Thinking=True on GGUF?

#7
by MrDevolver - opened

How do we set thinking parameter to true on GGUF? 🤔

IBM Granite org

Hi! We're actively working on an Ollama model with the corresponding go template. Ultimately, enabling thinking is a matter of enabling the right section of system prompt, so in the meantime you can use apply_chat_template on the client side, then use the expanded string with raw generate.

IBM Granite org

The draft Ollama model is now public: https://ollama.com/gabegoodhart/granite3.2-preview

@gabegoodhart is there any way to do this with llama.cpp currently? (not ollama) Thank you!

@quantflex At the moment, to do this in llama.cpp, you would have to use apply_chat_template with thinking=True on the client side (or do the equivalent string manipulation in the programming language of your choice) and then use the formatted string as input for the raw generation. We have not updated the built in chat template logic in llama.cpp itself yet.

The key addition to the system prompt can be seen at line 88 here: https://ollama.com/gabegoodhart/granite3.2-preview:8b/blobs/f7e156ba65ab

Thank you @gabegoodhart !

Sign up or log in to comment