Thinking=True on GGUF?
How do we set thinking parameter to true on GGUF? 🤔
Hi! We're actively working on an Ollama model with the corresponding go
template. Ultimately, enabling thinking
is a matter of enabling the right section of system prompt, so in the meantime you can use apply_chat_template
on the client side, then use the expanded string with raw generate
.
@gabegoodhart is there any way to do this with llama.cpp currently? (not ollama) Thank you!
@quantflex
At the moment, to do this in llama.cpp
, you would have to use apply_chat_template
with thinking=True
on the client side (or do the equivalent string manipulation in the programming language of your choice) and then use the formatted string as input for the raw generation. We have not updated the built in chat template logic in llama.cpp
itself yet.
The key addition to the system
prompt can be seen at line 88 here: https://ollama.com/gabegoodhart/granite3.2-preview:8b/blobs/f7e156ba65ab
Thank you @gabegoodhart !