ibm-granite/granite-3.2-8b-instruct-preview

13 days ago

How do we set thinking parameter to true on GGUF? 🤔

IBM Granite org 13 days ago

Hi! We're actively working on an Ollama model with the corresponding go template. Ultimately, enabling thinking is a matter of enabling the right section of system prompt, so in the meantime you can use apply_chat_template on the client side, then use the expanded string with raw generate.

gabegoodhart

IBM Granite org 13 days ago

The draft Ollama model is now public: https://ollama.com/gabegoodhart/granite3.2-preview

quantflex

8 days ago

@gabegoodhart is there any way to do this with llama.cpp currently? (not ollama) Thank you!

gabegoodhart

IBM Granite org 6 days ago

•

edited 6 days ago

@quantflex At the moment, to do this in llama.cpp, you would have to use apply_chat_template with thinking=True on the client side (or do the equivalent string manipulation in the programming language of your choice) and then use the formatted string as input for the raw generation. We have not updated the built in chat template logic in llama.cpp itself yet.

The key addition to the system prompt can be seen at line 88 here: https://ollama.com/gabegoodhart/granite3.2-preview:8b/blobs/f7e156ba65ab

quantflex

about 15 hours ago

Thank you @gabegoodhart !

ibm-granite
/

granite-3.2-8b-instruct-preview

Thinking=True on GGUF?