DavidAU
/

How-To-Use-Reasoning-Thinking-Models-and-Create-Them

@@ -105,7 +105,7 @@ https://huggingface.co/DavidAU/AI_Autocorrect__Auto-Creative-Enhancement__Auto-L
 ---
-Template Considerations:
 For most reasoning/thinking models your template CHOICE is critical, as well as your System Prompt/Role setting(s) - below.
@@ -124,6 +124,7 @@ A "Jinja" template is usually in the model's "source code" / "full precision" ve
 Here is a Qwen 2.5 version example (DO NOT USE: I have added spacing/breaks for readablity):
 <pre>
 "chat_template": "{% if not add_generation_prompt is defined %}
   {% set add_generation_prompt = false %}
   {% endif %}
@@ -176,11 +177,13 @@ Here is a Qwen 2.5 version example (DO NOT USE: I have added spacing/breaks for
                       {% if add_generation_prompt and not ns.is_tool %}
                       {{'<｜Assistant｜>'}}
                         {% endif %}"
 </pre>
 In some cases you may need to set a "tokenizer" too - depending on the LLM/AI app - to work with specific reasoning/thinking models. Usually
 this is NOT an issue as this is auto-detected/set, but if you are getting strange results then this might be the cause.
 TEMP/SETTINGS:
@@ -317,3 +320,58 @@ Response Guidelines:
 4. Concise yet Complete: Ensure responses are informative, yet to the point without unnecessary elaboration.
 5. Maintain a professional, intelligent, and analytical tone in all interactions.
 </PRE>

 ---
+<B>Template Considerations:</b>
 For most reasoning/thinking models your template CHOICE is critical, as well as your System Prompt/Role setting(s) - below.
 Here is a Qwen 2.5 version example (DO NOT USE: I have added spacing/breaks for readablity):
 <pre>
+<small>
 "chat_template": "{% if not add_generation_prompt is defined %}
   {% set add_generation_prompt = false %}
   {% endif %}
                       {% if add_generation_prompt and not ns.is_tool %}
                       {{'<｜Assistant｜>'}}
                         {% endif %}"
+</small>
 </pre>
 In some cases you may need to set a "tokenizer" too - depending on the LLM/AI app - to work with specific reasoning/thinking models. Usually
 this is NOT an issue as this is auto-detected/set, but if you are getting strange results then this might be the cause.
+Additional Section "General Notes" is at the end of this document.
 TEMP/SETTINGS:
 4. Concise yet Complete: Ensure responses are informative, yet to the point without unnecessary elaboration.
 5. Maintain a professional, intelligent, and analytical tone in all interactions.
 </PRE>
+---
+<B>General Notes:</b>
+These are general notes that have been collected from my various repos and/or from various experiences with both specific models
+and all models.
+These notes may assist you with other model(s) operation(s).
+---
+From :
+https://huggingface.co/DavidAU/L3.1-MOE-2X8B-Deepseek-DeepHermes-e32-uncensored-abliterated-13.7B-gguf
+Due to how this model is configured, I suggest 2-4 generations depending on your use case(s) as each will vary widely in terms of context, thinking/reasoning and response.
+Likewise, again depending on how your prompt is worded, it may take 1-4 regens for "thinking" to engage, however sometimes the model will generate a response, then think/reason and improve on this response and continue again. This is in part from "Deepseek" parts in the model.
+If you raise temp over .9, you may want to consider 4+ generations.
+Note on "reasoning/thinking" this will activate depending on the wording in your prompt(s) and also temp selected.
+There can also be variations because of how the models interact per generation.
+Also, as general note:
+If you are getting "long winded" generation/thinking/reasoning you may want to breakdown the "problem(s)" to solve into one or more prompts. This will allow the model to focus more strongly, and in some case give far better answers.
+IE:
+If you ask it to generate 6 general plots for a story VS generate one plot with these specific requirements - you may get better results.
+---
+From :
+https://huggingface.co/DavidAU/Qwen2.5-MOE-6x1.5B-DeepSeek-Reasoning-e32-gguf
+Temp of .4 to .8 is suggested, however it will still operate at much higher temps like 1.8, 2.6 etc.
+Depending on your prompt change temp SLOWLY: IE: .41,.42,.43 ... etc etc.
+Likewise, because these are small models, it may do a tonne of "thinking"/"reasoning" and then "forget" to finish a / the task(s). In this case, prompt the model to "Complete the task XYZ with the 'reasoning plan' above" .
+Likewise it may function better if you breakdown the reasoning/thinking task(s) into smaller pieces :
+"IE: Instead of asking for 6 plots FOR theme XYZ, ASK IT for ONE plot for theme XYZ at a time".
+Also set context limit at 4k minimum, 8K+ suggested.
+---