DavidAU
/

How-To-Use-Reasoning-Thinking-Models-and-Create-Them

@@ -2,14 +2,133 @@
 license: apache-2.0
 ---
 <B>Special Operation Instructions:</B>
 TEMP/SETTINGS:
 1. Set Temp between 0 and .8, higher than this "think" functions will activate differently. The most "stable" temp seems to be .6, with a variance of +-0.05. Lower for more "logic" reasoning, raise it for more "creative" reasoning (max .8 or so). Also set context to at least 4096, to account for "thoughts" generation.
 2. For temps 1+,2+ etc etc, thought(s) will expand, and become deeper and richer.
 3. Set "repeat penalty" to 1.02 to 1.07 (recommended) .
-4. This model requires a Llama 3 Instruct and/or Command-R chat template. (see notes on "System Prompt" / "Role" below)  OR standard "Jinja Autoloaded Template" (this is contained in the quant and will autoload)
 PROMPTS:
@@ -41,8 +160,6 @@ ADDITIONAL SUPPORT:
 For additional generational support, general questions, and detailed parameter info and a lot more see also:
-NOTE: This is a CLASS 2 model.
 https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
 ---

 license: apache-2.0
 ---
+<h2>How-To-Use-Reasoning-Thinking-Models-and-Create-Them - DOCUMENT</h2>
+This document covers suggestions and methods to get the most out of "Reasoning/Thinking" models, including parameters/samplers,
+System Prompt/Role settings, as well as links to "Reasoning/Thinking Models" and How to create your own (via adapters).
+This is a live document and updates will occur often.
+This document and the information contained in it can be used for ANY "Reasoning/Thinking" model - at my repo and/or other repos.
+---
+<B>Support: Document about Parameters, Samplers and How to Set These:</b>
+---
+For additional generational support, general questions, and detailed parameter info and a lot more see also:
+https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
+---
+<B>Support: AI Auto-Correct Engine (software patch for SillyTavern Front End)</b>
+---
+AI Auto-Correct Engine (built, and programmed by DavidAU) auto-corrects AI generation in real-time, including modification of the
+live generation stream to and from the AI... creating a two way street of information that operates, changes, and edits automatically.
+This system is for all GGUF, EXL2, HQQ, and other quants/compressions and full source models too.
+Below is an example generation using a standard GGUF (and standard AI app), but auto-corrected via this engine.
+The engine is an API level system.
+Software Link:
+https://huggingface.co/DavidAU/AI_Autocorrect__Auto-Creative-Enhancement__Auto-Low-Quant-Optimization__gguf-exl2-hqq-SOFTWARE
+---
+<h2>MAIN: How To Use Reasoning / Thinking Models 101 </h2>
 <B>Special Operation Instructions:</B>
+---
+Template Considerations:
+For most reasoning/thinking models your template CHOICE is critical, as well as your System Prompt/Role setting(s) - below.
+For most models you will need: Llama 3 Instruct or Chat, Chatml and/or Command-R OR standard "Jinja Autoloaded Template"
+(this is contained in the quant and will autoload in SOME AI Apps).
+The last one is usually the BEST CHOICE for a reasoning / thinking model (and in many cases other models too).
+In Lmstudio, this option appears in the lower left, "template to use -> Manual or "Jinja Template".
+This option/setting it will vary from AI/LLM app.
+A "Jinja" template is usually in the model's "source code" / "full precision" version and located usually in "tokenizer_config.json" file
+(usually the very BOTTOM/END of the file) which is then "copied" to the GGUF quants and available to "AI/LLM" apps.
+Here is a Qwen 2.5 version example (DO NOT USE: I have added spacing/breaks for readablity):
+<pre>
+"chat_template": "{% if not add_generation_prompt is defined %}
+  {% set add_generation_prompt = false %}
+  {% endif %}
+  {% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}
+  {%- for message in messages %}
+  {%- if message['role'] == 'system' %}
+  {% set ns.system_prompt = message['content'] %}
+  {%- endif %}
+  {%- endfor %}
+  {{bos_token}}
+  {{ns.system_prompt}}
+  {%- for message in messages %}
+  {%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}
+  {{'<｜User｜>' + message['content']}}
+    {%- endif %}
+    {%- if message['role'] == 'assistant' and message['content'] is none %}
+    {%- set ns.is_tool = false -%}
+    {%- for tool in message['tool_calls']%}
+    {%- if not ns.is_first %}
+    {{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\\n'
+      + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<｜tool▁call▁end｜>'}}
+        {%- set ns.is_first = true -%}
+        {%- else %}
+        {{'\\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>'
+          + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n'
+          + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}
+            {%- endif %}
+            {%- endfor %}
+            {%- endif %}
+            {%- if message['role'] == 'assistant' and message['content'] is not none %}
+            {%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}
+              {%- set ns.is_tool = false -%}
+              {%- else %}
+              {% set content = message['content'] %}
+              {% if '</think>' in content %}
+              {% set content = content.split('</think>')[-1] %}
+              {% endif %}
+              {{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}
+                {%- endif %}{%- endif %}
+                {%- if message['role'] == 'tool' %}
+                {%- set ns.is_tool = true -%}
+                {%- if ns.is_output_first %}
+                {{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}
+                  {%- set ns.is_output_first = false %}
+                  {%- else %}
+                  {{'\\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}
+                    {%- endif %}
+                    {%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}
+                      {% endif %}
+                      {% if add_generation_prompt and not ns.is_tool %}
+                      {{'<｜Assistant｜>'}}
+                        {% endif %}"
+</pre>
+In some cases you may need to set a "tokenizer" too - depending on the LLM/AI app - to work with specific reasoning/thinking models. Usually
+this is NOT an issue as this is auto-detected/set, but if you are getting strange results then this might be the cause.
 TEMP/SETTINGS:
 1. Set Temp between 0 and .8, higher than this "think" functions will activate differently. The most "stable" temp seems to be .6, with a variance of +-0.05. Lower for more "logic" reasoning, raise it for more "creative" reasoning (max .8 or so). Also set context to at least 4096, to account for "thoughts" generation.
 2. For temps 1+,2+ etc etc, thought(s) will expand, and become deeper and richer.
 3. Set "repeat penalty" to 1.02 to 1.07 (recommended) .
 PROMPTS:
 For additional generational support, general questions, and detailed parameter info and a lot more see also:
 https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
 ---