Text Generation
English
How to use reasoning models.
How to use thinking models.
How to create reasoninng models.
deepseek
reasoning
reason
thinking
all use cases
creative
fiction writing
plot generation
sub-plot generation
story generation
scene continue
storytelling
fiction story
romance
all genres
story
writing
vivid writing
fiction
roleplaying
bfloat16
float32
float16
role play
sillytavern
backyard
lmstudio
Text Generation WebUI
llama 3
mistral
llama 3.1
qwen 2.5
context 128k
mergekit
Merge
Update README.md
Browse files
README.md
CHANGED
@@ -2,14 +2,133 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
<B>Special Operation Instructions:</B>
|
6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
TEMP/SETTINGS:
|
8 |
|
9 |
1. Set Temp between 0 and .8, higher than this "think" functions will activate differently. The most "stable" temp seems to be .6, with a variance of +-0.05. Lower for more "logic" reasoning, raise it for more "creative" reasoning (max .8 or so). Also set context to at least 4096, to account for "thoughts" generation.
|
10 |
2. For temps 1+,2+ etc etc, thought(s) will expand, and become deeper and richer.
|
11 |
3. Set "repeat penalty" to 1.02 to 1.07 (recommended) .
|
12 |
-
4. This model requires a Llama 3 Instruct and/or Command-R chat template. (see notes on "System Prompt" / "Role" below) OR standard "Jinja Autoloaded Template" (this is contained in the quant and will autoload)
|
13 |
|
14 |
PROMPTS:
|
15 |
|
@@ -41,8 +160,6 @@ ADDITIONAL SUPPORT:
|
|
41 |
|
42 |
For additional generational support, general questions, and detailed parameter info and a lot more see also:
|
43 |
|
44 |
-
NOTE: This is a CLASS 2 model.
|
45 |
-
|
46 |
https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
|
47 |
|
48 |
---
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
<h2>How-To-Use-Reasoning-Thinking-Models-and-Create-Them - DOCUMENT</h2>
|
6 |
+
|
7 |
+
This document covers suggestions and methods to get the most out of "Reasoning/Thinking" models, including parameters/samplers,
|
8 |
+
System Prompt/Role settings, as well as links to "Reasoning/Thinking Models" and How to create your own (via adapters).
|
9 |
+
|
10 |
+
This is a live document and updates will occur often.
|
11 |
+
|
12 |
+
This document and the information contained in it can be used for ANY "Reasoning/Thinking" model - at my repo and/or other repos.
|
13 |
+
|
14 |
+
|
15 |
+
---
|
16 |
+
|
17 |
+
<B>Support: Document about Parameters, Samplers and How to Set These:</b>
|
18 |
+
|
19 |
+
---
|
20 |
+
|
21 |
+
For additional generational support, general questions, and detailed parameter info and a lot more see also:
|
22 |
+
|
23 |
+
https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
|
24 |
+
|
25 |
+
---
|
26 |
+
|
27 |
+
<B>Support: AI Auto-Correct Engine (software patch for SillyTavern Front End)</b>
|
28 |
+
|
29 |
+
---
|
30 |
+
|
31 |
+
AI Auto-Correct Engine (built, and programmed by DavidAU) auto-corrects AI generation in real-time, including modification of the
|
32 |
+
live generation stream to and from the AI... creating a two way street of information that operates, changes, and edits automatically.
|
33 |
+
This system is for all GGUF, EXL2, HQQ, and other quants/compressions and full source models too.
|
34 |
+
|
35 |
+
Below is an example generation using a standard GGUF (and standard AI app), but auto-corrected via this engine.
|
36 |
+
The engine is an API level system.
|
37 |
+
|
38 |
+
Software Link:
|
39 |
+
|
40 |
+
https://huggingface.co/DavidAU/AI_Autocorrect__Auto-Creative-Enhancement__Auto-Low-Quant-Optimization__gguf-exl2-hqq-SOFTWARE
|
41 |
+
|
42 |
+
---
|
43 |
+
|
44 |
+
<h2>MAIN: How To Use Reasoning / Thinking Models 101 </h2>
|
45 |
+
|
46 |
<B>Special Operation Instructions:</B>
|
47 |
|
48 |
+
---
|
49 |
+
|
50 |
+
Template Considerations:
|
51 |
+
|
52 |
+
For most reasoning/thinking models your template CHOICE is critical, as well as your System Prompt/Role setting(s) - below.
|
53 |
+
|
54 |
+
For most models you will need: Llama 3 Instruct or Chat, Chatml and/or Command-R OR standard "Jinja Autoloaded Template"
|
55 |
+
(this is contained in the quant and will autoload in SOME AI Apps).
|
56 |
+
|
57 |
+
The last one is usually the BEST CHOICE for a reasoning / thinking model (and in many cases other models too).
|
58 |
+
|
59 |
+
In Lmstudio, this option appears in the lower left, "template to use -> Manual or "Jinja Template".
|
60 |
+
|
61 |
+
This option/setting it will vary from AI/LLM app.
|
62 |
+
|
63 |
+
A "Jinja" template is usually in the model's "source code" / "full precision" version and located usually in "tokenizer_config.json" file
|
64 |
+
(usually the very BOTTOM/END of the file) which is then "copied" to the GGUF quants and available to "AI/LLM" apps.
|
65 |
+
|
66 |
+
Here is a Qwen 2.5 version example (DO NOT USE: I have added spacing/breaks for readablity):
|
67 |
+
|
68 |
+
<pre>
|
69 |
+
"chat_template": "{% if not add_generation_prompt is defined %}
|
70 |
+
{% set add_generation_prompt = false %}
|
71 |
+
{% endif %}
|
72 |
+
{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}
|
73 |
+
{%- for message in messages %}
|
74 |
+
{%- if message['role'] == 'system' %}
|
75 |
+
{% set ns.system_prompt = message['content'] %}
|
76 |
+
{%- endif %}
|
77 |
+
{%- endfor %}
|
78 |
+
{{bos_token}}
|
79 |
+
{{ns.system_prompt}}
|
80 |
+
{%- for message in messages %}
|
81 |
+
{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}
|
82 |
+
{{'<|User|>' + message['content']}}
|
83 |
+
{%- endif %}
|
84 |
+
{%- if message['role'] == 'assistant' and message['content'] is none %}
|
85 |
+
{%- set ns.is_tool = false -%}
|
86 |
+
{%- for tool in message['tool_calls']%}
|
87 |
+
{%- if not ns.is_first %}
|
88 |
+
{{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n'
|
89 |
+
+ '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}
|
90 |
+
{%- set ns.is_first = true -%}
|
91 |
+
{%- else %}
|
92 |
+
{{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>'
|
93 |
+
+ tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n'
|
94 |
+
+ '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}
|
95 |
+
{%- endif %}
|
96 |
+
{%- endfor %}
|
97 |
+
{%- endif %}
|
98 |
+
{%- if message['role'] == 'assistant' and message['content'] is not none %}
|
99 |
+
{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}
|
100 |
+
{%- set ns.is_tool = false -%}
|
101 |
+
{%- else %}
|
102 |
+
{% set content = message['content'] %}
|
103 |
+
{% if '</think>' in content %}
|
104 |
+
{% set content = content.split('</think>')[-1] %}
|
105 |
+
{% endif %}
|
106 |
+
{{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}
|
107 |
+
{%- endif %}{%- endif %}
|
108 |
+
{%- if message['role'] == 'tool' %}
|
109 |
+
{%- set ns.is_tool = true -%}
|
110 |
+
{%- if ns.is_output_first %}
|
111 |
+
{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}
|
112 |
+
{%- set ns.is_output_first = false %}
|
113 |
+
{%- else %}
|
114 |
+
{{'\\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}
|
115 |
+
{%- endif %}
|
116 |
+
{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}
|
117 |
+
{% endif %}
|
118 |
+
{% if add_generation_prompt and not ns.is_tool %}
|
119 |
+
{{'<|Assistant|>'}}
|
120 |
+
{% endif %}"
|
121 |
+
</pre>
|
122 |
+
|
123 |
+
In some cases you may need to set a "tokenizer" too - depending on the LLM/AI app - to work with specific reasoning/thinking models. Usually
|
124 |
+
this is NOT an issue as this is auto-detected/set, but if you are getting strange results then this might be the cause.
|
125 |
+
|
126 |
+
|
127 |
TEMP/SETTINGS:
|
128 |
|
129 |
1. Set Temp between 0 and .8, higher than this "think" functions will activate differently. The most "stable" temp seems to be .6, with a variance of +-0.05. Lower for more "logic" reasoning, raise it for more "creative" reasoning (max .8 or so). Also set context to at least 4096, to account for "thoughts" generation.
|
130 |
2. For temps 1+,2+ etc etc, thought(s) will expand, and become deeper and richer.
|
131 |
3. Set "repeat penalty" to 1.02 to 1.07 (recommended) .
|
|
|
132 |
|
133 |
PROMPTS:
|
134 |
|
|
|
160 |
|
161 |
For additional generational support, general questions, and detailed parameter info and a lot more see also:
|
162 |
|
|
|
|
|
163 |
https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
|
164 |
|
165 |
---
|