DavidAU commited on
Commit
8630094
·
verified ·
1 Parent(s): 8ed2a73

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +120 -3
README.md CHANGED
@@ -2,14 +2,133 @@
2
  license: apache-2.0
3
  ---
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  <B>Special Operation Instructions:</B>
6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  TEMP/SETTINGS:
8
 
9
  1. Set Temp between 0 and .8, higher than this "think" functions will activate differently. The most "stable" temp seems to be .6, with a variance of +-0.05. Lower for more "logic" reasoning, raise it for more "creative" reasoning (max .8 or so). Also set context to at least 4096, to account for "thoughts" generation.
10
  2. For temps 1+,2+ etc etc, thought(s) will expand, and become deeper and richer.
11
  3. Set "repeat penalty" to 1.02 to 1.07 (recommended) .
12
- 4. This model requires a Llama 3 Instruct and/or Command-R chat template. (see notes on "System Prompt" / "Role" below) OR standard "Jinja Autoloaded Template" (this is contained in the quant and will autoload)
13
 
14
  PROMPTS:
15
 
@@ -41,8 +160,6 @@ ADDITIONAL SUPPORT:
41
 
42
  For additional generational support, general questions, and detailed parameter info and a lot more see also:
43
 
44
- NOTE: This is a CLASS 2 model.
45
-
46
  https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
47
 
48
  ---
 
2
  license: apache-2.0
3
  ---
4
 
5
+ <h2>How-To-Use-Reasoning-Thinking-Models-and-Create-Them - DOCUMENT</h2>
6
+
7
+ This document covers suggestions and methods to get the most out of "Reasoning/Thinking" models, including parameters/samplers,
8
+ System Prompt/Role settings, as well as links to "Reasoning/Thinking Models" and How to create your own (via adapters).
9
+
10
+ This is a live document and updates will occur often.
11
+
12
+ This document and the information contained in it can be used for ANY "Reasoning/Thinking" model - at my repo and/or other repos.
13
+
14
+
15
+ ---
16
+
17
+ <B>Support: Document about Parameters, Samplers and How to Set These:</b>
18
+
19
+ ---
20
+
21
+ For additional generational support, general questions, and detailed parameter info and a lot more see also:
22
+
23
+ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
24
+
25
+ ---
26
+
27
+ <B>Support: AI Auto-Correct Engine (software patch for SillyTavern Front End)</b>
28
+
29
+ ---
30
+
31
+ AI Auto-Correct Engine (built, and programmed by DavidAU) auto-corrects AI generation in real-time, including modification of the
32
+ live generation stream to and from the AI... creating a two way street of information that operates, changes, and edits automatically.
33
+ This system is for all GGUF, EXL2, HQQ, and other quants/compressions and full source models too.
34
+
35
+ Below is an example generation using a standard GGUF (and standard AI app), but auto-corrected via this engine.
36
+ The engine is an API level system.
37
+
38
+ Software Link:
39
+
40
+ https://huggingface.co/DavidAU/AI_Autocorrect__Auto-Creative-Enhancement__Auto-Low-Quant-Optimization__gguf-exl2-hqq-SOFTWARE
41
+
42
+ ---
43
+
44
+ <h2>MAIN: How To Use Reasoning / Thinking Models 101 </h2>
45
+
46
  <B>Special Operation Instructions:</B>
47
 
48
+ ---
49
+
50
+ Template Considerations:
51
+
52
+ For most reasoning/thinking models your template CHOICE is critical, as well as your System Prompt/Role setting(s) - below.
53
+
54
+ For most models you will need: Llama 3 Instruct or Chat, Chatml and/or Command-R OR standard "Jinja Autoloaded Template"
55
+ (this is contained in the quant and will autoload in SOME AI Apps).
56
+
57
+ The last one is usually the BEST CHOICE for a reasoning / thinking model (and in many cases other models too).
58
+
59
+ In Lmstudio, this option appears in the lower left, "template to use -> Manual or "Jinja Template".
60
+
61
+ This option/setting it will vary from AI/LLM app.
62
+
63
+ A "Jinja" template is usually in the model's "source code" / "full precision" version and located usually in "tokenizer_config.json" file
64
+ (usually the very BOTTOM/END of the file) which is then "copied" to the GGUF quants and available to "AI/LLM" apps.
65
+
66
+ Here is a Qwen 2.5 version example (DO NOT USE: I have added spacing/breaks for readablity):
67
+
68
+ <pre>
69
+ "chat_template": "{% if not add_generation_prompt is defined %}
70
+ {% set add_generation_prompt = false %}
71
+ {% endif %}
72
+ {% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}
73
+ {%- for message in messages %}
74
+ {%- if message['role'] == 'system' %}
75
+ {% set ns.system_prompt = message['content'] %}
76
+ {%- endif %}
77
+ {%- endfor %}
78
+ {{bos_token}}
79
+ {{ns.system_prompt}}
80
+ {%- for message in messages %}
81
+ {%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}
82
+ {{'<|User|>' + message['content']}}
83
+ {%- endif %}
84
+ {%- if message['role'] == 'assistant' and message['content'] is none %}
85
+ {%- set ns.is_tool = false -%}
86
+ {%- for tool in message['tool_calls']%}
87
+ {%- if not ns.is_first %}
88
+ {{'<|Assistant|><|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\\n'
89
+ + '```json' + '\\n' + tool['function']['arguments'] + '\\n' + '```' + '<|tool▁call▁end|>'}}
90
+ {%- set ns.is_first = true -%}
91
+ {%- else %}
92
+ {{'\\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>'
93
+ + tool['function']['name'] + '\\n' + '```json' + '\\n' + tool['function']['arguments'] + '\\n'
94
+ + '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}
95
+ {%- endif %}
96
+ {%- endfor %}
97
+ {%- endif %}
98
+ {%- if message['role'] == 'assistant' and message['content'] is not none %}
99
+ {%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + message['content'] + '<|end▁of▁sentence|>'}}
100
+ {%- set ns.is_tool = false -%}
101
+ {%- else %}
102
+ {% set content = message['content'] %}
103
+ {% if '</think>' in content %}
104
+ {% set content = content.split('</think>')[-1] %}
105
+ {% endif %}
106
+ {{'<|Assistant|>' + content + '<|end▁of▁sentence|>'}}
107
+ {%- endif %}{%- endif %}
108
+ {%- if message['role'] == 'tool' %}
109
+ {%- set ns.is_tool = true -%}
110
+ {%- if ns.is_output_first %}
111
+ {{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}
112
+ {%- set ns.is_output_first = false %}
113
+ {%- else %}
114
+ {{'\\n<|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}
115
+ {%- endif %}
116
+ {%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}
117
+ {% endif %}
118
+ {% if add_generation_prompt and not ns.is_tool %}
119
+ {{'<|Assistant|>'}}
120
+ {% endif %}"
121
+ </pre>
122
+
123
+ In some cases you may need to set a "tokenizer" too - depending on the LLM/AI app - to work with specific reasoning/thinking models. Usually
124
+ this is NOT an issue as this is auto-detected/set, but if you are getting strange results then this might be the cause.
125
+
126
+
127
  TEMP/SETTINGS:
128
 
129
  1. Set Temp between 0 and .8, higher than this "think" functions will activate differently. The most "stable" temp seems to be .6, with a variance of +-0.05. Lower for more "logic" reasoning, raise it for more "creative" reasoning (max .8 or so). Also set context to at least 4096, to account for "thoughts" generation.
130
  2. For temps 1+,2+ etc etc, thought(s) will expand, and become deeper and richer.
131
  3. Set "repeat penalty" to 1.02 to 1.07 (recommended) .
 
132
 
133
  PROMPTS:
134
 
 
160
 
161
  For additional generational support, general questions, and detailed parameter info and a lot more see also:
162
 
 
 
163
  https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters
164
 
165
  ---