LHC88 commited on
Commit
837e943
·
1 Parent(s): 2c82be4

minor fixes

Browse files
Files changed (1) hide show
  1. README.md +70 -1
README.md CHANGED
@@ -161,10 +161,14 @@ We recommand that you use Mistral-Small-24B-Instruct-2501 in a server/client set
161
  1. Spin up a server:
162
 
163
  ```
164
- vllm serve mistralai/Mistral-Small-24B-Instruct-2501 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice
 
 
 
165
  ```
166
 
167
  **Note:** Running Mistral-Small-24B-Instruct-2501 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.
 
168
 
169
  2. To ping the client you can use a simple Python snippet.
170
 
@@ -256,6 +260,71 @@ This command reads the contents of chat_template.txt and creates a JSON object w
256
  jq --rawfile template chat_template_with_tools.jinja '.chat_template = $template' tokenizer_config.json > temp.json && mv temp.json tokenizer_config.json
257
  ```
258
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
259
  ### 📝 Develop and Test Jinja Prompt Templates with [Jinja Sandbox](http://jinja.quantprogramming.com/)
260
 
261
  [Jinja Sandbox](http://jinja.quantprogramming.com/) is a great online tool for **testing Jinja prompt templates** before integrating them into your application. It allows you to quickly render templates with custom input data and debug formatting issues.
 
161
  1. Spin up a server:
162
 
163
  ```
164
+ vllm serve --model uncensoredai/Mistral-Small-24B-Instruct-2501 \
165
+ --enable-auto-tool-choice --tool-call-parser mistral_v3_debug
166
+ --chat-template /path/to/chat_template_with_tools.jinja
167
+ /path/to/mistral_small_v3_parser.py
168
  ```
169
 
170
  **Note:** Running Mistral-Small-24B-Instruct-2501 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.
171
+ **Note:** Don't mind the warning on non-mistral tokenizer. Mistral-Small-24B-Instrut v3 does use a [LlamaTokenizer](https://huggingface.co/uncensoredai/Mistral-Small-24B-Instruct-2501/blob/2c82be49cce933e26113a754cd980ab238d957cf/tokenizer_config.json#L9018).
172
 
173
  2. To ping the client you can use a simple Python snippet.
174
 
 
260
  jq --rawfile template chat_template_with_tools.jinja '.chat_template = $template' tokenizer_config.json > temp.json && mv temp.json tokenizer_config.json
261
  ```
262
 
263
+ Jinja input example:
264
+ ```yaml
265
+ # System configuration
266
+ bos_token: "<s>"
267
+ eos_token: "</s>"
268
+
269
+ # Tools configuration
270
+ tools:
271
+ - type: "function"
272
+ function:
273
+ name: "get_weather"
274
+ description: "Get the current weather in a given location"
275
+ parameters:
276
+ type: "object"
277
+ properties:
278
+ location:
279
+ type: "string"
280
+ description: "City and state, e.g., 'San Francisco, CA'"
281
+ unit:
282
+ type: "string"
283
+ enum: ["celsius", "fahrenheit"]
284
+ required: ["location", "unit"]
285
+
286
+ - type: "function"
287
+ function:
288
+ name: "get_gold_price"
289
+ description: "Get the current gold price in wanted currency (default to USD)."
290
+ parameters:
291
+ type: "object"
292
+ properties:
293
+ currency:
294
+ type: "string"
295
+ description: "Currency code e.g. USD or EUR."
296
+
297
+ # Messages array
298
+ messages:
299
+ # Optional system message (if omitted, default will be used)
300
+ - role: "system"
301
+ content: "You are AI."
302
+
303
+ # User message
304
+ - role: "user"
305
+ content: "What's the weather like in San Francisco?"
306
+
307
+ # Example assistant message with tool calls
308
+ - role: "assistant"
309
+ tool_calls:
310
+ - id: "call_weather_123456789"
311
+ function:
312
+ name: "get_weather"
313
+ arguments:
314
+ location: "San Francisco, CA"
315
+ unit: "celsius"
316
+
317
+ # Example tool response
318
+ - role: "tool"
319
+ tool_call_id: "call_weather_123456789"
320
+ content:
321
+ content: '{"temperature": 18, "condition": "sunny"}'
322
+
323
+ # Example assistant final response
324
+ - role: "assistant"
325
+ content: "The weather in San Francisco is sunny with a temperature of 18°C."
326
+ ```
327
+
328
  ### 📝 Develop and Test Jinja Prompt Templates with [Jinja Sandbox](http://jinja.quantprogramming.com/)
329
 
330
  [Jinja Sandbox](http://jinja.quantprogramming.com/) is a great online tool for **testing Jinja prompt templates** before integrating them into your application. It allows you to quickly render templates with custom input data and debug formatting issues.