minor fixes
Browse files
README.md
CHANGED
@@ -161,10 +161,14 @@ We recommand that you use Mistral-Small-24B-Instruct-2501 in a server/client set
|
|
161 |
1. Spin up a server:
|
162 |
|
163 |
```
|
164 |
-
vllm serve
|
|
|
|
|
|
|
165 |
```
|
166 |
|
167 |
**Note:** Running Mistral-Small-24B-Instruct-2501 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.
|
|
|
168 |
|
169 |
2. To ping the client you can use a simple Python snippet.
|
170 |
|
@@ -256,6 +260,71 @@ This command reads the contents of chat_template.txt and creates a JSON object w
|
|
256 |
jq --rawfile template chat_template_with_tools.jinja '.chat_template = $template' tokenizer_config.json > temp.json && mv temp.json tokenizer_config.json
|
257 |
```
|
258 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
259 |
### 📝 Develop and Test Jinja Prompt Templates with [Jinja Sandbox](http://jinja.quantprogramming.com/)
|
260 |
|
261 |
[Jinja Sandbox](http://jinja.quantprogramming.com/) is a great online tool for **testing Jinja prompt templates** before integrating them into your application. It allows you to quickly render templates with custom input data and debug formatting issues.
|
|
|
161 |
1. Spin up a server:
|
162 |
|
163 |
```
|
164 |
+
vllm serve --model uncensoredai/Mistral-Small-24B-Instruct-2501 \
|
165 |
+
--enable-auto-tool-choice --tool-call-parser mistral_v3_debug
|
166 |
+
--chat-template /path/to/chat_template_with_tools.jinja
|
167 |
+
/path/to/mistral_small_v3_parser.py
|
168 |
```
|
169 |
|
170 |
**Note:** Running Mistral-Small-24B-Instruct-2501 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.
|
171 |
+
**Note:** Don't mind the warning on non-mistral tokenizer. Mistral-Small-24B-Instrut v3 does use a [LlamaTokenizer](https://huggingface.co/uncensoredai/Mistral-Small-24B-Instruct-2501/blob/2c82be49cce933e26113a754cd980ab238d957cf/tokenizer_config.json#L9018).
|
172 |
|
173 |
2. To ping the client you can use a simple Python snippet.
|
174 |
|
|
|
260 |
jq --rawfile template chat_template_with_tools.jinja '.chat_template = $template' tokenizer_config.json > temp.json && mv temp.json tokenizer_config.json
|
261 |
```
|
262 |
|
263 |
+
Jinja input example:
|
264 |
+
```yaml
|
265 |
+
# System configuration
|
266 |
+
bos_token: "<s>"
|
267 |
+
eos_token: "</s>"
|
268 |
+
|
269 |
+
# Tools configuration
|
270 |
+
tools:
|
271 |
+
- type: "function"
|
272 |
+
function:
|
273 |
+
name: "get_weather"
|
274 |
+
description: "Get the current weather in a given location"
|
275 |
+
parameters:
|
276 |
+
type: "object"
|
277 |
+
properties:
|
278 |
+
location:
|
279 |
+
type: "string"
|
280 |
+
description: "City and state, e.g., 'San Francisco, CA'"
|
281 |
+
unit:
|
282 |
+
type: "string"
|
283 |
+
enum: ["celsius", "fahrenheit"]
|
284 |
+
required: ["location", "unit"]
|
285 |
+
|
286 |
+
- type: "function"
|
287 |
+
function:
|
288 |
+
name: "get_gold_price"
|
289 |
+
description: "Get the current gold price in wanted currency (default to USD)."
|
290 |
+
parameters:
|
291 |
+
type: "object"
|
292 |
+
properties:
|
293 |
+
currency:
|
294 |
+
type: "string"
|
295 |
+
description: "Currency code e.g. USD or EUR."
|
296 |
+
|
297 |
+
# Messages array
|
298 |
+
messages:
|
299 |
+
# Optional system message (if omitted, default will be used)
|
300 |
+
- role: "system"
|
301 |
+
content: "You are AI."
|
302 |
+
|
303 |
+
# User message
|
304 |
+
- role: "user"
|
305 |
+
content: "What's the weather like in San Francisco?"
|
306 |
+
|
307 |
+
# Example assistant message with tool calls
|
308 |
+
- role: "assistant"
|
309 |
+
tool_calls:
|
310 |
+
- id: "call_weather_123456789"
|
311 |
+
function:
|
312 |
+
name: "get_weather"
|
313 |
+
arguments:
|
314 |
+
location: "San Francisco, CA"
|
315 |
+
unit: "celsius"
|
316 |
+
|
317 |
+
# Example tool response
|
318 |
+
- role: "tool"
|
319 |
+
tool_call_id: "call_weather_123456789"
|
320 |
+
content:
|
321 |
+
content: '{"temperature": 18, "condition": "sunny"}'
|
322 |
+
|
323 |
+
# Example assistant final response
|
324 |
+
- role: "assistant"
|
325 |
+
content: "The weather in San Francisco is sunny with a temperature of 18°C."
|
326 |
+
```
|
327 |
+
|
328 |
### 📝 Develop and Test Jinja Prompt Templates with [Jinja Sandbox](http://jinja.quantprogramming.com/)
|
329 |
|
330 |
[Jinja Sandbox](http://jinja.quantprogramming.com/) is a great online tool for **testing Jinja prompt templates** before integrating them into your application. It allows you to quickly render templates with custom input data and debug formatting issues.
|