uncensoredai
/

Mistral-Small-24B-Instruct-2501

@@ -161,10 +161,14 @@ We recommand that you use Mistral-Small-24B-Instruct-2501 in a server/client set
 1. Spin up a server:
 ```
-vllm serve mistralai/Mistral-Small-24B-Instruct-2501 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice
 ```
 **Note:** Running Mistral-Small-24B-Instruct-2501 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.
 2. To ping the client you can use a simple Python snippet.
@@ -256,6 +260,71 @@ This command reads the contents of chat_template.txt and creates a JSON object w
 jq --rawfile template chat_template_with_tools.jinja '.chat_template = $template' tokenizer_config.json > temp.json && mv temp.json tokenizer_config.json
 ```
 ### 📝 Develop and Test Jinja Prompt Templates with [Jinja Sandbox](http://jinja.quantprogramming.com/)
 [Jinja Sandbox](http://jinja.quantprogramming.com/) is a great online tool for **testing Jinja prompt templates** before integrating them into your application. It allows you to quickly render templates with custom input data and debug formatting issues.

 1. Spin up a server:
 ```
+vllm serve --model uncensoredai/Mistral-Small-24B-Instruct-2501 \
+  --enable-auto-tool-choice --tool-call-parser mistral_v3_debug
+  --chat-template /path/to/chat_template_with_tools.jinja
+  /path/to/mistral_small_v3_parser.py
 ```
 **Note:** Running Mistral-Small-24B-Instruct-2501 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.
+**Note:** Don't mind the warning on non-mistral tokenizer. Mistral-Small-24B-Instrut v3 does use a [LlamaTokenizer](https://huggingface.co/uncensoredai/Mistral-Small-24B-Instruct-2501/blob/2c82be49cce933e26113a754cd980ab238d957cf/tokenizer_config.json#L9018).
 2. To ping the client you can use a simple Python snippet.
 jq --rawfile template chat_template_with_tools.jinja '.chat_template = $template' tokenizer_config.json > temp.json && mv temp.json tokenizer_config.json
 ```
+Jinja input example:
+```yaml
+# System configuration
+bos_token: "<s>"
+eos_token: "</s>"
+# Tools configuration
+tools:
+  - type: "function"
+    function:
+      name: "get_weather"
+      description: "Get the current weather in a given location"
+      parameters:
+        type: "object"
+        properties:
+          location:
+            type: "string"
+            description: "City and state, e.g., 'San Francisco, CA'"
+          unit:
+            type: "string"
+            enum: ["celsius", "fahrenheit"]
+        required: ["location", "unit"]
+  - type: "function"
+    function:
+      name: "get_gold_price"
+      description: "Get the current gold price in wanted currency (default to USD)."
+      parameters:
+        type: "object"
+        properties:
+          currency:
+            type: "string"
+            description: "Currency code e.g. USD or EUR."
+# Messages array
+messages:
+  # Optional system message (if omitted, default will be used)
+  - role: "system"
+    content: "You are AI."
+  # User message
+  - role: "user"
+    content: "What's the weather like in San Francisco?"
+  # Example assistant message with tool calls
+  - role: "assistant"
+    tool_calls:
+      - id: "call_weather_123456789"
+        function:
+          name: "get_weather"
+          arguments:
+            location: "San Francisco, CA"
+            unit: "celsius"
+  # Example tool response
+  - role: "tool"
+    tool_call_id: "call_weather_123456789"
+    content:
+      content: '{"temperature": 18, "condition": "sunny"}'
+  # Example assistant final response
+  - role: "assistant"
+    content: "The weather in San Francisco is sunny with a temperature of 18°C."
+```
 ### 📝 Develop and Test Jinja Prompt Templates with [Jinja Sandbox](http://jinja.quantprogramming.com/)
 [Jinja Sandbox](http://jinja.quantprogramming.com/) is a great online tool for **testing Jinja prompt templates** before integrating them into your application. It allows you to quickly render templates with custom input data and debug formatting issues.