Update README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ We introduce the updated version of the **Qwen3-235B-A22B non-thinking mode**, n
|
|
32 |
- Number of Activated Experts: 8
|
33 |
- Context Length: **262,144 natively**.
|
34 |
|
35 |
-
**NOTE:
|
36 |
|
37 |
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|
38 |
|
@@ -135,7 +135,7 @@ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create
|
|
135 |
vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507 --tensor-parallel-size 8 --max-model-len 262144
|
136 |
```
|
137 |
|
138 |
-
Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as 32,768
|
139 |
|
140 |
For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
|
141 |
|
|
|
32 |
- Number of Activated Experts: 8
|
33 |
- Context Length: **262,144 natively**.
|
34 |
|
35 |
+
**NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
|
36 |
|
37 |
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
|
38 |
|
|
|
135 |
vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507 --tensor-parallel-size 8 --max-model-len 262144
|
136 |
```
|
137 |
|
138 |
+
**Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
|
139 |
|
140 |
For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
|
141 |
|