hzhwcmhf commited on
Commit
61e22c6
·
verified ·
1 Parent(s): 8a39f5b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -32,7 +32,7 @@ We introduce the updated version of the **Qwen3-235B-A22B non-thinking mode**, n
32
  - Number of Activated Experts: 8
33
  - Context Length: **262,144 natively**.
34
 
35
- **NOTE: this model only supports the non-thinking mode. In other words, there are no ``<think></think>`` blocks in the generated outputs. Meanwhile, `enable_thinking=False` is no longer needed to be specified.**
36
 
37
  For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
38
 
@@ -135,7 +135,7 @@ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create
135
  vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507 --tensor-parallel-size 8 --max-model-len 262144
136
  ```
137
 
138
- Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as 32,768.
139
 
140
  For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
141
 
 
32
  - Number of Activated Experts: 8
33
  - Context Length: **262,144 natively**.
34
 
35
+ **NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
36
 
37
  For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
38
 
 
135
  vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507 --tensor-parallel-size 8 --max-model-len 262144
136
  ```
137
 
138
+ **Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
139
 
140
  For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
141