Qwen
/

Qwen3-235B-A22B-Instruct-2507

Text Generation

Model card Files Files and versions

hzhwcmhf commited on Jul 21

Commit

61e22c6

·

verified ·

1 Parent(s): 8a39f5b

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -32,7 +32,7 @@ We introduce the updated version of the **Qwen3-235B-A22B non-thinking mode**, n
 - Number of Activated Experts: 8
 - Context Length: **262,144 natively**.
-**NOTE: this model only supports the non-thinking mode. In other words, there are no ``<think></think>`` blocks in the generated outputs. Meanwhile, `enable_thinking=False` is no longer needed to be specified.**
 For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
@@ -135,7 +135,7 @@ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create
     vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507 --tensor-parallel-size 8 --max-model-len 262144
     ```
-Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as 32,768.
 For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.

 - Number of Activated Experts: 8
 - Context Length: **262,144 natively**.
+**NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
 For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
     vllm serve Qwen/Qwen3-235B-A22B-Instruct-2507 --tensor-parallel-size 8 --max-model-len 262144
     ```
+**Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
 For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.