shimmyshimmer commited on
Commit
81c6610
·
verified ·
1 Parent(s): a52547f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +208 -3
README.md CHANGED
@@ -1,3 +1,208 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - unsloth
4
+ - qwen3
5
+ - qwen
6
+ base_model:
7
+ - Qwen/Qwen3-Coder-480B-A35B-Instruct
8
+ library_name: transformers
9
+ license: apache-2.0
10
+ license_link: https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct/blob/main/LICENSE
11
+ pipeline_tag: text-generation
12
+ ---
13
+ <div>
14
+ <p style="margin-bottom: 0; margin-top: 0;">
15
+ <strong>See <a href="https://huggingface.co/collections/unsloth/qwen3-680edabfb790c8c34a242f95">our collection</a> for all versions of Qwen3 including GGUF, 4-bit & 16-bit formats.</strong>
16
+ </p>
17
+ <p style="margin-bottom: 0;">
18
+ <em>Learn to run Qwen3-Coder correctly - <a href="https://docs.unsloth.ai/basics/qwen3-coder">Read our Guide</a>.</em>
19
+ </p>
20
+ <p style="margin-top: 0;margin-bottom: 0;">
21
+ <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
22
+ </p>
23
+ <div style="display: flex; gap: 5px; align-items: center; ">
24
+ <a href="https://github.com/unslothai/unsloth/">
25
+ <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
26
+ </a>
27
+ <a href="https://discord.gg/unsloth">
28
+ <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
29
+ </a>
30
+ <a href="https://docs.unsloth.ai/basics/qwen3-coder">
31
+ <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
32
+ </a>
33
+ </div>
34
+ <h1 style="margin-top: 0rem;">✨ Read our Qwen3-Coder Guide <a href="https://docs.unsloth.ai/basics/qwen3-coder">here</a>!</h1>
35
+ </div>
36
+
37
+ - Fine-tune Qwen3 (14B) for free using our Google [Colab notebook](https://docs.unsloth.ai/get-started/unsloth-notebooks)!
38
+ - Read our Blog about Qwen3 support: [unsloth.ai/blog/qwen3](https://unsloth.ai/blog/qwen3)
39
+ - View the rest of our notebooks in our [docs here](https://docs.unsloth.ai/get-started/unsloth-notebooks).
40
+ - Run & export your fine-tuned model to Ollama, llama.cpp or HF.
41
+
42
+ | Unsloth supports | Free Notebooks | Performance | Memory use |
43
+ |-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
44
+ | **Qwen3 (14B)** | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks) | 3x faster | 70% less |
45
+ | **GRPO with Qwen3 (8B)** | [▶️ Start on Colab](https://docs.unsloth.ai/get-started/unsloth-notebooks) | 3x faster | 80% less |
46
+ | **Llama-3.2 (3B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2.4x faster | 58% less |
47
+ | **Llama-3.2 (11B vision)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 60% less |
48
+ | **Qwen2.5 (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 60% less |
49
+
50
+ # Qwen3-Coder-480B-A35B-Instruct
51
+ <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
52
+ <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
53
+ </a>
54
+
55
+ ## Highlights
56
+
57
+ Today, we're announcing **Qwen3-Coder**, our most agentic code model to date. **Qwen3-Coder** is available in multiple sizes, but we're excited to introduce its most powerful variant first: **Qwen3-Coder-480B-A35B-Instruct**. featuring the following key enhancements:
58
+
59
+ - **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks, achieving results comparable to Claude Sonnet.
60
+ - **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
61
+ - **Agentic Coding** supporting for most platfrom such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.
62
+
63
+ ![image/jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Coder/qwen3-coder-main.jpg)
64
+
65
+ ## Model Overview
66
+
67
+ **Qwen3-480B-A35B-Instruct** has the following features:
68
+ - Type: Causal Language Models
69
+ - Training Stage: Pretraining & Post-training
70
+ - Number of Parameters: 480B in total and 35B activated
71
+ - Number of Layers: 62
72
+ - Number of Attention Heads (GQA): 96 for Q and 8 for KV
73
+ - Number of Experts: 160
74
+ - Number of Activated Experts: 8
75
+ - Context Length: **262,144 natively**.
76
+
77
+ **NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
78
+
79
+ For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3-coder/), [GitHub](https://github.com/QwenLM/Qwen3-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/).
80
+
81
+
82
+ ## Quickstart
83
+
84
+ We advise you to use the latest version of `transformers`.
85
+
86
+ With `transformers<4.51.0`, you will encounter the following error:
87
+ ```
88
+ KeyError: 'qwen3_moe'
89
+ ```
90
+
91
+ The following contains a code snippet illustrating how to use the model generate content based on given inputs.
92
+ ```python
93
+ from transformers import AutoModelForCausalLM, AutoTokenizer
94
+
95
+ model_name = "Qwen/Qwen3-480B-A35B-Instruct"
96
+
97
+ # load the tokenizer and the model
98
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
99
+ model = AutoModelForCausalLM.from_pretrained(
100
+ model_name,
101
+ torch_dtype="auto",
102
+ device_map="auto"
103
+ )
104
+
105
+ # prepare the model input
106
+ prompt = "Write a quick sort algorithm."
107
+ messages = [
108
+ {"role": "user", "content": prompt}
109
+ ]
110
+ text = tokenizer.apply_chat_template(
111
+ messages,
112
+ tokenize=False,
113
+ add_generation_prompt=True,
114
+ )
115
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
116
+
117
+ # conduct text completion
118
+ generated_ids = model.generate(
119
+ **model_inputs,
120
+ max_new_tokens=65536
121
+ )
122
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
123
+
124
+ content = tokenizer.decode(output_ids, skip_special_tokens=True)
125
+
126
+ print("content:", content)
127
+ ```
128
+
129
+ **Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
130
+
131
+ For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
132
+
133
+ ## Agentic Coding
134
+
135
+ Qwen3-Coder excels in tool calling capabilities.
136
+
137
+ You can simply define or use any tools as following example.
138
+ ```python
139
+ # Your tool implementation
140
+ def square_the_number(num: float) -> dict:
141
+ return num ** 2
142
+
143
+ # Define Tools
144
+ tools=[
145
+ {
146
+ "type":"function",
147
+ "function":{
148
+ "name": "square_the_number",
149
+ "description": "output the square of the number.",
150
+ "parameters": {
151
+ "type": "object",
152
+ "required": ["input_num"],
153
+ "properties": {
154
+ 'input_num': {
155
+ 'type': 'number',
156
+ 'description': 'input_num is a number that will be squared'
157
+ }
158
+ },
159
+ }
160
+ }
161
+ }
162
+ ]
163
+
164
+ import OpenAI
165
+ # Define LLM
166
+ client = OpenAI(
167
+ # Use a custom endpoint compatible with OpenAI API
168
+ base_url='http://localhost:8000/v1', # api_base
169
+ api_key="EMPTY"
170
+ )
171
+
172
+ messages = [{'role': 'user', 'content': 'square the number 1024'}]
173
+
174
+ completion = client.chat.completions.create(
175
+ messages=messages,
176
+ model="Qwen3-480B-A35B-Instruct",
177
+ max_tokens=65536,
178
+ tools=tools,
179
+ )
180
+
181
+ print(completion.choice[0])
182
+ ```
183
+
184
+ ## Best Practices
185
+
186
+ To achieve optimal performance, we recommend the following settings:
187
+
188
+ 1. **Sampling Parameters**:
189
+ - We suggest using `temperature=0.7`, `top_p=0.8`, `top_k=20`, `repetition_penalty=1.05`.
190
+
191
+ 2. **Adequate Output Length**: We recommend using an output length of 65,536 tokens for most queries, which is adequate for instruct models.
192
+
193
+
194
+ ### Citation
195
+
196
+ If you find our work helpful, feel free to give us a cite.
197
+
198
+ ```
199
+ @misc{qwen3technicalreport,
200
+ title={Qwen3 Technical Report},
201
+ author={Qwen Team},
202
+ year={2025},
203
+ eprint={2505.09388},
204
+ archivePrefix={arXiv},
205
+ primaryClass={cs.CL},
206
+ url={https://arxiv.org/abs/2505.09388},
207
+ }
208
+ ```