alexmarques commited on
Commit
e574bf6
·
verified ·
1 Parent(s): cb153b0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -31
README.md CHANGED
@@ -33,7 +33,7 @@ base_model: meta-llama/Meta-Llama-3.1-405B-Instruct
33
  - **Model Developers:** Neural Magic
34
 
35
  Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct).
36
- It achieves scores within 1% of the scores of the unquantized model for MMLU, ARC-Challenge, GSM-8k, Hellaswag, Winogrande and TruthfulQA.
37
 
38
  ### Model Optimizations
39
 
@@ -149,6 +149,8 @@ The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande an
149
  Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
150
  This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-405B-Instruct-evals).
151
 
 
 
152
  ### Accuracy
153
 
154
  #### Open LLM Leaderboard evaluation scores
@@ -158,7 +160,7 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
158
  </td>
159
  <td><strong>Meta-Llama-3.1-405B-Instruct </strong>
160
  </td>
161
- <td><strong>Meta-Llama-3.1-405B-Instruct-quantized.w8a8 (this model)</strong>
162
  </td>
163
  <td><strong>Recovery</strong>
164
  </td>
@@ -166,31 +168,21 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
166
  <tr>
167
  <td>MMLU (5-shot)
168
  </td>
169
- <td>87.41
170
  </td>
171
  <td>86.76
172
  </td>
173
  <td>99.3%
174
  </td>
175
- </tr>
176
- <tr>
177
- <td>MMLU (CoT, 0-shot)
178
- </td>
179
- <td>88.26
180
- </td>
181
- <td>87.42
182
- </td>
183
- <td>99.0%
184
- </td>
185
  </tr>
186
  <tr>
187
  <td>ARC Challenge (0-shot)
188
  </td>
189
  <td>94.97
190
  </td>
191
- <td>94.62
192
  </td>
193
- <td>99.6%
194
  </td>
195
  </tr>
196
  <tr>
@@ -198,19 +190,19 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
198
  </td>
199
  <td>96.44
200
  </td>
201
- <td>96.13
202
  </td>
203
- <td>99.7%
204
  </td>
205
  </tr>
206
  <tr>
207
  <td>Hellaswag (10-shot)
208
- </td>
209
  <td>88.33
210
  </td>
211
- <td>88.08
212
  </td>
213
- <td>99.7%
214
  </td>
215
  </tr>
216
  <tr>
@@ -218,19 +210,19 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
218
  </td>
219
  <td>87.21
220
  </td>
221
- <td>86.42
222
  </td>
223
- <td>99.7%
224
  </td>
225
  </tr>
226
  <tr>
227
- <td>TruthfulQA (0-shot, mc2)
228
  </td>
229
  <td>64.64
230
  </td>
231
- <td>64.44
232
  </td>
233
- <td>99.7%
234
  </td>
235
  </tr>
236
  <tr>
@@ -238,9 +230,9 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
238
  </td>
239
  <td><strong>86.75</strong>
240
  </td>
241
- <td><strong>86.27</strong>
242
  </td>
243
- <td><strong>99.4%</strong>
244
  </td>
245
  </tr>
246
  </table>
@@ -253,7 +245,7 @@ The results were obtained using the following commands:
253
  ```
254
  lm_eval \
255
  --model vllm \
256
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8",dtype=auto,add_bos_token=True,max_model_len=3850,max_gen_toks=10,tensor_parallel_size=8 \
257
  --tasks mmlu_llama_3.1_instruct \
258
  --fewshot_as_multiturn \
259
  --apply_chat_template \
@@ -265,7 +257,7 @@ lm_eval \
265
  ```
266
  lm_eval \
267
  --model vllm \
268
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8",dtype=auto,add_bos_token=True,max_model_len=4064,max_gen_toks=1024,tensor_parallel_size=8 \
269
  --tasks mmlu_cot_0shot_llama_3.1_instruct \
270
  --apply_chat_template \
271
  --num_fewshot 0 \
@@ -276,7 +268,7 @@ lm_eval \
276
  ```
277
  lm_eval \
278
  --model vllm \
279
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8",dtype=auto,add_bos_token=True,max_model_len=3940,max_gen_toks=100,tensor_parallel_size=8 \
280
  --tasks arc_challenge_llama_3.1_instruct \
281
  --apply_chat_template \
282
  --num_fewshot 0 \
@@ -287,7 +279,7 @@ lm_eval \
287
  ```
288
  lm_eval \
289
  --model vllm \
290
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8",dtype=auto,add_bos_token=True,max_model_len=4096,max_gen_toks=1024,tensor_parallel_size=8 \
291
  --tasks gsm8k_cot_llama_3.1_instruct \
292
  --fewshot_as_multiturn \
293
  --apply_chat_template \
 
33
  - **Model Developers:** Neural Magic
34
 
35
  Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct).
36
+ It achieves scores within 1.3% of the scores of the unquantized model for MMLU, ARC-Challenge, GSM-8k, Hellaswag, Winogrande and TruthfulQA.
37
 
38
  ### Model Optimizations
39
 
 
149
  Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
150
  This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-405B-Instruct-evals).
151
 
152
+ **Note:** Results have been updated after Meta modified the chat template.
153
+
154
  ### Accuracy
155
 
156
  #### Open LLM Leaderboard evaluation scores
 
160
  </td>
161
  <td><strong>Meta-Llama-3.1-405B-Instruct </strong>
162
  </td>
163
+ <td><strong>Meta-Llama-3.1-405B-Instruct-quantized.w4a16 (this model)</strong>
164
  </td>
165
  <td><strong>Recovery</strong>
166
  </td>
 
168
  <tr>
169
  <td>MMLU (5-shot)
170
  </td>
171
+ <td>87.38
172
  </td>
173
  <td>86.76
174
  </td>
175
  <td>99.3%
176
  </td>
 
 
 
 
 
 
 
 
 
 
177
  </tr>
178
  <tr>
179
  <td>ARC Challenge (0-shot)
180
  </td>
181
  <td>94.97
182
  </td>
183
+ <td>94.37
184
  </td>
185
+ <td>99.4%
186
  </td>
187
  </tr>
188
  <tr>
 
190
  </td>
191
  <td>96.44
192
  </td>
193
+ <td>95.45
194
  </td>
195
+ <td>99.0%
196
  </td>
197
  </tr>
198
  <tr>
199
  <td>Hellaswag (10-shot)
200
+ </td>
201
  <td>88.33
202
  </td>
203
+ <td>88.15
204
  </td>
205
+ <td>99.8%
206
  </td>
207
  </tr>
208
  <tr>
 
210
  </td>
211
  <td>87.21
212
  </td>
213
+ <td>86.11
214
  </td>
215
+ <td>98.7%
216
  </td>
217
  </tr>
218
  <tr>
219
+ <td>TruthfulQA (0-shot)
220
  </td>
221
  <td>64.64
222
  </td>
223
+ <td>64.39
224
  </td>
225
+ <td>99.6%
226
  </td>
227
  </tr>
228
  <tr>
 
230
  </td>
231
  <td><strong>86.75</strong>
232
  </td>
233
+ <td><strong>86.11</strong>
234
  </td>
235
+ <td><strong>99.3%</strong>
236
  </td>
237
  </tr>
238
  </table>
 
245
  ```
246
  lm_eval \
247
  --model vllm \
248
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8",dtype=auto,max_model_len=3850,max_gen_toks=10,tensor_parallel_size=8 \
249
  --tasks mmlu_llama_3.1_instruct \
250
  --fewshot_as_multiturn \
251
  --apply_chat_template \
 
257
  ```
258
  lm_eval \
259
  --model vllm \
260
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8",dtype=auto,max_model_len=4064,max_gen_toks=1024,tensor_parallel_size=8 \
261
  --tasks mmlu_cot_0shot_llama_3.1_instruct \
262
  --apply_chat_template \
263
  --num_fewshot 0 \
 
268
  ```
269
  lm_eval \
270
  --model vllm \
271
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8",dtype=auto,max_model_len=3940,max_gen_toks=100,tensor_parallel_size=8 \
272
  --tasks arc_challenge_llama_3.1_instruct \
273
  --apply_chat_template \
274
  --num_fewshot 0 \
 
279
  ```
280
  lm_eval \
281
  --model vllm \
282
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a8",dtype=auto,max_model_len=4096,max_gen_toks=1024,tensor_parallel_size=8 \
283
  --tasks gsm8k_cot_llama_3.1_instruct \
284
  --fewshot_as_multiturn \
285
  --apply_chat_template \