Lin-K76 commited on
Commit
a69a373
·
verified ·
1 Parent(s): 313cd77

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -25,7 +25,7 @@ language:
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct).
28
- It achieves an average score of 82.78 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 82.74.
29
 
30
  ### Model Optimizations
31
 
@@ -130,8 +130,8 @@ oneshot(
130
 
131
  ## Evaluation
132
 
133
- The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command:
134
- A modified version of ARC-C was used for evaluations, in line with Llama 3.1's prompting.
135
  ```
136
  lm_eval \
137
  --model vllm \
@@ -177,11 +177,11 @@ lm_eval \
177
  <tr>
178
  <td>GSM-8K (5-shot, strict-match)
179
  </td>
180
- <td>87.95
181
  </td>
182
- <td>88.40
183
  </td>
184
- <td>100.5%
185
  </td>
186
  </tr>
187
  <tr>
@@ -217,11 +217,11 @@ lm_eval \
217
  <tr>
218
  <td><strong>Average</strong>
219
  </td>
220
- <td><strong>82.74</strong>
221
  </td>
222
- <td><strong>82.78</strong>
223
  </td>
224
- <td><strong>100.0%</strong>
225
  </td>
226
  </tr>
227
  </table>
 
25
  - **Model Developers:** Neural Magic
26
 
27
  Quantized version of [Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct).
28
+ It achieves an average score of 83.41 on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 83.61.
29
 
30
  ### Model Optimizations
31
 
 
130
 
131
  ## Evaluation
132
 
133
+ The model was evaluated on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) leaderboard tasks (version 1) with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and the [vLLM](https://docs.vllm.ai/en/stable/) engine, using the following command.
134
+ A modified version of ARC-C and GSM8k-cot was used for evaluations, in line with Llama 3.1's prompting. It can be accessed on the [Neural Magic fork of the lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct).
135
  ```
136
  lm_eval \
137
  --model vllm \
 
177
  <tr>
178
  <td>GSM-8K (5-shot, strict-match)
179
  </td>
180
+ <td>93.18
181
  </td>
182
+ <td>92.19
183
  </td>
184
+ <td>98.94%
185
  </td>
186
  </tr>
187
  <tr>
 
217
  <tr>
218
  <td><strong>Average</strong>
219
  </td>
220
+ <td><strong>83.61</strong>
221
  </td>
222
+ <td><strong>83.41</strong>
223
  </td>
224
+ <td><strong>99.76%</strong>
225
  </td>
226
  </tr>
227
  </table>