Update README.md
Browse files
README.md
CHANGED
@@ -234,6 +234,26 @@ We evaluate our model on all benchmarks of the leaderboard's version 2 using the
|
|
234 |
| `falcon2-11B` | 32.61 | 21.94 | 2.34 | 2.8 | 7.53 | 15.44 | 13.78 |
|
235 |
| `Mistral-7B-v0.1` | 23.86 | 22.02 | 2.49 | 5.59 | 10.68 | 22.36 | 14.50 |
|
236 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
237 |
## Throughput
|
238 |
|
239 |
This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands:
|
|
|
234 |
| `falcon2-11B` | 32.61 | 21.94 | 2.34 | 2.8 | 7.53 | 15.44 | 13.78 |
|
235 |
| `Mistral-7B-v0.1` | 23.86 | 22.02 | 2.49 | 5.59 | 10.68 | 22.36 | 14.50 |
|
236 |
|
237 |
+
|
238 |
+
64.09 | hellaswag: 80.82 | arc-c: 62.03 | winogrande: 73.64 | truthfulqa: 53.42 | mmlu: 62.11 | gsm8k: 52.54
|
239 |
+
|
240 |
+
| `model name` |`ARC`|`HellaSwag`|`MMLU`|`Winogrande`|`TruthfulQA`|`GSM8K`|`Average`|
|
241 |
+
|:-------------------|:---:|:---------:|:----:|:----------:|:----------:|:-----:|:-------:|
|
242 |
+
| ***Pure SSM models***| | | | | | | |
|
243 |
+
| `Falcon-Mamba-7B` |62.03| 80.82 | 62.11| 73.64 | 53.42 | 52.54 | 64.09 |
|
244 |
+
| `mamba1` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
|
245 |
+
| `mamba2` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
|
246 |
+
| `mamba3` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
|
247 |
+
|***Hybrid SSM-attention models***|| | | | | | |
|
248 |
+
| `hybrid1` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
|
249 |
+
| `hybrid2` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
|
250 |
+
| `hybrid3` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
|
251 |
+
|***Transformer models***| | | | | | | |
|
252 |
+
| `Meta-Llama-3-8B` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
|
253 |
+
| `gemma-7B` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
|
254 |
+
| `falcon2-11B` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
|
255 |
+
| `Mistral-7B-v0.1` |00.00| 00.00 | 00.00| 00.00 | 00.00 | 00.00 | 00.00 |
|
256 |
+
|
257 |
## Throughput
|
258 |
|
259 |
This model can achieve comparable throughput and performance compared to other transformer based models that use optimized kernels such as Flash Attention 2. Make sure to install the optimized Mamba kernels with the following commands:
|