Adding Evaluation Results

#2
Files changed (1) hide show
  1. README.md +21 -13
README.md CHANGED
@@ -29,8 +29,7 @@ model-index:
29
  value: 60.64
30
  name: averaged accuracy
31
  source:
32
- url: >-
33
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
34
  name: Open LLM Leaderboard
35
  - task:
36
  type: text-generation
@@ -46,8 +45,7 @@ model-index:
46
  value: 46.53
47
  name: normalized accuracy
48
  source:
49
- url: >-
50
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
51
  name: Open LLM Leaderboard
52
  - task:
53
  type: text-generation
@@ -63,8 +61,7 @@ model-index:
63
  value: 37.08
64
  name: exact match
65
  source:
66
- url: >-
67
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
68
  name: Open LLM Leaderboard
69
  - task:
70
  type: text-generation
@@ -80,8 +77,7 @@ model-index:
80
  value: 16.44
81
  name: acc_norm
82
  source:
83
- url: >-
84
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
85
  name: Open LLM Leaderboard
86
  - task:
87
  type: text-generation
@@ -96,8 +92,7 @@ model-index:
96
  value: 20.95
97
  name: acc_norm
98
  source:
99
- url: >-
100
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
101
  name: Open LLM Leaderboard
102
  - task:
103
  type: text-generation
@@ -114,8 +109,7 @@ model-index:
114
  value: 47.85
115
  name: accuracy
116
  source:
117
- url: >-
118
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
119
  name: Open LLM Leaderboard
120
  ---
121
  ![opus.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/BELYApcX2oNMRsOW6nIyR.gif)
@@ -226,4 +220,18 @@ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-
226
  |MATH Lvl 5 (4-Shot)| 37.08|
227
  |GPQA (0-shot) | 16.44|
228
  |MuSR (0-shot) | 20.95|
229
- |MMLU-PRO (5-shot) | 47.85|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  value: 60.64
30
  name: averaged accuracy
31
  source:
32
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
 
33
  name: Open LLM Leaderboard
34
  - task:
35
  type: text-generation
 
45
  value: 46.53
46
  name: normalized accuracy
47
  source:
48
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
 
49
  name: Open LLM Leaderboard
50
  - task:
51
  type: text-generation
 
61
  value: 37.08
62
  name: exact match
63
  source:
64
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
 
65
  name: Open LLM Leaderboard
66
  - task:
67
  type: text-generation
 
77
  value: 16.44
78
  name: acc_norm
79
  source:
80
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
 
81
  name: Open LLM Leaderboard
82
  - task:
83
  type: text-generation
 
92
  value: 20.95
93
  name: acc_norm
94
  source:
95
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
 
96
  name: Open LLM Leaderboard
97
  - task:
98
  type: text-generation
 
109
  value: 47.85
110
  name: accuracy
111
  source:
112
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=prithivMLmods%2FCalcium-Opus-14B-Elite
 
113
  name: Open LLM Leaderboard
114
  ---
115
  ![opus.gif](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/BELYApcX2oNMRsOW6nIyR.gif)
 
220
  |MATH Lvl 5 (4-Shot)| 37.08|
221
  |GPQA (0-shot) | 16.44|
222
  |MuSR (0-shot) | 20.95|
223
+ |MMLU-PRO (5-shot) | 47.85|
224
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
225
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/prithivMLmods__Calcium-Opus-14B-Elite-details)!
226
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=prithivMLmods%2FCalcium-Opus-14B-Elite&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
227
+
228
+ | Metric |Value (%)|
229
+ |-------------------|--------:|
230
+ |**Average** | 40.08|
231
+ |IFEval (0-Shot) | 60.52|
232
+ |BBH (3-Shot) | 46.93|
233
+ |MATH Lvl 5 (4-Shot)| 47.89|
234
+ |GPQA (0-shot) | 16.55|
235
+ |MuSR (0-shot) | 20.78|
236
+ |MMLU-PRO (5-shot) | 47.80|
237
+