vicgalle leaderboard-pr-bot commited on
Commit
a2e6e31
1 Parent(s): 3fe9bf5

Adding Evaluation Results (#3)

Browse files

- Adding Evaluation Results (49e0538b1ee251cb2977c2ec717bd967d90944e3)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -109,6 +109,98 @@ model-index:
109
  source:
110
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B
111
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ---
113
  # CarbonBeagle-11B
114
 
@@ -172,3 +264,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
172
  |Winogrande (5-shot) |84.06|
173
  |GSM8k (5-shot) |66.94|
174
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  source:
110
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B
111
  name: Open LLM Leaderboard
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: IFEval (0-Shot)
117
+ type: HuggingFaceH4/ifeval
118
+ args:
119
+ num_few_shot: 0
120
+ metrics:
121
+ - type: inst_level_strict_acc and prompt_level_strict_acc
122
+ value: 54.15
123
+ name: strict accuracy
124
+ source:
125
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B
126
+ name: Open LLM Leaderboard
127
+ - task:
128
+ type: text-generation
129
+ name: Text Generation
130
+ dataset:
131
+ name: BBH (3-Shot)
132
+ type: BBH
133
+ args:
134
+ num_few_shot: 3
135
+ metrics:
136
+ - type: acc_norm
137
+ value: 33.06
138
+ name: normalized accuracy
139
+ source:
140
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B
141
+ name: Open LLM Leaderboard
142
+ - task:
143
+ type: text-generation
144
+ name: Text Generation
145
+ dataset:
146
+ name: MATH Lvl 5 (4-Shot)
147
+ type: hendrycks/competition_math
148
+ args:
149
+ num_few_shot: 4
150
+ metrics:
151
+ - type: exact_match
152
+ value: 5.51
153
+ name: exact match
154
+ source:
155
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B
156
+ name: Open LLM Leaderboard
157
+ - task:
158
+ type: text-generation
159
+ name: Text Generation
160
+ dataset:
161
+ name: GPQA (0-shot)
162
+ type: Idavidrein/gpqa
163
+ args:
164
+ num_few_shot: 0
165
+ metrics:
166
+ - type: acc_norm
167
+ value: 6.94
168
+ name: acc_norm
169
+ source:
170
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B
171
+ name: Open LLM Leaderboard
172
+ - task:
173
+ type: text-generation
174
+ name: Text Generation
175
+ dataset:
176
+ name: MuSR (0-shot)
177
+ type: TAUR-Lab/MuSR
178
+ args:
179
+ num_few_shot: 0
180
+ metrics:
181
+ - type: acc_norm
182
+ value: 9.19
183
+ name: acc_norm
184
+ source:
185
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B
186
+ name: Open LLM Leaderboard
187
+ - task:
188
+ type: text-generation
189
+ name: Text Generation
190
+ dataset:
191
+ name: MMLU-PRO (5-shot)
192
+ type: TIGER-Lab/MMLU-Pro
193
+ config: main
194
+ split: test
195
+ args:
196
+ num_few_shot: 5
197
+ metrics:
198
+ - type: acc
199
+ value: 25.29
200
+ name: accuracy
201
+ source:
202
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/CarbonBeagle-11B
203
+ name: Open LLM Leaderboard
204
  ---
205
  # CarbonBeagle-11B
206
 
 
264
  |Winogrande (5-shot) |84.06|
265
  |GSM8k (5-shot) |66.94|
266
 
267
+
268
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
269
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_vicgalle__CarbonBeagle-11B)
270
+
271
+ | Metric |Value|
272
+ |-------------------|----:|
273
+ |Avg. |22.36|
274
+ |IFEval (0-Shot) |54.15|
275
+ |BBH (3-Shot) |33.06|
276
+ |MATH Lvl 5 (4-Shot)| 5.51|
277
+ |GPQA (0-shot) | 6.94|
278
+ |MuSR (0-shot) | 9.19|
279
+ |MMLU-PRO (5-shot) |25.29|
280
+