leaderboard-pr-bot commited on
Commit
268c899
1 Parent(s): bbc16db

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +112 -13
README.md CHANGED
@@ -21,8 +21,7 @@ model-index:
21
  value: 72.53
22
  name: normalized accuracy
23
  source:
24
- url: >-
25
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
26
  name: Open LLM Leaderboard
27
  - task:
28
  type: text-generation
@@ -38,8 +37,7 @@ model-index:
38
  value: 88.85
39
  name: normalized accuracy
40
  source:
41
- url: >-
42
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
43
  name: Open LLM Leaderboard
44
  - task:
45
  type: text-generation
@@ -56,8 +54,7 @@ model-index:
56
  value: 66.71
57
  name: accuracy
58
  source:
59
- url: >-
60
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
61
  name: Open LLM Leaderboard
62
  - task:
63
  type: text-generation
@@ -73,8 +70,7 @@ model-index:
73
  - type: mc2
74
  value: 77.13
75
  source:
76
- url: >-
77
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
78
  name: Open LLM Leaderboard
79
  - task:
80
  type: text-generation
@@ -91,8 +87,7 @@ model-index:
91
  value: 83.27
92
  name: accuracy
93
  source:
94
- url: >-
95
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
96
  name: Open LLM Leaderboard
97
  - task:
98
  type: text-generation
@@ -109,8 +104,99 @@ model-index:
109
  value: 63.91
110
  name: accuracy
111
  source:
112
- url: >-
113
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  name: Open LLM Leaderboard
115
  ---
116
 
@@ -155,4 +241,17 @@ If you find this work, data and/or models useful for your research, please consi
155
  archivePrefix={arXiv},
156
  primaryClass={cs.CL}
157
  }
158
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  value: 72.53
22
  name: normalized accuracy
23
  source:
24
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
 
25
  name: Open LLM Leaderboard
26
  - task:
27
  type: text-generation
 
37
  value: 88.85
38
  name: normalized accuracy
39
  source:
40
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
 
41
  name: Open LLM Leaderboard
42
  - task:
43
  type: text-generation
 
54
  value: 66.71
55
  name: accuracy
56
  source:
57
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
 
58
  name: Open LLM Leaderboard
59
  - task:
60
  type: text-generation
 
70
  - type: mc2
71
  value: 77.13
72
  source:
73
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
 
74
  name: Open LLM Leaderboard
75
  - task:
76
  type: text-generation
 
87
  value: 83.27
88
  name: accuracy
89
  source:
90
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
 
91
  name: Open LLM Leaderboard
92
  - task:
93
  type: text-generation
 
104
  value: 63.91
105
  name: accuracy
106
  source:
107
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
108
+ name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: IFEval (0-Shot)
114
+ type: HuggingFaceH4/ifeval
115
+ args:
116
+ num_few_shot: 0
117
+ metrics:
118
+ - type: inst_level_strict_acc and prompt_level_strict_acc
119
+ value: 58.34
120
+ name: strict accuracy
121
+ source:
122
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
123
+ name: Open LLM Leaderboard
124
+ - task:
125
+ type: text-generation
126
+ name: Text Generation
127
+ dataset:
128
+ name: BBH (3-Shot)
129
+ type: BBH
130
+ args:
131
+ num_few_shot: 3
132
+ metrics:
133
+ - type: acc_norm
134
+ value: 32.39
135
+ name: normalized accuracy
136
+ source:
137
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
138
+ name: Open LLM Leaderboard
139
+ - task:
140
+ type: text-generation
141
+ name: Text Generation
142
+ dataset:
143
+ name: MATH Lvl 5 (4-Shot)
144
+ type: hendrycks/competition_math
145
+ args:
146
+ num_few_shot: 4
147
+ metrics:
148
+ - type: exact_match
149
+ value: 3.7
150
+ name: exact match
151
+ source:
152
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
153
+ name: Open LLM Leaderboard
154
+ - task:
155
+ type: text-generation
156
+ name: Text Generation
157
+ dataset:
158
+ name: GPQA (0-shot)
159
+ type: Idavidrein/gpqa
160
+ args:
161
+ num_few_shot: 0
162
+ metrics:
163
+ - type: acc_norm
164
+ value: 6.94
165
+ name: acc_norm
166
+ source:
167
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
168
+ name: Open LLM Leaderboard
169
+ - task:
170
+ type: text-generation
171
+ name: Text Generation
172
+ dataset:
173
+ name: MuSR (0-shot)
174
+ type: TAUR-Lab/MuSR
175
+ args:
176
+ num_few_shot: 0
177
+ metrics:
178
+ - type: acc_norm
179
+ value: 7.38
180
+ name: acc_norm
181
+ source:
182
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
183
+ name: Open LLM Leaderboard
184
+ - task:
185
+ type: text-generation
186
+ name: Text Generation
187
+ dataset:
188
+ name: MMLU-PRO (5-shot)
189
+ type: TIGER-Lab/MMLU-Pro
190
+ config: main
191
+ split: test
192
+ args:
193
+ num_few_shot: 5
194
+ metrics:
195
+ - type: acc
196
+ value: 26.38
197
+ name: accuracy
198
+ source:
199
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=vicgalle/ConfigurableBeagle-11B
200
  name: Open LLM Leaderboard
201
  ---
202
 
 
241
  archivePrefix={arXiv},
242
  primaryClass={cs.CL}
243
  }
244
+ ```
245
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
246
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_vicgalle__ConfigurableBeagle-11B)
247
+
248
+ | Metric |Value|
249
+ |-------------------|----:|
250
+ |Avg. |22.52|
251
+ |IFEval (0-Shot) |58.34|
252
+ |BBH (3-Shot) |32.39|
253
+ |MATH Lvl 5 (4-Shot)| 3.70|
254
+ |GPQA (0-shot) | 6.94|
255
+ |MuSR (0-shot) | 7.38|
256
+ |MMLU-PRO (5-shot) |26.38|
257
+