Commit
c0e7ac6
1 Parent(s): 2921859

Adding Evaluation Results (#3)

Browse files

- Adding Evaluation Results (06df4cb6333dfe20cda841370114db9e80c8c6d4)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +154 -48
README.md CHANGED
@@ -1,58 +1,151 @@
1
  ---
2
- base_model: mwitiderrick/open_llama_3b_code_instruct_0.1
 
 
 
 
 
3
  datasets:
4
  - mwitiderrick/AlpacaCode
 
5
  inference: true
6
  model_type: llama
7
- prompt_template: |
8
- <s>[INST]
9
- {prompt}
10
- [/INST]
11
  created_by: mwitiderrick
12
- tags:
13
- - transformers
14
- license: apache-2.0
15
- language:
16
- - en
17
- library_name: transformers
18
  pipeline_tag: text-generation
19
-
20
  model-index:
21
- - name: mwitiderrick/open_llama_3b_instruct_v_0.2
22
- results:
23
- - task:
24
- type: text-generation
25
- dataset:
26
- name: hellaswag
27
- type: hellaswag
28
- metrics:
29
- - name: hellaswag(0-Shot)
30
- type: hellaswag (0-Shot)
31
- value: 0.6600
32
- - task:
33
- type: text-generation
34
- dataset:
35
- name: winogrande
36
- type: winogrande
37
- metrics:
38
- - name: winogrande(0-Shot)
39
- type: winogrande (0-Shot)
40
- value: 0.6322
41
-
42
- - task:
43
- type: text-generation
44
- dataset:
45
- name: arc_challenge
46
- type: arc_challenge
47
- metrics:
48
- - name: arc_challenge(0-Shot)
49
- type: arc_challenge (0-Shot)
50
- value: 0.3447
51
- source:
52
- name: open_llama_3b_instruct_v_0.2 model card
53
- url: https://huggingface.co/mwitiderrick/open_llama_3b_instruct_v_0.2
54
-
55
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ---
57
  # OpenLLaMA Glaive: An Open Reproduction of LLaMA
58
 
@@ -122,4 +215,17 @@ def quick_sort(arr):
122
  |-------------|-------|------|-----:|--------|-----:|---|-----:|
123
  |arc_challenge|Yaml |none | 0|acc |0.3234|± |0.0137|
124
  | | |none | 0|acc_norm|0.3447|± |0.0139|
125
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ tags:
7
+ - transformers
8
  datasets:
9
  - mwitiderrick/AlpacaCode
10
+ base_model: mwitiderrick/open_llama_3b_code_instruct_0.1
11
  inference: true
12
  model_type: llama
13
+ prompt_template: "<s>[INST] \n{prompt}\n[/INST]\n"
 
 
 
14
  created_by: mwitiderrick
 
 
 
 
 
 
15
  pipeline_tag: text-generation
 
16
  model-index:
17
+ - name: mwitiderrick/open_llama_3b_instruct_v_0.2
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ dataset:
22
+ name: hellaswag
23
+ type: hellaswag
24
+ metrics:
25
+ - type: hellaswag (0-Shot)
26
+ value: 0.66
27
+ name: hellaswag(0-Shot)
28
+ - task:
29
+ type: text-generation
30
+ dataset:
31
+ name: winogrande
32
+ type: winogrande
33
+ metrics:
34
+ - type: winogrande (0-Shot)
35
+ value: 0.6322
36
+ name: winogrande(0-Shot)
37
+ - task:
38
+ type: text-generation
39
+ dataset:
40
+ name: arc_challenge
41
+ type: arc_challenge
42
+ metrics:
43
+ - type: arc_challenge (0-Shot)
44
+ value: 0.3447
45
+ name: arc_challenge(0-Shot)
46
+ source:
47
+ url: https://huggingface.co/mwitiderrick/open_llama_3b_instruct_v_0.2
48
+ name: open_llama_3b_instruct_v_0.2 model card
49
+ - task:
50
+ type: text-generation
51
+ name: Text Generation
52
+ dataset:
53
+ name: AI2 Reasoning Challenge (25-Shot)
54
+ type: ai2_arc
55
+ config: ARC-Challenge
56
+ split: test
57
+ args:
58
+ num_few_shot: 25
59
+ metrics:
60
+ - type: acc_norm
61
+ value: 40.7
62
+ name: normalized accuracy
63
+ source:
64
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
65
+ name: Open LLM Leaderboard
66
+ - task:
67
+ type: text-generation
68
+ name: Text Generation
69
+ dataset:
70
+ name: HellaSwag (10-Shot)
71
+ type: hellaswag
72
+ split: validation
73
+ args:
74
+ num_few_shot: 10
75
+ metrics:
76
+ - type: acc_norm
77
+ value: 67.45
78
+ name: normalized accuracy
79
+ source:
80
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
81
+ name: Open LLM Leaderboard
82
+ - task:
83
+ type: text-generation
84
+ name: Text Generation
85
+ dataset:
86
+ name: MMLU (5-Shot)
87
+ type: cais/mmlu
88
+ config: all
89
+ split: test
90
+ args:
91
+ num_few_shot: 5
92
+ metrics:
93
+ - type: acc
94
+ value: 27.74
95
+ name: accuracy
96
+ source:
97
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
98
+ name: Open LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: TruthfulQA (0-shot)
104
+ type: truthful_qa
105
+ config: multiple_choice
106
+ split: validation
107
+ args:
108
+ num_few_shot: 0
109
+ metrics:
110
+ - type: mc2
111
+ value: 35.86
112
+ source:
113
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
114
+ name: Open LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: Winogrande (5-shot)
120
+ type: winogrande
121
+ config: winogrande_xl
122
+ split: validation
123
+ args:
124
+ num_few_shot: 5
125
+ metrics:
126
+ - type: acc
127
+ value: 64.72
128
+ name: accuracy
129
+ source:
130
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
131
+ name: Open LLM Leaderboard
132
+ - task:
133
+ type: text-generation
134
+ name: Text Generation
135
+ dataset:
136
+ name: GSM8k (5-shot)
137
+ type: gsm8k
138
+ config: main
139
+ split: test
140
+ args:
141
+ num_few_shot: 5
142
+ metrics:
143
+ - type: acc
144
+ value: 1.97
145
+ name: accuracy
146
+ source:
147
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/open_llama_3b_glaive_assistant_v0.1
148
+ name: Open LLM Leaderboard
149
  ---
150
  # OpenLLaMA Glaive: An Open Reproduction of LLaMA
151
 
 
215
  |-------------|-------|------|-----:|--------|-----:|---|-----:|
216
  |arc_challenge|Yaml |none | 0|acc |0.3234|± |0.0137|
217
  | | |none | 0|acc_norm|0.3447|± |0.0139|
218
+ ```
219
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
220
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mwitiderrick__open_llama_3b_glaive_assistant_v0.1)
221
+
222
+ | Metric |Value|
223
+ |---------------------------------|----:|
224
+ |Avg. |39.74|
225
+ |AI2 Reasoning Challenge (25-Shot)|40.70|
226
+ |HellaSwag (10-Shot) |67.45|
227
+ |MMLU (5-Shot) |27.74|
228
+ |TruthfulQA (0-shot) |35.86|
229
+ |Winogrande (5-shot) |64.72|
230
+ |GSM8k (5-shot) | 1.97|
231
+