leaderboard-pr-bot commited on
Commit
26107a8
1 Parent(s): 3e61859

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +122 -5
README.md CHANGED
@@ -1,11 +1,114 @@
1
  ---
2
  license: mit
3
-
4
  widget:
5
- - text: |
6
- <|user|>
7
- Can you tell me a space adventure story?</s>
8
- <|assistant|>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
  Model trained on Tiny Stories. Followed up with conversations datasets, followed up with trimmed Cinder Dataset.
11
  Mini Cinder is ok at conversation and story telling for kids stories.
@@ -23,3 +126,17 @@ User: Ship member.
23
 
24
 
25
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/1fBgUSy8Aob7glhecfY3u.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
 
3
  widget:
4
+ - text: '<|user|>
5
+
6
+ Can you tell me a space adventure story?</s>
7
+
8
+ <|assistant|>'
9
+ model-index:
10
+ - name: 160M-TinyLLama-Mini-Cinder
11
+ results:
12
+ - task:
13
+ type: text-generation
14
+ name: Text Generation
15
+ dataset:
16
+ name: AI2 Reasoning Challenge (25-Shot)
17
+ type: ai2_arc
18
+ config: ARC-Challenge
19
+ split: test
20
+ args:
21
+ num_few_shot: 25
22
+ metrics:
23
+ - type: acc_norm
24
+ value: 24.66
25
+ name: normalized accuracy
26
+ source:
27
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/160M-TinyLLama-Mini-Cinder
28
+ name: Open LLM Leaderboard
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: HellaSwag (10-Shot)
34
+ type: hellaswag
35
+ split: validation
36
+ args:
37
+ num_few_shot: 10
38
+ metrics:
39
+ - type: acc_norm
40
+ value: 28.16
41
+ name: normalized accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/160M-TinyLLama-Mini-Cinder
44
+ name: Open LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: MMLU (5-Shot)
50
+ type: cais/mmlu
51
+ config: all
52
+ split: test
53
+ args:
54
+ num_few_shot: 5
55
+ metrics:
56
+ - type: acc
57
+ value: 25.09
58
+ name: accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/160M-TinyLLama-Mini-Cinder
61
+ name: Open LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: TruthfulQA (0-shot)
67
+ type: truthful_qa
68
+ config: multiple_choice
69
+ split: validation
70
+ args:
71
+ num_few_shot: 0
72
+ metrics:
73
+ - type: mc2
74
+ value: 44.08
75
+ source:
76
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/160M-TinyLLama-Mini-Cinder
77
+ name: Open LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: Winogrande (5-shot)
83
+ type: winogrande
84
+ config: winogrande_xl
85
+ split: validation
86
+ args:
87
+ num_few_shot: 5
88
+ metrics:
89
+ - type: acc
90
+ value: 52.57
91
+ name: accuracy
92
+ source:
93
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/160M-TinyLLama-Mini-Cinder
94
+ name: Open LLM Leaderboard
95
+ - task:
96
+ type: text-generation
97
+ name: Text Generation
98
+ dataset:
99
+ name: GSM8k (5-shot)
100
+ type: gsm8k
101
+ config: main
102
+ split: test
103
+ args:
104
+ num_few_shot: 5
105
+ metrics:
106
+ - type: acc
107
+ value: 0.0
108
+ name: accuracy
109
+ source:
110
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/160M-TinyLLama-Mini-Cinder
111
+ name: Open LLM Leaderboard
112
  ---
113
  Model trained on Tiny Stories. Followed up with conversations datasets, followed up with trimmed Cinder Dataset.
114
  Mini Cinder is ok at conversation and story telling for kids stories.
 
126
 
127
 
128
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6328952f798f8d122ce62a44/1fBgUSy8Aob7glhecfY3u.png)
129
+
130
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
131
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Josephgflowers__160M-TinyLLama-Mini-Cinder)
132
+
133
+ | Metric |Value|
134
+ |---------------------------------|----:|
135
+ |Avg. |29.09|
136
+ |AI2 Reasoning Challenge (25-Shot)|24.66|
137
+ |HellaSwag (10-Shot) |28.16|
138
+ |MMLU (5-Shot) |25.09|
139
+ |TruthfulQA (0-shot) |44.08|
140
+ |Winogrande (5-shot) |52.57|
141
+ |GSM8k (5-shot) | 0.00|
142
+