Commit
0a5b994
1 Parent(s): 3a2cb50

Adding Evaluation Results (#10)

Browse files

- Adding Evaluation Results (d9b009c42a5019cd655073fe3f142c1faec9a9e9)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +167 -42
README.md CHANGED
@@ -1,52 +1,163 @@
1
  ---
2
  license: apache-2.0
3
  tags:
4
- - text-generation
5
  base_model: Locutusque/TinyMistral-248M
6
  datasets:
7
- - OpenAssistant/oasst_top1_2023-08-25
8
  widget:
9
- - messages:
10
- - role: user
11
- content: "Invited some friends to come home today. Give me some ideas for games to play with them!"
12
- - messages:
13
- - role: user
14
- content: "How do meteorologists predict how much air pollution will be produced in the next year?"
15
- - messages:
16
- - role: user
17
- content: "Who is Mona Lisa?"
18
- - messages:
19
- - role: user
20
- content: "Heya!"
21
- - role: assistant
22
- content: "Hi! How may I help you today?"
23
- - role: user
24
- content: "I need to build a simple website. Where should I start learning about web development?"
25
- - messages:
26
- - role: user
27
- content: "What are some potential applications for quantum computing?"
28
- - messages:
29
- - role: user
30
- content: "Got a question for you!"
31
- - role: assistant
32
- content: "Sure! What's it?"
33
- - role: user
34
- content: "Why do you love cats so much!? 🐈"
35
- - messages:
36
- - role: user
37
- content: "Tell me about the pros and cons of social media."
38
- - messages:
39
- - role: user
40
- content: "Question: What is a dog?"
41
- - role: assistant
42
- content: "A dog is a four-legged, domesticated animal that is a member of the class Mammalia, which includes all mammals. Dogs are known for their loyalty, playfulness, and ability to be trained for various tasks. They are also used for hunting, herding, and as service animals."
43
- - role: user
44
- content: "Question: What is the capital of France?"
45
- - role: assistant
46
- content: "The capital of France is Paris. Paris is located in the north-central region of France and is known for its famous landmarks, such as the Eiffel Tower, the Louvre Museum, and Notre-Dame Cathedral."
47
- - role: user
48
- content: "Question: What is the color of an apple?"
 
 
 
 
 
 
 
 
49
  inference: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  ---
51
 
52
  # Locutusque's TinyMistral-248M trained on OpenAssistant TOP-1 Conversation Threads
@@ -77,3 +188,17 @@ To try out this model online, please visit this HuggingFace Space: [Felladrin/Mo
77
  penalty_alpha: 0.5
78
  top_k: 5
79
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  tags:
4
+ - text-generation
5
  base_model: Locutusque/TinyMistral-248M
6
  datasets:
7
+ - OpenAssistant/oasst_top1_2023-08-25
8
  widget:
9
+ - messages:
10
+ - role: user
11
+ content: Invited some friends to come home today. Give me some ideas for games
12
+ to play with them!
13
+ - messages:
14
+ - role: user
15
+ content: How do meteorologists predict how much air pollution will be produced
16
+ in the next year?
17
+ - messages:
18
+ - role: user
19
+ content: Who is Mona Lisa?
20
+ - messages:
21
+ - role: user
22
+ content: Heya!
23
+ - role: assistant
24
+ content: Hi! How may I help you today?
25
+ - role: user
26
+ content: I need to build a simple website. Where should I start learning about
27
+ web development?
28
+ - messages:
29
+ - role: user
30
+ content: What are some potential applications for quantum computing?
31
+ - messages:
32
+ - role: user
33
+ content: Got a question for you!
34
+ - role: assistant
35
+ content: Sure! What's it?
36
+ - role: user
37
+ content: Why do you love cats so much!? 🐈
38
+ - messages:
39
+ - role: user
40
+ content: Tell me about the pros and cons of social media.
41
+ - messages:
42
+ - role: user
43
+ content: 'Question: What is a dog?'
44
+ - role: assistant
45
+ content: A dog is a four-legged, domesticated animal that is a member of the class
46
+ Mammalia, which includes all mammals. Dogs are known for their loyalty, playfulness,
47
+ and ability to be trained for various tasks. They are also used for hunting,
48
+ herding, and as service animals.
49
+ - role: user
50
+ content: 'Question: What is the capital of France?'
51
+ - role: assistant
52
+ content: The capital of France is Paris. Paris is located in the north-central
53
+ region of France and is known for its famous landmarks, such as the Eiffel Tower,
54
+ the Louvre Museum, and Notre-Dame Cathedral.
55
+ - role: user
56
+ content: 'Question: What is the color of an apple?'
57
  inference: false
58
+ model-index:
59
+ - name: TinyMistral-248M-SFT-v3
60
+ results:
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: AI2 Reasoning Challenge (25-Shot)
66
+ type: ai2_arc
67
+ config: ARC-Challenge
68
+ split: test
69
+ args:
70
+ num_few_shot: 25
71
+ metrics:
72
+ - type: acc_norm
73
+ value: 21.93
74
+ name: normalized accuracy
75
+ source:
76
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
77
+ name: Open LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: HellaSwag (10-Shot)
83
+ type: hellaswag
84
+ split: validation
85
+ args:
86
+ num_few_shot: 10
87
+ metrics:
88
+ - type: acc_norm
89
+ value: 28.26
90
+ name: normalized accuracy
91
+ source:
92
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
93
+ name: Open LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: MMLU (5-Shot)
99
+ type: cais/mmlu
100
+ config: all
101
+ split: test
102
+ args:
103
+ num_few_shot: 5
104
+ metrics:
105
+ - type: acc
106
+ value: 22.91
107
+ name: accuracy
108
+ source:
109
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
110
+ name: Open LLM Leaderboard
111
+ - task:
112
+ type: text-generation
113
+ name: Text Generation
114
+ dataset:
115
+ name: TruthfulQA (0-shot)
116
+ type: truthful_qa
117
+ config: multiple_choice
118
+ split: validation
119
+ args:
120
+ num_few_shot: 0
121
+ metrics:
122
+ - type: mc2
123
+ value: 40.03
124
+ source:
125
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
126
+ name: Open LLM Leaderboard
127
+ - task:
128
+ type: text-generation
129
+ name: Text Generation
130
+ dataset:
131
+ name: Winogrande (5-shot)
132
+ type: winogrande
133
+ config: winogrande_xl
134
+ split: validation
135
+ args:
136
+ num_few_shot: 5
137
+ metrics:
138
+ - type: acc
139
+ value: 51.54
140
+ name: accuracy
141
+ source:
142
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
143
+ name: Open LLM Leaderboard
144
+ - task:
145
+ type: text-generation
146
+ name: Text Generation
147
+ dataset:
148
+ name: GSM8k (5-shot)
149
+ type: gsm8k
150
+ config: main
151
+ split: test
152
+ args:
153
+ num_few_shot: 5
154
+ metrics:
155
+ - type: acc
156
+ value: 0.0
157
+ name: accuracy
158
+ source:
159
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/TinyMistral-248M-SFT-v3
160
+ name: Open LLM Leaderboard
161
  ---
162
 
163
  # Locutusque's TinyMistral-248M trained on OpenAssistant TOP-1 Conversation Threads
 
188
  penalty_alpha: 0.5
189
  top_k: 5
190
  ```
191
+
192
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
193
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__TinyMistral-248M-SFT-v3)
194
+
195
+ | Metric |Value|
196
+ |---------------------------------|----:|
197
+ |Avg. |27.45|
198
+ |AI2 Reasoning Challenge (25-Shot)|21.93|
199
+ |HellaSwag (10-Shot) |28.26|
200
+ |MMLU (5-Shot) |22.91|
201
+ |TruthfulQA (0-shot) |40.03|
202
+ |Winogrande (5-shot) |51.54|
203
+ |GSM8k (5-shot) | 0.00|
204
+