leaderboard-pt-pr-bot commited on
Commit
94b29d2
•
1 Parent(s): dd4bcf7

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +171 -5
README.md CHANGED
@@ -1,12 +1,159 @@
1
  ---
2
- license: mit
3
- license_link: https://huggingface.co/upstage/solar-pro-preview-instruct/blob/main/LICENSE
4
  language:
5
  - en
6
- pipeline_tag: text-generation
 
7
  tags:
8
  - nlp
9
- library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
  <p align="left">
@@ -126,4 +273,23 @@ Learn more:
126
  Also try out:
127
 
128
  - [Document Parse](http://developers.upstage.ai/docs/apis/document-parse?utm_campaign=solarpro-preview-launch): An industry-leading model for converting complex document files to LLM-compatible HTML formats.
129
- - [Solar DocVision Preview](http://developers.upstage.ai/docs/apis/document-qa?utm_campaign=solarpro-preview-launch): A vision LLM specialized on documents.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
2
  language:
3
  - en
4
+ license: mit
5
+ library_name: transformers
6
  tags:
7
  - nlp
8
+ license_link: https://huggingface.co/upstage/solar-pro-preview-instruct/blob/main/LICENSE
9
+ pipeline_tag: text-generation
10
+ model-index:
11
+ - name: solar-pro-preview-instruct
12
+ results:
13
+ - task:
14
+ type: text-generation
15
+ name: Text Generation
16
+ dataset:
17
+ name: ENEM Challenge (No Images)
18
+ type: eduagarcia/enem_challenge
19
+ split: train
20
+ args:
21
+ num_few_shot: 3
22
+ metrics:
23
+ - type: acc
24
+ value: 73.97
25
+ name: accuracy
26
+ source:
27
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=upstage/solar-pro-preview-instruct
28
+ name: Open Portuguese LLM Leaderboard
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: BLUEX (No Images)
34
+ type: eduagarcia-temp/BLUEX_without_images
35
+ split: train
36
+ args:
37
+ num_few_shot: 3
38
+ metrics:
39
+ - type: acc
40
+ value: 66.76
41
+ name: accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=upstage/solar-pro-preview-instruct
44
+ name: Open Portuguese LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: OAB Exams
50
+ type: eduagarcia/oab_exams
51
+ split: train
52
+ args:
53
+ num_few_shot: 3
54
+ metrics:
55
+ - type: acc
56
+ value: 56.13
57
+ name: accuracy
58
+ source:
59
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=upstage/solar-pro-preview-instruct
60
+ name: Open Portuguese LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: Assin2 RTE
66
+ type: assin2
67
+ split: test
68
+ args:
69
+ num_few_shot: 15
70
+ metrics:
71
+ - type: f1_macro
72
+ value: 93.87
73
+ name: f1-macro
74
+ source:
75
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=upstage/solar-pro-preview-instruct
76
+ name: Open Portuguese LLM Leaderboard
77
+ - task:
78
+ type: text-generation
79
+ name: Text Generation
80
+ dataset:
81
+ name: Assin2 STS
82
+ type: eduagarcia/portuguese_benchmark
83
+ split: test
84
+ args:
85
+ num_few_shot: 15
86
+ metrics:
87
+ - type: pearson
88
+ value: 75.41
89
+ name: pearson
90
+ source:
91
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=upstage/solar-pro-preview-instruct
92
+ name: Open Portuguese LLM Leaderboard
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: FaQuAD NLI
98
+ type: ruanchaves/faquad-nli
99
+ split: test
100
+ args:
101
+ num_few_shot: 15
102
+ metrics:
103
+ - type: f1_macro
104
+ value: 77.52
105
+ name: f1-macro
106
+ source:
107
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=upstage/solar-pro-preview-instruct
108
+ name: Open Portuguese LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: HateBR Binary
114
+ type: ruanchaves/hatebr
115
+ split: test
116
+ args:
117
+ num_few_shot: 25
118
+ metrics:
119
+ - type: f1_macro
120
+ value: 81.6
121
+ name: f1-macro
122
+ source:
123
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=upstage/solar-pro-preview-instruct
124
+ name: Open Portuguese LLM Leaderboard
125
+ - task:
126
+ type: text-generation
127
+ name: Text Generation
128
+ dataset:
129
+ name: PT Hate Speech Binary
130
+ type: hate_speech_portuguese
131
+ split: test
132
+ args:
133
+ num_few_shot: 25
134
+ metrics:
135
+ - type: f1_macro
136
+ value: 69.73
137
+ name: f1-macro
138
+ source:
139
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=upstage/solar-pro-preview-instruct
140
+ name: Open Portuguese LLM Leaderboard
141
+ - task:
142
+ type: text-generation
143
+ name: Text Generation
144
+ dataset:
145
+ name: tweetSentBR
146
+ type: eduagarcia/tweetsentbr_fewshot
147
+ split: test
148
+ args:
149
+ num_few_shot: 25
150
+ metrics:
151
+ - type: f1_macro
152
+ value: 68.22
153
+ name: f1-macro
154
+ source:
155
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=upstage/solar-pro-preview-instruct
156
+ name: Open Portuguese LLM Leaderboard
157
  ---
158
 
159
  <p align="left">
 
273
  Also try out:
274
 
275
  - [Document Parse](http://developers.upstage.ai/docs/apis/document-parse?utm_campaign=solarpro-preview-launch): An industry-leading model for converting complex document files to LLM-compatible HTML formats.
276
+ - [Solar DocVision Preview](http://developers.upstage.ai/docs/apis/document-qa?utm_campaign=solarpro-preview-launch): A vision LLM specialized on documents.
277
+
278
+
279
+ # Open Portuguese LLM Leaderboard Evaluation Results
280
+
281
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/upstage/solar-pro-preview-instruct) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
282
+
283
+ | Metric | Value |
284
+ |--------------------------|---------|
285
+ |Average |**73.69**|
286
+ |ENEM Challenge (No Images)| 73.97|
287
+ |BLUEX (No Images) | 66.76|
288
+ |OAB Exams | 56.13|
289
+ |Assin2 RTE | 93.87|
290
+ |Assin2 STS | 75.41|
291
+ |FaQuAD NLI | 77.52|
292
+ |HateBR Binary | 81.60|
293
+ |PT Hate Speech Binary | 69.73|
294
+ |tweetSentBR | 68.22|
295
+