leaderboard-pt-pr-bot commited on
Commit
d2ac481
•
1 Parent(s): 3a088cb

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +170 -4
README.md CHANGED
@@ -1,11 +1,158 @@
1
  ---
 
 
 
2
  license: mit
3
  datasets:
4
  - wenbopan/Fusang-v1
5
  - wenbopan/OpenOrca-zh-20k
6
- language:
7
- - zh
8
- - en
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
  ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/62cd3a3691d27e60db0698b0/s21sMRxRT56c5t4M15GBP.webp)
@@ -60,4 +207,23 @@ response = tokenizer.decode(generated_ids[0], skip_special_tokens=True) # Aye, m
60
 
61
  </details>
62
 
63
- For more info please refer to [wenbopan/Faro-Yi-9B](https://huggingface.co/wenbopan/Faro-Yi-9B)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - zh
4
+ - en
5
  license: mit
6
  datasets:
7
  - wenbopan/Fusang-v1
8
  - wenbopan/OpenOrca-zh-20k
9
+ model-index:
10
+ - name: Faro-Yi-34B
11
+ results:
12
+ - task:
13
+ type: text-generation
14
+ name: Text Generation
15
+ dataset:
16
+ name: ENEM Challenge (No Images)
17
+ type: eduagarcia/enem_challenge
18
+ split: train
19
+ args:
20
+ num_few_shot: 3
21
+ metrics:
22
+ - type: acc
23
+ value: 73.2
24
+ name: accuracy
25
+ source:
26
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B
27
+ name: Open Portuguese LLM Leaderboard
28
+ - task:
29
+ type: text-generation
30
+ name: Text Generation
31
+ dataset:
32
+ name: BLUEX (No Images)
33
+ type: eduagarcia-temp/BLUEX_without_images
34
+ split: train
35
+ args:
36
+ num_few_shot: 3
37
+ metrics:
38
+ - type: acc
39
+ value: 64.81
40
+ name: accuracy
41
+ source:
42
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B
43
+ name: Open Portuguese LLM Leaderboard
44
+ - task:
45
+ type: text-generation
46
+ name: Text Generation
47
+ dataset:
48
+ name: OAB Exams
49
+ type: eduagarcia/oab_exams
50
+ split: train
51
+ args:
52
+ num_few_shot: 3
53
+ metrics:
54
+ - type: acc
55
+ value: 54.53
56
+ name: accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B
59
+ name: Open Portuguese LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: Assin2 RTE
65
+ type: assin2
66
+ split: test
67
+ args:
68
+ num_few_shot: 15
69
+ metrics:
70
+ - type: f1_macro
71
+ value: 91.58
72
+ name: f1-macro
73
+ source:
74
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B
75
+ name: Open Portuguese LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: Assin2 STS
81
+ type: eduagarcia/portuguese_benchmark
82
+ split: test
83
+ args:
84
+ num_few_shot: 15
85
+ metrics:
86
+ - type: pearson
87
+ value: 79.37
88
+ name: pearson
89
+ source:
90
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B
91
+ name: Open Portuguese LLM Leaderboard
92
+ - task:
93
+ type: text-generation
94
+ name: Text Generation
95
+ dataset:
96
+ name: FaQuAD NLI
97
+ type: ruanchaves/faquad-nli
98
+ split: test
99
+ args:
100
+ num_few_shot: 15
101
+ metrics:
102
+ - type: f1_macro
103
+ value: 71.84
104
+ name: f1-macro
105
+ source:
106
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B
107
+ name: Open Portuguese LLM Leaderboard
108
+ - task:
109
+ type: text-generation
110
+ name: Text Generation
111
+ dataset:
112
+ name: HateBR Binary
113
+ type: ruanchaves/hatebr
114
+ split: test
115
+ args:
116
+ num_few_shot: 25
117
+ metrics:
118
+ - type: f1_macro
119
+ value: 87.1
120
+ name: f1-macro
121
+ source:
122
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B
123
+ name: Open Portuguese LLM Leaderboard
124
+ - task:
125
+ type: text-generation
126
+ name: Text Generation
127
+ dataset:
128
+ name: PT Hate Speech Binary
129
+ type: hate_speech_portuguese
130
+ split: test
131
+ args:
132
+ num_few_shot: 25
133
+ metrics:
134
+ - type: f1_macro
135
+ value: 65.34
136
+ name: f1-macro
137
+ source:
138
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B
139
+ name: Open Portuguese LLM Leaderboard
140
+ - task:
141
+ type: text-generation
142
+ name: Text Generation
143
+ dataset:
144
+ name: tweetSentBR
145
+ type: eduagarcia/tweetsentbr_fewshot
146
+ split: test
147
+ args:
148
+ num_few_shot: 25
149
+ metrics:
150
+ - type: f1_macro
151
+ value: 70.46
152
+ name: f1-macro
153
+ source:
154
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wenbopan/Faro-Yi-34B
155
+ name: Open Portuguese LLM Leaderboard
156
  ---
157
 
158
  ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/62cd3a3691d27e60db0698b0/s21sMRxRT56c5t4M15GBP.webp)
 
207
 
208
  </details>
209
 
210
+ For more info please refer to [wenbopan/Faro-Yi-9B](https://huggingface.co/wenbopan/Faro-Yi-9B)
211
+
212
+
213
+ # Open Portuguese LLM Leaderboard Evaluation Results
214
+
215
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/wenbopan/Faro-Yi-34B) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
216
+
217
+ | Metric | Value |
218
+ |--------------------------|---------|
219
+ |Average |**73.14**|
220
+ |ENEM Challenge (No Images)| 73.20|
221
+ |BLUEX (No Images) | 64.81|
222
+ |OAB Exams | 54.53|
223
+ |Assin2 RTE | 91.58|
224
+ |Assin2 STS | 79.37|
225
+ |FaQuAD NLI | 71.84|
226
+ |HateBR Binary | 87.10|
227
+ |PT Hate Speech Binary | 65.34|
228
+ |tweetSentBR | 70.46|
229
+