Adding the Open Portuguese LLM Leaderboard Evaluation Results

#2
Files changed (1) hide show
  1. README.md +170 -4
README.md CHANGED
@@ -1,11 +1,158 @@
1
  ---
2
- license: mit
3
- datasets:
4
- - mlabonne/orpo-dpo-mix-40k
5
  language:
6
  - en
 
7
  tags:
8
  - ORPO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
  Barcenas-14b-Phi-3-medium-ORPO
11
 
@@ -13,4 +160,23 @@ Model trained with the innovative ORPO method, based on the robust VAGOsolutions
13
 
14
  The model was trained with the dataset: mlabonne/orpo-dpo-mix-40k, which combines diverse data sources to enhance conversational capabilities and contextual understanding.
15
 
16
- Made with ❀️ in Guadalupe, Nuevo Leon, Mexico πŸ‡²πŸ‡½
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - en
4
+ license: mit
5
  tags:
6
  - ORPO
7
+ datasets:
8
+ - mlabonne/orpo-dpo-mix-40k
9
+ model-index:
10
+ - name: Barcenas-14b-Phi-3-medium-ORPO
11
+ results:
12
+ - task:
13
+ type: text-generation
14
+ name: Text Generation
15
+ dataset:
16
+ name: ENEM Challenge (No Images)
17
+ type: eduagarcia/enem_challenge
18
+ split: train
19
+ args:
20
+ num_few_shot: 3
21
+ metrics:
22
+ - type: acc
23
+ value: 73.2
24
+ name: accuracy
25
+ source:
26
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
27
+ name: Open Portuguese LLM Leaderboard
28
+ - task:
29
+ type: text-generation
30
+ name: Text Generation
31
+ dataset:
32
+ name: BLUEX (No Images)
33
+ type: eduagarcia-temp/BLUEX_without_images
34
+ split: train
35
+ args:
36
+ num_few_shot: 3
37
+ metrics:
38
+ - type: acc
39
+ value: 65.79
40
+ name: accuracy
41
+ source:
42
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
43
+ name: Open Portuguese LLM Leaderboard
44
+ - task:
45
+ type: text-generation
46
+ name: Text Generation
47
+ dataset:
48
+ name: OAB Exams
49
+ type: eduagarcia/oab_exams
50
+ split: train
51
+ args:
52
+ num_few_shot: 3
53
+ metrics:
54
+ - type: acc
55
+ value: 51.03
56
+ name: accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
59
+ name: Open Portuguese LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: Assin2 RTE
65
+ type: assin2
66
+ split: test
67
+ args:
68
+ num_few_shot: 15
69
+ metrics:
70
+ - type: f1_macro
71
+ value: 92.6
72
+ name: f1-macro
73
+ source:
74
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
75
+ name: Open Portuguese LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: Assin2 STS
81
+ type: eduagarcia/portuguese_benchmark
82
+ split: test
83
+ args:
84
+ num_few_shot: 15
85
+ metrics:
86
+ - type: pearson
87
+ value: 71.45
88
+ name: pearson
89
+ source:
90
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
91
+ name: Open Portuguese LLM Leaderboard
92
+ - task:
93
+ type: text-generation
94
+ name: Text Generation
95
+ dataset:
96
+ name: FaQuAD NLI
97
+ type: ruanchaves/faquad-nli
98
+ split: test
99
+ args:
100
+ num_few_shot: 15
101
+ metrics:
102
+ - type: f1_macro
103
+ value: 69.06
104
+ name: f1-macro
105
+ source:
106
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
107
+ name: Open Portuguese LLM Leaderboard
108
+ - task:
109
+ type: text-generation
110
+ name: Text Generation
111
+ dataset:
112
+ name: HateBR Binary
113
+ type: ruanchaves/hatebr
114
+ split: test
115
+ args:
116
+ num_few_shot: 25
117
+ metrics:
118
+ - type: f1_macro
119
+ value: 84.6
120
+ name: f1-macro
121
+ source:
122
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
123
+ name: Open Portuguese LLM Leaderboard
124
+ - task:
125
+ type: text-generation
126
+ name: Text Generation
127
+ dataset:
128
+ name: PT Hate Speech Binary
129
+ type: hate_speech_portuguese
130
+ split: test
131
+ args:
132
+ num_few_shot: 25
133
+ metrics:
134
+ - type: f1_macro
135
+ value: 73.55
136
+ name: f1-macro
137
+ source:
138
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
139
+ name: Open Portuguese LLM Leaderboard
140
+ - task:
141
+ type: text-generation
142
+ name: Text Generation
143
+ dataset:
144
+ name: tweetSentBR
145
+ type: eduagarcia/tweetsentbr_fewshot
146
+ split: test
147
+ args:
148
+ num_few_shot: 25
149
+ metrics:
150
+ - type: f1_macro
151
+ value: 67.01
152
+ name: f1-macro
153
+ source:
154
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO
155
+ name: Open Portuguese LLM Leaderboard
156
  ---
157
  Barcenas-14b-Phi-3-medium-ORPO
158
 
 
160
 
161
  The model was trained with the dataset: mlabonne/orpo-dpo-mix-40k, which combines diverse data sources to enhance conversational capabilities and contextual understanding.
162
 
163
+ Made with ❀️ in Guadalupe, Nuevo Leon, Mexico πŸ‡²πŸ‡½
164
+
165
+
166
+ # Open Portuguese LLM Leaderboard Evaluation Results
167
+
168
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/Danielbrdz/Barcenas-14b-Phi-3-medium-ORPO) and on the [πŸš€ Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
169
+
170
+ | Metric | Value |
171
+ |--------------------------|---------|
172
+ |Average |**72.03**|
173
+ |ENEM Challenge (No Images)| 73.20|
174
+ |BLUEX (No Images) | 65.79|
175
+ |OAB Exams | 51.03|
176
+ |Assin2 RTE | 92.60|
177
+ |Assin2 STS | 71.45|
178
+ |FaQuAD NLI | 69.06|
179
+ |HateBR Binary | 84.60|
180
+ |PT Hate Speech Binary | 73.55|
181
+ |tweetSentBR | 67.01|
182
+