leaderboard-pt-pr-bot commited on
Commit
6b52074
1 Parent(s): 3edb304

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +166 -3
README.md CHANGED
@@ -1,9 +1,156 @@
1
  ---
 
 
2
  license: apache-2.0
3
  datasets:
4
  - nicholasKluge/Pt-Corpus
5
- language:
6
- - pt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  É um modelo base pré-treinado com cerca de 1b tokens em portugues iniciado com os pesos oficiais do modelo, deve ser utilizado para fine tuning.
@@ -18,4 +165,20 @@ Obs: Aguardando [resultados oficiais](https://huggingface.co/spaces/eduagarcia/o
18
  | faquad_nli | 68,11 | 47,63 | 20,48 |
19
  | hatebr_offensive_binary | 79,65 | 77,63 | 2,02 |
20
  | oab_exams | 45,42 | 45,24 | 0,18 |
21
- | portuguese_hate_speech_binary| 59,18 | 55,72 | 3,46 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pt
4
  license: apache-2.0
5
  datasets:
6
  - nicholasKluge/Pt-Corpus
7
+ model-index:
8
+ - name: Mistral-7B-v0.2-Base_ptbr
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: ENEM Challenge (No Images)
15
+ type: eduagarcia/enem_challenge
16
+ split: train
17
+ args:
18
+ num_few_shot: 3
19
+ metrics:
20
+ - type: acc
21
+ value: 64.94
22
+ name: accuracy
23
+ source:
24
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
25
+ name: Open Portuguese LLM Leaderboard
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: BLUEX (No Images)
31
+ type: eduagarcia-temp/BLUEX_without_images
32
+ split: train
33
+ args:
34
+ num_few_shot: 3
35
+ metrics:
36
+ - type: acc
37
+ value: 53.96
38
+ name: accuracy
39
+ source:
40
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
41
+ name: Open Portuguese LLM Leaderboard
42
+ - task:
43
+ type: text-generation
44
+ name: Text Generation
45
+ dataset:
46
+ name: OAB Exams
47
+ type: eduagarcia/oab_exams
48
+ split: train
49
+ args:
50
+ num_few_shot: 3
51
+ metrics:
52
+ - type: acc
53
+ value: 45.42
54
+ name: accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
57
+ name: Open Portuguese LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: Assin2 RTE
63
+ type: assin2
64
+ split: test
65
+ args:
66
+ num_few_shot: 15
67
+ metrics:
68
+ - type: f1_macro
69
+ value: 90.11
70
+ name: f1-macro
71
+ source:
72
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
73
+ name: Open Portuguese LLM Leaderboard
74
+ - task:
75
+ type: text-generation
76
+ name: Text Generation
77
+ dataset:
78
+ name: Assin2 STS
79
+ type: eduagarcia/portuguese_benchmark
80
+ split: test
81
+ args:
82
+ num_few_shot: 15
83
+ metrics:
84
+ - type: pearson
85
+ value: 72.51
86
+ name: pearson
87
+ source:
88
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
89
+ name: Open Portuguese LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: FaQuAD NLI
95
+ type: ruanchaves/faquad-nli
96
+ split: test
97
+ args:
98
+ num_few_shot: 15
99
+ metrics:
100
+ - type: f1_macro
101
+ value: 69.04
102
+ name: f1-macro
103
+ source:
104
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
105
+ name: Open Portuguese LLM Leaderboard
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: HateBR Binary
111
+ type: ruanchaves/hatebr
112
+ split: test
113
+ args:
114
+ num_few_shot: 25
115
+ metrics:
116
+ - type: f1_macro
117
+ value: 79.62
118
+ name: f1-macro
119
+ source:
120
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
121
+ name: Open Portuguese LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: PT Hate Speech Binary
127
+ type: hate_speech_portuguese
128
+ split: test
129
+ args:
130
+ num_few_shot: 25
131
+ metrics:
132
+ - type: f1_macro
133
+ value: 58.52
134
+ name: f1-macro
135
+ source:
136
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
137
+ name: Open Portuguese LLM Leaderboard
138
+ - task:
139
+ type: text-generation
140
+ name: Text Generation
141
+ dataset:
142
+ name: tweetSentBR
143
+ type: eduagarcia/tweetsentbr_fewshot
144
+ split: test
145
+ args:
146
+ num_few_shot: 25
147
+ metrics:
148
+ - type: f1_macro
149
+ value: 62.32
150
+ name: f1-macro
151
+ source:
152
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr
153
+ name: Open Portuguese LLM Leaderboard
154
  ---
155
 
156
  É um modelo base pré-treinado com cerca de 1b tokens em portugues iniciado com os pesos oficiais do modelo, deve ser utilizado para fine tuning.
 
165
  | faquad_nli | 68,11 | 47,63 | 20,48 |
166
  | hatebr_offensive_binary | 79,65 | 77,63 | 2,02 |
167
  | oab_exams | 45,42 | 45,24 | 0,18 |
168
+ | portuguese_hate_speech_binary| 59,18 | 55,72 | 3,46 |
169
+ # Open Portuguese LLM Leaderboard Evaluation Results
170
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/JJhooww/Mistral-7B-v0.2-Base_ptbr) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
171
+
172
+ | Metric | Value |
173
+ |--------------------------|---------|
174
+ |Average |**66.27**|
175
+ |ENEM Challenge (No Images)| 64.94|
176
+ |BLUEX (No Images) | 53.96|
177
+ |OAB Exams | 45.42|
178
+ |Assin2 RTE | 90.11|
179
+ |Assin2 STS | 72.51|
180
+ |FaQuAD NLI | 69.04|
181
+ |HateBR Binary | 79.62|
182
+ |PT Hate Speech Binary | 58.52|
183
+ |tweetSentBR | 62.32|
184
+