Commit
61eff5b
1 Parent(s): ff83a87

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#1)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (ba22d2e08272cd07bf9516b54b277ab75b4cefbd)


Co-authored-by: Open PT LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +166 -0
README.md CHANGED
@@ -1,5 +1,152 @@
1
  ---
2
  library_name: peft
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  ## Training procedure
5
 
@@ -22,3 +169,22 @@ The following `bitsandbytes` quantization config was used during training:
22
 
23
 
24
  - PEFT 0.5.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: peft
3
+ model-index:
4
+ - name: gembode-2b-base-ultraalpaca
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: ENEM Challenge (No Images)
11
+ type: eduagarcia/enem_challenge
12
+ split: train
13
+ args:
14
+ num_few_shot: 3
15
+ metrics:
16
+ - type: acc
17
+ value: 31.77
18
+ name: accuracy
19
+ source:
20
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/gembode-2b-base-ultraalpaca
21
+ name: Open Portuguese LLM Leaderboard
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: BLUEX (No Images)
27
+ type: eduagarcia-temp/BLUEX_without_images
28
+ split: train
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc
33
+ value: 24.2
34
+ name: accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/gembode-2b-base-ultraalpaca
37
+ name: Open Portuguese LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: OAB Exams
43
+ type: eduagarcia/oab_exams
44
+ split: train
45
+ args:
46
+ num_few_shot: 3
47
+ metrics:
48
+ - type: acc
49
+ value: 27.84
50
+ name: accuracy
51
+ source:
52
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/gembode-2b-base-ultraalpaca
53
+ name: Open Portuguese LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: Assin2 RTE
59
+ type: assin2
60
+ split: test
61
+ args:
62
+ num_few_shot: 15
63
+ metrics:
64
+ - type: f1_macro
65
+ value: 69.51
66
+ name: f1-macro
67
+ source:
68
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/gembode-2b-base-ultraalpaca
69
+ name: Open Portuguese LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: Assin2 STS
75
+ type: eduagarcia/portuguese_benchmark
76
+ split: test
77
+ args:
78
+ num_few_shot: 15
79
+ metrics:
80
+ - type: pearson
81
+ value: 30.31
82
+ name: pearson
83
+ source:
84
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/gembode-2b-base-ultraalpaca
85
+ name: Open Portuguese LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: FaQuAD NLI
91
+ type: ruanchaves/faquad-nli
92
+ split: test
93
+ args:
94
+ num_few_shot: 15
95
+ metrics:
96
+ - type: f1_macro
97
+ value: 55.55
98
+ name: f1-macro
99
+ source:
100
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/gembode-2b-base-ultraalpaca
101
+ name: Open Portuguese LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: HateBR Binary
107
+ type: ruanchaves/hatebr
108
+ split: test
109
+ args:
110
+ num_few_shot: 25
111
+ metrics:
112
+ - type: f1_macro
113
+ value: 53.18
114
+ name: f1-macro
115
+ source:
116
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/gembode-2b-base-ultraalpaca
117
+ name: Open Portuguese LLM Leaderboard
118
+ - task:
119
+ type: text-generation
120
+ name: Text Generation
121
+ dataset:
122
+ name: PT Hate Speech Binary
123
+ type: hate_speech_portuguese
124
+ split: test
125
+ args:
126
+ num_few_shot: 25
127
+ metrics:
128
+ - type: f1_macro
129
+ value: 64.74
130
+ name: f1-macro
131
+ source:
132
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/gembode-2b-base-ultraalpaca
133
+ name: Open Portuguese LLM Leaderboard
134
+ - task:
135
+ type: text-generation
136
+ name: Text Generation
137
+ dataset:
138
+ name: tweetSentBR
139
+ type: eduagarcia/tweetsentbr_fewshot
140
+ split: test
141
+ args:
142
+ num_few_shot: 25
143
+ metrics:
144
+ - type: f1_macro
145
+ value: 50.18
146
+ name: f1-macro
147
+ source:
148
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/gembode-2b-base-ultraalpaca
149
+ name: Open Portuguese LLM Leaderboard
150
  ---
151
  ## Training procedure
152
 
 
169
 
170
 
171
  - PEFT 0.5.0
172
+
173
+
174
+ # Open Portuguese LLM Leaderboard Evaluation Results
175
+
176
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/gembode-2b-base-ultraalpaca) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
177
+
178
+ | Metric | Value |
179
+ |--------------------------|---------|
180
+ |Average |**45.25**|
181
+ |ENEM Challenge (No Images)| 31.77|
182
+ |BLUEX (No Images) | 24.20|
183
+ |OAB Exams | 27.84|
184
+ |Assin2 RTE | 69.51|
185
+ |Assin2 STS | 30.31|
186
+ |FaQuAD NLI | 55.55|
187
+ |HateBR Binary | 53.18|
188
+ |PT Hate Speech Binary | 64.74|
189
+ |tweetSentBR | 50.18|
190
+