Commit
9637dc1
1 Parent(s): 89363c2

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#1)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (628078916205436c5374ca8b943caaf9865e9e1f)


Co-authored-by: Open PT LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +171 -5
README.md CHANGED
@@ -1,17 +1,164 @@
1
  ---
 
 
 
2
  library_name: transformers
3
  tags:
4
  - portugues
5
  - portuguese
6
  - QA
7
  - instruct
8
- license: apache-2.0
9
  datasets:
10
  - rhaymison/superset
11
- language:
12
- - pt
13
  pipeline_tag: text-generation
14
- base_model: meta-llama/Meta-Llama-3-8B-Instruct
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ---
16
 
17
  # Llama3-portuguese-luana-8b-instruct
@@ -128,4 +275,23 @@ email: [email protected]
128
  </a>
129
  <a href="https://github.com/rhaymisonbetini" target="_blank">
130
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
131
- </a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pt
4
+ license: apache-2.0
5
  library_name: transformers
6
  tags:
7
  - portugues
8
  - portuguese
9
  - QA
10
  - instruct
11
+ base_model: meta-llama/Meta-Llama-3-8B-Instruct
12
  datasets:
13
  - rhaymison/superset
 
 
14
  pipeline_tag: text-generation
15
+ model-index:
16
+ - name: Llama3-portuguese-luana-8b-instruct
17
+ results:
18
+ - task:
19
+ type: text-generation
20
+ name: Text Generation
21
+ dataset:
22
+ name: ENEM Challenge (No Images)
23
+ type: eduagarcia/enem_challenge
24
+ split: train
25
+ args:
26
+ num_few_shot: 3
27
+ metrics:
28
+ - type: acc
29
+ value: 69.0
30
+ name: accuracy
31
+ source:
32
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Llama3-portuguese-luana-8b-instruct
33
+ name: Open Portuguese LLM Leaderboard
34
+ - task:
35
+ type: text-generation
36
+ name: Text Generation
37
+ dataset:
38
+ name: BLUEX (No Images)
39
+ type: eduagarcia-temp/BLUEX_without_images
40
+ split: train
41
+ args:
42
+ num_few_shot: 3
43
+ metrics:
44
+ - type: acc
45
+ value: 51.74
46
+ name: accuracy
47
+ source:
48
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Llama3-portuguese-luana-8b-instruct
49
+ name: Open Portuguese LLM Leaderboard
50
+ - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
54
+ name: OAB Exams
55
+ type: eduagarcia/oab_exams
56
+ split: train
57
+ args:
58
+ num_few_shot: 3
59
+ metrics:
60
+ - type: acc
61
+ value: 47.56
62
+ name: accuracy
63
+ source:
64
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Llama3-portuguese-luana-8b-instruct
65
+ name: Open Portuguese LLM Leaderboard
66
+ - task:
67
+ type: text-generation
68
+ name: Text Generation
69
+ dataset:
70
+ name: Assin2 RTE
71
+ type: assin2
72
+ split: test
73
+ args:
74
+ num_few_shot: 15
75
+ metrics:
76
+ - type: f1_macro
77
+ value: 89.24
78
+ name: f1-macro
79
+ source:
80
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Llama3-portuguese-luana-8b-instruct
81
+ name: Open Portuguese LLM Leaderboard
82
+ - task:
83
+ type: text-generation
84
+ name: Text Generation
85
+ dataset:
86
+ name: Assin2 STS
87
+ type: eduagarcia/portuguese_benchmark
88
+ split: test
89
+ args:
90
+ num_few_shot: 15
91
+ metrics:
92
+ - type: pearson
93
+ value: 72.87
94
+ name: pearson
95
+ source:
96
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Llama3-portuguese-luana-8b-instruct
97
+ name: Open Portuguese LLM Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ name: Text Generation
101
+ dataset:
102
+ name: FaQuAD NLI
103
+ type: ruanchaves/faquad-nli
104
+ split: test
105
+ args:
106
+ num_few_shot: 15
107
+ metrics:
108
+ - type: f1_macro
109
+ value: 68.94
110
+ name: f1-macro
111
+ source:
112
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Llama3-portuguese-luana-8b-instruct
113
+ name: Open Portuguese LLM Leaderboard
114
+ - task:
115
+ type: text-generation
116
+ name: Text Generation
117
+ dataset:
118
+ name: HateBR Binary
119
+ type: ruanchaves/hatebr
120
+ split: test
121
+ args:
122
+ num_few_shot: 25
123
+ metrics:
124
+ - type: f1_macro
125
+ value: 85.93
126
+ name: f1-macro
127
+ source:
128
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Llama3-portuguese-luana-8b-instruct
129
+ name: Open Portuguese LLM Leaderboard
130
+ - task:
131
+ type: text-generation
132
+ name: Text Generation
133
+ dataset:
134
+ name: PT Hate Speech Binary
135
+ type: hate_speech_portuguese
136
+ split: test
137
+ args:
138
+ num_few_shot: 25
139
+ metrics:
140
+ - type: f1_macro
141
+ value: 64.16
142
+ name: f1-macro
143
+ source:
144
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Llama3-portuguese-luana-8b-instruct
145
+ name: Open Portuguese LLM Leaderboard
146
+ - task:
147
+ type: text-generation
148
+ name: Text Generation
149
+ dataset:
150
+ name: tweetSentBR
151
+ type: eduagarcia/tweetsentbr_fewshot
152
+ split: test
153
+ args:
154
+ num_few_shot: 25
155
+ metrics:
156
+ - type: f1_macro
157
+ value: 63.91
158
+ name: f1-macro
159
+ source:
160
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Llama3-portuguese-luana-8b-instruct
161
+ name: Open Portuguese LLM Leaderboard
162
  ---
163
 
164
  # Llama3-portuguese-luana-8b-instruct
 
275
  </a>
276
  <a href="https://github.com/rhaymisonbetini" target="_blank">
277
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
278
+ </a>
279
+
280
+
281
+ # Open Portuguese LLM Leaderboard Evaluation Results
282
+
283
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/rhaymison/Llama3-portuguese-luana-8b-instruct) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
284
+
285
+ | Metric | Value |
286
+ |--------------------------|---------|
287
+ |Average |**68.15**|
288
+ |ENEM Challenge (No Images)| 69|
289
+ |BLUEX (No Images) | 51.74|
290
+ |OAB Exams | 47.56|
291
+ |Assin2 RTE | 89.24|
292
+ |Assin2 STS | 72.87|
293
+ |FaQuAD NLI | 68.94|
294
+ |HateBR Binary | 85.93|
295
+ |PT Hate Speech Binary | 64.16|
296
+ |tweetSentBR | 63.91|
297
+