leaderboard-pt-pr-bot commited on
Commit
b8ff21d
1 Parent(s): c4aa820

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +166 -1
README.md CHANGED
@@ -13,6 +13,153 @@ base_model: Qwen/Qwen1.5-7B-Chat
13
  datasets:
14
  - rhaymison/superset
15
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
  # Qwen-portuguese-luana-7b
@@ -117,4 +264,22 @@ email: [email protected]
117
  </a>
118
  <a href="https://github.com/rhaymisonbetini" target="_blank">
119
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
120
- </a>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  datasets:
14
  - rhaymison/superset
15
  pipeline_tag: text-generation
16
+ model-index:
17
+ - name: Qwen-portuguese-luana-7b
18
+ results:
19
+ - task:
20
+ type: text-generation
21
+ name: Text Generation
22
+ dataset:
23
+ name: ENEM Challenge (No Images)
24
+ type: eduagarcia/enem_challenge
25
+ split: train
26
+ args:
27
+ num_few_shot: 3
28
+ metrics:
29
+ - type: acc
30
+ value: 58.36
31
+ name: accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Qwen-portuguese-luana-7b
34
+ name: Open Portuguese LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: BLUEX (No Images)
40
+ type: eduagarcia-temp/BLUEX_without_images
41
+ split: train
42
+ args:
43
+ num_few_shot: 3
44
+ metrics:
45
+ - type: acc
46
+ value: 48.12
47
+ name: accuracy
48
+ source:
49
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Qwen-portuguese-luana-7b
50
+ name: Open Portuguese LLM Leaderboard
51
+ - task:
52
+ type: text-generation
53
+ name: Text Generation
54
+ dataset:
55
+ name: OAB Exams
56
+ type: eduagarcia/oab_exams
57
+ split: train
58
+ args:
59
+ num_few_shot: 3
60
+ metrics:
61
+ - type: acc
62
+ value: 42.73
63
+ name: accuracy
64
+ source:
65
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Qwen-portuguese-luana-7b
66
+ name: Open Portuguese LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: Assin2 RTE
72
+ type: assin2
73
+ split: test
74
+ args:
75
+ num_few_shot: 15
76
+ metrics:
77
+ - type: f1_macro
78
+ value: 81.05
79
+ name: f1-macro
80
+ source:
81
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Qwen-portuguese-luana-7b
82
+ name: Open Portuguese LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: Assin2 STS
88
+ type: eduagarcia/portuguese_benchmark
89
+ split: test
90
+ args:
91
+ num_few_shot: 15
92
+ metrics:
93
+ - type: pearson
94
+ value: 74.25
95
+ name: pearson
96
+ source:
97
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Qwen-portuguese-luana-7b
98
+ name: Open Portuguese LLM Leaderboard
99
+ - task:
100
+ type: text-generation
101
+ name: Text Generation
102
+ dataset:
103
+ name: FaQuAD NLI
104
+ type: ruanchaves/faquad-nli
105
+ split: test
106
+ args:
107
+ num_few_shot: 15
108
+ metrics:
109
+ - type: f1_macro
110
+ value: 57.96
111
+ name: f1-macro
112
+ source:
113
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Qwen-portuguese-luana-7b
114
+ name: Open Portuguese LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: HateBR Binary
120
+ type: ruanchaves/hatebr
121
+ split: test
122
+ args:
123
+ num_few_shot: 25
124
+ metrics:
125
+ - type: f1_macro
126
+ value: 70.29
127
+ name: f1-macro
128
+ source:
129
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Qwen-portuguese-luana-7b
130
+ name: Open Portuguese LLM Leaderboard
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: PT Hate Speech Binary
136
+ type: hate_speech_portuguese
137
+ split: test
138
+ args:
139
+ num_few_shot: 25
140
+ metrics:
141
+ - type: f1_macro
142
+ value: 69.92
143
+ name: f1-macro
144
+ source:
145
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Qwen-portuguese-luana-7b
146
+ name: Open Portuguese LLM Leaderboard
147
+ - task:
148
+ type: text-generation
149
+ name: Text Generation
150
+ dataset:
151
+ name: tweetSentBR
152
+ type: eduagarcia/tweetsentbr_fewshot
153
+ split: test
154
+ args:
155
+ num_few_shot: 25
156
+ metrics:
157
+ - type: f1_macro
158
+ value: 59.69
159
+ name: f1-macro
160
+ source:
161
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=rhaymison/Qwen-portuguese-luana-7b
162
+ name: Open Portuguese LLM Leaderboard
163
  ---
164
 
165
  # Qwen-portuguese-luana-7b
 
264
  </a>
265
  <a href="https://github.com/rhaymisonbetini" target="_blank">
266
  <img src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white">
267
+ </a>
268
+
269
+ # Open Portuguese LLM Leaderboard Evaluation Results
270
+
271
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/rhaymison/Qwen-portuguese-luana-7b) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
272
+
273
+ | Metric | Value |
274
+ |--------------------------|---------|
275
+ |Average |**62.49**|
276
+ |ENEM Challenge (No Images)| 58.36|
277
+ |BLUEX (No Images) | 48.12|
278
+ |OAB Exams | 42.73|
279
+ |Assin2 RTE | 81.05|
280
+ |Assin2 STS | 74.25|
281
+ |FaQuAD NLI | 57.96|
282
+ |HateBR Binary | 70.29|
283
+ |PT Hate Speech Binary | 69.92|
284
+ |tweetSentBR | 59.69|
285
+