Adding the Open Portuguese LLM Leaderboard Evaluation Results

#26
Files changed (1) hide show
  1. README.md +172 -6
README.md CHANGED
@@ -1,13 +1,160 @@
1
  ---
2
- license: other
3
- license_name: tongyi-qianwen
4
- license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
5
  language:
6
  - en
7
- pipeline_tag: text-generation
8
- base_model: Qwen/Qwen2-72B
9
  tags:
10
  - chat
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  # Qwen2-72B-Instruct
@@ -167,4 +314,23 @@ If you find our work helpful, feel free to give us a cite.
167
  title={Qwen2 Technical Report},
168
  year={2024}
169
  }
170
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - en
4
+ license: other
 
5
  tags:
6
  - chat
7
+ base_model: Qwen/Qwen2-72B
8
+ license_name: tongyi-qianwen
9
+ license_link: https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/LICENSE
10
+ pipeline_tag: text-generation
11
+ model-index:
12
+ - name: Qwen2-72B-Instruct
13
+ results:
14
+ - task:
15
+ type: text-generation
16
+ name: Text Generation
17
+ dataset:
18
+ name: ENEM Challenge (No Images)
19
+ type: eduagarcia/enem_challenge
20
+ split: train
21
+ args:
22
+ num_few_shot: 3
23
+ metrics:
24
+ - type: acc
25
+ value: 82.23
26
+ name: accuracy
27
+ source:
28
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen2-72B-Instruct
29
+ name: Open Portuguese LLM Leaderboard
30
+ - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ name: BLUEX (No Images)
35
+ type: eduagarcia-temp/BLUEX_without_images
36
+ split: train
37
+ args:
38
+ num_few_shot: 3
39
+ metrics:
40
+ - type: acc
41
+ value: 74.41
42
+ name: accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen2-72B-Instruct
45
+ name: Open Portuguese LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: OAB Exams
51
+ type: eduagarcia/oab_exams
52
+ split: train
53
+ args:
54
+ num_few_shot: 3
55
+ metrics:
56
+ - type: acc
57
+ value: 65.56
58
+ name: accuracy
59
+ source:
60
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen2-72B-Instruct
61
+ name: Open Portuguese LLM Leaderboard
62
+ - task:
63
+ type: text-generation
64
+ name: Text Generation
65
+ dataset:
66
+ name: Assin2 RTE
67
+ type: assin2
68
+ split: test
69
+ args:
70
+ num_few_shot: 15
71
+ metrics:
72
+ - type: f1_macro
73
+ value: 94.69
74
+ name: f1-macro
75
+ source:
76
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen2-72B-Instruct
77
+ name: Open Portuguese LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: Assin2 STS
83
+ type: eduagarcia/portuguese_benchmark
84
+ split: test
85
+ args:
86
+ num_few_shot: 15
87
+ metrics:
88
+ - type: pearson
89
+ value: 75.12
90
+ name: pearson
91
+ source:
92
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen2-72B-Instruct
93
+ name: Open Portuguese LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: FaQuAD NLI
99
+ type: ruanchaves/faquad-nli
100
+ split: test
101
+ args:
102
+ num_few_shot: 15
103
+ metrics:
104
+ - type: f1_macro
105
+ value: 84.95
106
+ name: f1-macro
107
+ source:
108
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen2-72B-Instruct
109
+ name: Open Portuguese LLM Leaderboard
110
+ - task:
111
+ type: text-generation
112
+ name: Text Generation
113
+ dataset:
114
+ name: HateBR Binary
115
+ type: ruanchaves/hatebr
116
+ split: test
117
+ args:
118
+ num_few_shot: 25
119
+ metrics:
120
+ - type: f1_macro
121
+ value: 84.32
122
+ name: f1-macro
123
+ source:
124
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen2-72B-Instruct
125
+ name: Open Portuguese LLM Leaderboard
126
+ - task:
127
+ type: text-generation
128
+ name: Text Generation
129
+ dataset:
130
+ name: PT Hate Speech Binary
131
+ type: hate_speech_portuguese
132
+ split: test
133
+ args:
134
+ num_few_shot: 25
135
+ metrics:
136
+ - type: f1_macro
137
+ value: 74.64
138
+ name: f1-macro
139
+ source:
140
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen2-72B-Instruct
141
+ name: Open Portuguese LLM Leaderboard
142
+ - task:
143
+ type: text-generation
144
+ name: Text Generation
145
+ dataset:
146
+ name: tweetSentBR
147
+ type: eduagarcia/tweetsentbr_fewshot
148
+ split: test
149
+ args:
150
+ num_few_shot: 25
151
+ metrics:
152
+ - type: f1_macro
153
+ value: 76.54
154
+ name: f1-macro
155
+ source:
156
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=Qwen/Qwen2-72B-Instruct
157
+ name: Open Portuguese LLM Leaderboard
158
  ---
159
 
160
  # Qwen2-72B-Instruct
 
314
  title={Qwen2 Technical Report},
315
  year={2024}
316
  }
317
+ ```
318
+
319
+
320
+ # Open Portuguese LLM Leaderboard Evaluation Results
321
+
322
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/Qwen/Qwen2-72B-Instruct) and on the [πŸš€ Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
323
+
324
+ | Metric | Value |
325
+ |--------------------------|---------|
326
+ |Average |**79.16**|
327
+ |ENEM Challenge (No Images)| 82.23|
328
+ |BLUEX (No Images) | 74.41|
329
+ |OAB Exams | 65.56|
330
+ |Assin2 RTE | 94.69|
331
+ |Assin2 STS | 75.12|
332
+ |FaQuAD NLI | 84.95|
333
+ |HateBR Binary | 84.32|
334
+ |PT Hate Speech Binary | 74.64|
335
+ |tweetSentBR | 76.54|
336
+