leaderboard-pt-pr-bot commited on
Commit
b7caa80
•
1 Parent(s): 6beb706

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +169 -3
README.md CHANGED
@@ -1,16 +1,163 @@
1
  ---
2
  language:
3
  - eng
 
 
4
  tags:
5
  - sft
6
  - Yi-34B-200K
7
- license:
8
- - mit
9
  datasets:
10
  - LDJnr/Capybara
11
  - LDJnr/LessWrong-Amplify-Instruct
12
  - LDJnr/Pure-Dove
13
  - LDJnr/Verified-Camel
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
  ## **Nous-Capybara-34B V1.9**
@@ -121,4 +268,23 @@ The following are benchmarks we checked for contamination against our dataset:
121
  journal={arXiv preprint arXiv:(comming soon)},
122
  year={2023}
123
  }
124
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - eng
4
+ license:
5
+ - mit
6
  tags:
7
  - sft
8
  - Yi-34B-200K
 
 
9
  datasets:
10
  - LDJnr/Capybara
11
  - LDJnr/LessWrong-Amplify-Instruct
12
  - LDJnr/Pure-Dove
13
  - LDJnr/Verified-Camel
14
+ model-index:
15
+ - name: Nous-Capybara-34B
16
+ results:
17
+ - task:
18
+ type: text-generation
19
+ name: Text Generation
20
+ dataset:
21
+ name: ENEM Challenge (No Images)
22
+ type: eduagarcia/enem_challenge
23
+ split: train
24
+ args:
25
+ num_few_shot: 3
26
+ metrics:
27
+ - type: acc
28
+ value: 71.17
29
+ name: accuracy
30
+ source:
31
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=NousResearch/Nous-Capybara-34B
32
+ name: Open Portuguese LLM Leaderboard
33
+ - task:
34
+ type: text-generation
35
+ name: Text Generation
36
+ dataset:
37
+ name: BLUEX (No Images)
38
+ type: eduagarcia-temp/BLUEX_without_images
39
+ split: train
40
+ args:
41
+ num_few_shot: 3
42
+ metrics:
43
+ - type: acc
44
+ value: 63.0
45
+ name: accuracy
46
+ source:
47
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=NousResearch/Nous-Capybara-34B
48
+ name: Open Portuguese LLM Leaderboard
49
+ - task:
50
+ type: text-generation
51
+ name: Text Generation
52
+ dataset:
53
+ name: OAB Exams
54
+ type: eduagarcia/oab_exams
55
+ split: train
56
+ args:
57
+ num_few_shot: 3
58
+ metrics:
59
+ - type: acc
60
+ value: 55.31
61
+ name: accuracy
62
+ source:
63
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=NousResearch/Nous-Capybara-34B
64
+ name: Open Portuguese LLM Leaderboard
65
+ - task:
66
+ type: text-generation
67
+ name: Text Generation
68
+ dataset:
69
+ name: Assin2 RTE
70
+ type: assin2
71
+ split: test
72
+ args:
73
+ num_few_shot: 15
74
+ metrics:
75
+ - type: f1_macro
76
+ value: 90.07
77
+ name: f1-macro
78
+ source:
79
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=NousResearch/Nous-Capybara-34B
80
+ name: Open Portuguese LLM Leaderboard
81
+ - task:
82
+ type: text-generation
83
+ name: Text Generation
84
+ dataset:
85
+ name: Assin2 STS
86
+ type: eduagarcia/portuguese_benchmark
87
+ split: test
88
+ args:
89
+ num_few_shot: 15
90
+ metrics:
91
+ - type: pearson
92
+ value: 75.71
93
+ name: pearson
94
+ source:
95
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=NousResearch/Nous-Capybara-34B
96
+ name: Open Portuguese LLM Leaderboard
97
+ - task:
98
+ type: text-generation
99
+ name: Text Generation
100
+ dataset:
101
+ name: FaQuAD NLI
102
+ type: ruanchaves/faquad-nli
103
+ split: test
104
+ args:
105
+ num_few_shot: 15
106
+ metrics:
107
+ - type: f1_macro
108
+ value: 77.31
109
+ name: f1-macro
110
+ source:
111
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=NousResearch/Nous-Capybara-34B
112
+ name: Open Portuguese LLM Leaderboard
113
+ - task:
114
+ type: text-generation
115
+ name: Text Generation
116
+ dataset:
117
+ name: HateBR Binary
118
+ type: ruanchaves/hatebr
119
+ split: test
120
+ args:
121
+ num_few_shot: 25
122
+ metrics:
123
+ - type: f1_macro
124
+ value: 74.09
125
+ name: f1-macro
126
+ source:
127
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=NousResearch/Nous-Capybara-34B
128
+ name: Open Portuguese LLM Leaderboard
129
+ - task:
130
+ type: text-generation
131
+ name: Text Generation
132
+ dataset:
133
+ name: PT Hate Speech Binary
134
+ type: hate_speech_portuguese
135
+ split: test
136
+ args:
137
+ num_few_shot: 25
138
+ metrics:
139
+ - type: f1_macro
140
+ value: 71.61
141
+ name: f1-macro
142
+ source:
143
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=NousResearch/Nous-Capybara-34B
144
+ name: Open Portuguese LLM Leaderboard
145
+ - task:
146
+ type: text-generation
147
+ name: Text Generation
148
+ dataset:
149
+ name: tweetSentBR
150
+ type: eduagarcia/tweetsentbr_fewshot
151
+ split: test
152
+ args:
153
+ num_few_shot: 25
154
+ metrics:
155
+ - type: f1_macro
156
+ value: 70.79
157
+ name: f1-macro
158
+ source:
159
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=NousResearch/Nous-Capybara-34B
160
+ name: Open Portuguese LLM Leaderboard
161
  ---
162
 
163
  ## **Nous-Capybara-34B V1.9**
 
268
  journal={arXiv preprint arXiv:(comming soon)},
269
  year={2023}
270
  }
271
+ ```
272
+
273
+
274
+ # Open Portuguese LLM Leaderboard Evaluation Results
275
+
276
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/NousResearch/Nous-Capybara-34B) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
277
+
278
+ | Metric | Value |
279
+ |--------------------------|---------|
280
+ |Average |**72.12**|
281
+ |ENEM Challenge (No Images)| 71.17|
282
+ |BLUEX (No Images) | 63|
283
+ |OAB Exams | 55.31|
284
+ |Assin2 RTE | 90.07|
285
+ |Assin2 STS | 75.71|
286
+ |FaQuAD NLI | 77.31|
287
+ |HateBR Binary | 74.09|
288
+ |PT Hate Speech Binary | 71.61|
289
+ |tweetSentBR | 70.79|
290
+