leaderboard-pt-pr-bot commited on
Commit
73ed09c
1 Parent(s): 58ca939

Adding the Open Portuguese LLM Leaderboard Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard

The purpose of this PR is to add evaluation results from the Open Portuguese LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/eduagarcia-temp/portuguese-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +141 -3
README.md CHANGED
@@ -1,12 +1,133 @@
1
  ---
 
 
2
  license: apache-2.0
 
3
  datasets:
4
  - wikimedia/wikipedia
5
- language:
6
- - pt
7
  metrics:
8
  - accuracy
9
- library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
  # Model Card for Model ID
12
 
@@ -207,3 +328,20 @@ If you found periquito-3B useful in your research or applications, please cite u
207
  url = {https://huggingface.co/wandgibaut/periquito-3B}
208
  }
209
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - pt
4
  license: apache-2.0
5
+ library_name: transformers
6
  datasets:
7
  - wikimedia/wikipedia
 
 
8
  metrics:
9
  - accuracy
10
+ model-index:
11
+ - name: periquito-3B
12
+ results:
13
+ - task:
14
+ type: text-generation
15
+ name: Text Generation
16
+ dataset:
17
+ name: ENEM Challenge (No Images)
18
+ type: eduagarcia/enem_challenge
19
+ split: train
20
+ args:
21
+ num_few_shot: 3
22
+ metrics:
23
+ - type: acc
24
+ value: 17.98
25
+ name: accuracy
26
+ source:
27
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wandgibaut/periquito-3B
28
+ name: Open Portuguese LLM Leaderboard
29
+ - task:
30
+ type: text-generation
31
+ name: Text Generation
32
+ dataset:
33
+ name: BLUEX (No Images)
34
+ type: eduagarcia-temp/BLUEX_without_images
35
+ split: train
36
+ args:
37
+ num_few_shot: 3
38
+ metrics:
39
+ - type: acc
40
+ value: 21.14
41
+ name: accuracy
42
+ source:
43
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wandgibaut/periquito-3B
44
+ name: Open Portuguese LLM Leaderboard
45
+ - task:
46
+ type: text-generation
47
+ name: Text Generation
48
+ dataset:
49
+ name: OAB Exams
50
+ type: eduagarcia/oab_exams
51
+ split: train
52
+ args:
53
+ num_few_shot: 3
54
+ metrics:
55
+ - type: acc
56
+ value: 22.69
57
+ name: accuracy
58
+ source:
59
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wandgibaut/periquito-3B
60
+ name: Open Portuguese LLM Leaderboard
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: Assin2 RTE
66
+ type: assin2
67
+ split: test
68
+ args:
69
+ num_few_shot: 15
70
+ metrics:
71
+ - type: f1_macro
72
+ value: 43.01
73
+ name: f1-macro
74
+ - type: pearson
75
+ value: 8.92
76
+ name: pearson
77
+ source:
78
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wandgibaut/periquito-3B
79
+ name: Open Portuguese LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: FaQuAD NLI
85
+ type: ruanchaves/faquad-nli
86
+ split: test
87
+ args:
88
+ num_few_shot: 15
89
+ metrics:
90
+ - type: f1_macro
91
+ value: 43.97
92
+ name: f1-macro
93
+ source:
94
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wandgibaut/periquito-3B
95
+ name: Open Portuguese LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: HateBR Binary
101
+ type: eduagarcia/portuguese_benchmark
102
+ split: test
103
+ args:
104
+ num_few_shot: 25
105
+ metrics:
106
+ - type: f1_macro
107
+ value: 50.46
108
+ name: f1-macro
109
+ - type: f1_macro
110
+ value: 41.19
111
+ name: f1-macro
112
+ source:
113
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wandgibaut/periquito-3B
114
+ name: Open Portuguese LLM Leaderboard
115
+ - task:
116
+ type: text-generation
117
+ name: Text Generation
118
+ dataset:
119
+ name: tweetSentBR
120
+ type: eduagarcia-temp/tweetsentbr
121
+ split: test
122
+ args:
123
+ num_few_shot: 25
124
+ metrics:
125
+ - type: f1_macro
126
+ value: 47.96
127
+ name: f1-macro
128
+ source:
129
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=wandgibaut/periquito-3B
130
+ name: Open Portuguese LLM Leaderboard
131
  ---
132
  # Model Card for Model ID
133
 
 
328
  url = {https://huggingface.co/wandgibaut/periquito-3B}
329
  }
330
  ```
331
+
332
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
333
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/wandgibaut/periquito-3B)
334
+
335
+ | Metric | Value |
336
+ |--------------------------|---------|
337
+ |Average |**33.04**|
338
+ |ENEM Challenge (No Images)| 17.98|
339
+ |BLUEX (No Images) | 21.14|
340
+ |OAB Exams | 22.69|
341
+ |Assin2 RTE | 43.01|
342
+ |Assin2 STS | 8.92|
343
+ |FaQuAD NLI | 43.97|
344
+ |HateBR Binary | 50.46|
345
+ |PT Hate Speech Binary | 41.19|
346
+ |tweetSentBR | 47.96|
347
+