JJhooww's picture
Create README.md
dac10cc verified
|
raw
history blame
5.37 kB

language: - pt license: apache-2.0 datasets: - nicholasKluge/Pt-Corpus model-index: - name: Mistral-7B-v0.2-Base_ptbr results: - task: type: text-generation name: Text Generation dataset: name: ENEM Challenge (No Images) type: eduagarcia/enem_challenge split: train args: num_few_shot: 3 metrics: - type: acc value: 64.94 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BLUEX (No Images) type: eduagarcia-temp/BLUEX_without_images split: train args: num_few_shot: 3 metrics: - type: acc value: 53.96 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: OAB Exams type: eduagarcia/oab_exams split: train args: num_few_shot: 3 metrics: - type: acc value: 45.42 name: accuracy source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Assin2 RTE type: assin2 split: test args: num_few_shot: 15 metrics: - type: f1_macro value: 90.11 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Assin2 STS type: eduagarcia/portuguese_benchmark split: test args: num_few_shot: 15 metrics: - type: pearson value: 72.51 name: pearson source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: FaQuAD NLI type: ruanchaves/faquad-nli split: test args: num_few_shot: 15 metrics: - type: f1_macro value: 69.04 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HateBR Binary type: ruanchaves/hatebr split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 79.62 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: PT Hate Speech Binary type: hate_speech_portuguese split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 58.52 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: tweetSentBR type: eduagarcia/tweetsentbr_fewshot split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 62.32 name: f1-macro source: url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=JJhooww/Mistral-7B-v0.2-Base_ptbr name: Open Portuguese LLM Leaderboard

É um modelo base pré-treinado com cerca de 1b tokens em portugues iniciado com os pesos oficiais do modelo, deve ser utilizado para fine tuning. | faquad_nli | 68,11 | 47,63 | 20,48 | | hatebr_offensive_binary | 79,65 | 77,63 | 2,02 | | oab_exams | 45,42 | 45,24 | 0,18 | | portuguese_hate_speech_binary| 59,18 | 55,72 | 3,46 |

Open Portuguese LLM Leaderboard Evaluation Results

Detailed results can be found here and on the 🚀 Open Portuguese LLM Leaderboard

Metric Value
Average 66.27
ENEM Challenge (No Images) 64.94
BLUEX (No Images) 53.96
OAB Exams 45.42
Assin2 RTE 90.11
Assin2 STS 72.51
FaQuAD NLI 69.04
HateBR Binary 79.62
PT Hate Speech Binary 58.52
tweetSentBR 62.32