OSainz commited on
Commit
91ae07b
•
1 Parent(s): a267d6b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -24,7 +24,7 @@ model-index:
24
  value: 67.24
25
  source:
26
  name: Paper
27
- url: https://paper-url.com
28
  - task:
29
  type: multiple-choice
30
  dataset:
@@ -36,7 +36,7 @@ model-index:
36
  value: 51.56
37
  source:
38
  name: Paper
39
- url: https://paper-url.com
40
  - task:
41
  type: mix
42
  dataset:
@@ -48,7 +48,7 @@ model-index:
48
  value: 54.04
49
  source:
50
  name: Paper
51
- url: https://paper-url.com
52
  - task:
53
  type: multiple_choice
54
  dataset:
@@ -60,7 +60,7 @@ model-index:
60
  value: 45.02
61
  source:
62
  name: Paper
63
- url: https://paper-url.com
64
  - task:
65
  type: multiple_choice
66
  dataset:
@@ -72,7 +72,7 @@ model-index:
72
  value: 29.83
73
  source:
74
  name: Paper
75
- url: https://paper-url.com
76
  - task:
77
  type: multiple_choice
78
  dataset:
@@ -84,7 +84,7 @@ model-index:
84
  value: 56.44
85
  source:
86
  name: Paper
87
- url: https://paper-url.com
88
  - task:
89
  type: multiple_choice
90
  dataset:
@@ -96,7 +96,7 @@ model-index:
96
  value: 43.18
97
  source:
98
  name: Paper
99
- url: https://paper-url.com
100
  ---
101
 
102
  # **Model Card for Latxa 13b**
@@ -105,6 +105,8 @@ model-index:
105
  <img src="https://github.com/hitz-zentroa/latxa/blob/b9aa705f60ee2cc03c9ed62fda82a685abb31b07/assets/latxa_round.png?raw=true" style="height: 350px;">
106
  </p>
107
 
 
 
108
  We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters. Latxa is based on Llama 2, which we continue pretraining on a new Basque corpus comprising 4.3M documents and 4.2B tokens. In our extensive evaluation, Latxa outperforms all previous open models we compare to by a large margin. In addition, it is competitive with GPT-4 Turbo in language proficiency and understanding, despite lagging behind in reading comprehension and knowledgeintensive tasks. Both the Latxa family of models, as well as our new pretraining corpora and evaluation datasets, are publicly available under open licenses. Our suite enables reproducible research on methods to build LLMs for low-resource languages
109
 
110
  - 📒 Blog Post: [Latxa: An Open Language Model and Evaluation Suite for Basque](https://www.hitz.eus/en/node/340)
 
24
  value: 67.24
25
  source:
26
  name: Paper
27
+ url: https://arxiv.org/abs/2403.20266
28
  - task:
29
  type: multiple-choice
30
  dataset:
 
36
  value: 51.56
37
  source:
38
  name: Paper
39
+ url: https://arxiv.org/abs/2403.20266
40
  - task:
41
  type: mix
42
  dataset:
 
48
  value: 54.04
49
  source:
50
  name: Paper
51
+ url: https://arxiv.org/abs/2403.20266
52
  - task:
53
  type: multiple_choice
54
  dataset:
 
60
  value: 45.02
61
  source:
62
  name: Paper
63
+ url: https://arxiv.org/abs/2403.20266
64
  - task:
65
  type: multiple_choice
66
  dataset:
 
72
  value: 29.83
73
  source:
74
  name: Paper
75
+ url: https://arxiv.org/abs/2403.20266
76
  - task:
77
  type: multiple_choice
78
  dataset:
 
84
  value: 56.44
85
  source:
86
  name: Paper
87
+ url: https://arxiv.org/abs/2403.20266
88
  - task:
89
  type: multiple_choice
90
  dataset:
 
96
  value: 43.18
97
  source:
98
  name: Paper
99
+ url: https://arxiv.org/abs/2403.20266
100
  ---
101
 
102
  # **Model Card for Latxa 13b**
 
105
  <img src="https://github.com/hitz-zentroa/latxa/blob/b9aa705f60ee2cc03c9ed62fda82a685abb31b07/assets/latxa_round.png?raw=true" style="height: 350px;">
106
  </p>
107
 
108
+ <span style="color: red; font-weight: bold">IMPORTANT:</span> This model is outdated and made available publicly for reproducibility purposes only. Please utilize the most recent version found in [our HuggingFace collection](https://huggingface.co/collections/HiTZ/latxa-65a697e6838b3acc53677304).
109
+
110
  We introduce Latxa, a family of large language models for Basque ranging from 7 to 70 billion parameters. Latxa is based on Llama 2, which we continue pretraining on a new Basque corpus comprising 4.3M documents and 4.2B tokens. In our extensive evaluation, Latxa outperforms all previous open models we compare to by a large margin. In addition, it is competitive with GPT-4 Turbo in language proficiency and understanding, despite lagging behind in reading comprehension and knowledgeintensive tasks. Both the Latxa family of models, as well as our new pretraining corpora and evaluation datasets, are publicly available under open licenses. Our suite enables reproducible research on methods to build LLMs for low-resource languages
111
 
112
  - 📒 Blog Post: [Latxa: An Open Language Model and Evaluation Suite for Basque](https://www.hitz.eus/en/node/340)