mT5 small spanish es
This is a Spanish fine-tuned version of Google's mT5-small model.
https://huggingface.co/google/mt5-small
Datasets
The datasets used for the fine-tuning
Task Prefix
Multinli (English) multi nli premise:[Text] hypo:[Text]
Multinli (Spanish) multi nli premise:[Text] hypo:[Text]
Pawx (English) pawx sentence1:[Text] sentence2:[Text]
Pawx (Spanish) pawx sentence1:[Text] sentence2:[Text]
Squad (English) question:[Text] context:[Text]
Squad (Spanish) question:[Text] context:[Text]
Translations (English-Spanish) translate English to Spanish:[Text]
Translations (Spanish-English) translate Spanish to English:[Text]
Inference
The following piece of code could be used to perfome the different model tasks.
Translations
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "HURIDOCS/mt5-small-spanish-es"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
task = "translate Spanish to English:Esta frase es para probar el modelo"
input_ids = tokenizer(
[task],
return_tensors="pt",
padding="max_length",
truncation=True,
max_length=512
)["input_ids"]
output_ids = model.generate(
input_ids=input_ids,
max_length=84,
no_repeat_ngram_size=2,
num_beams=4
)[0]
result_text = tokenizer.decode(
output_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(result_text)
Question answering
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_name = "HURIDOCS/mt5-small-spanish-es"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
task = '''question:En qué país se encuentra Normandía? context:Los normandos (normandos: Nourmann; Francés: Normandos; Normanni)
fue el pueblo que en los siglos X y XI dio su nombre a Normandía, una región de Francia.
Eran descendientes de invasores nórdicos ('normandos" viene de "Norseman") y piratas de Dinamarca, Islandia y Noruega que,
bajo su líder Rollo, acordaron jurar lealtad al rey Carlos III de Francia Occidental. A través de generaciones de asimilación
y mezcla con las poblaciones nativas francas y galas romanas, sus descendientes se fusionarían gradualmente con las culturas
carolingias de Francia Occidental. La identidad cultural y étnica distintiva de los normandos surgió inicialmente en la
primera mitad del siglo X, y continuó evolucionando durante los siglos siguientes.'''
input_ids = tokenizer(
[task],
return_tensors="pt",
padding="max_length",
truncation=True,
max_length=512
)["input_ids"]
output_ids = model.generate(
input_ids=input_ids,
max_length=84,
no_repeat_ngram_size=2,
num_beams=4
)[0]
result_text = tokenizer.decode(
output_ids,
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(result_text)
Fine-tuning
Check out the Transformers Libray examples
https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering
Performance
Spanish SQuAD v2 512 tokens
Model Exact match F1
rank 1 mrm8488/distill-bert-base-spanish-wwm-cased 50.43% 71.45%
rank 2 **mT5 small spanish es** 48.35% 62.03%
rank 3 flan-t5-small 41.44% 56.48%
- Downloads last month
- 370
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.