Update README.md

46ec93b about 2 years ago

3.86 kB

	---
	language:
	- it
	license: apache-2.0
	tags:
	- italian
	- sequence-to-sequence
	- style-transfer
	- efficient
	- formality-style-transfer
	datasets:
	- yahoo/xformal_it
	widget:
	- text: "Questa performance è a dir poco spiacevole."
	- text: "In attesa di un Suo cortese riscontro, Le auguriamo un piacevole proseguimento di giornata."
	- text: "Questa visione mi procura una goduria indescrivibile."
	- text: "qualora ciò possa interessarti, ti pregherei di contattarmi."
	metrics:
	- rouge
	- bertscore
	model-index:
	- name: it5-efficient-small-el32-formal-to-informal
	results:
	- task:
	type: formality-style-transfer
	name: "Formal-to-informal Style Transfer"
	dataset:
	type: xformal_it
	name: "XFORMAL (Italian Subset)"
	metrics:
	- type: rouge1
	value: 0.459
	name: "Avg. Test Rouge1"
	- type: rouge2
	value: 0.244
	name: "Avg. Test Rouge2"
	- type: rougeL
	value: 0.435
	name: "Avg. Test RougeL"
	- type: bertscore
	value: 0.739
	name: "Avg. Test BERTScore"
	---

	# IT5 Cased Small Efficient EL32 for Formal-to-informal Style Transfer 🤗

	Shout-out to [Stefan Schweter](https://github.com/stefan-it) for contributing the pre-trained efficient model!

	This repository contains the checkpoint for the [IT5 Cased Small Efficient EL32](https://huggingface.co/it5/it5-efficient-small-el32)
	model fine-tuned on Formal-to-informal style transfer on the Italian subset of the XFORMAL dataset as part of the experiments of the paper [IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation](https://arxiv.org/abs/2203.03759) by [Gabriele Sarti](https://gsarti.com) and [Malvina Nissim](https://malvinanissim.github.io).

	Efficient IT5 models differ from the standard ones by adopting a different vocabulary that enables cased text generation and an [optimized model architecture](https://arxiv.org/abs/2109.10686) to improve performances while reducing parameter count. The Small-EL32 replaces the original encoder from the T5 Small architecture with a 32-layer deep encoder, showing improved performances over the base model.

	A comprehensive overview of other released materials is provided in the [gsarti/it5](https://github.com/gsarti/it5) repository. Refer to the paper for additional details concerning the reported scores and the evaluation approach.

	## Using the model

	Model checkpoints are available for usage in Tensorflow, Pytorch and JAX. They can be used directly with pipelines as:

	```python
	from transformers import pipelines

	f2i = pipeline("text2text-generation", model='it5/it5-efficient-small-el32-formal-to-informal')
	f2i("Vi ringrazio infinitamente per vostra disponibilità")
	>>> [{"generated_text": "e grazie per la vostra disponibilità!"}]
	```

	or loaded using autoclasses:

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	tokenizer = AutoTokenizer.from_pretrained("it5-efficient-small-el32-formal-to-informal")
	model = AutoModelForSeq2SeqLM.from_pretrained("it5-efficient-small-el32-formal-to-informal")
	```

	If you use this model in your research, please cite our work as:

	```bibtex
	@article{sarti-nissim-2022-it5,
	title={{IT5}: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation},
	author={Sarti, Gabriele and Nissim, Malvina},
	journal={ArXiv preprint 2203.03759},
	url={https://arxiv.org/abs/2203.03759},
	year={2022},
	month={mar}
	}
	```

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 10.0


	### Framework versions

	- Transformers 4.15.0
	- Pytorch 1.10.0+cu102
	- Datasets 1.17.0
	- Tokenizers 0.10.3