BERTino: an Italian DistilBERT model

This repository hosts BERTino, an Italian DistilBERT model pre-trained by indigo.ai on a large general-domain Italian corpus. BERTino is task-agnostic and can be fine-tuned for every downstream task.

Corpus

The pre-training corpus that we used is the union of the Paisa and ItWaC corpora. The final corpus counts 14 millions of sentences for a total of 12 GB of text.

Downstream Results

To validate the pre-training that we conducted, we evaluated BERTino on the Italian ParTUT, Italian ISDT, Italian WikiNER and multi-class sentence classification tasks. We report for comparison results obtained by the teacher model fine-tuned in the same tasks and for the same number of epochs.

Italian ISDT:

Model	F1 score	Fine-tuning time	Evaluation time
BERTino	0,9801	9m, 4s	3s
Teacher	0,983	16m, 28s	5s

Italian ParTUT:

Model	F1 score	Fine-tuning time	Evaluation time
BERTino	0,9268	1m, 18s	1s
Teacher	0,9688	2m, 18s	1s

Italian WikiNER:

Model	F1 score	Fine-tuning time	Evaluation time
BERTino	0,9038	35m, 35s	3m, 1s
Teacher	0,9178	67m, 8s	5m, 16s

Multi-class sentence classification:

Model	F1 score	Fine-tuning time	Evaluation time
BERTino	0,7788	4m, 40s	6s
Teacher	0,7986	8m, 52s	9s