hmByT5 - Preliminary Language Models

Preliminary Historic Multilingual and Monolingual ByT5 Models. Following languages are currently covered:

  • English (British Library Corpus - Books)
  • German (Europeana Newspaper)
  • French (Europeana Newspaper)
  • Finnish (Europeana Newspaper)
  • Swedish (Europeana Newspaper)
  • Dutch (Delpher Corpus)

More details can be found in our GitHub repository.

Pretraining

We pretrain hmByT5 on a v3-32 TPU Pod. Details about the training can be found here.

Evaluation on Downstream Tasks (NER)

We evaluated the hmByT5 model that was pretrained on English AjMC corpus for 200k steps:

Hyper-param Configuration Run 1 Run 2 Run 3 Run 4 Run 5 Avg.
wsFalse-bs4-e10-lr0.00016-poolingfirst 83.80 84.78 83.74 83.35 84.37 84.01 ± 0.50
wsFalse-bs4-e10-lr0.00015-poolingfirst 84.67 82.69 83.92 84.53 82.90 83.74 ± 0.82
wsFalse-bs8-e10-lr0.00016-poolingfirst 82.12 83.82 83.37 83.00 83.70 83.20 ± 0.61
wsFalse-bs8-e10-lr0.00015-poolingfirst 83.45 82.83 84.15 81.76 83.78 83.19 ± 0.84

It turns out, that the results are not on-par with current SOTA on the English AjMC corpus, see a comparison here. Thus, we continue experiments with the Hugging Face Transformers JAX/FLAX implementation to pretrain ByT5 models on TPU.

Acknowledgements

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC). Many Thanks for providing access to the TPUs ❤️

Downloads last month
5
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.