elishowk's picture
Automatic correction of README.md metadata. Contact [email protected] for any question
094d3b9
|
raw
history blame
4.46 kB
metadata
language: es
license: cc-by-4.0
tags:
  - multilingual
  - bert
pipeline_tag: fill-mask
widget:
  - text: ¿Qué es la vida? Un [MASK].

ALBERTI

ALBERTI is a set of two BERT-based multilingual model for poetry. One for verses and another one for stanzas. This model has been further trained with the PULPO corpus for verses using Flax, including training scripts.

This is part of the Flax/Jax Community Week, organised by HuggingFace and TPU usage sponsored by Google.

PULPO

PULPO, the Prodigious Unannotated Literary Poetry Corpus, is a set of multilingual corpora of verses and stanzas with over 95M words.

The following corpora has been downloaded using the Averell tool, developed by the POSTDATA team:

Spanish

English

French

Italian

Czech

Portuguese

Also, we obtained the following corpora from these sources:

Spanish

English

Arabic

Chinese

Finnish

German

Hungarian

Portuguese

Russian

Team members

Useful links

Acknowledgments

This project would not have been possible without the infrastructure and resources provided by HuggingFace and Google Cloud. Moreover, we want to thank POSTDATA Project (ERC-StG-679528) and the Computational Literary Studies Infrastructure (CLS INFRA No. 101004984) of the European Union's Horizon 2020 research and innovation programme for their support and time allowance.