transformers/scripts/tatoeba/README.md · chendl/compositional_test at 0b559ccb3225de3e60f003977f6a492be3f19580

Setup transformers following instructions in README.md, (I would fork first).

git clone [email protected]:huggingface/transformers.git
cd transformers
pip install -e .
pip install pandas GitPython wget

Get required metadata

curl https://cdn-datasets.huggingface.co/language_codes/language-codes-3b2.csv  > language-codes-3b2.csv
curl https://cdn-datasets.huggingface.co/language_codes/iso-639-3.csv > iso-639-3.csv

Install Tatoeba-Challenge repo inside transformers

git clone [email protected]:Helsinki-NLP/Tatoeba-Challenge.git

To convert a few models, call the conversion script from command line:

python src/transformers/models/marian/convert_marian_tatoeba_to_pytorch.py --models heb-eng eng-heb --save_dir converted

To convert lots of models you can pass your list of Tatoeba model names to resolver.convert_models in a python client or script.

from transformers.convert_marian_tatoeba_to_pytorch import TatoebaConverter
resolver = TatoebaConverter(save_dir='converted')
resolver.convert_models(['heb-eng', 'eng-heb'])

Upload converted models

Since version v3.5.0, the model sharing workflow is switched to git-based system . Refer to model sharing doc for more details.

To upload all converted models,

Install git-lfs.
Login to transformers-cli

huggingface-cli login

Run the upload_models script

./scripts/tatoeba/upload_models.sh

Modifications

To change naming logic, change the code near os.rename. The model card creation code may also need to change.
To change model card content, you must modify TatoebaCodeResolver.write_model_card