--- tags: - translation - torch==1.8.0 widget: - text: Inference Unavailable language: - th - zh --- ### marianmt-th-zh_cn * source languages: th * target languages: zh_cn * dataset: * model: transformer-align * pre-processing: normalization + SentencePiece * test set scores: 15.53 ## Training Training scripts from [LalitaDeelert/NLP-ZH_TH-Project](https://github.com/LalitaDeelert/NLP-ZH_TH-Project). Experiments tracked at [cstorm125/marianmt-th-zh_cn](https://wandb.ai/cstorm125/marianmt-th-zh_cn). ``` export WANDB_PROJECT=marianmt-th-zh_cn python train_model.py --input_fname ../data/v1/Train.csv \\\\\\\\ \\\\t--output_dir ../models/marianmt-th-zh_cn \\\\\\\\ \\\\t--source_lang th --target_lang zh \\\\\\\\ \\\\t--metric_tokenize zh --fp16 ``` ## Usage ``` from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Lalita/marianmt-zh_cn-th") model = AutoModelForSeq2SeqLM.from_pretrained("Lalita/marianmt-zh_cn-th").cpu() src_text = [ 'ฉันรักคุณ', 'ฉันอยากกินข้าว', ] translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True)) print([tokenizer.decode(t, skip_special_tokens=True) for t in translated]) > ['我爱你', '我想吃饭。'] ``` ## Requirements ``` transformers==4.6.0 torch==1.8.0 ```