README.md · ENLP/mrasp2 at refs/pr/1

metadata

tags:
  - translation
license: apache-2.0
metrics:
  - bleu
  - sacrebleu

一、项目介绍

此项目是参考github上优秀的机器翻译项目mRASP2,将官方开源的fairseq预训练权重改写为transformers架构，使其能够更加方便使用。

二、使用方法

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model_path = 'ENLP/mrasp2'
model = AutoModelForSeq2SeqLM.from_pretrained(model_path, trust_remote_code=True, cache_dir=model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, cache_dir=model_path)
input_text = ["Welcome to download and use!"]
inputs = tokenizer(input_text, return_tensors="pt", padding=True, max_length=1024, truncation=True)
result = model.generate(**inputs)
result = tokenizer.batch_decode(result, skip_special_tokens=True)
result = [pre.strip() for pre in result]
# ['欢迎下载和使用!']

三、使用说明

该模型支持32种语言，更多详细参考mRASP2，此模型库的tokenizer仅针对中英双语进行优化，如果需要使用其他语言请自行参考tokenization_bat.py进行修改。请注意，这是官方的6e6d-no-mono模型，12e12d两个模型暂时无法实现，找不到原因，如果有知道的小伙伴可以分享出来。

四、其他模型

ENLP/mrasp