NanoTranslator-L / README.md

upload

977a1b5 2 months ago

4.51 kB

	---
	license: gpl-3.0
	datasets:
	- Mxode/BiST
	language:
	- en
	- zh
	pipeline_tag: translation
	library_name: transformers
	---
	# NanoTranslator-L

	English \| [简体中文](README_zh-CN.md)

	## Introduction

	This is the large model of the NanoTranslator, currently supported only in English to Chinese.

	The ONNX version of the model is also available in the repository.

	All models are collected in the [NanoTranslator Collection](https://huggingface.co/collections/Mxode/nanotranslator-66e1de2ba352e926ae865bd2).

	\| \| P. \| Arch. \| Act. \| V. \| H. \| I. \| L. \| A.H. \| K.H. \| Tie \|
	\| :--: \| :-----: \| :--: \| :--: \| :--: \| :-----: \| :---: \| :------: \| :--: \| :--: \| :--: \|
	\| [XXL2](https://huggingface.co/Mxode/NanoTranslator-XXL2) \| 102 \| LLaMA \| SwiGLU \| 16K \| 1120 \| 3072 \| 6 \| 16 \| 8 \| True \|
	\| [XXL](https://huggingface.co/Mxode/NanoTranslator-XXL) \| 100 \| LLaMA \| SwiGLU \| 16K \| 768 \| 4096 \| 8 \| 24 \| 8 \| True \|
	\| [XL](https://huggingface.co/Mxode/NanoTranslator-XL) \| 78 \| LLaMA \| GeGLU \| 16K \| 768 \| 4096 \| 6 \| 24 \| 8 \| True \|
	\| [L](https://huggingface.co/Mxode/NanoTranslator-L) \| 49 \| LLaMA \| GeGLU \| 16K \| 512 \| 2816 \| 8 \| 16 \| 8 \| True \|
	\| [M2](https://huggingface.co/Mxode/NanoTranslator-M2) \| 22 \| Qwen2 \| GeGLU \| 4K \| 432 \| 2304 \| 6 \| 24 \| 8 \| True \|
	\| [M](https://huggingface.co/Mxode/NanoTranslator-M) \| 22 \| LLaMA \| SwiGLU \| 8K \| 256 \| 1408 \| 16 \| 16 \| 4 \| True \|
	\| [S](https://huggingface.co/Mxode/NanoTranslator-S) \| 9 \| LLaMA \| SwiGLU \| 4K \| 168 \| 896 \| 16 \| 12 \| 4 \| True \|
	\| [XS](https://huggingface.co/Mxode/NanoTranslator-XS) \| 2 \| LLaMA \| SwiGLU \| 2K \| 96 \| 512 \| 12 \| 12 \| 4 \| True \|

	- P. - Parameters (in million)
	- V. - vocab size
	- H. - hidden size
	- I. - intermediate size
	- L. - num layers
	- A.H. - num attention heads
	- K.H. - num kv heads
	- Tie - tie word embeddings



	## How to use

	Prompt format as follows：

	```
	<\|im_start\|> {English Text} <\|endoftext\|>
	```

	### Directly using transformers

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_path = 'Mxode/NanoTranslator-L'

	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(model_path)

	def translate(text: str, model, **kwargs):
	generation_args = dict(
	max_new_tokens = kwargs.pop("max_new_tokens", 512),
	do_sample = kwargs.pop("do_sample", True),
	temperature = kwargs.pop("temperature", 0.55),
	top_p = kwargs.pop("top_p", 0.8),
	top_k = kwargs.pop("top_k", 40),
	**kwargs
	)

	prompt = "<\|im_start\|>" + text + "<\|endoftext\|>"
	model_inputs = tokenizer([prompt], return_tensors="pt").to(model.device)

	generated_ids = model.generate(model_inputs.input_ids, **generation_args)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	return response

	text = "Each step of the cell cycle is monitored by internal."

	response = translate(text, model, max_new_tokens=64, do_sample=False)
	print(response)
	```


	### ONNX

	It has been measured that reasoning with ONNX models will be 2-10 times faster than reasoning directly with transformers models.

	You should switch to [onnx branch](https://huggingface.co/Mxode/NanoTranslator-L/tree/onnx) manually and download to local.

	reference docs:

	- [Export to ONNX](https://huggingface.co/docs/transformers/serialization)
	- [Inference pipelines with the ONNX Runtime accelerator](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/pipelines)

	Using ORTModelForCausalLM

	```python
	from optimum.onnxruntime import ORTModelForCausalLM
	from transformers import AutoTokenizer

	model_path = "your/folder/to/onnx_model"

	ort_model = ORTModelForCausalLM.from_pretrained(model_path)
	tokenizer = AutoTokenizer.from_pretrained(model_path)

	text = "Each step of the cell cycle is monitored by internal."

	response = translate(text, ort_model, max_new_tokens=64, do_sample=False)
	print(response)
	```

	Using pipeline

	```python
	from optimum.pipelines import pipeline

	model_path = "your/folder/to/onnx_model"
	pipe = pipeline("text-generation", model=model_path, accelerator="ort")

	text = "Each step of the cell cycle is monitored by internal."

	response = pipe(text, max_new_tokens=64, do_sample=False)
	response
	```