LLaMAX
/

LLaMAX2-7B-X-CSQA

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

LLaMAX2-7B-X-CSQA / README.md

LLaMAX's picture

Update README.md

040b0f9 verified 4 months ago

|

history blame contribute delete

2.91 kB

	---
	tags:
	- Multilingual
	license: mit
	language:
	- af
	- am
	- ar
	- hy
	- as
	- ast
	- az
	- be
	- bn
	- bs
	- bg
	- my
	- ca
	- ceb
	- zho
	- hr
	- cs
	- da
	- nl
	- en
	- et
	- tl
	- fi
	- fr
	- ff
	- gl
	- lg
	- ka
	- de
	- el
	- gu
	- ha
	- he
	- hi
	- hu
	- is
	- ig
	- id
	- ga
	- it
	- ja
	- jv
	- kea
	- kam
	- kn
	- kk
	- km
	- ko
	- ky
	- lo
	- lv
	- ln
	- lt
	- luo
	- lb
	- mk
	- ms
	- ml
	- mt
	- mi
	- mr
	- mn
	- ne
	- ns
	- no
	- ny
	- oc
	- or
	- om
	- ps
	- fa
	- pl
	- pt
	- pa
	- ro
	- ru
	- sr
	- sn
	- sd
	- sk
	- sl
	- so
	- ku
	- es
	- sw
	- sv
	- tg
	- ta
	- te
	- th
	- tr
	- uk
	- umb
	- ur
	- uz
	- vi
	- cy
	- wo
	- xh
	- yo
	- zu
	---

	### Model Sources
	- Paper: LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
	- Link: https://arxiv.org/pdf/2407.05975
	- Repository: https://github.com/CONE-MT/LLaMAX/

	### Model Description

	🔥 LLaMAX-7B-X-CSQA is a commonsense reasoning model with multilingual capability, which is fully fine-tuned the powerful multilingual model [LLaMAX-7B](https://huggingface.co/LLaMAX/LLaMAX-7B) on five English commonsense reasoning dataset to train LLaMAX-7B-X-CSQA, including X-CSQA, ARC-Easy, ARC-Challenge, OpenBookQA, and QASC.

	🔥 Compared with fine-tuning Llama-2 on the same setting, LLaMAX-7B-X-CSQA improves the average accuracy up to 4.2% on the X-CSQA dataset.


	### Experiments


	\| X-CSQA \| Avg. \| Sw \| Ur \| Hi \| Ar \| Vi \| Ja \| Pl \| Zh \| Nl \| Ru \| It \| De \| Pt \| Fr \| Es \| En \|
	\|--------------------\|------\|------\|------\|------\|------\|----\|-------\|------\|-------\|----\|------\|------\|-------\|------\|-------\|--------\|--------\|
	\| Llama2-7B-X-CSQA \| 50.9 \| 23.2 \| 24.7 \| 32.9 \| 32.4 \| 51.0 \| 50.0 \| 51.5 \| 55.6 \| 56.9 \| 55.8 \| 58.8 \| 59.9 \| 60.4 \| 61.8 \| 61.9 \| 78.1 \|
	\| LLaMAX-7B-X-CSQA \| 55.1 \| 43.5 \| 39.0 \| 44.1 \| 45.1 \| 54.0 \| 49.9 \| 54.6 \| 58.2 \| 58.9 \| 57.1 \| 59.1 \| 59.0 \| 60.9 \| 61.6 \| 62.7 \| 74.0 \|

	### Model Usage

	Code Example:
	```angular2html
	from transformers import AutoTokenizer, LlamaForCausalLM

	model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
	tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)

	query = "What is someone operating a vehicle likely to be accused of after becoming inebriated? \n Options: A.punish \t B. arrest \t C. automobile accidents \t D. talking nonsense \t E.drunk
	driving \n Answer:"
	inputs = tokenizer(query, return_tensors="pt")

	generate_ids = model.generate(inputs.input_ids, max_length=30)
	tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
	# => E
	```

	### Citation
	if our model helps your work, please cite this paper:

	```
	@article{lu2024llamax,
	title={LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages},
	author={Lu, Yinquan and Zhu, Wenhao and Li, Lei and Qiao, Yu and Yuan, Fei},
	journal={arXiv preprint arXiv:2407.05975},
	year={2024}
	}
	```