LLaMAX2-7B-X-CSQA / README.md
LLaMAX's picture
Update README.md
040b0f9 verified
---
tags:
- Multilingual
license: mit
language:
- af
- am
- ar
- hy
- as
- ast
- az
- be
- bn
- bs
- bg
- my
- ca
- ceb
- zho
- hr
- cs
- da
- nl
- en
- et
- tl
- fi
- fr
- ff
- gl
- lg
- ka
- de
- el
- gu
- ha
- he
- hi
- hu
- is
- ig
- id
- ga
- it
- ja
- jv
- kea
- kam
- kn
- kk
- km
- ko
- ky
- lo
- lv
- ln
- lt
- luo
- lb
- mk
- ms
- ml
- mt
- mi
- mr
- mn
- ne
- ns
- no
- ny
- oc
- or
- om
- ps
- fa
- pl
- pt
- pa
- ro
- ru
- sr
- sn
- sd
- sk
- sl
- so
- ku
- es
- sw
- sv
- tg
- ta
- te
- th
- tr
- uk
- umb
- ur
- uz
- vi
- cy
- wo
- xh
- yo
- zu
---
### Model Sources
- **Paper**: LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
- **Link**: https://arxiv.org/pdf/2407.05975
- **Repository**: https://github.com/CONE-MT/LLaMAX/
### Model Description
🔥 LLaMAX-7B-X-CSQA is a commonsense reasoning model with multilingual capability, which is fully fine-tuned the powerful multilingual model [LLaMAX-7B](https://huggingface.co/LLaMAX/LLaMAX-7B) on five English commonsense reasoning dataset to train LLaMAX-7B-X-CSQA, including X-CSQA, ARC-Easy, ARC-Challenge, OpenBookQA, and QASC.
🔥 Compared with fine-tuning Llama-2 on the same setting, LLaMAX-7B-X-CSQA improves the average accuracy up to 4.2% on the X-CSQA dataset.
### Experiments
| X-CSQA | Avg. | Sw | Ur | Hi | Ar | Vi | Ja | Pl | Zh | Nl | Ru | It | De | Pt | Fr | Es | En |
|--------------------|------|------|------|------|------|----|-------|------|-------|----|------|------|-------|------|-------|--------|--------|
| Llama2-7B-X-CSQA | 50.9 | 23.2 | 24.7 | 32.9 | 32.4 | 51.0 | 50.0 | 51.5 | 55.6 | 56.9 | 55.8 | 58.8 | 59.9 | 60.4 | 61.8 | 61.9 | 78.1 |
| LLaMAX-7B-X-CSQA | 55.1 | 43.5 | 39.0 | 44.1 | 45.1 | 54.0 | 49.9 | 54.6 | 58.2 | 58.9 | 57.1 | 59.1 | 59.0 | 60.9 | 61.6 | 62.7 | 74.0 |
### Model Usage
Code Example:
```angular2html
from transformers import AutoTokenizer, LlamaForCausalLM
model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS)
tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER)
query = "What is someone operating a vehicle likely to be accused of after becoming inebriated? \n Options: A.punish \t B. arrest \t C. automobile accidents \t D. talking nonsense \t E.drunk
driving \n Answer:"
inputs = tokenizer(query, return_tensors="pt")
generate_ids = model.generate(inputs.input_ids, max_length=30)
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
# => E
```
### Citation
if our model helps your work, please cite this paper:
```
@article{lu2024llamax,
title={LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages},
author={Lu, Yinquan and Zhu, Wenhao and Li, Lei and Qiao, Yu and Yuan, Fei},
journal={arXiv preprint arXiv:2407.05975},
year={2024}
}
```