leia-llm
/

Leia-Swallow-7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Leia-Swallow-7b / README.md

ikuyamada's picture

Update README.md

109493f verified 7 months ago

|

1.6 kB

	---
	license: apache-2.0
	language:
	- ja
	---
	# Leia-Swallow-7B

	LEIA is a training technique for autoregressive LLMs that effectively improves their performance in languages other than English by enhancing cross-lingual knowledge transfer from English to a target language.
	This model is constructed by applying LEIA to Swallow, a Japanese-English bilingual LLM based on LLaMA 2.
	The model achieves enhanced performance on six Japanese question-answering benchmarks, as reported below.

	Please refer to our paper or blog post (in Japanese) for further technical details.

	- [LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation](https://arxiv.org/abs/2402.11485) (arxiv.org)
	- [LEIA: 言語間転移学習でLLMを賢くする新しい方法](#) (zenn.dev)

	## Model List

	- [Leia-Swallow-7b](https://huggingface.co/leia-llm/Leia-Swallow-7b/)
	- [Leia-Swallow-13b](https://huggingface.co/leia-llm/Leia-Swallow-13b/)

	## Empirical Results

	The model is assessed using the following six question answering benchmarks:
	- X-CODAH
	- X-CSQA
	- JCommonsenseQA
	- NIILC
	- JEMHopQA
	- JAQKET v2

	\| Model \| X-CODAH \| X-CSQA \| JCommonsenseQA \| NIILC \| JEMHopQA \| JAQKET v2 \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \|
	\| Swallow \|　42.0　\| 41.0 \| 80.3 \| 59.5 \| 50.8 \| 86.2 \|
	\| LEIA \| 42.7 \| 42.4 \| 80.6 \| 60.3 \| 54.7 \| 86.5 \|

	For further details of this experiment, please refer to [our paper](https://arxiv.org/abs/2402.11485).

	## Contributors

	- Ikuya Yamada (Studio Ousia, RIKEN)
	- Ryokan Ri (LY Corporation, SB Intuitions)