metadata
library_name: transformers
license: llama3
language:
- ja
- en
Llama-3-ELYZA-JP-8B-AWQ
Model Description
Llama-3-ELYZA-JP-8B is a large language model trained by ELYZA, Inc. Based on meta-llama/Meta-Llama-3-8B-Instruct, it has been enhanced for Japanese usage through additional pre-training and instruction tuning.
For more details, please refer to our blog post.
Quantization
We have prepared two quantized model options, GGUF and AWQ. This is the AutoAWQ model.
The following table shows the performance degradation due to quantization:
Model | ELYZA-tasks-100 GPT4 score |
---|---|
Llama-3-ELYZA-JP-8B | 3.655 |
Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M) | 3.57 |
Llama-3-ELYZA-JP-8B-AWQ | 3.39 |
Use with vLLM
Install vLLM:
pip install vllm
vLLM Offline Batched Inference
from vllm import LLM, SamplingParams
llm = LLM(model="elyza/Llama-3-ELYZA-JP-8B-AWQ", quantization="awq")
tokenizer = llm.get_tokenizer()
DEFAULT_SYSTEM_PROMPT = "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"
sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=1000)
messages_batch = [
[
{"role": "system", "content": DEFAULT_SYSTEM_PROMPT},
{"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"}
],
[
{"role": "system", "content": DEFAULT_SYSTEM_PROMPT},
{"role": "user", "content": "クマが海辺に行ってアザラシと友達になり、最終的には家に帰るというプロットの短編小説を書いてください。"}
]
]
prompts = [
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
for messages in messages_batch
]
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
print(output.outputs[0].text)
print("=" * 50)
vLLM OpenAI Compatible Server
Start the API server:
python -m vllm.entrypoints.openai.api_server \
--model elyza/Llama-3-ELYZA-JP-8B-AWQ \
--port 8000 \
--host localhost \
--quantization awq
Call the API using curl:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "elyza/Llama-3-ELYZA-JP-8B-AWQ",
"messages": [
{ "role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" },
{ "role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?" }
],
"temperature": 0.6,
"max_tokens": 1000,
"stream": false
}'
Call the API using Python:
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key = "dummy_api_key"
)
completion = client.chat.completions.create(
model="elyza/Llama-3-ELYZA-JP-8B-AWQ",
messages=[
{"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"},
{"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"}
]
)
Developers
Listed in alphabetical order.
License
Meta Llama 3 Community License
How to Cite
@misc{elyzallama2024,
title={elyza/Llama-3-ELYZA-JP-8B},
url={https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B},
author={Masato Hirakawa and Shintaro Horie and Tomoaki Nakamura and Daisuke Oba and Sam Passaglia and Akira Sasaki},
year={2024},
}
Citations
@article{llama3modelcard,
title={Llama 3 Model Card},
author={AI@Meta},
year={2024},
url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}