hfl
/

Llama-3-Chinese-8B-Instruct-GGUF

This repository contains Llama-3-Chinese-8B-Instruct-GGUF (llama.cpp/ollama/tgw, etc. compatible), which is the quantized version of Llama-3-Chinese-8B-Instruct.

Note: this is an instruction (chat) model, which can be used for conversation, QA, etc.

Further details (performance, usage, etc.) should refer to GitHub project page: https://github.com/ymcui/Chinese-LLaMA-Alpaca-3

Performance

Metric: PPL, lower is better

Note: Old models have been removed due to its inferior performance (llama.cpp has breaking changes on pre-tokenizer).

Quant Size PPL (old model) 👍🏻 PPL (new model)
Q2_K 2.96 GB 10.3918 +/- 0.13288 9.1168 +/- 0.10711
Q3_K 3.74 GB 6.3018 +/- 0.07849 5.4082 +/- 0.05955
Q4_0 4.34 GB 6.0628 +/- 0.07501 5.2048 +/- 0.05725
Q4_K 4.58 GB 5.9066 +/- 0.07419 5.0189 +/- 0.05520
Q5_0 5.21 GB 5.8562 +/- 0.07355 4.9803 +/- 0.05493
Q5_K 5.34 GB 5.8062 +/- 0.07331 4.9195 +/- 0.05436
Q6_K 6.14 GB 5.7757 +/- 0.07298 4.8966 +/- 0.05413
Q8_0 7.95 GB 5.7626 +/- 0.07272 4.8822 +/- 0.05396
F16 14.97 GB 5.7628 +/- 0.07275 4.8802 +/- 0.05392

Others

Downloads last month
1,641
GGUF
Model size
8.03B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference API
Unable to determine this model's library. Check the docs .

Collection including hfl/llama-3-chinese-8b-instruct-gguf