Update README.md
Browse files
README.md
CHANGED
@@ -26,9 +26,9 @@ and outperform many of the available open source and closed chat models on commo
|
|
26 |
|
27 |
This repository stores a experimental IQ_1S quantized GGUF Llama 3.1 instruction tuned 70B model.
|
28 |
|
29 |
-
**
|
30 |
|
31 |
-
**
|
32 |
that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT)
|
33 |
and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness
|
34 |
and safety.
|
@@ -37,7 +37,7 @@ and safety.
|
|
37 |
|---------------------|--------------------------------------------|------|-----------------|--------------------------|--------------|---|-----------|----------------|
|
38 |
|Llama 3.1 (text only)|A new mix of publicly available online data.|70B |Multilingual Text|Multilingual Text and code|128k |Yes|15T+ |December 2023 |
|
39 |
|
40 |
-
**
|
41 |
|
42 |
# Quantization Information
|
43 |
|Weight Quantization| PPL |
|
@@ -47,4 +47,23 @@ and safety.
|
|
47 |
|
48 |
Dataset used for re-calibration: Mix of [standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/exllamav2/conversion/standard_cal_data)
|
49 |
|
50 |
-
The generated `imatrix` can be downloaded from [imatrix.dat](https://huggingface.co/npc0/Meta-Llama-3.1-70B-Instruct-IQ_1S/resolve/main/imatrix.dat)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
This repository stores a experimental IQ_1S quantized GGUF Llama 3.1 instruction tuned 70B model.
|
28 |
|
29 |
+
**Model developer**: Meta
|
30 |
|
31 |
+
**Model Architecture**: Llama 3.1 is an auto-regressive language model
|
32 |
that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT)
|
33 |
and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness
|
34 |
and safety.
|
|
|
37 |
|---------------------|--------------------------------------------|------|-----------------|--------------------------|--------------|---|-----------|----------------|
|
38 |
|Llama 3.1 (text only)|A new mix of publicly available online data.|70B |Multilingual Text|Multilingual Text and code|128k |Yes|15T+ |December 2023 |
|
39 |
|
40 |
+
**Supported languages**: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
|
41 |
|
42 |
# Quantization Information
|
43 |
|Weight Quantization| PPL |
|
|
|
47 |
|
48 |
Dataset used for re-calibration: Mix of [standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/exllamav2/conversion/standard_cal_data)
|
49 |
|
50 |
+
The generated `imatrix` can be downloaded from [imatrix.dat](https://huggingface.co/npc0/Meta-Llama-3.1-70B-Instruct-IQ_1S/resolve/main/imatrix.dat)
|
51 |
+
|
52 |
+
**Usage**: with `llama-cpp-python`
|
53 |
+
```python
|
54 |
+
from llama_cpp import Llama
|
55 |
+
|
56 |
+
llm = Llama.from_pretrained(
|
57 |
+
repo_id="npc0/Meta-Llama-3.1-70B-Instruct-IQ_1S",
|
58 |
+
filename="GGUF_FILE",
|
59 |
+
)
|
60 |
+
|
61 |
+
llm.create_chat_completion(
|
62 |
+
messages = [
|
63 |
+
{
|
64 |
+
"role": "user",
|
65 |
+
"content": "What is the capital of France?"
|
66 |
+
}
|
67 |
+
]
|
68 |
+
)
|
69 |
+
```
|