npc0
/

Meta-Llama-3.1-70B-Instruct-IQ_1S

Text Generation

Inference Endpoints

Model card Files Files and versions Community

npc0 commited on Aug 26

Commit

2b3c4db

•

1 Parent(s): fadde25

Update README.md

Files changed (1) hide show

README.md +23 -4

README.md CHANGED Viewed

@@ -26,9 +26,9 @@ and outperform many of the available open source and closed chat models on commo
 This repository stores a experimental IQ_1S quantized GGUF Llama 3.1 instruction tuned 70B model.
-** Model developer **: Meta
-** Model Architecture **: Llama 3.1 is an auto-regressive language model
 that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT)
 and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness
 and safety.
@@ -37,7 +37,7 @@ and safety.
 |---------------------|--------------------------------------------|------|-----------------|--------------------------|--------------|---|-----------|----------------|
 |Llama 3.1 (text only)|A new mix of publicly available online data.|70B    |Multilingual Text|Multilingual Text and code|128k          |Yes|15T+       |December 2023   |
-** Supported languages **: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
 # Quantization Information
 |Weight Quantization| PPL                |
@@ -47,4 +47,23 @@ and safety.
 Dataset used for re-calibration: Mix of [standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/exllamav2/conversion/standard_cal_data)
-The generated `imatrix` can be downloaded from [imatrix.dat](https://huggingface.co/npc0/Meta-Llama-3.1-70B-Instruct-IQ_1S/resolve/main/imatrix.dat)

 This repository stores a experimental IQ_1S quantized GGUF Llama 3.1 instruction tuned 70B model.
+**Model developer**: Meta
+**Model Architecture**: Llama 3.1 is an auto-regressive language model
 that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT)
 and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness
 and safety.
 |---------------------|--------------------------------------------|------|-----------------|--------------------------|--------------|---|-----------|----------------|
 |Llama 3.1 (text only)|A new mix of publicly available online data.|70B    |Multilingual Text|Multilingual Text and code|128k          |Yes|15T+       |December 2023   |
+**Supported languages**: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
 # Quantization Information
 |Weight Quantization| PPL                |
 Dataset used for re-calibration: Mix of [standard_cal_data](https://github.com/turboderp/exllamav2/tree/master/exllamav2/conversion/standard_cal_data)
+The generated `imatrix` can be downloaded from [imatrix.dat](https://huggingface.co/npc0/Meta-Llama-3.1-70B-Instruct-IQ_1S/resolve/main/imatrix.dat)
+**Usage**: with `llama-cpp-python`
+```python
+from llama_cpp import Llama
+llm = Llama.from_pretrained(
+	repo_id="npc0/Meta-Llama-3.1-70B-Instruct-IQ_1S",
+	filename="GGUF_FILE",
+)
+llm.create_chat_completion(
+		messages = [
+			{
+				"role": "user",
+				"content": "What is the capital of France?"
+			}
+		]
+)
+```