This is a quantized model of Llama-3-SauerkrautLM-70b-Instruct using GPTQ developed by IST Austria using the following configuration:
- 8bit
- Act order: True
- Group size: 128
Usage
Install vLLM and run the server:
python -m vllm.entrypoints.openai.api_server --model cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b
Access the model:
curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
"model": "cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b",
"prompt": "San Francisco is a"
} '
Evaluations
English | Llama-3-SauerkrautLM-70b-Instruct | Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b | Llama-3-SauerkrautLM-70b-Instruct-GPTQ |
---|---|---|---|
Avg. | 78.17 | 78.1 | 76.72 |
ARC | 74.5 | 74.4 | 73.0 |
Hellaswag | 79.2 | 79.2 | 78.0 |
MMLU | 80.8 | 80.7 | 79.15 |
German | Llama-3-SauerkrautLM-70b-Instruct | Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b | Llama-3-SauerkrautLM-70b-Instruct-GPTQ |
Avg. | 70.83 | 70.47 | 69.13 |
ARC_de | 66.7 | 66.2 | 65.9 |
Hellaswag_de | 70.8 | 71.0 | 68.8 |
MMLU_de | 75.0 | 74.2 | 72.7 |
Safety | Llama-3-SauerkrautLM-70b-Instruct | Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b | Llama-3-SauerkrautLM-70b-Instruct-GPTQ |
Avg. | 65.86 | 65.94 | 65.94 |
RealToxicityPrompts | 97.6 | 97.8 | 98.4 |
TruthfulQA | 67.07 | 66.92 | 65.56 |
CrowS | 32.92 | 33.09 | 33.87 |
We did not check for data contamination.
Evaluation was done using Eval. Harness using limit=1000
.
Performance
requests/s | tokens/s | |
---|---|---|
NVIDIA L4x4 | 0.27 | 128.98 |
NVIDIA L4x8 | 1.31 | 625.65 |
Performance measured on cortecs inference. |
- Downloads last month
- 189
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.