metadata
license: mit
language:
- en
Model
Here is a Quantized version of Llama-3.1-70B-Instruct using GGUF
GGUF is designed for use with GGML and other executors.
GGUF was developed by @ggerganov who is also the developer of llama.cpp, a popular C/C++ LLM inference framework.
Models initially developed in frameworks like PyTorch can be converted to GGUF format for use with those engines.
Uploaded Quantization Types
Currently, I have uploaded 2 quantized versions:
- Q4_K_M ~ Recommended
- Q5_K_M ~ Recommended
- Q8_0 ~ NOT Recommended
All Quantization Types Possible
Here are all of the Quantization Types that are Possible. Let me know if you need any other versions
# | or | Q# | : | Description Of Quantization Types |
---|---|---|---|---|
2 | or | Q4_0 | : | small, very high quality loss - legacy, prefer using Q3_K_M |
3 | or | Q4_1 | : | small, substantial quality loss - legacy, prefer using Q3_K_L |
8 | or | Q5_0 | : | medium, balanced quality - legacy, prefer using Q4_K_M |
9 | or | Q5_1 | : | medium, low quality loss - legacy, prefer using Q5_K_M |
10 | or | Q2_K | : | smallest, extreme quality loss - NOT Recommended |
12 | or | Q3_K | : | alias for Q3_K_M |
11 | or | Q3_K_S | : | very small, very high quality loss |
12 | or | Q3_K_M | : | very small, high quality loss |
13 | or | Q3_K_L | : | small, high quality loss |
15 | or | Q4_K | : | alias for Q4_K_M |
14 | or | Q4_K_S | : | small, some quality loss |
15 | or | Q4_K_M | : | medium, balanced quality - Recommended |
17 | or | Q5_K | : | alias for Q5_K_M |
16 | or | Q5_K_S | : | large, low quality loss - Recommended |
17 | or | Q5_K_M | : | large, very low quality loss - Recommended |
18 | or | Q6_K | : | very large, very low quality loss |
7 | or | Q8_0 | : | very large, extremely low quality loss |
1 | or | F16 | : | extremely large, virtually no quality loss - NOT Recommended |
0 | or | F32 | : | absolutely huge, lossless - NOT Recommended |
Uses
By using the GGUF version of Llama-3.1-70B-Instruct, you will be able to run this LLM while having to use significantly less resources than you would using the non quantized version.