File size: 9,513 Bytes
370fbdb
 
 
 
 
 
 
 
 
 
 
 
 
 
a2cf896
370fbdb
a2cf896
370fbdb
a2cf896
 
 
 
 
370fbdb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
base_model: [meta-llama/Meta-Llama-3.1-405B-Instruct]
---


# 🚀 CPU optimized quantizations of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) 🖥️

This repository contains CPU-optimized GGUF quantizations of the Meta-Llama-3.1-405B-Instruct model. These quantizations are designed to run efficiently on CPU hardware while maintaining good performance.

## Available Quantizations

1. Q4_0_48 (CPU Optimized): ~264 GB
2. BF16: ~855 GB
3. Q8_0: ~435 GB
x. more coming...

## Use Aria2 for parallelized downloads, links will download 9x faster

>>[!TIP]🐧 On Linux `sudo apt install -y aria2`
>>
>>🍎 On Mac `brew install aria2`
>>
>>Feel free to paste these all in at once or one at a time

### Q4_0_48 (CPU Optimized) Version

```bash
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00001-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00001-of-00006.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00002-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00002-of-00006.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00003-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00003-of-00006.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00004-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00004-of-00006.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00005-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00005-of-00006.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-inst-cpu-optimized-q4048-00006-of-00006.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-405b-inst-cpu-optimized-q4048-00006-of-00006.gguf
```

### BF16 Version

```bash
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00001-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00001-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00002-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00002-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00003-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00003-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00004-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00004-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00005-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00005-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00006-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00006-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00007-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00007-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00008-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00008-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00009-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00009-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00010-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00010-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00011-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00011-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00012-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00012-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00013-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00013-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00014-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00014-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00015-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00015-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00016-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00016-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00017-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00017-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00018-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00018-of-00019.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-bf16-00019-of-00019.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-bf16-00019-of-00019.gguf
```

### Q8_0 Version

```bash
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00001-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00001-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00002-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00002-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00003-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00003-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00004-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00004-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00005-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00005-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00006-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00006-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00007-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00007-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00008-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00008-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00009-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00009-of-00010.gguf
aria2c -x 16 -s 16 -k 1M -o meta-llama-405b-inst-q8_0-00010-of-00010.gguf https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/resolve/main/meta-llama-405b-inst-q8_0-00010-of-00010.gguf
```

## Usage

After downloading, you can use these models with libraries like `llama.cpp`. Here's a basic example:

```bash
 ./llama-cli -t 32 --temp 0.4 -fa -m ~/meow/meta-405b-inst-cpu-optimized-q4048-00001-of-00006.gguf -b 512 -c 9000 -p "Adopt the persona of a NASA JPL mathmatician and firendly programmer that doesnt talk much and answers questions fast and on a first principles basis." -cnv -co -i -ctk"
```

## Model Information

This model is based on the [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) model. It's an instruction-tuned version of the 405B parameter Llama 3.1 model, designed for assistant-like chat and various natural language generation tasks.

Key features:
- 405 billion parameters
- Supports 8 languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai
- 128k context length
- Uses Grouped-Query Attention (GQA) for improved inference scalability

For more detailed information about the base model, please refer to the [original model card](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct).

## License

The use of this model is subject to the [Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE). Please ensure you comply with the license terms when using this model.

## Acknowledgements

Special thanks to the Meta AI team for creating and releasing the Llama 3.1 model series.

## Enjoy; more quants and perplexity benchmarks coming