File size: 866 Bytes
8df1080
 
 
 
 
 
 
 
 
 
 
 
 
 
b87ae9a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
---
pipeline_tag: text-generation
inference: false
license: apache-2.0
library_name: transformers
tags:
- language
- granite-3.0
- llama-cpp
- gguf-my-repo
base_model: ibm-granite/granite-3.0-8b-instruct

---

# eformat/granite-3.0-8b-instruct-Q4_K_M-GGUF

Not all tools (vllm, llama.cpp) seem to support the new model config params it seems (25/10/2024).

```json
# config.json
"model_type": "granite"
"architectures": [
  "GraniteForCausalLM"
]
```

This gguf conversion done using old ones

```json
# config.json
"model_type": "llama"
"architectures": [
  "LlamaForCausalLM"
]
```

This gguf loads OK - tested using:

```bash
# llama.cpp
./llama-server --verbose --gpu-layers 99999 --parallel 2 --ctx-size 4096 -m ~/instructlab/models/granite-3.0-8b-instruct-Q4_K_M.gguf
```

```bash
# vllm
vllm serve ~/instructlab/models/granite-3.0-8b-instruct-Q4_K_M.gguf
```