|
--- |
|
pipeline_tag: text-generation |
|
inference: false |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- language |
|
- granite-3.0 |
|
- llama-cpp |
|
- gguf-my-repo |
|
base_model: ibm-granite/granite-3.0-8b-instruct |
|
|
|
--- |
|
|
|
# eformat/granite-3.0-8b-instruct-Q4_K_M-GGUF |
|
|
|
Not all tools (vllm, llama.cpp) seem to support the new model config params it seems (25/10/2024). |
|
|
|
```json |
|
# config.json |
|
"model_type": "granite" |
|
"architectures": [ |
|
"GraniteForCausalLM" |
|
] |
|
``` |
|
|
|
This gguf conversion done using old ones |
|
|
|
```json |
|
# config.json |
|
"model_type": "llama" |
|
"architectures": [ |
|
"LlamaForCausalLM" |
|
] |
|
``` |
|
|
|
This gguf loads OK - tested using: |
|
|
|
```bash |
|
# llama.cpp |
|
./llama-server --verbose --gpu-layers 99999 --parallel 2 --ctx-size 4096 -m ~/instructlab/models/granite-3.0-8b-instruct-Q4_K_M.gguf |
|
``` |
|
|
|
```bash |
|
# vllm |
|
vllm serve ~/instructlab/models/granite-3.0-8b-instruct-Q4_K_M.gguf |
|
``` |
|
|