NeoChen1024
/

dolphin-2.9.4-llama3.1-8b-GGUF

Inference Endpoints

Model card Files Files and versions Community

Edit model card

GGUF quants of cognitivecomputations/dolphin-2.9.4-llama3.1-8b, here I have:

IQ4_XS (4.2G,  8.7992 +/- 0.11237, fits into 8GiB VRAM + 4096 context with F16 KV cache)
Q4_K_M (4.6G,  8.7948 +/- 0.11223, fits into 8GiB VRAM + 4096 context with F16 KV cache, also good for CPU inference on E5-26xx v3/v4)
Q8_0   (8.0G,  8.5970 +/- 0.10933, imatrix derived from it)
F16    ( 15G,  8.6617 +/- 0.11043, for 24GiB VRAM)

Perplexity measured with -fa -c 2048 -ub 2048 on UTF-8 text version of "Wired Love" from Project Gutenberg.

Downloads last month: 89

GGUF

Model size

8.03B params

Architecture

llama

4-bit

8-bit

16-bit

Inference API

Unable to determine this model's library. Check the docs .

Model tree for NeoChen1024/dolphin-2.9.4-llama3.1-8b-GGUF

Base model

meta-llama/Llama-3.1-8B

Finetuned

cognitivecomputations/dolphin-2.9.4-llama3.1-8b

Quantized

(25)

this model