Edit model card

BEE-spoke-data/smol_llama-220M-GQA-GGUF

Quantized GGUF model files for smol_llama-220M-GQA from BEE-spoke-data

Name Quant method Size
smol_llama-220m-gqa.fp16.gguf fp16 436.50 MB
smol_llama-220m-gqa.q2_k.gguf q2_k 102.60 MB
smol_llama-220m-gqa.q3_k_m.gguf q3_k_m 115.70 MB
smol_llama-220m-gqa.q4_k_m.gguf q4_k_m 137.58 MB
smol_llama-220m-gqa.q5_k_m.gguf q5_k_m 157.91 MB
smol_llama-220m-gqa.q6_k.gguf q6_k 179.52 MB
smol_llama-220m-gqa.q8_0.gguf q8_0 232.28 MB

Original Model Card:

smol_llama: 220M GQA

model card WIP, more details to come

A small 220M param (total) decoder model. This is the first version of the model.

  • 1024 hidden size, 10 layers
  • GQA (32 heads, 8 key-value), context length 2048
  • train-from-scratch on one GPU :)

Downloads last month
106
GGUF
Model size
218M params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for afrideva/smol_llama-220M-GQA-GGUF

Quantized
(3)
this model

Datasets used to train afrideva/smol_llama-220M-GQA-GGUF