Edit model card

smol_llama-220M-GQA-32k-linear

Experimental model meant to serve as a long-context speculative decoding model.

Created using BEE-spoke-data/smol_llama-220M-GQA and further pretraining at 32768 context length on togethercomputer/RedPajama-Data-1T-Sample.

This variant uses the linear rope scaling method for context extension.

Wikitext Perplexity (64 rows) as evaluated by exllamav2:

Base Model
2048: 20.2193
4096: 102.6928
8192: 235.5210
16384: 390.7198
32768: 515.8053

32k - Linear Rope Scale 16.0
2048: 25.7148
4096: 23.4461
8192: 22.3326
16384: 21.6744
32768: 21.4317

32k - Rope Theta 1000000.0
2048: 20.2158
4096: 18.3868
8192: 17.5976
16384: 17.1462
32768: 16.6989
Downloads last month
13
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Doctor-Shotgun/smol_llama-220M-GQA-32k-linear

Collection including Doctor-Shotgun/smol_llama-220M-GQA-32k-linear