File size: 999 Bytes
acc10ef |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
---
license: apache-2.0
datasets:
- togethercomputer/RedPajama-Data-1T-Sample
language:
- en
tags:
- llama
- llama 2
- smol_llama
---
# smol_llama-220M-GQA-32k-linear
Experimental model meant to serve as a long-context speculative decoding model.
Created using [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) and further pretraining at 32768 context length on [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample).
This variant uses the linear rope scaling method for context extension.
Wikitext Perplexity (64 rows) as evaluated by [exllamav2](https://github.com/turboderp/exllamav2):
```
Base Model
2048: 20.2193
4096: 102.6928
8192: 235.5210
16384: 390.7198
32768: 515.8053
32k - Linear Rope Scale 16.0
2048: 25.7148
4096: 23.4461
8192: 22.3326
16384: 21.6744
32768: 21.4317
32k - Rope Theta 1000000.0
2048: 20.2158
4096: 18.3868
8192: 17.5976
16384: 17.1462
32768: 16.6989
``` |