--- license: apache-2.0 datasets: - togethercomputer/RedPajama-Data-1T-Sample language: - en tags: - llama - llama 2 - smol_llama --- # smol_llama-220M-GQA-32k-linear Experimental model meant to serve as a long-context speculative decoding model. Created using [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) and further pretraining at 32768 context length on [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample). This variant uses the linear rope scaling method for context extension. Wikitext Perplexity (64 rows) as evaluated by [exllamav2](https://github.com/turboderp/exllamav2): ``` Base Model 2048: 20.2193 4096: 102.6928 8192: 235.5210 16384: 390.7198 32768: 515.8053 32k - Linear Rope Scale 16.0 2048: 25.7148 4096: 23.4461 8192: 22.3326 16384: 21.6744 32768: 21.4317 32k - Rope Theta 1000000.0 2048: 20.2158 4096: 18.3868 8192: 17.5976 16384: 17.1462 32768: 16.6989 ```