Yi-1.5-6B?
#2
by
saishf
- opened
Yi 1.5 is impressive, fast and it's KV cache is pretty small.
There's a lack of Yi-1.5 tunes & it seems like it would be fun and it feels less robotic than llama-3 instruct 🐥
maybe! although, at that parameter size, it does feel like we're getting a bit close to 7/8b to be worth bothering with a slightly worse pretrain :p
Qwen 1.5 32b for some reason has huge KV cache and that curse extended to this model too
Tested with 8K ctx & Flash Attention on KoboldCPP
Model | Size on disk | Vram @ 8Kctx + FA | Difference between model size and vram allocation |
---|---|---|---|
duloxetine-4b-v1 @ Q5_K_M | 2.8GB | 5.9GB | 2.1GB |
Yi-1.5-6B @ Q5_K_M | 4.3GB | 4.6GB | 0.3GB |
WizardLM-2 @ Q5_K_M | 5.1GB | 5.9GB | 0.8GB |
Qwen2-7B @ Q4_K_M | 4.7GB | 4.9GB | 0.2GB |
Yi-1.5-9B @ Q4_K_M | 5.3GB | 5.8GB | 0.5GB |
Llama-3-8B @ Q4_K_M | 4.9GB | 5.7GB | 0.8GB |
duloxetine-4b-v1 @ Q5_K_M
Yi-1.5-6B @ Q5_K_M
WizardLM-2 @ Q5_K_M
Qwen2-7B @ Q4_K_M
Yi-1.5-9B @ Q4_K_M
Llama-3-8B @ Q4_K_M
Edit - Added Llama-3
@saishf
decided to train the 9b instead so it makes a bit more sense over 7b/8b trains, also it has an extended native ctxlen. but its goin