CLA-Experiments
Collection
2 items
•
Updated
Training Script: https://github.com/AnswerDotAI/fsdp_qlora/blob/3f7c583e985ff35e37a7b7497a7d4fedb77df695/experiments/cla/train.sh
This model shares KV activations every 2 layers. For example, layer 1 uses layer 0 kv activations, layer 3 uses layer 2 kv activations, etc..