view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention By sirluk • Oct 7 • 7