Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20 • 85
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation Apr 29 • 73
LongNet: Scaling Transformers to 1,000,000,000 Tokens Paper • 2307.02486 • Published Jul 5, 2023 • 80
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers Paper • 2305.07185 • Published May 12, 2023 • 9