TinyGSM: achieving >80% on GSM8k with small language models
Paper
•
2312.09241
•
Published
•
37
1. TinyGSM: achieving >80% on GSM8k with small language models
Note 2. ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Note 3. Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
Note 4. Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Note 5. Rho-1: Not All Tokens Are What You Need
Note 6. Universal Guidance for Diffusion Models
Note 7. 2BP: 2-Stage Backpropagation
Note 8. LinFusion: 1 GPU, 1 Minute, 16K Image
Note 9. LVCD: Reference-based Lineart Video Colorization with Diffusion Models
Note 10. GRIN: GRadient-INformed MoE
Note 11. Addition is All You Need for Energy-efficient Language Models
Note 12. Reinforcement Learning Textbook