MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs Paper • 2402.15627 • Published Feb 23 • 32
Beyond Language Models: Byte Models are Digital World Simulators Paper • 2402.19155 • Published Feb 29 • 47
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU Paper • 2403.06504 • Published Mar 11 • 52
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models Paper • 2403.06764 • Published Mar 11 • 24
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Paper • 2404.08801 • Published Apr 12 • 62
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25 • 56
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs Paper • 2406.18629 • Published 9 days ago • 36
Simulating Classroom Education with LLM-Empowered Agents Paper • 2406.19226 • Published 8 days ago • 27
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation Paper • 2406.19215 • Published 8 days ago • 26
Aligning Teacher with Student Preferences for Tailored Training Data Generation Paper • 2406.19227 • Published 8 days ago • 22
T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings Paper • 2406.19223 • Published 8 days ago • 8
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation Paper • 2406.18676 • Published 9 days ago • 5
Direct Preference Knowledge Distillation for Large Language Models Paper • 2406.19774 • Published 7 days ago • 17
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published 4 days ago • 64