SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers Paper • 2410.10629 • Published Oct 14 • 6
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published 7 days ago • 37
Public Domain 12M: A Highly Aesthetic Image-Text Dataset with Novel Governance Mechanisms Paper • 2410.23144 • Published 29 days ago • 4
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper • 2406.11271 • Published Jun 17 • 20
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 74
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Paper • 2410.02367 • Published Oct 3 • 47
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data Paper • 2410.18558 • Published Oct 24 • 18
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function Paper • 2410.21438 • Published about 1 month ago • 1
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments Paper • 2410.23918 • Published 28 days ago • 18
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent Paper • 2411.02265 • Published 24 days ago • 24
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published 21 days ago • 109
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper • 2405.20233 • Published May 30 • 6
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration Paper • 2411.10958 • Published 11 days ago • 47
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17 • 49
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published 9 days ago • 47
nGPT: Normalized Transformer with Representation Learning on the Hypersphere Paper • 2410.01131 • Published Oct 1 • 9
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published 13 days ago • 27
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B Paper • 2406.07394 • Published Jun 11 • 23