The Surprising Effectiveness of Test-Time Training for Abstract Reasoning Paper • 2411.07279 • Published 10 days ago • 3
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Paper • 2410.13848 • Published Oct 17 • 27
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published Oct 2 • 13
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3 • 36
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1 • 144
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2 • 40
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 56
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 135
Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models Paper • 2408.06663 • Published Aug 13 • 15
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark Paper • 2409.02813 • Published Sep 4 • 28
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Sep 25 • 622
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6 • 33
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published Aug 15 • 44