Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 115
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 56
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9 • 29
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect Paper • 2403.03853 • Published Mar 6 • 62
Enhancing Vision-Language Pre-training with Rich Supervisions Paper • 2403.03346 • Published Mar 5 • 14
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Paper • 2312.09390 • Published Dec 14, 2023 • 32
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models Paper • 2312.04410 • Published Dec 7, 2023 • 14
Measuring Faithfulness in Chain-of-Thought Reasoning Paper • 2307.13702 • Published Jul 17, 2023 • 27
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \10,000 Budget; An Extra 4,000 Unlocks 81.8% Accuracy Paper • 2306.15658 • Published Jun 27, 2023 • 12
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought Paper • 2306.12672 • Published Jun 22, 2023 • 26