Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25 • 103
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12 • 35
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels Paper • 2406.09415 • Published Jun 13 • 50
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 65
Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step Paper • 2406.04314 • Published Jun 6 • 27
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions Paper • 2406.04325 • Published Jun 6 • 72
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Paper • 2405.01535 • Published May 2 • 118
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report Paper • 2405.00732 • Published Apr 29 • 118
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites Paper • 2404.16821 • Published Apr 25 • 53
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 254
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 27