Compress3D: a Compressed Latent Space for 3D Generation from a Single Image Paper • 2403.13524 • Published Mar 20 • 8
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis Paper • 2403.13501 • Published Mar 20 • 9
Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos Paper • 2403.13044 • Published Mar 19 • 14
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation Paper • 2403.13745 • Published Mar 20 • 11
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model Paper • 2403.13064 • Published Mar 19 • 31
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20 • 58
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework Paper • 2403.13248 • Published Mar 20 • 76
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation Paper • 2403.14621 • Published Mar 21 • 14
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition Paper • 2403.14148 • Published Mar 21 • 17
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks Paper • 2403.14468 • Published Mar 21 • 21
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference Paper • 2403.14520 • Published Mar 21 • 32
FAX: Scalable and Differentiable Federated Primitives in JAX Paper • 2403.07128 • Published Mar 11 • 11
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14 • 20
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring Paper • 2403.09333 • Published Mar 14 • 14
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14 • 25
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking Paper • 2403.09629 • Published Mar 14 • 72
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset Paper • 2403.09029 • Published Mar 14 • 54
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 124
Design2Code: How Far Are We From Automating Front-End Engineering? Paper • 2403.03163 • Published Mar 5 • 93
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 56
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6 • 182
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like Paper • 2402.07383 • Published Feb 12 • 13
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models Paper • 2402.07033 • Published Feb 10 • 16
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement Paper • 2402.07456 • Published Feb 12 • 41
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model Paper • 2402.07827 • Published Feb 12 • 45
IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation Paper • 2402.08682 • Published Feb 13 • 12
Graph Mamba: Towards Learning on Graphs with State Space Models Paper • 2402.08678 • Published Feb 13 • 13
Lumos : Empowering Multimodal LLMs with Scene Text Recognition Paper • 2402.08017 • Published Feb 12 • 24
World Model on Million-Length Video And Language With RingAttention Paper • 2402.08268 • Published Feb 13 • 36
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models Paper • 2402.08714 • Published Feb 13 • 10
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion Paper • 2402.10009 • Published Feb 15 • 18
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation Paper • 2402.10210 • Published Feb 15 • 29
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 34
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts Paper • 2402.09727 • Published Feb 15 • 35
RLVF: Learning from Verbal Feedback without Overgeneralization Paper • 2402.10893 • Published Feb 16 • 10
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots Paper • 2402.10329 • Published Feb 15 • 13
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation Paper • 2402.10491 • Published Feb 16 • 16
SPAR: Personalized Content-Based Recommendation via Long Engagement Attention Paper • 2402.10555 • Published Feb 16 • 32
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss Paper • 2402.10790 • Published Feb 16 • 40
Linear Transformers with Learnable Kernel Functions are Better In-Context Models Paper • 2402.10644 • Published Feb 16 • 78
Learning to Learn Faster from Human Feedback with Language Model Predictive Control Paper • 2402.11450 • Published Feb 18 • 20
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Paper • 2402.12226 • Published Feb 19 • 40
Speculative Streaming: Fast LLM Inference without Auxiliary Models Paper • 2402.11131 • Published Feb 16 • 41