HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows Paper • 2409.17433 • Published 6 days ago • 7
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published 4 days ago • 15
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Paper • 2409.17066 • Published 6 days ago • 18
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult Paper • 2409.17545 • Published 5 days ago • 14
MinerU: An Open-Source Solution for Precise Document Content Extraction Paper • 2409.18839 • Published 4 days ago • 16
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published 5 days ago • 42
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published 5 days ago • 30
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 5 days ago • 28
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published 6 days ago • 21
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published 5 days ago • 20
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling Paper • 2409.14683 • Published 8 days ago • 8
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts Paper • 2405.19893 • Published May 30 • 29
Ovis: Structural Embedding Alignment for Multimodal Large Language Model Paper • 2405.20797 • Published May 31 • 24
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published 6 days ago • 56
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 6 days ago • 85
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors Paper • 2409.17058 • Published 6 days ago • 9
Game4Loc: A UAV Geo-Localization Benchmark from Game Data Paper • 2409.16925 • Published 6 days ago • 6
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published 7 days ago • 38
Present and Future Generalization of Synthetic Image Detectors Paper • 2409.14128 • Published 10 days ago • 18
MonoFormer: One Transformer for Both Diffusion and Autoregression Paper • 2409.16280 • Published 7 days ago • 16
MaskBit: Embedding-free Image Generation via Bit Tokens Paper • 2409.16211 • Published 7 days ago • 13
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published 7 days ago • 28
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation Paper • 2409.16283 • Published 7 days ago • 6
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control Paper • 2409.12192 • Published 13 days ago • 4
Seeing Faces in Things: A Model and Dataset for Pareidolia Paper • 2409.16143 • Published 7 days ago • 14
RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning Paper • 2409.14674 • Published 8 days ago • 40
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Paper • 2409.15278 • Published 8 days ago • 21
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs Paper • 2409.14988 • Published 8 days ago • 20
Phantom of Latent for Large Language and Vision Models Paper • 2409.14713 • Published 8 days ago • 26
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Paper • 2409.13592 • Published 11 days ago • 45
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published 11 days ago • 64
Portrait Video Editing Empowered by Multimodal Generative Priors Paper • 2409.13591 • Published 11 days ago • 15
Colorful Diffuse Intrinsic Image Decomposition in the Wild Paper • 2409.13690 • Published 11 days ago • 12
Temporally Aligned Audio for Video with Autoregression Paper • 2409.13689 • Published 11 days ago • 7
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published 12 days ago • 46
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 12 days ago • 126
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation Paper • 2409.12576 • Published 12 days ago • 14
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published 12 days ago • 20
LVCD: Reference-based Lineart Video Colorization with Diffusion Models Paper • 2409.12960 • Published 12 days ago • 20
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation Paper • 2409.12532 • Published 12 days ago • 5
MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions Paper • 2409.12958 • Published 12 days ago • 6
FlexiTex: Enhancing Texture Generation with Visual Guidance Paper • 2409.12431 • Published 12 days ago • 9
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer Paper • 2409.08425 • Published 19 days ago • 9
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey Paper • 2409.11564 • Published 14 days ago • 18
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models Paper • 2409.12139 • Published 13 days ago • 11