CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Paper • 2409.19291 • Published 8 days ago • 13
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment Paper • 2410.01679 • Published 3 days ago • 17
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation Paper • 2410.02458 • Published 2 days ago • 8
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? Paper • 2410.02115 • Published 3 days ago • 8
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Paper • 2410.02367 • Published 3 days ago • 12
MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis Paper • 2410.02103 • Published 3 days ago • 8
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published 3 days ago • 21
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published 2 days ago • 29
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Paper • 2410.02740 • Published 2 days ago • 46
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs Paper • 2410.01518 • Published 3 days ago • 2
FactAlign: Long-form Factuality Alignment of Large Language Models Paper • 2410.01691 • Published 3 days ago • 8
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published 3 days ago • 12
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis Paper • 2409.20059 • Published 6 days ago • 15
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks Paper • 2410.01744 • Published 3 days ago • 21
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation Paper • 2410.01680 • Published 3 days ago • 27
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning Paper • 2410.01044 • Published 4 days ago • 34
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation Paper • 2410.00890 • Published 4 days ago • 14
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published 6 days ago • 48
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published 5 days ago • 27
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models Paper • 2409.20551 • Published 5 days ago • 13
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published 5 days ago • 43
MinerU: An Open-Source Solution for Precise Document Content Extraction Paper • 2409.18839 • Published 8 days ago • 22
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult Paper • 2409.17545 • Published 10 days ago • 16
MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making Paper • 2409.16686 • Published 11 days ago • 7
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models Paper • 2409.17066 • Published 10 days ago • 22
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends Paper • 2409.14195 • Published 14 days ago • 10
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published 10 days ago • 22
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published 9 days ago • 23
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 9 days ago • 34
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published 10 days ago • 43
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published 9 days ago • 32
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published 16 days ago • 23
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 16 days ago • 128
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published 16 days ago • 35
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published 17 days ago • 46
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey Paper • 2409.11564 • Published 18 days ago • 18
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published 17 days ago • 35
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 17 days ago • 69
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models Paper • 2409.11136 • Published 18 days ago • 21
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published 18 days ago • 23
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published 19 days ago • 17
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B Paper • 2409.11055 • Published 19 days ago • 16
Single-Layer Learnable Activation for Implicit Neural Representation (SL^{2}A-INR) Paper • 2409.10836 • Published 19 days ago • 4
OSV: One Step is Enough for High-Quality Image to Video Generation Paper • 2409.11367 • Published 18 days ago • 13
Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records Paper • 2409.07012 • Published 25 days ago • 3
Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models Paper • 2409.06277 • Published 26 days ago • 14