ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer Paper • 2410.00086 • Published 2 days ago • 6
Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models Paper • 2410.00231 • Published 2 days ago • 3
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation Paper • 2410.00890 • Published 1 day ago • 7
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published 1 day ago • 15
Visual Question Decomposition on Multimodal Large Language Models Paper • 2409.19339 • Published 4 days ago • 5
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers Paper • 2409.20537 • Published 2 days ago • 10
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models Paper • 2409.20551 • Published 2 days ago • 10
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models Paper • 2409.18943 • Published 5 days ago • 22
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published 2 days ago • 33
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published 16 days ago • 17
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B Paper • 2409.11055 • Published 15 days ago • 16
Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts Paper • 2409.13449 • Published 12 days ago • 7
Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study Paper • 2409.17580 • Published 6 days ago • 6
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends Paper • 2409.14195 • Published 11 days ago • 10
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published 6 days ago • 23
Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction Paper • 2409.18121 • Published 6 days ago • 7
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image Paper • 2409.17280 • Published 7 days ago • 8
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction Paper • 2409.17422 • Published 7 days ago • 22
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 6 days ago • 30
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness Paper • 2409.18125 • Published 6 days ago • 32
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models Paper • 2409.17481 • Published 7 days ago • 42
NoTeeline: Supporting Real-Time Notetaking from Keypoints with Large Language Models Paper • 2409.16493 • Published 8 days ago • 7
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale Paper • 2409.16299 • Published 23 days ago • 9
TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans Paper • 2409.16666 • Published 7 days ago • 5
Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing Paper • 2409.16629 • Published 8 days ago • 9
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion Paper • 2409.17145 • Published 7 days ago • 11
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published 7 days ago • 87
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published 7 days ago • 57
RRM: Robust Reward Model Training Mitigates Reward Hacking Paper • 2409.13156 • Published 13 days ago • 3
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published 8 days ago • 9
OmniBench: Towards The Future of Universal Omni-Language Models Paper • 2409.15272 • Published 9 days ago • 24
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control Paper • 2409.12192 • Published 14 days ago • 4
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation Paper • 2409.16283 • Published 8 days ago • 6
MonoFormer: One Transformer for Both Diffusion and Autoregression Paper • 2409.16280 • Published 8 days ago • 17
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published 8 days ago • 39
SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending Paper • 2409.13926 • Published 12 days ago • 4
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Paper • 2409.15278 • Published 9 days ago • 21
Phantom of Latent for Large Language and Vision Models Paper • 2409.14713 • Published 10 days ago • 26
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? Paper • 2409.15277 • Published 9 days ago • 34
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments Paper • 2409.11276 • Published 15 days ago • 6
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation Paper • 2409.12941 • Published 13 days ago • 19
Colorful Diffuse Intrinsic Image Decomposition in the Wild Paper • 2409.13690 • Published 12 days ago • 12
Portrait Video Editing Empowered by Multimodal Generative Priors Paper • 2409.13591 • Published 12 days ago • 15
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published 12 days ago • 64
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse Paper • 2409.11242 • Published 15 days ago • 4
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark Paper • 2409.11363 • Published 15 days ago • 2
RoMath: A Mathematical Reasoning Benchmark in Romanian Paper • 2409.11074 • Published 15 days ago • 3
Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning Paper • 2409.12001 • Published 14 days ago • 3
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer Paper • 2409.08425 • Published 20 days ago • 9