We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published 1 day ago • 56
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published 5 days ago • 49
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published 4 days ago • 68
Simulating Classroom Education with LLM-Empowered Agents Paper • 2406.19226 • Published 5 days ago • 27
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published 5 days ago • 49
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs Paper • 2406.18629 • Published 6 days ago • 36
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published 7 days ago • 45
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Paper • 2406.18521 • Published 6 days ago • 25
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models Paper • 2406.18510 • Published 6 days ago • 7
MatchTime: Towards Automatic Soccer Game Commentary Generation Paper • 2406.18530 • Published 6 days ago • 11
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models Paper • 2406.17294 • Published 8 days ago • 8
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation Paper • 2406.18522 • Published 6 days ago • 37
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning Paper • 2406.17770 • Published 7 days ago • 18
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Paper • 2311.07919 • Published Nov 14, 2023 • 8
MotionBooth: Motion-Aware Customized Text-to-Video Generation Paper • 2406.17758 • Published 7 days ago • 16
FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models Paper • 2406.16863 • Published 8 days ago • 10
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published 7 days ago • 71
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models Paper • 2406.15704 • Published 11 days ago • 5
IRASim: Learning Interactive Real-Robot Action Simulators Paper • 2406.14540 • Published 12 days ago • 6
ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians Paper • 2406.16815 • Published 8 days ago • 7
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published 10 days ago • 41
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models Paper • 2406.16338 • Published 9 days ago • 22
LongVA Collection Long Context Transfer From Text To Vision: https://lmms-lab.github.io/posts/longva/ • 5 items • Updated 7 days ago • 9
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published 8 days ago • 53
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation Paper • 2406.15252 • Published 11 days ago • 13
EvTexture: Event-driven Texture Enhancement for Video Super-Resolution Paper • 2406.13457 • Published 14 days ago • 12
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs Paper • 2406.15319 • Published 11 days ago • 55
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Paper • 2406.13542 • Published 13 days ago • 15
ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning Paper • 2406.14130 • Published 13 days ago • 10
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published 12 days ago • 75
HARE: HumAn pRiors, a key to small language model Efficiency Paper • 2406.11410 • Published 16 days ago • 37
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks Paper • 2406.12066 • Published 15 days ago • 7
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing Paper • 2406.06523 • Published 22 days ago • 48
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding Paper • 2406.14515 • Published 12 days ago • 27
Depth Anywhere: Enhancing 360 Monocular Depth Estimation via Perspective Distillation and Unlabeled Data Augmentation Paper • 2406.12849 • Published 14 days ago • 48
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI Paper • 2406.12753 • Published 14 days ago • 14
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published 14 days ago • 26
VoCo-LLaMA: Towards Vision Compression with Large Language Models Paper • 2406.12275 • Published 15 days ago • 28
Bootstrapping Language Models with DPO Implicit Rewards Paper • 2406.09760 • Published 19 days ago • 36
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published 15 days ago • 54
Pandora: Towards General World Model with Natural Language Actions and Video States Paper • 2406.09455 • Published 20 days ago • 12
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences Paper • 2406.11069 • Published 16 days ago • 11
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs Paper • 2406.11833 • Published 15 days ago • 60
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers Paper • 2406.10163 • Published 18 days ago • 27
mDPO: Conditional Preference Optimization for Multimodal Large Language Models Paper • 2406.11839 • Published 15 days ago • 36
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery Paper • 2406.08587 • Published 20 days ago • 15
view article Article The CVPR Survival Guide: Discovering Research That's Interesting to YOU! By harpreetsahota • 18 days ago • 9
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published 21 days ago • 30
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models Paper • 2406.06563 • Published 30 days ago • 17