Collections
Discover the best community collections!
Collections including paper arxiv:2410.24211
-
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Paper • 2410.02740 • Published • 52 -
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging
Paper • 2410.01215 • Published • 30 -
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
Paper • 2409.17146 • Published • 103 -
EuroLLM: Multilingual Language Models for Europe
Paper • 2409.16235 • Published • 24
-
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 108 -
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning
Paper • 2408.07931 • Published • 19 -
DELTA: Dense Efficient Long-range 3D Tracking for any video
Paper • 2410.24211 • Published • 8
-
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Paper • 2407.21705 • Published • 25 -
TrackGo: A Flexible and Efficient Method for Controllable Video Generation
Paper • 2408.11475 • Published • 17 -
TVG: A Training-free Transition Video Generation Method with Diffusion Models
Paper • 2408.13413 • Published • 13 -
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Paper • 2409.18964 • Published • 25
-
LocalMamba: Visual State Space Model with Windowed Selective Scan
Paper • 2403.09338 • Published • 7 -
GiT: Towards Generalist Vision Transformer through Universal Language Interface
Paper • 2403.09394 • Published • 25 -
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper • 2402.19479 • Published • 32 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 26