TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models Paper โข 2410.10818 โข Published Oct 14 โข 14
Attention Prompting on Image for Large Vision-Language Models Paper โข 2409.17143 โข Published Sep 25 โข 7
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally Paper โข 2409.08270 โข Published Sep 12 โข 9
Gated Slot Attention for Efficient Linear-Time Sequence Modeling Paper โข 2409.07146 โข Published Sep 11 โข 19
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Paper โข 2408.00765 โข Published Aug 1 โข 12 โข 9
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities Paper โข 2408.00765 โข Published Aug 1 โข 12 โข 9