CLAIR-A: Leveraging Large Language Models to Judge Audio Captions Paper • 2409.12962 • Published Sep 19 • 2
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions Paper • 2409.12962 • Published Sep 19 • 2 • 2
Visual Haystacks: Answering Harder Questions About Sets of Images Paper • 2407.13766 • Published Jul 18 • 2 • 4
Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition Paper • 2403.19822 • Published Mar 28
Virtual Personas for Language Models via an Anthology of Backstories Paper • 2407.06576 • Published Jul 9
Visual Haystacks: Answering Harder Questions About Sets of Images Paper • 2407.13766 • Published Jul 18 • 2
view post Post 542 Reply 🚨 Launching The Visual Haystacks (VHs) Benchmark: the first "visual-centric" Needle-In-A-Haystack (NIAH) benchmark to assess LMMs' capability in long-context visual retrieval and reasoning. Check it out! tsunghanwu/visual_haystackshttps://visual-haystacks.github.io/https://arxiv.org/abs/2407.13766https://github.com/visual-haystacks/vhs_benchmark 🔥 1 1 +
Visual Haystacks: Answering Harder Questions About Sets of Images Paper • 2407.13766 • Published Jul 18 • 2 • 4
view article Article Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark! By davidchan • Jul 23 • 3
view article Article Are We Ready for Multi-Image Reasoning? Launching VHs: The Visual Haystacks Benchmark! By davidchan • Jul 23 • 3
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition Paper • 2401.02417 • Published Jan 4 • 1
Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification Paper • 2312.14378 • Published Dec 22, 2023