Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types Paper • 2409.09269 • Published Sep 14 • 7
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13 • 47
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models Paper • 2405.13974 • Published May 22 • 9
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 118
view article Article Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task By danaaubakirova • May 16 • 17