-
One missing piece in Vision and Language: A Survey on Comics Understanding
Paper • 2409.09502 • Published • 23 -
Comics Datasets Framework: Mix of Comics datasets for detection benchmarking
Paper • 2407.03540 • Published • 3 -
CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding
Paper • 2407.03550 • Published • 2
Vision, Language and Reading
non-profit
AI & ML interests
Multimodal AI, Document Understanding, Reading Systems.
Organization Card
Vision, Language, and Reading Group
At the Computer Vision Center (CVC) in Barcelona, Spain.
The VLR research team conducts fundamental research and technology transfer at the frontier between vision, language and reading systems. We devise reading systems for text in the wild, and incorporate scene text semantics in a multitude of computer vision tasks such as captioning, visual question answering, cross-modal retrieval, fine-grained classification, etc. In parallel, we advance document understanding with a special interest in end-to-end approaches for Document Visual Question Answering.
Collections
2
models
None public yet
datasets
None public yet