Spaces:
Runtime error
Runtime error
File size: 630 Bytes
185a893 547e7ab 185a893 547e7ab |
1 2 3 4 5 |
## Abstract
This project is focused on Spanish Image Captioning. Most of the existing datasets and models on this task work with English-only image-text pairs. Our intention here is to show that CLIP Vision + Marian model can be trained on Spanish translation textual checkpoints with pre-trained image encoders and made to perform well enough on this particular task.
Due to lack of good-quality Spanish data, we translate subsets of the Conceptual 12M dataset into Spanish using the Marian MT `Helsinki-NLP/opus-mt-en-es` model. With better translated captions, and hyperparameter-tuning, we expect to see higher performance.
|