metadata
language:
- es
metrics:
- accuracy
- f1
pipeline_tag: visual-question-answering
Model Card for Model ID
This is a multimodal model for VQA in Spanish
Github:https://github.com/pvbastidas/spanish-vqa
Performance
These are the training results of the 5 epoch.
Text Transformer: MarIA
Image Transformer: BEiT
Epoch | Step | Loss | eval_loss | eval_wups | eval_acc | eval_f1 | # Trainable Parameters |
---|---|---|---|---|---|---|---|
1 | 624 | 5.046 | 4.231 | 0.173 | 0.135 | 0.006 | 211M |
2 | 1248 | 4.198 | 3.896 | 0.224 | 0.198 | 0.013 | 211M |
3 | 1872 | 3.834 | 3.729 | 0.260 | 0.236 | 0.024 | 211M |
4 | 2496 | 3.569 | 3.598 | 0.272 | 0.249 | 0.029 | 211M |
5 | 4680 | 3.358 | 3.566 | 0.274 | 0.251 | 0.030 | 211M |