|
--- |
|
language: |
|
- es |
|
metrics: |
|
- accuracy |
|
- f1 |
|
pipeline_tag: visual-question-answering |
|
--- |
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This is a multimodal model for VQA in Spanish |
|
|
|
**Github**:https://github.com/pvbastidas/spanish-vqa |
|
|
|
## Performance |
|
|
|
These are the training results of the 5 epoch. |
|
|
|
Text Transformer: MarIA |
|
|
|
Image Transformer: BEiT |
|
|
|
| Epoch | Step | Loss | eval_loss | eval_wups | eval_acc | eval_f1 | # Trainable Parameters | |
|
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | |
|
| 1 | 624 | 5.046 | 4.231 | 0.173 | 0.135 | 0.006 | 211M | |
|
| 2 | 1248 | 4.198 | 3.896 | 0.224 | 0.198 | 0.013 | 211M | |
|
| 3 | 1872 | 3.834 | 3.729 | 0.260 | 0.236 | 0.024 | 211M | |
|
| 4 | 2496 | 3.569 | 3.598 | 0.272 | 0.249 | 0.029 | 211M | |
|
| 5 | 4680 | 3.358 | 3.566 | 0.274 | 0.251 | 0.030 | 211M | |
|
|
|
|