metadata

language:
  - es
metrics:
  - accuracy
  - f1
pipeline_tag: visual-question-answering

Model Card for Model ID

This is a multimodal model for VQA in Spanish

Performance

These are the training results of the 5 epoch.

Text Transformer: MarIA

Image Transformer: BEiT

Epoch	Step	Loss	eval_loss	eval_wups	eval_acc	eval_f1	# Trainable Parameters
1	624	5.046	4.231	0.173	0.135	0.006	211M
2	1248	4.198	3.896	0.224	0.198	0.013	211M
3	1872	3.834	3.729	0.260	0.236	0.024	211M
4	2496	3.569	3.598	0.272	0.249	0.029	211M
5	4680	3.358	3.566	0.274	0.251	0.030	211M