Spaces:
Runtime error
Runtime error
## Challenges and Technical Difficulties | |
We faced challenges at every step of the way, despite having some example scripts and models ready by the π€ team in Flax. | |
- The dataset we used - Conceptual 12M took 2-3 days to translate using MBart (since we didn't have Marian at the time). The major bottleneck was implementing the translation efficiently. We tried using `mtranslate` first but it turned out to be too slow, even with multiprocessing. | |
- The translations with deep learning models aren't as "perfect" as translation APIs like Google and Yandex. This could lead to poor performance. | |
- We prepared the model and config classes for our model from scratch, basing it on `CLIP Vision` and `mBART` implementations in Flax. The ViT embeddings should be used inside the BERT embeddings class, which was the major challenge here. | |
- We were only able to get around 1.5 days of training time on TPUs due to above mentioned challenges. We were unable to perform hyperparameter tuning. Our [loss curves on the pre-training model](https://huggingface.co/flax-community/spanish-image-captioning/tensorboard) show that the training hasn't converged, and we could see further improvement in the BLEU scores. |