Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,11 @@
|
|
|
|
|
|
1 |
An image captioning model [ViT-GPT2](https://huggingface.co/flax-community/vit-gpt2/tree/main) by combining the ViT model and a French GPT2 model.
|
2 |
|
3 |
Part of the [Huggingface JAX/Flax event](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/).
|
4 |
|
5 |
The GPT2 model source code is modified so it can accept an encoder's output.
|
6 |
The pretained weights of both models are loaded, with a set of randomly initialized cross-attention weigths.
|
7 |
-
The model is trained on 65000 images from the COCO dataset for about 1500 steps (batch\_size=256), with the original
|
|
|
|
|
|
1 |
+
**🖼️ When ViT meets GPT-2 📝**
|
2 |
+
|
3 |
An image captioning model [ViT-GPT2](https://huggingface.co/flax-community/vit-gpt2/tree/main) by combining the ViT model and a French GPT2 model.
|
4 |
|
5 |
Part of the [Huggingface JAX/Flax event](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/).
|
6 |
|
7 |
The GPT2 model source code is modified so it can accept an encoder's output.
|
8 |
The pretained weights of both models are loaded, with a set of randomly initialized cross-attention weigths.
|
9 |
+
The model is trained on 65000 images from the COCO dataset for about 1500 steps (batch\_size=256), with the original English cpationis being translated to French for training purpose.
|
10 |
+
|
11 |
+
A HuggingFace Space demo for this model: [🖼️ French Image Captioning Demo 📝](https://huggingface.co/spaces/flax-community/image-caption-french)
|