Update README.md
Browse files
README.md
CHANGED
@@ -16,7 +16,7 @@ widget:
|
|
16 |
|
17 |
### Summary
|
18 |
|
19 |
-
|
20 |
|
21 |
### Model Description
|
22 |
|
|
|
16 |
|
17 |
### Summary
|
18 |
|
19 |
+
A Vision-and-Language Pre-training (VLP) model for a fashion-related downstream task, Visual Question Answering (VQA). The related model, ViLT, was proposed in [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) and incorporates text embeddings into a Vision Transformer (ViT), allowing it to have a minimal design for VLP.
|
20 |
|
21 |
### Model Description
|
22 |
|