daki97
/

visualbert_finetuned_easy_vqa

Question Answering

Inference Endpoints

Model card Files Files and versions Community

AEnigmista commited on Jun 12, 2023

Commit

2b0340d

•

1 Parent(s): 1d41bac

Create README.md

Files changed (1) hide show

README.md +43 -0

README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+---
+license: apache-2.0
+language:
+- en
+tags:
+- visual_bert
+- vqa
+- easy_vqa
+---
+# Visual BERT finetuned on easy_vqa
+This model is a finetuned version of the VisualBERT model on the easy_vqa dataset. The dataset is available at the following [github repo](https://github.com/vzhou842/easy-VQA/tree/master/easy_vqa)
+## VisualBERT
+VisualBERT is a multi-modal vision and language model. It can be used for tasks such as visual question answering, multiple choice and visual reasoning.
+For more info on VisualBERT, please refer to the [documentation](https://huggingface.co/docs/transformers/model_doc/visual_bert#overview)
+## Dataset
+The dataset easy_vqa, with which the model was fine-tuned, can be easily installed via the package easy_vqa:
+```python
+pip install easy_vqa
+```
+An instance of the dataset is composed of a question, the answer of the question (a label) and the id of the image related to the question.
+Each image is 64x64 and contains a shape (rectangle, triangle or circle) filled with a single color (blue, red, green, yellow, black, gray, brown or teal)
+in a random position.
+The questions of the dataset inquire about the shape (e.g. What is the blue shape?), the color of the shape (e.g. What color is the triangle?)
+and the presence of a particular shape/color in both affermative and negative form (e.g. Is there a red shape?).
+Therefore, the possible answers to a question are: the three possible shapes, the eight possible colors, yes and no.
+More information about the package functions which allow to load the images and the questions can be found in the dataset's [repo](https://github.com/vzhou842/easy-VQA/tree/master/easy_vqa)
+as well an utility script to generate new instances of the dataset in case Data Augmentation is needed.
+## How to Use
+Load the image processor and the model with the following code:
+```python
+processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
+model = VisualBertForQuestionAnswering.from_pretrained("daki97/visualbert_finetuned_easy_vqa")
+```
+## COLAB Demo
+An example of the usage of the model with the easy_vqa dataset is available [here](https://colab.research.google.com/drive/1yQfmz6wiSasRl6z-DmP-X403r3lZFqQS#scrollTo=HeVnH8BKkYCI)