dandelin
/

vilt-b32-finetuned-vqa

Visual Question Answering

Inference Endpoints

Model card Files Files and versions Community

nielsr HF staff commited on Jan 23, 2022

Commit

4355f59

•

1 Parent(s): 1d05211

Add code example

Files changed (1) hide show

README.md +24 -5

README.md CHANGED Viewed

@@ -9,17 +9,36 @@ Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by
 Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
-## Model description
-(to do)
 ## Intended uses & limitations
 You can use the raw model for visual question answering.
 ### How to use
-(to do)
 ## Training data

 Disclaimer: The team releasing ViLT did not write a model card for this model so this model card has been written by the Hugging Face team.
 ## Intended uses & limitations
 You can use the raw model for visual question answering.
 ### How to use
+Here is how to use this model in PyTorch:
+```python
+from transformers import ViltProcessor, ViltForQuestionAnswering
+import requests
+from PIL import Image
+# prepare image + question
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+text = "How many cats are there?"
+processor = ViltProcessor.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
+model = ViltForQuestionAnswering.from_pretrained("dandelin/vilt-b32-finetuned-vqa")
+# prepare inputs
+encoding = processor(image, text, return_tensors="pt")
+# forward pass
+outputs = model(**encoding)
+logits = outputs.logits
+idx = logits.argmax(-1).item()
+print("Predicted answer:", model.config.id2label[idx])
+```
 ## Training data