Can instructBlip process videos

by UncleanCode - opened

I recently looked at the source of the blip2_vicuna-instruct7b on Salesforce/LAVIS repository and found a code for handling videos. I don't know if this is in the hugging face instructBlip model. So I'm asking if instructBlip can handle videos and if yes, how do I go about it?


Thanks for your interest in InstructBLIP. Support for videos is not yet present in the Transformers library. Did the authors release any checkpoints trained on video?

I'm unaware of that currently. I'd check to see if there is. What I saw was just a code line for handling videos with low frame count.

also interested in processing videos

Hi @UncleanCode @louis030195

Can you share the snippet for handling videos from the original authors? That can be probably adapted a bit to use transformers model


I'm trying to run the demo from the page at the end and the model des not generate text instead is give this :

Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:24<00:00, 6.10s/it]
/home/tanya.kaintura/Project/myenv/lib/python3.8/site-packages/transformers/generation/ UserWarning: do_sample is set to False. However, top_p is set to 0.9 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p.
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?


For videos I recommend taking a look at VideoBLIP:

Sign up or log in to comment