kpyu
/

video-blip-flan-t5-xl-ego4d

text2text-generation

image-captioning

video-captioning

visual-question-answering

Inference Endpoints

Model card Files Files and versions Community

kpyu commited on May 17, 2023

Commit

b494c30

•

1 Parent(s): 205d387

Update README.md

Files changed (1) hide show

README.md +29 -0

README.md CHANGED Viewed

@@ -1,3 +1,32 @@
 ---
 license: mit
 ---

 ---
+language: en
 license: mit
+tags:
+- vision
+- image-to-text
+- video-to-text
+- image-captioning
+- video-captioning
+- visual-question-answering
+pipeline_tag: image-to-text
 ---
+# VideoBLIP, Flan T5-xl, fine-tuned on Ego4D
+VideoBLIP model, leveraging [BLIP-2](https://arxiv.org/abs/2301.12597) with [Flan T5-xl](https://huggingface.co/google/flan-t5-xl) (a large language model with 2.7 billion parameters) as its LLM backbone.
+## Model description
+VideoBLIP is an augmented BLIP-2 that can handle videos.
+## Bias, Risks, Limitations, and Ethical Considerations
+VideoBLIP-OPT uses off-the-shelf Flan-T5 as the language model. It inherits the same risks and limitations from [Flan-T5](https://arxiv.org/pdf/2210.11416.pdf):
+> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.
+VideoBLIP has not been tested in real world applications. It should not be directly deployed in any applications. Researchers should first carefully assess the safety and fairness of the model in relation to the specific context they’re being deployed within.
+### How to use
+For code examples, please refer to the [official repository](https://github.com/yukw777/VideoBLIP).