--- tags: - vision - clip - clip4clip - video pipeline_tag: text-to-video --- # Model Card ## Details This model was trained via CLIP4Clip (a CLIP-based a CLIP-based video retrival method, based on this [paper](https://arxiv.org/pdf/2104.08860.pdf) and [code](https://github.com/ArrowLuo/CLIP4Clip). This model was trained on 150k videos from the [WebVid Dataset](https://m-bain.github.io/webvid-dataset/) (a large-scale dataset of short videos with textual descriptions sourced from the web). We adjucted the weights of the clip model we achieved from our training to the model implameted in [clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32) and added few changes for the last layers. ### Use with Transformers ```python import numpy as np import torch from transformers import AutoTokenizer, CLIPTextModelWithProjection search_sentence = "a basketball player performing a slam dunk" model = CLIPTextModelWithProjection.from_pretrained("Diangle/clip4clip-webvid") tokenizer = AutoTokenizer.from_pretrained("Diangle/clip4clip-webvid") inputs = tokenizer(text=search_sentence , return_tensors="pt", padding=True) outputs = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], return_dict=False) # Adding special projection and changing last layers: text_projection = model.state_dict()['text_projection.weight'] text_embeds = outputs[1] @ text_projection final_output = text_embeds[torch.arange(text_embeds.shape[0]), inputs["input_ids"].argmax(dim=-1)] # Normalizing the embeddings: final_output = final_output / final_output.norm(dim=-1, keepdim=True) final_output = final_output.cpu().detach().numpy() sequence_output = final_output / np.sum(final_output**2, axis=1, keepdims=True) print("sequence_output: ", sequence_output) ``` ## Model Use ### Intended Use This model is intended to use for video retrival, look for example **this space**. ### Extra Information For video embedding there is an extra notebook that describes how to embedd videos. ## Performance and Limitations ### Performance We have evaluated the performance ## Limitations ## Feedback ### Where to send questions or comments about the model Please use [this Google Form](https://forms.gle/Uv7afRH5dvY34ZEs9)