Searchium-ai
/

clip4clip-webvid150k

zero-shot-image-classification

Inference Endpoints

Model card Files Files and versions Community

Diangle commited on Jun 11, 2023

Commit

486c8d1

•

1 Parent(s): e72a24f

Update README.md

Files changed (1) hide show

README.md +10 -6

README.md CHANGED Viewed

@@ -29,9 +29,7 @@ search_sentence = "a basketball player performing a slam dunk"
 model = CLIPTextModelWithProjection.from_pretrained("Diangle/clip4clip-webvid")
 tokenizer = AutoTokenizer.from_pretrained("Diangle/clip4clip-webvid")
 inputs = tokenizer(text=search_sentence , return_tensors="pt", padding=True)
 outputs = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], return_dict=False)
 # Special projection and changing last layers:
@@ -46,15 +44,15 @@ sequence_output = final_output / np.sum(final_output**2, axis=1, keepdims=True)
 print("sequence_output: ", sequence_output)
 ```
 ## Model Use
 ### Intended Use
-This model is intended to use for video retrival, look for example **this space**.
 ### Extra Information
 For video embedding there is an extra notebook that describes how to embedd videos.
@@ -63,11 +61,17 @@ For video embedding there is an extra notebook that describes how to embedd vide
 ### Performance
-We have evaluated the performance
-## Limitations

 model = CLIPTextModelWithProjection.from_pretrained("Diangle/clip4clip-webvid")
 tokenizer = AutoTokenizer.from_pretrained("Diangle/clip4clip-webvid")
 inputs = tokenizer(text=search_sentence , return_tensors="pt", padding=True)
 outputs = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], return_dict=False)
 # Special projection and changing last layers:
 print("sequence_output: ", sequence_output)
 ```
 ## Model Use
 ### Intended Use
+This model is intended to use for video retrival, look for example this [**space**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid).
 ### Extra Information
+We have
 For video embedding there is an extra notebook that describes how to embedd videos.
 ### Performance
+We have evaluated the performance of differnet models on the last 10k video clips from Webvid database.
+| Model | R1 | R5 | R10 | MR | MedianR | MeanR
+|------------------------|-------|-------|-------|-----|-----|---------|
+| Zero-shot clip weights | 37.16 | 62.10 | 71.16 | 3.0 | 3.0 | 42.2128
+| CLIP4Clip weights trained on msr-vtt | 38.38 | 62.89 | 72.01 | 3.0 | 3.0 | 39.3023
+| CLIP4Clip trained on 150k Webvid | 50.74 | 77.30 | 85.05 | 1.0 | 1.0 | 14.9535
+| Binarized CLIP4Clip trained on 150k Webvid with rerank100 | 50.56 | 76.39 | 83.51 | 1.0 | 1.0 | 43.2964
+For more information about the evaluation you can look at this [notebook].