Update README.md
Browse files
README.md
CHANGED
@@ -29,9 +29,7 @@ search_sentence = "a basketball player performing a slam dunk"
|
|
29 |
model = CLIPTextModelWithProjection.from_pretrained("Diangle/clip4clip-webvid")
|
30 |
tokenizer = AutoTokenizer.from_pretrained("Diangle/clip4clip-webvid")
|
31 |
|
32 |
-
|
33 |
inputs = tokenizer(text=search_sentence , return_tensors="pt", padding=True)
|
34 |
-
|
35 |
outputs = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], return_dict=False)
|
36 |
|
37 |
# Special projection and changing last layers:
|
@@ -46,15 +44,15 @@ sequence_output = final_output / np.sum(final_output**2, axis=1, keepdims=True)
|
|
46 |
print("sequence_output: ", sequence_output)
|
47 |
```
|
48 |
|
49 |
-
|
50 |
## Model Use
|
51 |
|
52 |
### Intended Use
|
53 |
|
54 |
-
This model is intended to use for video retrival, look for example
|
55 |
|
56 |
### Extra Information
|
57 |
|
|
|
58 |
For video embedding there is an extra notebook that describes how to embedd videos.
|
59 |
|
60 |
|
@@ -63,11 +61,17 @@ For video embedding there is an extra notebook that describes how to embedd vide
|
|
63 |
|
64 |
### Performance
|
65 |
|
66 |
-
We have evaluated the performance
|
67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
|
|
|
69 |
|
70 |
-
## Limitations
|
71 |
|
72 |
|
73 |
|
|
|
29 |
model = CLIPTextModelWithProjection.from_pretrained("Diangle/clip4clip-webvid")
|
30 |
tokenizer = AutoTokenizer.from_pretrained("Diangle/clip4clip-webvid")
|
31 |
|
|
|
32 |
inputs = tokenizer(text=search_sentence , return_tensors="pt", padding=True)
|
|
|
33 |
outputs = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], return_dict=False)
|
34 |
|
35 |
# Special projection and changing last layers:
|
|
|
44 |
print("sequence_output: ", sequence_output)
|
45 |
```
|
46 |
|
|
|
47 |
## Model Use
|
48 |
|
49 |
### Intended Use
|
50 |
|
51 |
+
This model is intended to use for video retrival, look for example this [**space**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid).
|
52 |
|
53 |
### Extra Information
|
54 |
|
55 |
+
We have
|
56 |
For video embedding there is an extra notebook that describes how to embedd videos.
|
57 |
|
58 |
|
|
|
61 |
|
62 |
### Performance
|
63 |
|
64 |
+
We have evaluated the performance of differnet models on the last 10k video clips from Webvid database.
|
65 |
|
66 |
+
| Model | R1 | R5 | R10 | MR | MedianR | MeanR
|
67 |
+
|------------------------|-------|-------|-------|-----|-----|---------|
|
68 |
+
| Zero-shot clip weights | 37.16 | 62.10 | 71.16 | 3.0 | 3.0 | 42.2128
|
69 |
+
| CLIP4Clip weights trained on msr-vtt | 38.38 | 62.89 | 72.01 | 3.0 | 3.0 | 39.3023
|
70 |
+
| CLIP4Clip trained on 150k Webvid | 50.74 | 77.30 | 85.05 | 1.0 | 1.0 | 14.9535
|
71 |
+
| Binarized CLIP4Clip trained on 150k Webvid with rerank100 | 50.56 | 76.39 | 83.51 | 1.0 | 1.0 | 43.2964
|
72 |
|
73 |
+
For more information about the evaluation you can look at this [notebook].
|
74 |
|
|
|
75 |
|
76 |
|
77 |
|