Diangle commited on
Commit
486c8d1
1 Parent(s): e72a24f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -6
README.md CHANGED
@@ -29,9 +29,7 @@ search_sentence = "a basketball player performing a slam dunk"
29
  model = CLIPTextModelWithProjection.from_pretrained("Diangle/clip4clip-webvid")
30
  tokenizer = AutoTokenizer.from_pretrained("Diangle/clip4clip-webvid")
31
 
32
-
33
  inputs = tokenizer(text=search_sentence , return_tensors="pt", padding=True)
34
-
35
  outputs = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], return_dict=False)
36
 
37
  # Special projection and changing last layers:
@@ -46,15 +44,15 @@ sequence_output = final_output / np.sum(final_output**2, axis=1, keepdims=True)
46
  print("sequence_output: ", sequence_output)
47
  ```
48
 
49
-
50
  ## Model Use
51
 
52
  ### Intended Use
53
 
54
- This model is intended to use for video retrival, look for example **this space**.
55
 
56
  ### Extra Information
57
 
 
58
  For video embedding there is an extra notebook that describes how to embedd videos.
59
 
60
 
@@ -63,11 +61,17 @@ For video embedding there is an extra notebook that describes how to embedd vide
63
 
64
  ### Performance
65
 
66
- We have evaluated the performance
67
 
 
 
 
 
 
 
68
 
 
69
 
70
- ## Limitations
71
 
72
 
73
 
 
29
  model = CLIPTextModelWithProjection.from_pretrained("Diangle/clip4clip-webvid")
30
  tokenizer = AutoTokenizer.from_pretrained("Diangle/clip4clip-webvid")
31
 
 
32
  inputs = tokenizer(text=search_sentence , return_tensors="pt", padding=True)
 
33
  outputs = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], return_dict=False)
34
 
35
  # Special projection and changing last layers:
 
44
  print("sequence_output: ", sequence_output)
45
  ```
46
 
 
47
  ## Model Use
48
 
49
  ### Intended Use
50
 
51
+ This model is intended to use for video retrival, look for example this [**space**](https://huggingface.co/spaces/Diangle/Clip4Clip-webvid).
52
 
53
  ### Extra Information
54
 
55
+ We have
56
  For video embedding there is an extra notebook that describes how to embedd videos.
57
 
58
 
 
61
 
62
  ### Performance
63
 
64
+ We have evaluated the performance of differnet models on the last 10k video clips from Webvid database.
65
 
66
+ | Model | R1 | R5 | R10 | MR | MedianR | MeanR
67
+ |------------------------|-------|-------|-------|-----|-----|---------|
68
+ | Zero-shot clip weights | 37.16 | 62.10 | 71.16 | 3.0 | 3.0 | 42.2128
69
+ | CLIP4Clip weights trained on msr-vtt | 38.38 | 62.89 | 72.01 | 3.0 | 3.0 | 39.3023
70
+ | CLIP4Clip trained on 150k Webvid | 50.74 | 77.30 | 85.05 | 1.0 | 1.0 | 14.9535
71
+ | Binarized CLIP4Clip trained on 150k Webvid with rerank100 | 50.56 | 76.39 | 83.51 | 1.0 | 1.0 | 43.2964
72
 
73
+ For more information about the evaluation you can look at this [notebook].
74
 
 
75
 
76
 
77