DavidJung commited on
Commit
e49739d
1 Parent(s): a5a23fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -2
README.md CHANGED
@@ -1,8 +1,66 @@
1
  ---
2
  tags:
3
  - clip
 
 
 
4
  library_name: open_clip
5
  pipeline_tag: zero-shot-image-classification
6
- license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
- # Model card for marqo-fashionCLIP
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  tags:
3
  - clip
4
+ - e-commerce
5
+ - fashion
6
+ - multimodal retrieval
7
  library_name: open_clip
8
  pipeline_tag: zero-shot-image-classification
9
+ license: apache-2.0
10
+ datasets:
11
+ - Marqo/atlas
12
+ - Marqo/deepfashion-inshop
13
+ - Marqo/deepfashion-multimodal
14
+ - Marqo/fashion200k
15
+ - Marqo/iMaterialist
16
+ - Marqo/KAGL
17
+ - Marqo/polyvore
18
+ language:
19
+ - en
20
+ metrics:
21
+ - precision
22
+ - recall
23
+ - MRR
24
  ---
25
+ # Marqo FashionCLIP Model Card
26
+ Marqo-FashionCLIP leverages Generalised Contrastive Learning ([GCL](https://www.marqo.ai/blog/generalized-contrastive-learning-for-multi-modal-retrieval-and-ranking)) which allows the model to be trained on not just text descriptions but also categories, style, colors, materials, keywords and fine-details to provide highly relevant search results on fashion products.
27
+ The model was fine-tuned from ViT-B-16 (laion2b_s34b_b88k).
28
+
29
+ **Github Page**: [Marqo-FashionCLIP](https://github.com/marqo-ai/marqo-FashionCLIP)
30
+
31
+
32
+ ## Usage
33
+ The model can be seamlessly used with [OpenCLIP](https://github.com/mlfoundations/open_clip) by
34
+
35
+ ```python
36
+ import open_clip
37
+ model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:Marqo/marqo-fashionCLIP')
38
+ tokenizer = open_clip.get_tokenizer('hf-hub:Marqo/marqo-fashionCLIP')
39
+ ```
40
+
41
+ ## Benchmark Results
42
+ Average evaluation results on 6 public multimodal fashion datasets ([Atlas](https://huggingface.co/datasets/Marqo/atlas), [DeepFashion (In-shop)](https://huggingface.co/datasets/Marqo/deepfashion-inshop), [DeepFashion (Multimodal)](https://huggingface.co/datasets/Marqo/deepfashion-multimodal), [Fashion200k](https://huggingface.co/datasets/Marqo/fashion200k), [KAGL](https://huggingface.co/datasets/Marqo/KAGL), and [Polyvore](https://huggingface.co/datasets/Marqo/polyvore)) are reported below:
43
+
44
+ **Text-To-Image (Averaged across 6 datasets)**
45
+ | Model | AvgRecall | Recall@1 | Recall@10 | MRR |
46
+ |----------------------------|-------------|------------|-------------|-----------|
47
+ | FashionCLIP2.0 | 0.163 | 0.077 | 0.249 | 0.165 |
48
+ | Marqo-FashionCLIP | **0.192** | **0.094** | **0.290** | **0.200** |
49
+ | OpenFashionCLIP | 0.132 | 0.060 | 0.204 | 0.135 |
50
+ | ViT-B-16-laion2b_s34b_b88k | 0.174 | 0.088 | 0.261 | 0.180 |
51
+
52
+ **Category-To-Product (Averaged across 5 datasets)**
53
+ | Model | AvgP | P@1 | P@10 | MRR |
54
+ |----------------------------|-----------|-----------|-----------|-----------|
55
+ | FashionCLIP2.0 | 0.684 | 0.681 | **0.686** | 0.741 |
56
+ | Marqo-FashionCLIP | **0.705** | **0.734** | 0.676 | **0.776** |
57
+ | OpenFashionCLIP | 0.646 | 0.653 | 0.639 | 0.720 |
58
+ | ViT-B-16-laion2b_s34b_b88k | 0.662 | 0.673 | 0.652 | 0.743 |
59
+
60
+ **Sub-Category-To-Product (Averaged across 4 datasets)**
61
+ | Model | AvgP | P@1 | P@10 | MRR |
62
+ |----------------------------|-----------|-----------|-----------|-----------|
63
+ | FashionCLIP2.0 | 0.657 | 0.676 | 0.638 | 0.733 |
64
+ | Marqo-FashionCLIP | **0.707** | **0.747** | **0.667** | **0.772** |
65
+ | OpenFashionCLIP | 0.598 | 0.619 | 0.578 | 0.689 |
66
+ | ViT-B-16-laion2b_s34b_b88k | 0.638 | 0.651 | 0.624 | 0.712 |