Marqo
/

marqo-fashionCLIP

@@ -1,8 +1,66 @@
 ---
 tags:
 - clip
 library_name: open_clip
 pipeline_tag: zero-shot-image-classification
-license: mit
 ---
-# Model card for marqo-fashionCLIP

 ---
 tags:
 - clip
+- e-commerce
+- fashion
+- multimodal retrieval
 library_name: open_clip
 pipeline_tag: zero-shot-image-classification
+license: apache-2.0
+datasets:
+- Marqo/atlas
+- Marqo/deepfashion-inshop
+- Marqo/deepfashion-multimodal
+- Marqo/fashion200k
+- Marqo/iMaterialist
+- Marqo/KAGL
+- Marqo/polyvore
+language:
+- en
+metrics:
+- precision
+- recall
+- MRR
 ---
+# Marqo FashionCLIP Model Card
+Marqo-FashionCLIP leverages Generalised Contrastive Learning ([GCL](https://www.marqo.ai/blog/generalized-contrastive-learning-for-multi-modal-retrieval-and-ranking)) which allows the model to be trained on not just text descriptions but also categories, style, colors, materials, keywords and fine-details to provide highly relevant search results on fashion products.
+The model was fine-tuned from ViT-B-16 (laion2b_s34b_b88k).
+**Github Page**: [Marqo-FashionCLIP](https://github.com/marqo-ai/marqo-FashionCLIP)
+## Usage
+The model can be seamlessly used with [OpenCLIP](https://github.com/mlfoundations/open_clip) by
+```python
+import open_clip
+model, preprocess_train, preprocess_val = open_clip.create_model_and_transforms('hf-hub:Marqo/marqo-fashionCLIP')
+tokenizer = open_clip.get_tokenizer('hf-hub:Marqo/marqo-fashionCLIP')
+```
+## Benchmark Results
+Average evaluation results on 6 public multimodal fashion datasets ([Atlas](https://huggingface.co/datasets/Marqo/atlas), [DeepFashion (In-shop)](https://huggingface.co/datasets/Marqo/deepfashion-inshop), [DeepFashion (Multimodal)](https://huggingface.co/datasets/Marqo/deepfashion-multimodal), [Fashion200k](https://huggingface.co/datasets/Marqo/fashion200k), [KAGL](https://huggingface.co/datasets/Marqo/KAGL), and [Polyvore](https://huggingface.co/datasets/Marqo/polyvore)) are reported below:
+**Text-To-Image (Averaged across 6 datasets)**
+| Model                      | AvgRecall   | Recall@1   | Recall@10   | MRR       |
+|----------------------------|-------------|------------|-------------|-----------|
+| FashionCLIP2.0                | 0.163       | 0.077      | 0.249       | 0.165     |
+| Marqo-FashionCLIP          | **0.192**       | **0.094**      | **0.290**       | **0.200**     |
+| OpenFashionCLIP            | 0.132       | 0.060      | 0.204       | 0.135     |
+| ViT-B-16-laion2b_s34b_b88k | 0.174       | 0.088      | 0.261       | 0.180     |
+**Category-To-Product (Averaged across 5 datasets)**
+| Model                      | AvgP      | P@1       | P@10      | MRR       |
+|----------------------------|-----------|-----------|-----------|-----------|
+| FashionCLIP2.0                | 0.684     | 0.681     | **0.686**     | 0.741     |
+| Marqo-FashionCLIP          | **0.705**     | **0.734**     | 0.676     | **0.776**     |
+| OpenFashionCLIP            | 0.646     | 0.653     | 0.639     | 0.720     |
+| ViT-B-16-laion2b_s34b_b88k | 0.662     | 0.673     | 0.652     | 0.743     |
+**Sub-Category-To-Product (Averaged across 4 datasets)**
+| Model                      | AvgP      | P@1       | P@10      | MRR       |
+|----------------------------|-----------|-----------|-----------|-----------|
+| FashionCLIP2.0                | 0.657     | 0.676     | 0.638     | 0.733     |
+| Marqo-FashionCLIP          | **0.707**     | **0.747**     | **0.667**     | **0.772**     |
+| OpenFashionCLIP            | 0.598     | 0.619     | 0.578     | 0.689     |
+| ViT-B-16-laion2b_s34b_b88k | 0.638     | 0.651     | 0.624     | 0.712     |