File size: 2,584 Bytes
5060c83
 
b637639
e39fbde
 
 
 
20c84a2
e39fbde
 
 
 
 
b637639
5060c83
 
b637639
5060c83
 
 
b637639
5060c83
b637639
5060c83
ccad832
 
5060c83
 
 
b637639
 
ccad832
b637639
 
 
 
 
 
 
 
 
 
 
 
e39fbde
 
 
 
 
 
b637639
 
 
 
 
 
 
 
 
 
 
ccad832
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b637639
 
7cbebc1
 
 
20c84a2
7cbebc1
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
library_name: transformers
base_model: "cross-encoder/ms-marco-MiniLM-L-12-v2"
model-index:
- name: esci-ms-marco-MiniLM-L-12-v2
  results:
  - task:
      type: Reranking
    metrics:
    - type: mrr@10
      value: 91.74
    - type: ndcg@10
      value: 84.83
tags: ["cross-encoder", "search", "product-search"]
---

# Model Descripton

<!-- Provide a quick summary of what the model is/does. -->

Fine tunes a cross encoder on the Amazon ESCI dataset.

# Usage

## Transformers

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->


```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from torch import no_grad

model_name = "lv12/esci-ms-marco-MiniLM-L-12-v2"

queries = [
    "adidas shoes",
    "adidas sambas",
    "girls sandals",
    "backpacks",
    "shoes", 
    "mustard blouse"
]
documents =  [
        "Nike Air Max, with air cushion",
        "Adidas Ultraboost, the best boost you can get",
        "Women's sandals wide width 9",
        "Girl's surf backpack",
        "Fresh watermelon, all you can eat",
        "Floral yellow dress with frills and lace"
    ]

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer(
    queries,
    documents,
    padding=True,
    truncation=True,
    return_tensors="pt",
)

model.eval()
with no_grad():
    scores = model(**inputs).logits.cpu().detach().numpy()
    print(scores)
```

### Sentence Transformers

```python
from sentence_transformers import CrossEncoder

model_name = "lv12/esci-ms-marco-MiniLM-L-12-v2"

queries = [
    "adidas shoes",
    "adidas sambas",
    "girls sandals",
    "backpacks",
    "shoes", 
    "mustard blouse"
]
documents =  [
        "Nike Air Max, with air cushion",
        "Adidas Ultraboost, the best boost you can get",
        "Women's sandals wide width 9",
        "Girl's surf backpack",
        "Fresh watermelon, all you can eat",
        "Floral yellow dress with frills and lace"
    ]
model = CrossEncoder(model_name, max_length=512)
scores = model.predict([(q, d) for q, d in zip(queries, documents)])
print(scores)
```

## Training

Trained using MSELoss using `<query, document>` pairs with `grade` as the label.

```python
from sentence_transformers import InputExample

train_samples = [
    InputExample(texts=["query 1", "document 1"], label=0.3),
    InputExample(texts=["query 1", "document 2"], label=0.8),
    InputExample(texts=["query 2", "document 2"], label=0.1),
]
````