juanpablomesa commited on
Commit
cdbd278
1 Parent(s): 671e922

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,806 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:9600
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: The median home value in San Carlos, CA is $2,350,000.
35
+ sentences:
36
+ - What does the console property of the WorkerGlobalScope interface provide access
37
+ to?
38
+ - What is the last sold price and date for the property at 4372 W 14th Street Dr,
39
+ Greeley, CO 80634?
40
+ - What is the median home value in San Carlos, CA?
41
+ - source_sentence: The four new principals hired by Superintendent of Schools Ken
42
+ Kenworthy for the Okeechobee school system are Joseph Stanley at Central Elementary,
43
+ Jody Hays at Yearling Middle School, Tuuli Robinson at North Elementary, and Dr.
44
+ Thelma Jackson at Seminole Elementary School.
45
+ sentences:
46
+ - Who won the gold medal in the men's 1,500m final at the speed skating World Cup?
47
+ - What is the purpose of the 1,2,3 bowling activity for toddlers?
48
+ - Who are the four new principals hired by Superintendent of Schools Ken Kenworthy
49
+ for the Okeechobee school system?
50
+ - source_sentence: Twitter Audit is used to scan your followers and find out what
51
+ percentage of them are real people.
52
+ sentences:
53
+ - What is the main product discussed in the context of fair trade?
54
+ - What is the software mentioned in the context suitable for?
55
+ - What is the purpose of the Twitter Audit tool?
56
+ - source_sentence: Michael Czysz made the 2011 E1pc lighter and more powerful than
57
+ the 2010 version, and also improved the software controlling the bike’s D1g1tal
58
+ powertrain.
59
+ sentences:
60
+ - What changes did Michael Czysz make to the 2011 E1pc compared to the 2010 version?
61
+ - What is the author's suggestion for leaving a legacy for future generations?
62
+ - What is the most affordable and reliable option to fix a MacBook according to
63
+ the technician?
64
+ - source_sentence: HTC called the Samsung Galaxy S4 “mainstream”.
65
+ sentences:
66
+ - What is the essential aspect of the vocation to marriage according to Benedict
67
+ XVI's message on the 40th Anniversary of Humanae Vitae?
68
+ - What did HTC announce about the Samsung Galaxy S4?
69
+ - What was Allan Cox's First Class Delivery launched on for his Level 1 certification
70
+ flight?
71
+ model-index:
72
+ - name: BGE base Financial Matryoshka
73
+ results:
74
+ - task:
75
+ type: information-retrieval
76
+ name: Information Retrieval
77
+ dataset:
78
+ name: dim 768
79
+ type: dim_768
80
+ metrics:
81
+ - type: cosine_accuracy@1
82
+ value: 0.9675
83
+ name: Cosine Accuracy@1
84
+ - type: cosine_accuracy@3
85
+ value: 0.9791666666666666
86
+ name: Cosine Accuracy@3
87
+ - type: cosine_accuracy@5
88
+ value: 0.9829166666666667
89
+ name: Cosine Accuracy@5
90
+ - type: cosine_accuracy@10
91
+ value: 0.98875
92
+ name: Cosine Accuracy@10
93
+ - type: cosine_precision@1
94
+ value: 0.9675
95
+ name: Cosine Precision@1
96
+ - type: cosine_precision@3
97
+ value: 0.3263888888888889
98
+ name: Cosine Precision@3
99
+ - type: cosine_precision@5
100
+ value: 0.1965833333333333
101
+ name: Cosine Precision@5
102
+ - type: cosine_precision@10
103
+ value: 0.09887499999999999
104
+ name: Cosine Precision@10
105
+ - type: cosine_recall@1
106
+ value: 0.9675
107
+ name: Cosine Recall@1
108
+ - type: cosine_recall@3
109
+ value: 0.9791666666666666
110
+ name: Cosine Recall@3
111
+ - type: cosine_recall@5
112
+ value: 0.9829166666666667
113
+ name: Cosine Recall@5
114
+ - type: cosine_recall@10
115
+ value: 0.98875
116
+ name: Cosine Recall@10
117
+ - type: cosine_ndcg@10
118
+ value: 0.9776735843960416
119
+ name: Cosine Ndcg@10
120
+ - type: cosine_mrr@10
121
+ value: 0.9741727843915341
122
+ name: Cosine Mrr@10
123
+ - type: cosine_map@100
124
+ value: 0.974471752833939
125
+ name: Cosine Map@100
126
+ - task:
127
+ type: information-retrieval
128
+ name: Information Retrieval
129
+ dataset:
130
+ name: dim 512
131
+ type: dim_512
132
+ metrics:
133
+ - type: cosine_accuracy@1
134
+ value: 0.9641666666666666
135
+ name: Cosine Accuracy@1
136
+ - type: cosine_accuracy@3
137
+ value: 0.9775
138
+ name: Cosine Accuracy@3
139
+ - type: cosine_accuracy@5
140
+ value: 0.9816666666666667
141
+ name: Cosine Accuracy@5
142
+ - type: cosine_accuracy@10
143
+ value: 0.98875
144
+ name: Cosine Accuracy@10
145
+ - type: cosine_precision@1
146
+ value: 0.9641666666666666
147
+ name: Cosine Precision@1
148
+ - type: cosine_precision@3
149
+ value: 0.3258333333333333
150
+ name: Cosine Precision@3
151
+ - type: cosine_precision@5
152
+ value: 0.1963333333333333
153
+ name: Cosine Precision@5
154
+ - type: cosine_precision@10
155
+ value: 0.09887499999999999
156
+ name: Cosine Precision@10
157
+ - type: cosine_recall@1
158
+ value: 0.9641666666666666
159
+ name: Cosine Recall@1
160
+ - type: cosine_recall@3
161
+ value: 0.9775
162
+ name: Cosine Recall@3
163
+ - type: cosine_recall@5
164
+ value: 0.9816666666666667
165
+ name: Cosine Recall@5
166
+ - type: cosine_recall@10
167
+ value: 0.98875
168
+ name: Cosine Recall@10
169
+ - type: cosine_ndcg@10
170
+ value: 0.9758504869144781
171
+ name: Cosine Ndcg@10
172
+ - type: cosine_mrr@10
173
+ value: 0.9717977843915344
174
+ name: Cosine Mrr@10
175
+ - type: cosine_map@100
176
+ value: 0.9720465527215371
177
+ name: Cosine Map@100
178
+ - task:
179
+ type: information-retrieval
180
+ name: Information Retrieval
181
+ dataset:
182
+ name: dim 256
183
+ type: dim_256
184
+ metrics:
185
+ - type: cosine_accuracy@1
186
+ value: 0.9620833333333333
187
+ name: Cosine Accuracy@1
188
+ - type: cosine_accuracy@3
189
+ value: 0.9741666666666666
190
+ name: Cosine Accuracy@3
191
+ - type: cosine_accuracy@5
192
+ value: 0.9804166666666667
193
+ name: Cosine Accuracy@5
194
+ - type: cosine_accuracy@10
195
+ value: 0.98625
196
+ name: Cosine Accuracy@10
197
+ - type: cosine_precision@1
198
+ value: 0.9620833333333333
199
+ name: Cosine Precision@1
200
+ - type: cosine_precision@3
201
+ value: 0.32472222222222225
202
+ name: Cosine Precision@3
203
+ - type: cosine_precision@5
204
+ value: 0.1960833333333333
205
+ name: Cosine Precision@5
206
+ - type: cosine_precision@10
207
+ value: 0.09862499999999999
208
+ name: Cosine Precision@10
209
+ - type: cosine_recall@1
210
+ value: 0.9620833333333333
211
+ name: Cosine Recall@1
212
+ - type: cosine_recall@3
213
+ value: 0.9741666666666666
214
+ name: Cosine Recall@3
215
+ - type: cosine_recall@5
216
+ value: 0.9804166666666667
217
+ name: Cosine Recall@5
218
+ - type: cosine_recall@10
219
+ value: 0.98625
220
+ name: Cosine Recall@10
221
+ - type: cosine_ndcg@10
222
+ value: 0.9737941784937224
223
+ name: Cosine Ndcg@10
224
+ - type: cosine_mrr@10
225
+ value: 0.9698406084656085
226
+ name: Cosine Mrr@10
227
+ - type: cosine_map@100
228
+ value: 0.9702070899963996
229
+ name: Cosine Map@100
230
+ - task:
231
+ type: information-retrieval
232
+ name: Information Retrieval
233
+ dataset:
234
+ name: dim 128
235
+ type: dim_128
236
+ metrics:
237
+ - type: cosine_accuracy@1
238
+ value: 0.9554166666666667
239
+ name: Cosine Accuracy@1
240
+ - type: cosine_accuracy@3
241
+ value: 0.97
242
+ name: Cosine Accuracy@3
243
+ - type: cosine_accuracy@5
244
+ value: 0.9766666666666667
245
+ name: Cosine Accuracy@5
246
+ - type: cosine_accuracy@10
247
+ value: 0.98375
248
+ name: Cosine Accuracy@10
249
+ - type: cosine_precision@1
250
+ value: 0.9554166666666667
251
+ name: Cosine Precision@1
252
+ - type: cosine_precision@3
253
+ value: 0.3233333333333333
254
+ name: Cosine Precision@3
255
+ - type: cosine_precision@5
256
+ value: 0.1953333333333333
257
+ name: Cosine Precision@5
258
+ - type: cosine_precision@10
259
+ value: 0.09837499999999999
260
+ name: Cosine Precision@10
261
+ - type: cosine_recall@1
262
+ value: 0.9554166666666667
263
+ name: Cosine Recall@1
264
+ - type: cosine_recall@3
265
+ value: 0.97
266
+ name: Cosine Recall@3
267
+ - type: cosine_recall@5
268
+ value: 0.9766666666666667
269
+ name: Cosine Recall@5
270
+ - type: cosine_recall@10
271
+ value: 0.98375
272
+ name: Cosine Recall@10
273
+ - type: cosine_ndcg@10
274
+ value: 0.969307497603498
275
+ name: Cosine Ndcg@10
276
+ - type: cosine_mrr@10
277
+ value: 0.9647410714285715
278
+ name: Cosine Mrr@10
279
+ - type: cosine_map@100
280
+ value: 0.9652034022263717
281
+ name: Cosine Map@100
282
+ - task:
283
+ type: information-retrieval
284
+ name: Information Retrieval
285
+ dataset:
286
+ name: dim 64
287
+ type: dim_64
288
+ metrics:
289
+ - type: cosine_accuracy@1
290
+ value: 0.9391666666666667
291
+ name: Cosine Accuracy@1
292
+ - type: cosine_accuracy@3
293
+ value: 0.9616666666666667
294
+ name: Cosine Accuracy@3
295
+ - type: cosine_accuracy@5
296
+ value: 0.9666666666666667
297
+ name: Cosine Accuracy@5
298
+ - type: cosine_accuracy@10
299
+ value: 0.9758333333333333
300
+ name: Cosine Accuracy@10
301
+ - type: cosine_precision@1
302
+ value: 0.9391666666666667
303
+ name: Cosine Precision@1
304
+ - type: cosine_precision@3
305
+ value: 0.3205555555555556
306
+ name: Cosine Precision@3
307
+ - type: cosine_precision@5
308
+ value: 0.1933333333333333
309
+ name: Cosine Precision@5
310
+ - type: cosine_precision@10
311
+ value: 0.09758333333333333
312
+ name: Cosine Precision@10
313
+ - type: cosine_recall@1
314
+ value: 0.9391666666666667
315
+ name: Cosine Recall@1
316
+ - type: cosine_recall@3
317
+ value: 0.9616666666666667
318
+ name: Cosine Recall@3
319
+ - type: cosine_recall@5
320
+ value: 0.9666666666666667
321
+ name: Cosine Recall@5
322
+ - type: cosine_recall@10
323
+ value: 0.9758333333333333
324
+ name: Cosine Recall@10
325
+ - type: cosine_ndcg@10
326
+ value: 0.9577277779716886
327
+ name: Cosine Ndcg@10
328
+ - type: cosine_mrr@10
329
+ value: 0.9519417989417989
330
+ name: Cosine Mrr@10
331
+ - type: cosine_map@100
332
+ value: 0.9525399354798056
333
+ name: Cosine Map@100
334
+ ---
335
+
336
+ # BGE base Financial Matryoshka
337
+
338
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
339
+
340
+ ## Model Details
341
+
342
+ ### Model Description
343
+ - **Model Type:** Sentence Transformer
344
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
345
+ - **Maximum Sequence Length:** 512 tokens
346
+ - **Output Dimensionality:** 768 tokens
347
+ - **Similarity Function:** Cosine Similarity
348
+ <!-- - **Training Dataset:** Unknown -->
349
+ - **Language:** en
350
+ - **License:** apache-2.0
351
+
352
+ ### Model Sources
353
+
354
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
355
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
356
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
357
+
358
+ ### Full Model Architecture
359
+
360
+ ```
361
+ SentenceTransformer(
362
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
363
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
364
+ (2): Normalize()
365
+ )
366
+ ```
367
+
368
+ ## Usage
369
+
370
+ ### Direct Usage (Sentence Transformers)
371
+
372
+ First install the Sentence Transformers library:
373
+
374
+ ```bash
375
+ pip install -U sentence-transformers
376
+ ```
377
+
378
+ Then you can load this model and run inference.
379
+ ```python
380
+ from sentence_transformers import SentenceTransformer
381
+
382
+ # Download from the 🤗 Hub
383
+ model = SentenceTransformer("juanpablomesa/bge-base-financial-matryoshka")
384
+ # Run inference
385
+ sentences = [
386
+ 'HTC called the Samsung Galaxy S4 “mainstream”.',
387
+ 'What did HTC announce about the Samsung Galaxy S4?',
388
+ "What is the essential aspect of the vocation to marriage according to Benedict XVI's message on the 40th Anniversary of Humanae Vitae?",
389
+ ]
390
+ embeddings = model.encode(sentences)
391
+ print(embeddings.shape)
392
+ # [3, 768]
393
+
394
+ # Get the similarity scores for the embeddings
395
+ similarities = model.similarity(embeddings, embeddings)
396
+ print(similarities.shape)
397
+ # [3, 3]
398
+ ```
399
+
400
+ <!--
401
+ ### Direct Usage (Transformers)
402
+
403
+ <details><summary>Click to see the direct usage in Transformers</summary>
404
+
405
+ </details>
406
+ -->
407
+
408
+ <!--
409
+ ### Downstream Usage (Sentence Transformers)
410
+
411
+ You can finetune this model on your own dataset.
412
+
413
+ <details><summary>Click to expand</summary>
414
+
415
+ </details>
416
+ -->
417
+
418
+ <!--
419
+ ### Out-of-Scope Use
420
+
421
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
422
+ -->
423
+
424
+ ## Evaluation
425
+
426
+ ### Metrics
427
+
428
+ #### Information Retrieval
429
+ * Dataset: `dim_768`
430
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
431
+
432
+ | Metric | Value |
433
+ |:--------------------|:-----------|
434
+ | cosine_accuracy@1 | 0.9675 |
435
+ | cosine_accuracy@3 | 0.9792 |
436
+ | cosine_accuracy@5 | 0.9829 |
437
+ | cosine_accuracy@10 | 0.9888 |
438
+ | cosine_precision@1 | 0.9675 |
439
+ | cosine_precision@3 | 0.3264 |
440
+ | cosine_precision@5 | 0.1966 |
441
+ | cosine_precision@10 | 0.0989 |
442
+ | cosine_recall@1 | 0.9675 |
443
+ | cosine_recall@3 | 0.9792 |
444
+ | cosine_recall@5 | 0.9829 |
445
+ | cosine_recall@10 | 0.9888 |
446
+ | cosine_ndcg@10 | 0.9777 |
447
+ | cosine_mrr@10 | 0.9742 |
448
+ | **cosine_map@100** | **0.9745** |
449
+
450
+ #### Information Retrieval
451
+ * Dataset: `dim_512`
452
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
453
+
454
+ | Metric | Value |
455
+ |:--------------------|:----------|
456
+ | cosine_accuracy@1 | 0.9642 |
457
+ | cosine_accuracy@3 | 0.9775 |
458
+ | cosine_accuracy@5 | 0.9817 |
459
+ | cosine_accuracy@10 | 0.9888 |
460
+ | cosine_precision@1 | 0.9642 |
461
+ | cosine_precision@3 | 0.3258 |
462
+ | cosine_precision@5 | 0.1963 |
463
+ | cosine_precision@10 | 0.0989 |
464
+ | cosine_recall@1 | 0.9642 |
465
+ | cosine_recall@3 | 0.9775 |
466
+ | cosine_recall@5 | 0.9817 |
467
+ | cosine_recall@10 | 0.9888 |
468
+ | cosine_ndcg@10 | 0.9759 |
469
+ | cosine_mrr@10 | 0.9718 |
470
+ | **cosine_map@100** | **0.972** |
471
+
472
+ #### Information Retrieval
473
+ * Dataset: `dim_256`
474
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
475
+
476
+ | Metric | Value |
477
+ |:--------------------|:-----------|
478
+ | cosine_accuracy@1 | 0.9621 |
479
+ | cosine_accuracy@3 | 0.9742 |
480
+ | cosine_accuracy@5 | 0.9804 |
481
+ | cosine_accuracy@10 | 0.9862 |
482
+ | cosine_precision@1 | 0.9621 |
483
+ | cosine_precision@3 | 0.3247 |
484
+ | cosine_precision@5 | 0.1961 |
485
+ | cosine_precision@10 | 0.0986 |
486
+ | cosine_recall@1 | 0.9621 |
487
+ | cosine_recall@3 | 0.9742 |
488
+ | cosine_recall@5 | 0.9804 |
489
+ | cosine_recall@10 | 0.9862 |
490
+ | cosine_ndcg@10 | 0.9738 |
491
+ | cosine_mrr@10 | 0.9698 |
492
+ | **cosine_map@100** | **0.9702** |
493
+
494
+ #### Information Retrieval
495
+ * Dataset: `dim_128`
496
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
497
+
498
+ | Metric | Value |
499
+ |:--------------------|:-----------|
500
+ | cosine_accuracy@1 | 0.9554 |
501
+ | cosine_accuracy@3 | 0.97 |
502
+ | cosine_accuracy@5 | 0.9767 |
503
+ | cosine_accuracy@10 | 0.9838 |
504
+ | cosine_precision@1 | 0.9554 |
505
+ | cosine_precision@3 | 0.3233 |
506
+ | cosine_precision@5 | 0.1953 |
507
+ | cosine_precision@10 | 0.0984 |
508
+ | cosine_recall@1 | 0.9554 |
509
+ | cosine_recall@3 | 0.97 |
510
+ | cosine_recall@5 | 0.9767 |
511
+ | cosine_recall@10 | 0.9838 |
512
+ | cosine_ndcg@10 | 0.9693 |
513
+ | cosine_mrr@10 | 0.9647 |
514
+ | **cosine_map@100** | **0.9652** |
515
+
516
+ #### Information Retrieval
517
+ * Dataset: `dim_64`
518
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
519
+
520
+ | Metric | Value |
521
+ |:--------------------|:-----------|
522
+ | cosine_accuracy@1 | 0.9392 |
523
+ | cosine_accuracy@3 | 0.9617 |
524
+ | cosine_accuracy@5 | 0.9667 |
525
+ | cosine_accuracy@10 | 0.9758 |
526
+ | cosine_precision@1 | 0.9392 |
527
+ | cosine_precision@3 | 0.3206 |
528
+ | cosine_precision@5 | 0.1933 |
529
+ | cosine_precision@10 | 0.0976 |
530
+ | cosine_recall@1 | 0.9392 |
531
+ | cosine_recall@3 | 0.9617 |
532
+ | cosine_recall@5 | 0.9667 |
533
+ | cosine_recall@10 | 0.9758 |
534
+ | cosine_ndcg@10 | 0.9577 |
535
+ | cosine_mrr@10 | 0.9519 |
536
+ | **cosine_map@100** | **0.9525** |
537
+
538
+ <!--
539
+ ## Bias, Risks and Limitations
540
+
541
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
542
+ -->
543
+
544
+ <!--
545
+ ### Recommendations
546
+
547
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
548
+ -->
549
+
550
+ ## Training Details
551
+
552
+ ### Training Dataset
553
+
554
+ #### Unnamed Dataset
555
+
556
+
557
+ * Size: 9,600 training samples
558
+ * Columns: <code>positive</code> and <code>anchor</code>
559
+ * Approximate statistics based on the first 1000 samples:
560
+ | | positive | anchor |
561
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
562
+ | type | string | string |
563
+ | details | <ul><li>min: 3 tokens</li><li>mean: 50.19 tokens</li><li>max: 435 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 18.66 tokens</li><li>max: 43 tokens</li></ul> |
564
+ * Samples:
565
+ | positive | anchor |
566
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------|
567
+ | <code>The Berry Export Summary 2028 is a dedicated export plan for the Australian strawberry, raspberry, and blackberry industries. It maps the sectors’ current position, where they want to be, high-opportunity markets, and next steps. The purpose of this plan is to grow their global presence over the next 10 years.</code> | <code>What is the Berry Export Summary 2028 and what is its purpose?</code> |
568
+ | <code>Benefits reported from having access to Self-supply water sources include convenience, less time spent for fetching water and access to more and better quality water. In some areas, Self-supply sources offer important added values such as water for productive use, income generation, family safety and improved food security.</code> | <code>What are some of the benefits reported from having access to Self-supply water sources?</code> |
569
+ | <code>The unique features of the Coolands for Twitter app include Real-Time updates without the need for a refresh button, Avatar Indicator which shows small avatars on the title bar for new messages, Direct Link for intuitive and convenient link opening, Smart Bookmark to easily return to previous reading position, and User Level Notification which allows customized notification settings for different users.</code> | <code>What are the unique features of the Coolands for Twitter app?</code> |
570
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
571
+ ```json
572
+ {
573
+ "loss": "MultipleNegativesRankingLoss",
574
+ "matryoshka_dims": [
575
+ 768,
576
+ 512,
577
+ 256,
578
+ 128,
579
+ 64
580
+ ],
581
+ "matryoshka_weights": [
582
+ 1,
583
+ 1,
584
+ 1,
585
+ 1,
586
+ 1
587
+ ],
588
+ "n_dims_per_step": -1
589
+ }
590
+ ```
591
+
592
+ ### Training Hyperparameters
593
+ #### Non-Default Hyperparameters
594
+
595
+ - `eval_strategy`: epoch
596
+ - `per_device_train_batch_size`: 32
597
+ - `per_device_eval_batch_size`: 16
598
+ - `gradient_accumulation_steps`: 16
599
+ - `learning_rate`: 2e-05
600
+ - `num_train_epochs`: 4
601
+ - `lr_scheduler_type`: cosine
602
+ - `warmup_ratio`: 0.1
603
+ - `bf16`: True
604
+ - `tf32`: True
605
+ - `load_best_model_at_end`: True
606
+ - `optim`: adamw_torch_fused
607
+ - `batch_sampler`: no_duplicates
608
+
609
+ #### All Hyperparameters
610
+ <details><summary>Click to expand</summary>
611
+
612
+ - `overwrite_output_dir`: False
613
+ - `do_predict`: False
614
+ - `eval_strategy`: epoch
615
+ - `prediction_loss_only`: True
616
+ - `per_device_train_batch_size`: 32
617
+ - `per_device_eval_batch_size`: 16
618
+ - `per_gpu_train_batch_size`: None
619
+ - `per_gpu_eval_batch_size`: None
620
+ - `gradient_accumulation_steps`: 16
621
+ - `eval_accumulation_steps`: None
622
+ - `learning_rate`: 2e-05
623
+ - `weight_decay`: 0.0
624
+ - `adam_beta1`: 0.9
625
+ - `adam_beta2`: 0.999
626
+ - `adam_epsilon`: 1e-08
627
+ - `max_grad_norm`: 1.0
628
+ - `num_train_epochs`: 4
629
+ - `max_steps`: -1
630
+ - `lr_scheduler_type`: cosine
631
+ - `lr_scheduler_kwargs`: {}
632
+ - `warmup_ratio`: 0.1
633
+ - `warmup_steps`: 0
634
+ - `log_level`: passive
635
+ - `log_level_replica`: warning
636
+ - `log_on_each_node`: True
637
+ - `logging_nan_inf_filter`: True
638
+ - `save_safetensors`: True
639
+ - `save_on_each_node`: False
640
+ - `save_only_model`: False
641
+ - `restore_callback_states_from_checkpoint`: False
642
+ - `no_cuda`: False
643
+ - `use_cpu`: False
644
+ - `use_mps_device`: False
645
+ - `seed`: 42
646
+ - `data_seed`: None
647
+ - `jit_mode_eval`: False
648
+ - `use_ipex`: False
649
+ - `bf16`: True
650
+ - `fp16`: False
651
+ - `fp16_opt_level`: O1
652
+ - `half_precision_backend`: auto
653
+ - `bf16_full_eval`: False
654
+ - `fp16_full_eval`: False
655
+ - `tf32`: True
656
+ - `local_rank`: 0
657
+ - `ddp_backend`: None
658
+ - `tpu_num_cores`: None
659
+ - `tpu_metrics_debug`: False
660
+ - `debug`: []
661
+ - `dataloader_drop_last`: False
662
+ - `dataloader_num_workers`: 0
663
+ - `dataloader_prefetch_factor`: None
664
+ - `past_index`: -1
665
+ - `disable_tqdm`: False
666
+ - `remove_unused_columns`: True
667
+ - `label_names`: None
668
+ - `load_best_model_at_end`: True
669
+ - `ignore_data_skip`: False
670
+ - `fsdp`: []
671
+ - `fsdp_min_num_params`: 0
672
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
673
+ - `fsdp_transformer_layer_cls_to_wrap`: None
674
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
675
+ - `deepspeed`: None
676
+ - `label_smoothing_factor`: 0.0
677
+ - `optim`: adamw_torch_fused
678
+ - `optim_args`: None
679
+ - `adafactor`: False
680
+ - `group_by_length`: False
681
+ - `length_column_name`: length
682
+ - `ddp_find_unused_parameters`: None
683
+ - `ddp_bucket_cap_mb`: None
684
+ - `ddp_broadcast_buffers`: False
685
+ - `dataloader_pin_memory`: True
686
+ - `dataloader_persistent_workers`: False
687
+ - `skip_memory_metrics`: True
688
+ - `use_legacy_prediction_loop`: False
689
+ - `push_to_hub`: False
690
+ - `resume_from_checkpoint`: None
691
+ - `hub_model_id`: None
692
+ - `hub_strategy`: every_save
693
+ - `hub_private_repo`: False
694
+ - `hub_always_push`: False
695
+ - `gradient_checkpointing`: False
696
+ - `gradient_checkpointing_kwargs`: None
697
+ - `include_inputs_for_metrics`: False
698
+ - `eval_do_concat_batches`: True
699
+ - `fp16_backend`: auto
700
+ - `push_to_hub_model_id`: None
701
+ - `push_to_hub_organization`: None
702
+ - `mp_parameters`:
703
+ - `auto_find_batch_size`: False
704
+ - `full_determinism`: False
705
+ - `torchdynamo`: None
706
+ - `ray_scope`: last
707
+ - `ddp_timeout`: 1800
708
+ - `torch_compile`: False
709
+ - `torch_compile_backend`: None
710
+ - `torch_compile_mode`: None
711
+ - `dispatch_batches`: None
712
+ - `split_batches`: None
713
+ - `include_tokens_per_second`: False
714
+ - `include_num_input_tokens_seen`: False
715
+ - `neftune_noise_alpha`: None
716
+ - `optim_target_modules`: None
717
+ - `batch_eval_metrics`: False
718
+ - `batch_sampler`: no_duplicates
719
+ - `multi_dataset_batch_sampler`: proportional
720
+
721
+ </details>
722
+
723
+ ### Training Logs
724
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
725
+ |:--------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
726
+ | 0.5333 | 10 | 0.6065 | - | - | - | - | - |
727
+ | 0.96 | 18 | - | 0.9583 | 0.9674 | 0.9695 | 0.9372 | 0.9708 |
728
+ | 1.0667 | 20 | 0.3313 | - | - | - | - | - |
729
+ | 1.6 | 30 | 0.144 | - | - | - | - | - |
730
+ | 1.9733 | 37 | - | 0.9630 | 0.9699 | 0.9716 | 0.9488 | 0.9745 |
731
+ | 2.1333 | 40 | 0.1317 | - | - | - | - | - |
732
+ | 2.6667 | 50 | 0.0749 | - | - | - | - | - |
733
+ | 2.9867 | 56 | - | 0.9650 | 0.9701 | 0.9721 | 0.9522 | 0.9747 |
734
+ | 3.2 | 60 | 0.088 | - | - | - | - | - |
735
+ | 3.7333 | 70 | 0.0598 | - | - | - | - | - |
736
+ | **3.84** | **72** | **-** | **0.9652** | **0.9702** | **0.972** | **0.9525** | **0.9745** |
737
+
738
+ * The bold row denotes the saved checkpoint.
739
+
740
+ ### Framework Versions
741
+ - Python: 3.11.5
742
+ - Sentence Transformers: 3.0.1
743
+ - Transformers: 4.41.2
744
+ - PyTorch: 2.1.2+cu121
745
+ - Accelerate: 0.31.0
746
+ - Datasets: 2.19.1
747
+ - Tokenizers: 0.19.1
748
+
749
+ ## Citation
750
+
751
+ ### BibTeX
752
+
753
+ #### Sentence Transformers
754
+ ```bibtex
755
+ @inproceedings{reimers-2019-sentence-bert,
756
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
757
+ author = "Reimers, Nils and Gurevych, Iryna",
758
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
759
+ month = "11",
760
+ year = "2019",
761
+ publisher = "Association for Computational Linguistics",
762
+ url = "https://arxiv.org/abs/1908.10084",
763
+ }
764
+ ```
765
+
766
+ #### MatryoshkaLoss
767
+ ```bibtex
768
+ @misc{kusupati2024matryoshka,
769
+ title={Matryoshka Representation Learning},
770
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
771
+ year={2024},
772
+ eprint={2205.13147},
773
+ archivePrefix={arXiv},
774
+ primaryClass={cs.LG}
775
+ }
776
+ ```
777
+
778
+ #### MultipleNegativesRankingLoss
779
+ ```bibtex
780
+ @misc{henderson2017efficient,
781
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
782
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
783
+ year={2017},
784
+ eprint={1705.00652},
785
+ archivePrefix={arXiv},
786
+ primaryClass={cs.CL}
787
+ }
788
+ ```
789
+
790
+ <!--
791
+ ## Glossary
792
+
793
+ *Clearly define terms in order to be accessible across audiences.*
794
+ -->
795
+
796
+ <!--
797
+ ## Model Card Authors
798
+
799
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
800
+ -->
801
+
802
+ <!--
803
+ ## Model Card Contact
804
+
805
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
806
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.1.2+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:28f7d007f8e0ab61c15a918b58edc7adaeb9abc74f0893704d8d842d83525358
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff