Tomor0720 commited on
Commit
68f6838
1 Parent(s): b9f2e59

Add new SentenceTransformer model.

Browse files
Files changed (4) hide show
  1. README.md +733 -0
  2. config.json +30 -0
  3. config_sentence_transformers.json +10 -0
  4. modules.json +8 -0
README.md ADDED
@@ -0,0 +1,733 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: sentence-transformers
3
+ pipeline_tag: sentence-similarity
4
+ tags:
5
+ - sentence-transformers
6
+ - sentence-similarity
7
+ - feature-extraction
8
+ ---
9
+
10
+ # SentenceTransformer
11
+
12
+ This is a [sentence-transformers](https://www.SBERT.net) model trained. It maps sentences & paragraphs to a None-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
13
+
14
+ ## Model Details
15
+
16
+ ### Model Description
17
+ - **Model Type:** Sentence Transformer
18
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
19
+ - **Maximum Sequence Length:** None tokens
20
+ - **Output Dimensionality:** None tokens
21
+ - **Similarity Function:** Cosine Similarity
22
+ <!-- - **Training Dataset:** Unknown -->
23
+ <!-- - **Language:** Unknown -->
24
+ <!-- - **License:** Unknown -->
25
+
26
+ ### Model Sources
27
+
28
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
29
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
30
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
31
+
32
+ ### Full Model Architecture
33
+
34
+ ```
35
+ SentenceTransformer(
36
+ (0): ConcatCustomPooling(
37
+ (model): BertModel(
38
+ (embeddings): BertEmbeddings(
39
+ (word_embeddings): Embedding(30522, 1024, padding_idx=0)
40
+ (position_embeddings): Embedding(512, 1024)
41
+ (token_type_embeddings): Embedding(2, 1024)
42
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
43
+ (dropout): Dropout(p=0.1, inplace=False)
44
+ )
45
+ (encoder): BertEncoder(
46
+ (layer): ModuleList(
47
+ (0): BertLayer(
48
+ (attention): BertAttention(
49
+ (self): BertSelfAttention(
50
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
51
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
52
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
53
+ (dropout): Dropout(p=0.1, inplace=False)
54
+ )
55
+ (output): BertSelfOutput(
56
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
57
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
58
+ (dropout): Dropout(p=0.1, inplace=False)
59
+ )
60
+ )
61
+ (intermediate): BertIntermediate(
62
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
63
+ (intermediate_act_fn): GELUActivation()
64
+ )
65
+ (output): BertOutput(
66
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
67
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
68
+ (dropout): Dropout(p=0.1, inplace=False)
69
+ )
70
+ )
71
+ (1): BertLayer(
72
+ (attention): BertAttention(
73
+ (self): BertSelfAttention(
74
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
75
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
76
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
77
+ (dropout): Dropout(p=0.1, inplace=False)
78
+ )
79
+ (output): BertSelfOutput(
80
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
81
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
82
+ (dropout): Dropout(p=0.1, inplace=False)
83
+ )
84
+ )
85
+ (intermediate): BertIntermediate(
86
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
87
+ (intermediate_act_fn): GELUActivation()
88
+ )
89
+ (output): BertOutput(
90
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
91
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
92
+ (dropout): Dropout(p=0.1, inplace=False)
93
+ )
94
+ )
95
+ (2): BertLayer(
96
+ (attention): BertAttention(
97
+ (self): BertSelfAttention(
98
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
99
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
100
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
101
+ (dropout): Dropout(p=0.1, inplace=False)
102
+ )
103
+ (output): BertSelfOutput(
104
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
105
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
106
+ (dropout): Dropout(p=0.1, inplace=False)
107
+ )
108
+ )
109
+ (intermediate): BertIntermediate(
110
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
111
+ (intermediate_act_fn): GELUActivation()
112
+ )
113
+ (output): BertOutput(
114
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
115
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
116
+ (dropout): Dropout(p=0.1, inplace=False)
117
+ )
118
+ )
119
+ (3): BertLayer(
120
+ (attention): BertAttention(
121
+ (self): BertSelfAttention(
122
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
123
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
124
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
125
+ (dropout): Dropout(p=0.1, inplace=False)
126
+ )
127
+ (output): BertSelfOutput(
128
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
129
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
130
+ (dropout): Dropout(p=0.1, inplace=False)
131
+ )
132
+ )
133
+ (intermediate): BertIntermediate(
134
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
135
+ (intermediate_act_fn): GELUActivation()
136
+ )
137
+ (output): BertOutput(
138
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
139
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ )
143
+ (4): BertLayer(
144
+ (attention): BertAttention(
145
+ (self): BertSelfAttention(
146
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
147
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
148
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
149
+ (dropout): Dropout(p=0.1, inplace=False)
150
+ )
151
+ (output): BertSelfOutput(
152
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
153
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
154
+ (dropout): Dropout(p=0.1, inplace=False)
155
+ )
156
+ )
157
+ (intermediate): BertIntermediate(
158
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
159
+ (intermediate_act_fn): GELUActivation()
160
+ )
161
+ (output): BertOutput(
162
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
163
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
164
+ (dropout): Dropout(p=0.1, inplace=False)
165
+ )
166
+ )
167
+ (5): BertLayer(
168
+ (attention): BertAttention(
169
+ (self): BertSelfAttention(
170
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
171
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
172
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
173
+ (dropout): Dropout(p=0.1, inplace=False)
174
+ )
175
+ (output): BertSelfOutput(
176
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
177
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
178
+ (dropout): Dropout(p=0.1, inplace=False)
179
+ )
180
+ )
181
+ (intermediate): BertIntermediate(
182
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
183
+ (intermediate_act_fn): GELUActivation()
184
+ )
185
+ (output): BertOutput(
186
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
187
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
188
+ (dropout): Dropout(p=0.1, inplace=False)
189
+ )
190
+ )
191
+ (6): BertLayer(
192
+ (attention): BertAttention(
193
+ (self): BertSelfAttention(
194
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
195
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
196
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
197
+ (dropout): Dropout(p=0.1, inplace=False)
198
+ )
199
+ (output): BertSelfOutput(
200
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
201
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
202
+ (dropout): Dropout(p=0.1, inplace=False)
203
+ )
204
+ )
205
+ (intermediate): BertIntermediate(
206
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
207
+ (intermediate_act_fn): GELUActivation()
208
+ )
209
+ (output): BertOutput(
210
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
211
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
212
+ (dropout): Dropout(p=0.1, inplace=False)
213
+ )
214
+ )
215
+ (7): BertLayer(
216
+ (attention): BertAttention(
217
+ (self): BertSelfAttention(
218
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
219
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
220
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
221
+ (dropout): Dropout(p=0.1, inplace=False)
222
+ )
223
+ (output): BertSelfOutput(
224
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
225
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
226
+ (dropout): Dropout(p=0.1, inplace=False)
227
+ )
228
+ )
229
+ (intermediate): BertIntermediate(
230
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
231
+ (intermediate_act_fn): GELUActivation()
232
+ )
233
+ (output): BertOutput(
234
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
235
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
236
+ (dropout): Dropout(p=0.1, inplace=False)
237
+ )
238
+ )
239
+ (8): BertLayer(
240
+ (attention): BertAttention(
241
+ (self): BertSelfAttention(
242
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
243
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
244
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
245
+ (dropout): Dropout(p=0.1, inplace=False)
246
+ )
247
+ (output): BertSelfOutput(
248
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
249
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
250
+ (dropout): Dropout(p=0.1, inplace=False)
251
+ )
252
+ )
253
+ (intermediate): BertIntermediate(
254
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
255
+ (intermediate_act_fn): GELUActivation()
256
+ )
257
+ (output): BertOutput(
258
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
259
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
260
+ (dropout): Dropout(p=0.1, inplace=False)
261
+ )
262
+ )
263
+ (9): BertLayer(
264
+ (attention): BertAttention(
265
+ (self): BertSelfAttention(
266
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
267
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
268
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
269
+ (dropout): Dropout(p=0.1, inplace=False)
270
+ )
271
+ (output): BertSelfOutput(
272
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
273
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
274
+ (dropout): Dropout(p=0.1, inplace=False)
275
+ )
276
+ )
277
+ (intermediate): BertIntermediate(
278
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
279
+ (intermediate_act_fn): GELUActivation()
280
+ )
281
+ (output): BertOutput(
282
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
283
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
284
+ (dropout): Dropout(p=0.1, inplace=False)
285
+ )
286
+ )
287
+ (10): BertLayer(
288
+ (attention): BertAttention(
289
+ (self): BertSelfAttention(
290
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
291
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
292
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
293
+ (dropout): Dropout(p=0.1, inplace=False)
294
+ )
295
+ (output): BertSelfOutput(
296
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
297
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
298
+ (dropout): Dropout(p=0.1, inplace=False)
299
+ )
300
+ )
301
+ (intermediate): BertIntermediate(
302
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
303
+ (intermediate_act_fn): GELUActivation()
304
+ )
305
+ (output): BertOutput(
306
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
307
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
308
+ (dropout): Dropout(p=0.1, inplace=False)
309
+ )
310
+ )
311
+ (11): BertLayer(
312
+ (attention): BertAttention(
313
+ (self): BertSelfAttention(
314
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
315
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
316
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
317
+ (dropout): Dropout(p=0.1, inplace=False)
318
+ )
319
+ (output): BertSelfOutput(
320
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
321
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
322
+ (dropout): Dropout(p=0.1, inplace=False)
323
+ )
324
+ )
325
+ (intermediate): BertIntermediate(
326
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
327
+ (intermediate_act_fn): GELUActivation()
328
+ )
329
+ (output): BertOutput(
330
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
331
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
332
+ (dropout): Dropout(p=0.1, inplace=False)
333
+ )
334
+ )
335
+ (12): BertLayer(
336
+ (attention): BertAttention(
337
+ (self): BertSelfAttention(
338
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
339
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
340
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
341
+ (dropout): Dropout(p=0.1, inplace=False)
342
+ )
343
+ (output): BertSelfOutput(
344
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
345
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
346
+ (dropout): Dropout(p=0.1, inplace=False)
347
+ )
348
+ )
349
+ (intermediate): BertIntermediate(
350
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
351
+ (intermediate_act_fn): GELUActivation()
352
+ )
353
+ (output): BertOutput(
354
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
355
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
356
+ (dropout): Dropout(p=0.1, inplace=False)
357
+ )
358
+ )
359
+ (13): BertLayer(
360
+ (attention): BertAttention(
361
+ (self): BertSelfAttention(
362
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
363
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
364
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
365
+ (dropout): Dropout(p=0.1, inplace=False)
366
+ )
367
+ (output): BertSelfOutput(
368
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
369
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
370
+ (dropout): Dropout(p=0.1, inplace=False)
371
+ )
372
+ )
373
+ (intermediate): BertIntermediate(
374
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
375
+ (intermediate_act_fn): GELUActivation()
376
+ )
377
+ (output): BertOutput(
378
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
379
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
380
+ (dropout): Dropout(p=0.1, inplace=False)
381
+ )
382
+ )
383
+ (14): BertLayer(
384
+ (attention): BertAttention(
385
+ (self): BertSelfAttention(
386
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
387
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
388
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
389
+ (dropout): Dropout(p=0.1, inplace=False)
390
+ )
391
+ (output): BertSelfOutput(
392
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
393
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
394
+ (dropout): Dropout(p=0.1, inplace=False)
395
+ )
396
+ )
397
+ (intermediate): BertIntermediate(
398
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
399
+ (intermediate_act_fn): GELUActivation()
400
+ )
401
+ (output): BertOutput(
402
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
403
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
404
+ (dropout): Dropout(p=0.1, inplace=False)
405
+ )
406
+ )
407
+ (15): BertLayer(
408
+ (attention): BertAttention(
409
+ (self): BertSelfAttention(
410
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
411
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
412
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
413
+ (dropout): Dropout(p=0.1, inplace=False)
414
+ )
415
+ (output): BertSelfOutput(
416
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
417
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
418
+ (dropout): Dropout(p=0.1, inplace=False)
419
+ )
420
+ )
421
+ (intermediate): BertIntermediate(
422
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
423
+ (intermediate_act_fn): GELUActivation()
424
+ )
425
+ (output): BertOutput(
426
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
427
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
428
+ (dropout): Dropout(p=0.1, inplace=False)
429
+ )
430
+ )
431
+ (16): BertLayer(
432
+ (attention): BertAttention(
433
+ (self): BertSelfAttention(
434
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
435
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
436
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
437
+ (dropout): Dropout(p=0.1, inplace=False)
438
+ )
439
+ (output): BertSelfOutput(
440
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
441
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
442
+ (dropout): Dropout(p=0.1, inplace=False)
443
+ )
444
+ )
445
+ (intermediate): BertIntermediate(
446
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
447
+ (intermediate_act_fn): GELUActivation()
448
+ )
449
+ (output): BertOutput(
450
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
451
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
452
+ (dropout): Dropout(p=0.1, inplace=False)
453
+ )
454
+ )
455
+ (17): BertLayer(
456
+ (attention): BertAttention(
457
+ (self): BertSelfAttention(
458
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
459
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
460
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
461
+ (dropout): Dropout(p=0.1, inplace=False)
462
+ )
463
+ (output): BertSelfOutput(
464
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
465
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
466
+ (dropout): Dropout(p=0.1, inplace=False)
467
+ )
468
+ )
469
+ (intermediate): BertIntermediate(
470
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
471
+ (intermediate_act_fn): GELUActivation()
472
+ )
473
+ (output): BertOutput(
474
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
475
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
476
+ (dropout): Dropout(p=0.1, inplace=False)
477
+ )
478
+ )
479
+ (18): BertLayer(
480
+ (attention): BertAttention(
481
+ (self): BertSelfAttention(
482
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
483
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
484
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
485
+ (dropout): Dropout(p=0.1, inplace=False)
486
+ )
487
+ (output): BertSelfOutput(
488
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
489
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
490
+ (dropout): Dropout(p=0.1, inplace=False)
491
+ )
492
+ )
493
+ (intermediate): BertIntermediate(
494
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
495
+ (intermediate_act_fn): GELUActivation()
496
+ )
497
+ (output): BertOutput(
498
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
499
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
500
+ (dropout): Dropout(p=0.1, inplace=False)
501
+ )
502
+ )
503
+ (19): BertLayer(
504
+ (attention): BertAttention(
505
+ (self): BertSelfAttention(
506
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
507
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
508
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
509
+ (dropout): Dropout(p=0.1, inplace=False)
510
+ )
511
+ (output): BertSelfOutput(
512
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
513
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
514
+ (dropout): Dropout(p=0.1, inplace=False)
515
+ )
516
+ )
517
+ (intermediate): BertIntermediate(
518
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
519
+ (intermediate_act_fn): GELUActivation()
520
+ )
521
+ (output): BertOutput(
522
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
523
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
524
+ (dropout): Dropout(p=0.1, inplace=False)
525
+ )
526
+ )
527
+ (20): BertLayer(
528
+ (attention): BertAttention(
529
+ (self): BertSelfAttention(
530
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
531
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
532
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
533
+ (dropout): Dropout(p=0.1, inplace=False)
534
+ )
535
+ (output): BertSelfOutput(
536
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
537
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
538
+ (dropout): Dropout(p=0.1, inplace=False)
539
+ )
540
+ )
541
+ (intermediate): BertIntermediate(
542
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
543
+ (intermediate_act_fn): GELUActivation()
544
+ )
545
+ (output): BertOutput(
546
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
547
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
548
+ (dropout): Dropout(p=0.1, inplace=False)
549
+ )
550
+ )
551
+ (21): BertLayer(
552
+ (attention): BertAttention(
553
+ (self): BertSelfAttention(
554
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
555
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
556
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
557
+ (dropout): Dropout(p=0.1, inplace=False)
558
+ )
559
+ (output): BertSelfOutput(
560
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
561
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
562
+ (dropout): Dropout(p=0.1, inplace=False)
563
+ )
564
+ )
565
+ (intermediate): BertIntermediate(
566
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
567
+ (intermediate_act_fn): GELUActivation()
568
+ )
569
+ (output): BertOutput(
570
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
571
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
572
+ (dropout): Dropout(p=0.1, inplace=False)
573
+ )
574
+ )
575
+ (22): BertLayer(
576
+ (attention): BertAttention(
577
+ (self): BertSelfAttention(
578
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
579
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
580
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
581
+ (dropout): Dropout(p=0.1, inplace=False)
582
+ )
583
+ (output): BertSelfOutput(
584
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
585
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
586
+ (dropout): Dropout(p=0.1, inplace=False)
587
+ )
588
+ )
589
+ (intermediate): BertIntermediate(
590
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
591
+ (intermediate_act_fn): GELUActivation()
592
+ )
593
+ (output): BertOutput(
594
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
595
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
596
+ (dropout): Dropout(p=0.1, inplace=False)
597
+ )
598
+ )
599
+ (23): BertLayer(
600
+ (attention): BertAttention(
601
+ (self): BertSelfAttention(
602
+ (query): Linear(in_features=1024, out_features=1024, bias=True)
603
+ (key): Linear(in_features=1024, out_features=1024, bias=True)
604
+ (value): Linear(in_features=1024, out_features=1024, bias=True)
605
+ (dropout): Dropout(p=0.1, inplace=False)
606
+ )
607
+ (output): BertSelfOutput(
608
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
609
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
610
+ (dropout): Dropout(p=0.1, inplace=False)
611
+ )
612
+ )
613
+ (intermediate): BertIntermediate(
614
+ (dense): Linear(in_features=1024, out_features=4096, bias=True)
615
+ (intermediate_act_fn): GELUActivation()
616
+ )
617
+ (output): BertOutput(
618
+ (dense): Linear(in_features=4096, out_features=1024, bias=True)
619
+ (LayerNorm): LayerNorm((1024,), eps=1e-12, elementwise_affine=True)
620
+ (dropout): Dropout(p=0.1, inplace=False)
621
+ )
622
+ )
623
+ )
624
+ )
625
+ (pooler): BertPooler(
626
+ (dense): Linear(in_features=1024, out_features=1024, bias=True)
627
+ (activation): Tanh()
628
+ )
629
+ )
630
+ )
631
+ )
632
+ ```
633
+
634
+ ## Usage
635
+
636
+ ### Direct Usage (Sentence Transformers)
637
+
638
+ First install the Sentence Transformers library:
639
+
640
+ ```bash
641
+ pip install -U sentence-transformers
642
+ ```
643
+
644
+ Then you can load this model and run inference.
645
+ ```python
646
+ from sentence_transformers import SentenceTransformer
647
+
648
+ # Download from the 🤗 Hub
649
+ model = SentenceTransformer("Tomor0720/bge_large_en_v1.5_custom_pooling")
650
+ # Run inference
651
+ sentences = [
652
+ 'The weather is lovely today.',
653
+ "It's so sunny outside!",
654
+ 'He drove to the stadium.',
655
+ ]
656
+ embeddings = model.encode(sentences)
657
+ print(embeddings.shape)
658
+ # [3, 1024]
659
+
660
+ # Get the similarity scores for the embeddings
661
+ similarities = model.similarity(embeddings, embeddings)
662
+ print(similarities.shape)
663
+ # [3, 3]
664
+ ```
665
+
666
+ <!--
667
+ ### Direct Usage (Transformers)
668
+
669
+ <details><summary>Click to see the direct usage in Transformers</summary>
670
+
671
+ </details>
672
+ -->
673
+
674
+ <!--
675
+ ### Downstream Usage (Sentence Transformers)
676
+
677
+ You can finetune this model on your own dataset.
678
+
679
+ <details><summary>Click to expand</summary>
680
+
681
+ </details>
682
+ -->
683
+
684
+ <!--
685
+ ### Out-of-Scope Use
686
+
687
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
688
+ -->
689
+
690
+ <!--
691
+ ## Bias, Risks and Limitations
692
+
693
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
694
+ -->
695
+
696
+ <!--
697
+ ### Recommendations
698
+
699
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
700
+ -->
701
+
702
+ ## Training Details
703
+
704
+ ### Framework Versions
705
+ - Python: 3.9.18
706
+ - Sentence Transformers: 3.1.1
707
+ - Transformers: 4.45.1
708
+ - PyTorch: 1.13.0+cu117
709
+ - Accelerate: 0.20.3
710
+ - Datasets: 2.13.0
711
+ - Tokenizers: 0.20.0
712
+
713
+ ## Citation
714
+
715
+ ### BibTeX
716
+
717
+ <!--
718
+ ## Glossary
719
+
720
+ *Clearly define terms in order to be accessible across audiences.*
721
+ -->
722
+
723
+ <!--
724
+ ## Model Card Authors
725
+
726
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
727
+ -->
728
+
729
+ <!--
730
+ ## Model Card Contact
731
+
732
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
733
+ -->
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_name": "BAAI/bge-large-en-v1.5",
3
+ "layers": [
4
+ 0,
5
+ 1,
6
+ 2,
7
+ 3,
8
+ 4,
9
+ 5,
10
+ 6,
11
+ 7,
12
+ 8,
13
+ 9,
14
+ 10,
15
+ 11,
16
+ 12,
17
+ 13,
18
+ 14,
19
+ 15,
20
+ 16,
21
+ 17,
22
+ 18,
23
+ 19,
24
+ 20,
25
+ 21,
26
+ 22,
27
+ 23
28
+ ],
29
+ "max_seq_len": 512
30
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.45.1",
5
+ "pytorch": "1.13.0+cu117"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
modules.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "__main__.ConcatCustomPooling"
7
+ }
8
+ ]