nickprock commited on
Commit
b09bab8
1 Parent(s): 026a6b0

Upload 11 files

Browse files
1_Pooling/config.json CHANGED
@@ -3,5 +3,7 @@
3
  "pooling_mode_cls_token": false,
4
  "pooling_mode_mean_tokens": true,
5
  "pooling_mode_max_tokens": false,
6
- "pooling_mode_mean_sqrt_len_tokens": false
 
 
7
  }
 
3
  "pooling_mode_cls_token": false,
4
  "pooling_mode_mean_tokens": true,
5
  "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false
9
  }
README.md CHANGED
@@ -1,24 +1,18 @@
1
  ---
 
2
  pipeline_tag: sentence-similarity
3
  tags:
4
  - sentence-transformers
5
  - feature-extraction
6
  - sentence-similarity
7
  - transformers
8
- license: mit
9
- datasets:
10
- - stsb_multi_mt
11
- - unicamp-dl/mmarco
12
- language:
13
- - it
14
- library_name: sentence-transformers
15
  ---
16
 
17
- # {multi-sentence-BERTino}
18
 
19
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
20
 
21
- This model is trained from [indigo-ai/BERTino](https://huggingface.co/indigo-ai/BERTino) using [mmarco italian](https://huggingface.co/datasets/unicamp-dl/mmarco) (200K) and [stsb italian](https://huggingface.co/datasets/stsb_multi_mt).
22
  <!--- Describe your model here -->
23
 
24
  ## Usage (Sentence-Transformers)
@@ -33,12 +27,11 @@ Then you can use the model like this:
33
 
34
  ```python
35
  from sentence_transformers import SentenceTransformer
36
- sentences = ["Una ragazza si acconcia i capelli.", "Una ragazza si sta spazzolando i capelli."]
37
 
38
- model = SentenceTransformer('nickprock/multi-sentence-BERTino')
39
  embeddings = model.encode(sentences)
40
  print(embeddings)
41
-
42
  ```
43
 
44
 
@@ -59,11 +52,11 @@ def mean_pooling(model_output, attention_mask):
59
 
60
 
61
  # Sentences we want sentence embeddings for
62
- sentences = ['Una ragazza si acconcia i capelli.', 'Una ragazza si sta spazzolando i capelli.']
63
 
64
  # Load model from HuggingFace Hub
65
- tokenizer = AutoTokenizer.from_pretrained('nickprock/multi-sentence-BERTino')
66
- model = AutoModel.from_pretrained('nickprock/multi-sentence-BERTino')
67
 
68
  # Tokenize sentences
69
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
@@ -77,7 +70,6 @@ sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask']
77
 
78
  print("Sentence embeddings:")
79
  print(sentence_embeddings)
80
-
81
  ```
82
 
83
 
@@ -94,7 +86,7 @@ The model was trained with the parameters:
94
 
95
  **DataLoader**:
96
 
97
- `torch.utils.data.dataloader.DataLoader` of length 15625 with parameters:
98
  ```
99
  {'batch_size': 16, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
100
  ```
@@ -117,6 +109,20 @@ The model was trained with the parameters:
117
 
118
  `sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss`
119
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  Parameters of the fit()-Method:
121
  ```
122
  {
@@ -140,7 +146,7 @@ Parameters of the fit()-Method:
140
  ```
141
  SentenceTransformer(
142
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel
143
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
144
  )
145
  ```
146
 
 
1
  ---
2
+ library_name: sentence-transformers
3
  pipeline_tag: sentence-similarity
4
  tags:
5
  - sentence-transformers
6
  - feature-extraction
7
  - sentence-similarity
8
  - transformers
9
+
 
 
 
 
 
 
10
  ---
11
 
12
+ # {MODEL_NAME}
13
 
14
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
15
 
 
16
  <!--- Describe your model here -->
17
 
18
  ## Usage (Sentence-Transformers)
 
27
 
28
  ```python
29
  from sentence_transformers import SentenceTransformer
30
+ sentences = ["This is an example sentence", "Each sentence is converted"]
31
 
32
+ model = SentenceTransformer('{MODEL_NAME}')
33
  embeddings = model.encode(sentences)
34
  print(embeddings)
 
35
  ```
36
 
37
 
 
52
 
53
 
54
  # Sentences we want sentence embeddings for
55
+ sentences = ['This is an example sentence', 'Each sentence is converted']
56
 
57
  # Load model from HuggingFace Hub
58
+ tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
59
+ model = AutoModel.from_pretrained('{MODEL_NAME}')
60
 
61
  # Tokenize sentences
62
  encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
 
70
 
71
  print("Sentence embeddings:")
72
  print(sentence_embeddings)
 
73
  ```
74
 
75
 
 
86
 
87
  **DataLoader**:
88
 
89
+ `torch.utils.data.dataloader.DataLoader` of length 31250 with parameters:
90
  ```
91
  {'batch_size': 16, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
92
  ```
 
109
 
110
  `sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss`
111
 
112
+ **DataLoader**:
113
+
114
+ `torch.utils.data.dataloader.DataLoader` of length 31250 with parameters:
115
+ ```
116
+ {'batch_size': 16, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
117
+ ```
118
+
119
+ **Loss**:
120
+
121
+ `sentence_transformers.losses.CachedMultipleNegativesRankingLoss.CachedMultipleNegativesRankingLoss` with parameters:
122
+ ```
123
+ {'scale': 20.0, 'similarity_fct': 'cos_sim'}
124
+ ```
125
+
126
  Parameters of the fit()-Method:
127
  ```
128
  {
 
146
  ```
147
  SentenceTransformer(
148
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel
149
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
150
  )
151
  ```
152
 
config_sentence_transformers.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "__version__": {
3
- "sentence_transformers": "2.2.2",
4
  "transformers": "4.36.0",
5
  "pytorch": "2.0.0"
6
  }
 
1
  {
2
  "__version__": {
3
+ "sentence_transformers": "2.3.1",
4
  "transformers": "4.36.0",
5
  "pytorch": "2.0.0"
6
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:305fe04c3b7865015153d6ee7b4c49651b5986658c386bd4e305762cd07ea05e
3
  size 270316376
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97427de98b80da98b373899f7fe3e6d5de04f987ec0cc3ad88fb0d4e7d772f03
3
  size 270316376