yaniseuranova commited on
Commit
7f18162
1 Parent(s): a16f48d

Add SetFit model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
- "word_embedding_dimension": 768,
3
- "pooling_mode_cls_token": false,
4
- "pooling_mode_mean_tokens": true,
5
  "pooling_mode_max_tokens": false,
6
  "pooling_mode_mean_sqrt_len_tokens": false,
7
  "pooling_mode_weightedmean_tokens": false,
 
1
  {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
  "pooling_mode_max_tokens": false,
6
  "pooling_mode_mean_sqrt_len_tokens": false,
7
  "pooling_mode_weightedmean_tokens": false,
README.md CHANGED
@@ -5,26 +5,24 @@ tags:
5
  - sentence-transformers
6
  - text-classification
7
  - generated_from_setfit_trainer
8
- base_model: sentence-transformers/all-mpnet-base-v2
9
  metrics:
10
  - accuracy
11
  widget:
12
- - text: "What is the difference between ACID compliance and BASE topology in distributed\
13
- \ database systems?\n While designing a high throughput RDBMS like\
14
- \ Amazon Aurora, how would you choose between consistency model that is highly\
15
- \ optimistic and one that highly paranoid in conflictedupdate scenarios"
16
- - text: What is the primary function of the Apache Hive metastore in a Hadoop ecosystem,
17
- and how does it differ from a traditional relational database management system?
18
- - text: What is the primary purpose of employing Entity-Controlled Vocabulary (ESS)
19
- in open data publishing, according to industry experts at Microsoft?
20
- - text: How do organizations prioritize innovation to strive in a rapidly changing
21
- industry landscape driven by technological and societal shifts?
22
- - text: How do societal norms influence the emergence of new business models in unstable
23
- economies?
24
  pipeline_tag: text-classification
25
  inference: true
26
  model-index:
27
- - name: SetFit with sentence-transformers/all-mpnet-base-v2
28
  results:
29
  - task:
30
  type: text-classification
@@ -39,9 +37,9 @@ model-index:
39
  name: Accuracy
40
  ---
41
 
42
- # SetFit with sentence-transformers/all-mpnet-base-v2
43
 
44
- This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
45
 
46
  The model has been trained using an efficient few-shot learning technique that involves:
47
 
@@ -52,9 +50,9 @@ The model has been trained using an efficient few-shot learning technique that i
52
 
53
  ### Model Description
54
  - **Model Type:** SetFit
55
- - **Sentence Transformer body:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
56
  - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
57
- - **Maximum Sequence Length:** 384 tokens
58
  - **Number of Classes:** 2 classes
59
  <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
60
  <!-- - **Language:** Unknown -->
@@ -67,10 +65,10 @@ The model has been trained using an efficient few-shot learning technique that i
67
  - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
68
 
69
  ### Model Labels
70
- | Label | Examples |
71
- |:---------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
72
- | semantic | <ul><li>'How do artificial intelligence systems navigate the trade-off between simplicity and accuracy when modeling complex real-world phenomena?'</li><li>'How do complex systems, consisting of many interconnected components, give rise to emergent properties that cannot be predicted from the characteristics of their individual parts?'</li><li>'How do complex systems, such as those found in nature and human societies, exhibit emergent properties that arise from the interactions of individual components?'</li></ul> |
73
- | lexical | <ul><li>'What is the primary difference between a generative adversarial network (GAN) and a variational autoencoder (VAE) in deep learning?'</li><li>'What is the primary difference between a Decision Tree and a Random Forest in Machine Learning, and how do they alleviate overfitting?'</li><li>'What is the primary difference between a Bayesian neural network and a traditional feedforward neural network in the context of machine learning?'</li></ul> |
74
 
75
  ## Evaluation
76
 
@@ -97,7 +95,7 @@ from setfit import SetFitModel
97
  # Download from the 🤗 Hub
98
  model = SetFitModel.from_pretrained("yaniseuranova/setfit-paraphrase-mpnet-base-v2-sst2")
99
  # Run inference
100
- preds = model("How do societal norms influence the emergence of new business models in unstable economies?")
101
  ```
102
 
103
  <!--
@@ -129,12 +127,12 @@ preds = model("How do societal norms influence the emergence of new business mod
129
  ### Training Set Metrics
130
  | Training set | Min | Median | Max |
131
  |:-------------|:----|:--------|:----|
132
- | Word count | 4 | 18.6566 | 56 |
133
 
134
  | Label | Training Sample Count |
135
  |:---------|:----------------------|
136
- | lexical | 47 |
137
- | semantic | 52 |
138
 
139
  ### Training Hyperparameters
140
  - batch_size: (16, 16)
@@ -156,36 +154,27 @@ preds = model("How do societal norms influence the emergence of new business mod
156
  ### Training Results
157
  | Epoch | Step | Training Loss | Validation Loss |
158
  |:-------:|:-------:|:-------------:|:---------------:|
159
- | 0.0032 | 1 | 0.3177 | - |
160
- | 0.1592 | 50 | 0.0905 | - |
161
- | 0.3185 | 100 | 0.0013 | - |
162
- | 0.4777 | 150 | 0.0011 | - |
163
- | 0.6369 | 200 | 0.0002 | - |
164
- | 0.7962 | 250 | 0.0003 | - |
165
- | 0.9554 | 300 | 0.0001 | - |
166
- | 1.0 | 314 | - | 0.0001 |
167
- | 1.1146 | 350 | 0.0001 | - |
168
- | 1.2739 | 400 | 0.0001 | - |
169
- | 1.4331 | 450 | 0.0001 | - |
170
- | 1.5924 | 500 | 0.0001 | - |
171
- | 1.7516 | 550 | 0.0001 | - |
172
- | 1.9108 | 600 | 0.0001 | - |
173
- | 2.0 | 628 | - | 0.0 |
174
- | 2.0701 | 650 | 0.0001 | - |
175
- | 2.2293 | 700 | 0.0 | - |
176
- | 2.3885 | 750 | 0.0001 | - |
177
- | 2.5478 | 800 | 0.0 | - |
178
- | 2.7070 | 850 | 0.0001 | - |
179
- | 2.8662 | 900 | 0.0001 | - |
180
- | **3.0** | **942** | **-** | **0.0** |
181
- | 3.0255 | 950 | 0.0001 | - |
182
- | 3.1847 | 1000 | 0.0 | - |
183
- | 3.3439 | 1050 | 0.0001 | - |
184
- | 3.5032 | 1100 | 0.0001 | - |
185
- | 3.6624 | 1150 | 0.0 | - |
186
- | 3.8217 | 1200 | 0.0001 | - |
187
- | 3.9809 | 1250 | 0.0001 | - |
188
- | 4.0 | 1256 | - | 0.0 |
189
 
190
  * The bold row denotes the saved checkpoint.
191
  ### Framework Versions
 
5
  - sentence-transformers
6
  - text-classification
7
  - generated_from_setfit_trainer
8
+ base_model: BAAI/bge-m3
9
  metrics:
10
  - accuracy
11
  widget:
12
+ - text: What is the primary difference between a Bayesian neural network and a traditional
13
+ feedforward neural network in the context of machine learning?
14
+ - text: What is the difference betweensupervised and unsupervised machine learning
15
+ algorithms in terms of data labeling and model training?
16
+ - text: What is the primary application of Natural Language Processing (NLP) in Google's
17
+ BERT language model, and how does it utilize masked language modeling to improve
18
+ contextual understanding?
19
+ - text: What is the main advantage of using GraphQL over traditional RESTful APIs,
20
+ as demonstrated by social media giant Facebook in their Facebook ADS API?
21
+ - text: Qui est Robin Mancini ?
 
 
22
  pipeline_tag: text-classification
23
  inference: true
24
  model-index:
25
+ - name: SetFit with BAAI/bge-m3
26
  results:
27
  - task:
28
  type: text-classification
 
37
  name: Accuracy
38
  ---
39
 
40
+ # SetFit with BAAI/bge-m3
41
 
42
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
43
 
44
  The model has been trained using an efficient few-shot learning technique that involves:
45
 
 
50
 
51
  ### Model Description
52
  - **Model Type:** SetFit
53
+ - **Sentence Transformer body:** [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3)
54
  - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
55
+ - **Maximum Sequence Length:** 8192 tokens
56
  - **Number of Classes:** 2 classes
57
  <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
58
  <!-- - **Language:** Unknown -->
 
65
  - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
66
 
67
  ### Model Labels
68
+ | Label | Examples |
69
+ |:---------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
70
+ | lexical | <ul><li>'What is the definition of semantics in the context ofontology-based data integration, and how does it differ from outright data normalization, as implementented in graph databases like neo4j orAmazon Neptune?'</li><li>'What is the primary application of graph convolutional neural networks (GCNNs) in natural language processing (NLP) for modeling syntactic dependencies in parsing?'</li><li>"What is the distinguising feature of Apache Hive's Metadata Tables, used for maintaining and managingtables in Hadoop Distributed File System (HDFS)?"</li></ul> |
71
+ | semantic | <ul><li>'What is a key challenge faced by managers in sustaining a work culture that encourages creativity, innovation, and critical thinking within the technological industry globally?'</li><li>'How might shifting societal values influence the dynamics between multinational corporations and governments, leading to Changes in the global economic landscape?'</li><li>'How does the allocation of limited resources affect the allocation of decision-making power within an organization?'</li></ul> |
72
 
73
  ## Evaluation
74
 
 
95
  # Download from the 🤗 Hub
96
  model = SetFitModel.from_pretrained("yaniseuranova/setfit-paraphrase-mpnet-base-v2-sst2")
97
  # Run inference
98
+ preds = model("Qui est Robin Mancini ?")
99
  ```
100
 
101
  <!--
 
127
  ### Training Set Metrics
128
  | Training set | Min | Median | Max |
129
  |:-------------|:----|:--------|:----|
130
+ | Word count | 4 | 19.1392 | 56 |
131
 
132
  | Label | Training Sample Count |
133
  |:---------|:----------------------|
134
+ | lexical | 36 |
135
+ | semantic | 43 |
136
 
137
  ### Training Hyperparameters
138
  - batch_size: (16, 16)
 
154
  ### Training Results
155
  | Epoch | Step | Training Loss | Validation Loss |
156
  |:-------:|:-------:|:-------------:|:---------------:|
157
+ | 0.0050 | 1 | 0.1549 | - |
158
+ | 0.2475 | 50 | 0.0045 | - |
159
+ | 0.4950 | 100 | 0.0009 | - |
160
+ | 0.7426 | 150 | 0.0005 | - |
161
+ | 0.9901 | 200 | 0.0005 | - |
162
+ | 1.0 | 202 | - | 0.0001 |
163
+ | 1.2376 | 250 | 0.0006 | - |
164
+ | 1.4851 | 300 | 0.0006 | - |
165
+ | 1.7327 | 350 | 0.0005 | - |
166
+ | 1.9802 | 400 | 0.0004 | - |
167
+ | 2.0 | 404 | - | 0.0 |
168
+ | 2.2277 | 450 | 0.0003 | - |
169
+ | 2.4752 | 500 | 0.0003 | - |
170
+ | 2.7228 | 550 | 0.0003 | - |
171
+ | 2.9703 | 600 | 0.0003 | - |
172
+ | **3.0** | **606** | **-** | **0.0** |
173
+ | 3.2178 | 650 | 0.0003 | - |
174
+ | 3.4653 | 700 | 0.0004 | - |
175
+ | 3.7129 | 750 | 0.0003 | - |
176
+ | 3.9604 | 800 | 0.0002 | - |
177
+ | 4.0 | 808 | - | 0.0 |
 
 
 
 
 
 
 
 
 
178
 
179
  * The bold row denotes the saved checkpoint.
180
  ### Framework Versions
config.json CHANGED
@@ -1,24 +1,28 @@
1
  {
2
- "_name_or_path": "checkpoints/step_942",
3
  "architectures": [
4
- "MPNetModel"
5
  ],
6
  "attention_probs_dropout_prob": 0.1,
7
  "bos_token_id": 0,
 
8
  "eos_token_id": 2,
9
  "hidden_act": "gelu",
10
  "hidden_dropout_prob": 0.1,
11
- "hidden_size": 768,
12
  "initializer_range": 0.02,
13
- "intermediate_size": 3072,
14
  "layer_norm_eps": 1e-05,
15
- "max_position_embeddings": 514,
16
- "model_type": "mpnet",
17
- "num_attention_heads": 12,
18
- "num_hidden_layers": 12,
 
19
  "pad_token_id": 1,
20
- "relative_attention_num_buckets": 32,
21
  "torch_dtype": "float32",
22
  "transformers_version": "4.39.0",
23
- "vocab_size": 30527
 
 
24
  }
 
1
  {
2
+ "_name_or_path": "checkpoints/step_606",
3
  "architectures": [
4
+ "XLMRobertaModel"
5
  ],
6
  "attention_probs_dropout_prob": 0.1,
7
  "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
  "eos_token_id": 2,
10
  "hidden_act": "gelu",
11
  "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 1024,
13
  "initializer_range": 0.02,
14
+ "intermediate_size": 4096,
15
  "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 8194,
17
+ "model_type": "xlm-roberta",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 24,
20
+ "output_past": true,
21
  "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
  "torch_dtype": "float32",
24
  "transformers_version": "4.39.0",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 250002
28
  }
config_sentence_transformers.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "__version__": {
3
- "sentence_transformers": "2.0.0",
4
- "transformers": "4.6.1",
5
- "pytorch": "1.8.1"
6
  },
7
  "prompts": {},
8
  "default_prompt_name": null
 
1
  {
2
  "__version__": {
3
+ "sentence_transformers": "2.2.2",
4
+ "transformers": "4.33.0",
5
+ "pytorch": "2.1.2+cu121"
6
  },
7
  "prompts": {},
8
  "default_prompt_name": null
config_setfit.json CHANGED
@@ -1,7 +1,7 @@
1
  {
 
2
  "labels": [
3
  "lexical",
4
  "semantic"
5
- ],
6
- "normalize_embeddings": false
7
  }
 
1
  {
2
+ "normalize_embeddings": false,
3
  "labels": [
4
  "lexical",
5
  "semantic"
6
+ ]
 
7
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:dbf6d3930a01c83d13c8e9917fadace89ffe82b20e0e8e1e79879f17c3f3ffea
3
- size 437967672
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6b631443242ee9cb7eaa44b335a5b8a0932d0f7730c1e523a2972f095dd5fe6
3
+ size 2271064456
model_head.pkl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3eb7a8f7e177ed92e49ddc0fd6d56cff9748420121ded82c0713ee52aa1e2294
3
- size 7039
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f26505603a392dfebb8ca914a16aa7e94aeeb8b35f89376e80a56616a8b08a4
3
+ size 9087
sentence_bert_config.json CHANGED
@@ -1,4 +1,4 @@
1
  {
2
- "max_seq_length": 384,
3
  "do_lower_case": false
4
  }
 
1
  {
2
+ "max_seq_length": 8192,
3
  "do_lower_case": false
4
  }
special_tokens_map.json CHANGED
@@ -42,7 +42,7 @@
42
  "single_word": false
43
  },
44
  "unk_token": {
45
- "content": "[UNK]",
46
  "lstrip": false,
47
  "normalized": false,
48
  "rstrip": false,
 
42
  "single_word": false
43
  },
44
  "unk_token": {
45
+ "content": "<unk>",
46
  "lstrip": false,
47
  "normalized": false,
48
  "rstrip": false,
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -27,20 +27,12 @@
27
  "3": {
28
  "content": "<unk>",
29
  "lstrip": false,
30
- "normalized": true,
31
- "rstrip": false,
32
- "single_word": false,
33
- "special": true
34
- },
35
- "104": {
36
- "content": "[UNK]",
37
- "lstrip": false,
38
  "normalized": false,
39
  "rstrip": false,
40
  "single_word": false,
41
  "special": true
42
  },
43
- "30526": {
44
  "content": "<mask>",
45
  "lstrip": true,
46
  "normalized": false,
@@ -52,21 +44,19 @@
52
  "bos_token": "<s>",
53
  "clean_up_tokenization_spaces": true,
54
  "cls_token": "<s>",
55
- "do_lower_case": true,
56
  "eos_token": "</s>",
57
  "mask_token": "<mask>",
58
- "max_length": 128,
59
- "model_max_length": 512,
60
  "pad_to_multiple_of": null,
61
  "pad_token": "<pad>",
62
  "pad_token_type_id": 0,
63
  "padding_side": "right",
64
  "sep_token": "</s>",
 
65
  "stride": 0,
66
- "strip_accents": null,
67
- "tokenize_chinese_chars": true,
68
- "tokenizer_class": "MPNetTokenizer",
69
  "truncation_side": "right",
70
  "truncation_strategy": "longest_first",
71
- "unk_token": "[UNK]"
72
  }
 
27
  "3": {
28
  "content": "<unk>",
29
  "lstrip": false,
 
 
 
 
 
 
 
 
30
  "normalized": false,
31
  "rstrip": false,
32
  "single_word": false,
33
  "special": true
34
  },
35
+ "250001": {
36
  "content": "<mask>",
37
  "lstrip": true,
38
  "normalized": false,
 
44
  "bos_token": "<s>",
45
  "clean_up_tokenization_spaces": true,
46
  "cls_token": "<s>",
 
47
  "eos_token": "</s>",
48
  "mask_token": "<mask>",
49
+ "max_length": 8192,
50
+ "model_max_length": 8192,
51
  "pad_to_multiple_of": null,
52
  "pad_token": "<pad>",
53
  "pad_token_type_id": 0,
54
  "padding_side": "right",
55
  "sep_token": "</s>",
56
+ "sp_model_kwargs": {},
57
  "stride": 0,
58
+ "tokenizer_class": "XLMRobertaTokenizer",
 
 
59
  "truncation_side": "right",
60
  "truncation_strategy": "longest_first",
61
+ "unk_token": "<unk>"
62
  }