Add SetFit model

Browse files

Files changed (13) hide show

1_Pooling/config.json +9 -0
README.md +225 -0
config.json +65 -0
config_sentence_transformers.json +7 -0
config_setfit.json +9 -0
model.safetensors +3 -0
model_head.pkl +3 -0
modules.json +14 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +51 -0
tokenizer.json +0 -0
tokenizer_config.json +73 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "word_embedding_dimension": 768,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false
+}

README.md ADDED Viewed

	@@ -0,0 +1,225 @@

+---
+library_name: setfit
+tags:
+- setfit
+- sentence-transformers
+- text-classification
+- generated_from_setfit_trainer
+metrics:
+- accuracy
+widget:
+- text: Specific information applicable to Parties, including regional economic integration
+    organizations and their member States, that have reached an agreement to act jointly
+    under Article 4, paragraph 2, of the Paris Agreement, including the Parties that
+    agreed to act jointly and the terms of the agreement, in accordance with Article
+    4, paragraphs 16–18, of the Paris Agreement. Not applicable. (c). How the Party’s
+    preparation of its nationally determined contribution has been informed by the
+    outcomes of the global stocktake, in accordance with Article 4, paragraph 9, of
+    the Paris Agreement.
+- text: 'In the shipping and aviation sectors, emission reduction efforts will be
+    focused on distributing eco-friendly ships and enhancing the operational efficiency
+    of aircraft. Agriculture, livestock farming and fisheries: The Republic Korea
+    is introducing various options to accelerate low-carbon farming, for instance,
+    improving irrigation techniques in rice paddies and adopting low-input systems
+    for nitrogen fertilizers.'
+- text: As part of this commitment, Oman s upstream oil and gas industry is developing
+    economically viable solutions to phase out routine flaring as quickly as possible
+    and ahead of the World Bank s target date. IV. Climate Preparedness and Resilience.
+    The Sultanate of Oman has stepped up its efforts in advancing its expertise and
+    methodologies to better manage the climate change risks over the past five years.
+    The adaptation efforts are underway, and the status of adaptation planning is
+    still at a nascent stage.
+- text: 'Synergy and coherence 46 VII- Gender and youth 46 VIII- Education and employment
+    48 ANNEXES. 49 Annex No. 1: Details of mitigation measures, conditional and non-conditional,
+    by sector 49 Annex No.2: List of adaptation actions proposed by sectors. 57 Annex
+    No.3: GCF project portfolio. 63 CONTRIBUTION DENTERMINEE AT NATIONAL LEVEL CDN
+    MAURITANIE LIST OF TABLES Table 1: Summary of funding needs for the CND 2021-2030
+    updated. 12 Table 2: CND 2021-2030 mitigation measures updated by sector (cumulative
+    cost and reduction potential for the period). 14 Table 3: CND 2021-2030 adaptation
+    measures updated by sector. Error!'
+- text: In the transport sector, restructuing is planned through a number of large
+    infrastructure initiatives aiming to revive the role of public transport and achieving
+    a relevant share of fuel efficient vehicles. Under both the conditional and unconditional
+    mitigation scenarios, Lebanon will achieve sizeable emission reductions. With
+    regards to adaptation, Lebanon has planned comprehensive sectoral actions related
+    to water, agriculture/forestry and biodiversity, for example related to irrigation,
+    forest management, etc. It also continues developing adaptation strategies in
+    the remaining sectors.
+pipeline_tag: text-classification
+inference: false
+co2_eq_emissions:
+  emissions: 25.8151164022705
+  source: codecarbon
+  training_type: fine-tuning
+  on_cloud: false
+  cpu_model: Intel(R) Xeon(R) CPU @ 2.00GHz
+  ram_total_size: 12.674781799316406
+  hours_used: 0.622
+  hardware_used: 1 x Tesla T4
+base_model: ppsingh/SECTOR-multilabel-mpnet_w
+---
+# SetFit with ppsingh/SECTOR-multilabel-mpnet_w
+This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [ppsingh/SECTOR-multilabel-mpnet_w](https://huggingface.co/ppsingh/SECTOR-multilabel-mpnet_w) as the Sentence Transformer embedding model. A [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance is used for classification.
+The model has been trained using an efficient few-shot learning technique that involves:
+1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
+2. Training a classification head with features from the fine-tuned Sentence Transformer.
+## Model Details
+### Model Description
+- **Model Type:** SetFit
+- **Sentence Transformer body:** [ppsingh/SECTOR-multilabel-mpnet_w](https://huggingface.co/ppsingh/SECTOR-multilabel-mpnet_w)
+- **Classification head:** a [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance
+- **Maximum Sequence Length:** 512 tokens
+- **Number of Classes:** 4 classes
+<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
+- **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
+- **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
+## Uses
+### Direct Use for Inference
+First install the SetFit library:
+```bash
+pip install setfit
+```
+Then you can load this model and run inference.
+```python
+from setfit import SetFitModel
+# Download from the 🤗 Hub
+model = SetFitModel.from_pretrained("ppsingh/iki_sector_setfit")
+# Run inference
+preds = model("In the shipping and aviation sectors, emission reduction efforts will be focused on distributing eco-friendly ships and enhancing the operational efficiency of aircraft. Agriculture, livestock farming and fisheries: The Republic Korea is introducing various options to accelerate low-carbon farming, for instance, improving irrigation techniques in rice paddies and adopting low-input systems for nitrogen fertilizers.")
+```
+<!--
+### Downstream Use
+*List how someone could finetune this model on their own dataset.*
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Set Metrics
+| Training set | Min | Median | Max |
+|:-------------|:----|:-------|:----|
+| Word count   | 35  | 76.164 | 170 |
+### Training Hyperparameters
+- batch_size: (16, 2)
+- num_epochs: (1, 0)
+- max_steps: -1
+- sampling_strategy: oversampling
+- body_learning_rate: (2e-05, 1e-05)
+- head_learning_rate: 0.01
+- loss: CosineSimilarityLoss
+- distance_metric: cosine_distance
+- margin: 0.25
+- end_to_end: False
+- use_amp: False
+- warmup_proportion: 0.01
+- seed: 42
+- eval_max_steps: -1
+- load_best_model_at_end: False
+### Training Results
+| Epoch  | Step | Training Loss | Validation Loss |
+|:------:|:----:|:-------------:|:---------------:|
+| 0.0005 | 1    | 0.2029        | -               |
+| 0.0993 | 200  | 0.0111        | 0.1124          |
+| 0.1985 | 400  | 0.0063        | 0.111           |
+| 0.2978 | 600  | 0.0183        | 0.1214          |
+| 0.3970 | 800  | 0.0197        | 0.1248          |
+| 0.4963 | 1000 | 0.0387        | 0.1339          |
+| 0.5955 | 1200 | 0.0026        | 0.1181          |
+| 0.6948 | 1400 | 0.0378        | 0.1208          |
+| 0.7940 | 1600 | 0.0285        | 0.1267          |
+| 0.8933 | 1800 | 0.0129        | 0.1254          |
+| 0.9926 | 2000 | 0.0341        | 0.1271          |
+### Environmental Impact
+Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
+- **Carbon Emitted**: 0.026 kg of CO2
+- **Hours Used**: 0.622 hours
+### Training Hardware
+- **On Cloud**: No
+- **GPU Model**: 1 x Tesla T4
+- **CPU Model**: Intel(R) Xeon(R) CPU @ 2.00GHz
+- **RAM Size**: 12.67 GB
+### Framework Versions
+- Python: 3.10.12
+- SetFit: 1.0.3
+- Sentence Transformers: 2.3.1
+- Transformers: 4.35.2
+- PyTorch: 2.1.0+cu121
+- Datasets: 2.3.0
+- Tokenizers: 0.15.1
+## Citation
+### BibTeX
+```bibtex
+@article{https://doi.org/10.48550/arxiv.2209.11055,
+    doi = {10.48550/ARXIV.2209.11055},
+    url = {https://arxiv.org/abs/2209.11055},
+    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
+    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
+    title = {Efficient Few-Shot Learning Without Prompts},
+    publisher = {arXiv},
+    year = {2022},
+    copyright = {Creative Commons Attribution 4.0 International}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,65 @@

+{
+  "_name_or_path": "ppsingh/SECTOR-multilabel-mpnet_w",
+  "architectures": [
+    "MPNetModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "Agriculture",
+    "1": "Buildings",
+    "2": "Coastal Zone",
+    "3": "Cross-Cutting Area",
+    "4": "Disaster Risk Management (DRM)",
+    "5": "Economy-wide",
+    "6": "Education",
+    "7": "Energy",
+    "8": "Environment",
+    "9": "Health",
+    "10": "Industries",
+    "11": "LULUCF/Forestry",
+    "12": "Social Development",
+    "13": "Tourism",
+    "14": "Transport",
+    "15": "Urban",
+    "16": "Waste",
+    "17": "Water"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "Agriculture": 0,
+    "Buildings": 1,
+    "Coastal Zone": 2,
+    "Cross-Cutting Area": 3,
+    "Disaster Risk Management (DRM)": 4,
+    "Economy-wide": 5,
+    "Education": 6,
+    "Energy": 7,
+    "Environment": 8,
+    "Health": 9,
+    "Industries": 10,
+    "LULUCF/Forestry": 11,
+    "Social Development": 12,
+    "Tourism": 13,
+    "Transport": 14,
+    "Urban": 15,
+    "Waste": 16,
+    "Water": 17
+  },
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "mpnet",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "problem_type": "multi_label_classification",
+  "relative_attention_num_buckets": 32,
+  "torch_dtype": "float32",
+  "transformers_version": "4.35.2",
+  "vocab_size": 30527
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "__version__": {
+    "sentence_transformers": "2.3.1",
+    "transformers": "4.35.2",
+    "pytorch": "2.1.0+cu121"
+  }
+}

config_setfit.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "normalize_embeddings": true,
+  "labels": [
+    "Economy-wide",
+    "Energy",
+    "Other Sector",
+    "Transport"
+  ]
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c5f137bb2e1e7da1eebf8b21d4b5878675c41b4240614b3e4bccb248029eb52c
+size 437967672

model_head.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:88559af6420967265b7519c9b35c1a3efa8a7ef3ee3c4b40d3f5f3225ffab36b
+size 13858

modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 512,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,73 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "104": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30526": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "do_lower_case": true,
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "max_length": 128,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "<pad>",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "problem_type": "multi_label_classification",
+  "sep_token": "</s>",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "MPNetTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff