Add SetFit model

Browse files

Files changed (13) hide show

1_Pooling/config.json +9 -0
README.md +241 -0
config.json +37 -0
config_sentence_transformers.json +7 -0
config_setfit.json +7 -0
model.safetensors +3 -0
model_head.pkl +3 -0
modules.json +14 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +51 -0
tokenizer.json +0 -0
tokenizer_config.json +73 -0
vocab.txt +0 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "word_embedding_dimension": 768,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false
+}

README.md ADDED Viewed

	@@ -0,0 +1,241 @@

+---
+library_name: setfit
+tags:
+- setfit
+- sentence-transformers
+- text-classification
+- generated_from_setfit_trainer
+metrics:
+- accuracy
+widget:
+- text: During 2021-2030, Thailand s LEDS will be implemented through the NDC roadmap
+    and sectoral action plans which provide detailed guidance on measures and realistic
+    actions to achieve the 1st NDC target by 2030, as well as regular monitoring and
+    evaluation of the progress and achievement. The monitoring and evaluation of the
+    mitigation measures relating to the Thailand’s LEDS will be carried out to ensure
+    its effectiveness and efficiency in achieving its objectives and key performance
+    indicators. Because it is a long-term plan spanning many years during which many
+    changes can occur, it is envisaged that it will be subject to a comprehensive
+    review every five years. This is consistent with the approach under the Paris
+    Agreement that assigned Parties to submit their NDCs to the UNFCCC every five
+    year.
+- text: The NDC also benefited from the reviews and comments of these implementing
+    partners as well as local and international experts. Special thanks to The Honourable
+    Molwyn Joseph, Minister for Health, Wellness and the Environment, for his unwavering
+    commitment to advance this ambitious climate change agenda, while Antigua and
+    Barbuda faced an outbreak of the COVID-19 pandemic. Significant contributions
+    to the process were made by a wide-cross section of stakeholders from the public
+    and private sector, civil society, trade and industry groups and training institutions,
+    who attended NDC-related workshops, consultations and participated in key stakeholder
+    interviews organized to inform the NDC update.
+- text: Antigua and Barbuda will mainstream gender in its energy planning through
+    an Inclusive Renewable Energy Strategy. This strategy will recognize and acknowledge,
+    among other things, the gender norms, and inequalities prevalent in the energy
+    sector, women and men’s differentiated access to energy, their different energy
+    needs and preferences, and different impacts that energy access could have on
+    their livelihoods. Antigua and Barbuda’s plan for an inclusive renewable energy
+    transition will ensure continued affordable and reliable access to electricity
+    and other energy services for all.
+- text: 'Thailand’s climate actions are divided into short-term, medium-term and long-term
+    targets up to 2050. For the mitigation actions, short-term targets include: (i)
+    develop medium- and long-term GHG emission reduction targets and prepare roadmaps
+    for the implementation by sector, including the GHG emission reduction target
+    on a voluntary basis (pre-2020 target), Nationally Appropriate Mitigation Actions
+    (NAMAs) roadmaps, and measurement, reporting, and verification mechanisms, (ii)
+    establish domestic incentive mechanisms to encourage low carbon development. The
+    medium-term targets include: (i) reduce GHG emissions from energy and transport
+    sectors by 7-20% against BAU level by 2020, subject to the level of international
+    support, (ii) supply at least 25% of energy consumption from renewable energy
+    sources by 2021 and (iii) increase the ratio of municipalities with more than
+    10 m2 of green space per capita.'
+- text: In the oil sector, the country has benefited from 372 million dollars for
+    the reduction of gas flaring at the initiative (GGFR - "Global Gas Flaring Reduction")
+    of the World Bank after having adopted in November 2015 a national reduction plan
+    flaring and associated gas upgrading. In the electricity sector, the NDC highlights
+    the development of hydroelectricity which should make it possible to cover 80%
+    of production in 2025, the remaining 20% &ZeroWidthSpace;&ZeroWidthSpace;being
+    covered by gas and other renewable energies.
+pipeline_tag: text-classification
+inference: true
+co2_eq_emissions:
+  emissions: 5.901369050433577
+  source: codecarbon
+  training_type: fine-tuning
+  on_cloud: false
+  cpu_model: Intel(R) Xeon(R) CPU @ 2.00GHz
+  ram_total_size: 12.674789428710938
+  hours_used: 0.185
+  hardware_used: 1 x Tesla T4
+base_model: ppsingh/TAPP-multilabel-mpnet
+---
+# SetFit with ppsingh/TAPP-multilabel-mpnet
+This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [ppsingh/TAPP-multilabel-mpnet](https://huggingface.co/ppsingh/TAPP-multilabel-mpnet) as the Sentence Transformer embedding model. A [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance is used for classification.
+The model has been trained using an efficient few-shot learning technique that involves:
+1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
+2. Training a classification head with features from the fine-tuned Sentence Transformer.
+## Model Details
+### Model Description
+- **Model Type:** SetFit
+- **Sentence Transformer body:** [ppsingh/TAPP-multilabel-mpnet](https://huggingface.co/ppsingh/TAPP-multilabel-mpnet)
+- **Classification head:** a [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance
+- **Maximum Sequence Length:** 512 tokens
+- **Number of Classes:** 2 classes
+<!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
+<!-- - **Language:** Unknown -->
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
+- **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
+- **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
+### Model Labels
+| Label    | Examples                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+|:---------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| NEGATIVE | <ul><li>'(p 70-1).Antigua and Barbuda’s 2021 update to the first Nationally Determined Contribution the most vulnerable in society have been predominantly focused on adaptation measures like building resilience to flooding and hurricanes. The updated NDC ambition provides an opportunity to focus more intently on enabling access to energy efficiency and renewable energy for the most vulnerable, particularly women who are most affected when electricity is not available since the grid is down after an extreme weather event. Nationally, Antigua and Barbuda intends to utilize the SIRF Fund as a mechanism primarily to catalyse and leverage investment in the transition for NGOs, MSMEs and informal sectors that normally cannot access traditional local commercial financing due to perceived high risks.'</li><li>'The transport system cost will be increased by 16.2% compared to the BAU level. Electric trucks and electric pick-ups will account for the highest share of investment followed by electric buses and trucks. In the manufacturing industries, the energy efficiency improvement in the heating and the motor systems and the deployment of CCS require the highest investment in the non-metallic and the chemical industries in 2050. The manufacturing industries system cost will be increased by 15.3% compared to the BAU level.'</li><li>'Figure 1-9: Total GHG emissions by sector (excluding LULUCF) 2000 and 2016 1.2.2 Greenhouse Gas Emission by Sector • Energy Total direct GHG emissions from the Energy sector in 2016 were estimated to be 253,895.61 eq. The majority of GHG emissions in the Energy sector were generated by fuel combustion, consisting mostly of grid-connected electricity and heat production at around eq (42.84%). GHG emissions from Transport, Manufacturing Industries and Construction, and other sectors were 68,260.17 GgCO2 eq eq (6.10%), respectively. Fugitive Emissions from fuel eq or a little over 4.33% of total GHG emissions from the Energy sector. Details of GHG emissions in the Energy sector by gas type and source in 2016 are presented in Figure 1-10. Source: Thailand Third Biennial Update Report, UNFCCC 2020.'</li></ul> |
+| TARGET   | <ul><li>'DNPM, NFA,. Cocoa. Board,. Spice Board,. Provincial. gov-ernments. in the. Momase. region. Ongoing -. 2025. 340. European Union. Support committed. Priority Sector: Health. By 2030, 100% of the population benefit from introduced health measures to respond to malaria and other climate-sensitive diseases in PNG. Action or Activity. Indicator. Status. Lead. Implementing. Agencies. Supporting. Agencies. Time Frame. Budget (USD). Funding Source. (Existing/Potential). Other Support. Improve vector control. measures, with a priority. of all households having. access to a long-lasting. insecticidal net (LLIN).'</li><li>'Conditionality: With national effort it is intended to increase the attention to vulnerable groups in case of disasters and/or emergencies up to 50% of the target and 100% of the target with international cooperation. Description: In this goal, it is projected to increase coverage from 33% to 50% (211,000 families) of agricultural insurance in attention to the number of families, whose crops were affected by various adverse weather events (flood, drought, frost, hailstorm, among others), in addition to the implementation of comprehensive actions for risk management and adaptation to Climate Change.'</li><li>'By 2030, upgrade watershed health and vitality in at least 20 districts to a higher condition category. By 2030, create an inventory of wetlands in Nepal and sustainably manage vulnerable wetlands. By 2025, enhance the sink capacity of the landuse sector by instituting the Forest Development Fund (FDF) for compensation of plantations and forest restoration. Increase growing stock including Mean Annual Increment in Tarai, Hills and Mountains. Afforest/reforest viable public and private lands, including agroforestry.'</li></ul>                                                                                                                                                                                                                                                                                                                                                                                               |
+## Uses
+### Direct Use for Inference
+First install the SetFit library:
+```bash
+pip install setfit
+```
+Then you can load this model and run inference.
+```python
+from setfit import SetFitModel
+# Download from the 🤗 Hub
+model = SetFitModel.from_pretrained("ppsingh/iki_target_setfit")
+# Run inference
+preds = model("In the oil sector, the country has benefited from 372 million dollars for the reduction of gas flaring at the initiative (GGFR - \"Global Gas Flaring Reduction\") of the World Bank after having adopted in November 2015 a national reduction plan flaring and associated gas upgrading. In the electricity sector, the NDC highlights the development of hydroelectricity which should make it possible to cover 80% of production in 2025, the remaining 20% &ZeroWidthSpace;&ZeroWidthSpace;being covered by gas and other renewable energies.")
+```
+<!--
+### Downstream Use
+*List how someone could finetune this model on their own dataset.*
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Set Metrics
+| Training set | Min | Median   | Max |
+|:-------------|:----|:---------|:----|
+| Word count   | 58  | 116.6632 | 508 |
+| Label    | Training Sample Count |
+|:---------|:----------------------|
+| NEGATIVE | 51                    |
+| TARGET   | 44                    |
+### Training Hyperparameters
+- batch_size: (8, 2)
+- num_epochs: (1, 0)
+- max_steps: -1
+- sampling_strategy: undersampling
+- body_learning_rate: (2e-05, 1e-05)
+- head_learning_rate: 0.01
+- loss: CosineSimilarityLoss
+- distance_metric: cosine_distance
+- margin: 0.25
+- end_to_end: False
+- use_amp: False
+- warmup_proportion: 0.01
+- seed: 42
+- eval_max_steps: -1
+- load_best_model_at_end: False
+### Training Results
+| Epoch  | Step | Training Loss | Validation Loss |
+|:------:|:----:|:-------------:|:---------------:|
+| 0.0018 | 1    | 0.3343        | -               |
+| 0.1783 | 100  | 0.0026        | 0.1965          |
+| 0.3565 | 200  | 0.0001        | 0.1995          |
+| 0.5348 | 300  | 0.0001        | 0.2105          |
+| 0.7130 | 400  | 0.0001        | 0.2153          |
+| 0.8913 | 500  | 0.0           | 0.1927          |
+### Environmental Impact
+Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
+- **Carbon Emitted**: 0.006 kg of CO2
+- **Hours Used**: 0.185 hours
+### Training Hardware
+- **On Cloud**: No
+- **GPU Model**: 1 x Tesla T4
+- **CPU Model**: Intel(R) Xeon(R) CPU @ 2.00GHz
+- **RAM Size**: 12.67 GB
+### Framework Versions
+- Python: 3.10.12
+- SetFit: 1.0.3
+- Sentence Transformers: 2.3.1
+- Transformers: 4.35.2
+- PyTorch: 2.1.0+cu121
+- Datasets: 2.3.0
+- Tokenizers: 0.15.1
+## Citation
+### BibTeX
+```bibtex
+@article{https://doi.org/10.48550/arxiv.2209.11055,
+    doi = {10.48550/ARXIV.2209.11055},
+    url = {https://arxiv.org/abs/2209.11055},
+    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
+    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
+    title = {Efficient Few-Shot Learning Without Prompts},
+    publisher = {arXiv},
+    year = {2022},
+    copyright = {Creative Commons Attribution 4.0 International}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "_name_or_path": "ppsingh/TAPP-multilabel-mpnet",
+  "architectures": [
+    "MPNetModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "id2label": {
+    "0": "ActionLabel",
+    "1": "PlansLabel",
+    "2": "PolicyLabel",
+    "3": "TargetLabel"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "label2id": {
+    "ActionLabel": 0,
+    "PlansLabel": 1,
+    "PolicyLabel": 2,
+    "TargetLabel": 3
+  },
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "mpnet",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "problem_type": "multi_label_classification",
+  "relative_attention_num_buckets": 32,
+  "torch_dtype": "float32",
+  "transformers_version": "4.35.2",
+  "vocab_size": 30527
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "__version__": {
+    "sentence_transformers": "2.3.1",
+    "transformers": "4.35.2",
+    "pytorch": "2.1.0+cu121"
+  }
+}

config_setfit.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "labels": [
+    "NEGATIVE",
+    "TARGET"
+  ],
+  "normalize_embeddings": true
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d43161723b44e5f295ece8f088b6a3dc0c70e5f861db5d7d1c692aca42c03e65
+size 437967672

model_head.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b471c1931e8f7d0c3c1e47b999d7a9041fd61d444f951d09a88ddf161c462122
+size 7702

modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  }
+]

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 512,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,73 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "104": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30526": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "do_lower_case": true,
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "max_length": 128,
+  "model_max_length": 512,
+  "pad_to_multiple_of": null,
+  "pad_token": "<pad>",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "problem_type": "multi_label_classification",
+  "sep_token": "</s>",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "MPNetTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff