--- library_name: setfit tags: - setfit - sentence-transformers - text-classification - generated_from_setfit_trainer metrics: - accuracy widget: - text: How does cannibalization within the RTEC category compare to other product categories within the MT channel, influencing the overall volumelift? - text: Can you identify the specific factors or challenges that contributed to the decline in ROI within TT in 2022 compared to 2021? - text: Which Sku cannibalizes higher margin Skus the most for CHEDRAUI channel_name? - text: Can you compare the overall market share and competitive landscape of the category more sensitive to internal cannibalization with other categories? - text: Can you identify the key factors or challenges that have contributed to the ROI decline within TT pipeline_tag: text-classification inference: true base_model: intfloat/multilingual-e5-large model-index: - name: SetFit with intfloat/multilingual-e5-large results: - task: type: text-classification name: Text Classification dataset: name: Unknown type: unknown split: test metrics: - type: accuracy value: 0.9130434782608695 name: Accuracy --- # SetFit with intfloat/multilingual-e5-large This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification. The model has been trained using an efficient few-shot learning technique that involves: 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning. 2. Training a classification head with features from the fine-tuned Sentence Transformer. ## Model Details ### Model Description - **Model Type:** SetFit - **Sentence Transformer body:** [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance - **Maximum Sequence Length:** 512 tokens - **Number of Classes:** 3 classes ### Model Sources - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit) - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055) - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit) ### Model Labels | Label | Examples | |:------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 2.0 |

'Are there particular factors or trends contributing to the high level of cannibalization for certain brands in the SS category?'
'How is the TL performed with respect to volume Lift for Zucaritas in 2022?'
'Which category is more sensititive to internal cannibalization?'

| | 1.0 |

'Are there plans to enhance promotional activities specific to the MT to mitigate the ROI decline in 2023?'
'Can you provide a detailed analysis of the categories that experienced the highest and lowest ROI changes from 2021 to 2022?'
'Why has the overall Lift declined in 2023 in Zucaritas vs 2022?'

| | 0.0 |

'Which sku_group have seen the highest Lifts for Promo Price in MT WS catg_nm in 2022?'
'Which channel has the max ROI and Vol Lift when we run the Promotion for RTEC category?'
'How is the promotion efficacy in 2022 compared to 2021 for RTEC category and BARS subcategory? '

| ## Evaluation ### Metrics | Label | Accuracy | |:--------|:---------| | **all** | 0.9130 | ## Uses ### Direct Use for Inference First install the SetFit library: ```bash pip install setfit ``` Then you can load this model and run inference. ```python from setfit import SetFitModel # Download from the 🤗 Hub model = SetFitModel.from_pretrained("vgarg/promo_prescriptive_harshal_trail_27_02_2024") # Run inference preds = model("Which Sku cannibalizes higher margin Skus the most for CHEDRAUI channel_name?") ``` ## Training Details ### Training Set Metrics | Training set | Min | Median | Max | |:-------------|:----|:--------|:----| | Word count | 8 | 16.1333 | 30 | | Label | Training Sample Count | |:------|:----------------------| | 0.0 | 10 | | 1.0 | 10 | | 2.0 | 10 | ### Training Hyperparameters - batch_size: (16, 16) - num_epochs: (3, 3) - max_steps: -1 - sampling_strategy: oversampling - num_iterations: 20 - body_learning_rate: (2e-05, 2e-05) - head_learning_rate: 2e-05 - loss: CosineSimilarityLoss - distance_metric: cosine_distance - margin: 0.25 - end_to_end: False - use_amp: False - warmup_proportion: 0.1 - seed: 42 - eval_max_steps: -1 - load_best_model_at_end: False ### Training Results | Epoch | Step | Training Loss | Validation Loss | |:------:|:----:|:-------------:|:---------------:| | 0.0133 | 1 | 0.3648 | - | | 0.6667 | 50 | 0.0031 | - | | 1.3333 | 100 | 0.0006 | - | | 2.0 | 150 | 0.0003 | - | | 2.6667 | 200 | 0.0003 | - | ### Framework Versions - Python: 3.10.12 - SetFit: 1.0.3 - Sentence Transformers: 2.4.0 - Transformers: 4.37.2 - PyTorch: 2.1.0+cu121 - Datasets: 2.17.1 - Tokenizers: 0.15.2 ## Citation ### BibTeX ```bibtex @article{https://doi.org/10.48550/arxiv.2209.11055, doi = {10.48550/ARXIV.2209.11055}, url = {https://arxiv.org/abs/2209.11055}, author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren}, keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {Efficient Few-Shot Learning Without Prompts}, publisher = {arXiv}, year = {2022}, copyright = {Creative Commons Attribution 4.0 International} } ```