SetFit

This is a SetFit model that can be used for Text Classification. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Classification head: a LogisticRegression instance
Maximum Sequence Length: 384 tokens
Number of Classes: 2 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label Examples

bug

Label	Examples
bug	'lookatme requirements should specify click<9\n`lookatme` specifies a requirements of `click>=7,<8` but in fact seems to work fine with click 8+. Many tools (including poetry, and soon pip) will refuse to install lookatme in a venv with modern Python packages because those packages require click 8+.\r\n\r\nThis is easily fixed by updating requirements.\r\n\r\nSteps to reproduce the behavior:\r\n\r\n`\r\npoetry shell\r\npoetry add black\r\npoetry add lookatme\r\n`\r\n\r\nExpected behavior\r\nlookatme can be installed with black.\r\n\r\nActual behavior\r\npoetry refuses to install lookatme because of the unnecessary requirement.\r\n\r\nAdditional context\r\nPR inbound.' 'Quarto error when trying to render a simple .qmd file\n### System details\r\n\r\nVersion 2022.11.0-daily+87 (2022.11.0-daily+87)\r\nsysname\r\n"Darwin"\r\nrelease\r\n"21.5.0"\r\nversion\r\n"Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu8020.121.3~4/RELEASE_ARM64_T6000" \r\n\r\n\r\n### Steps to reproduce the problem\r\nthis is an example `qmd file`\r\n\r\n---\r\ntitle: "An Introduction to data science"\r\nformat: revealjs\r\n---\r\n\r\n\r\n\r\n---\r\n# How is the project is constructed\r\n\r\n1. Intro\r\n\r\n2. Literature review\r\n\r\n3. Hypothesis\r\n\r\n4. Methods: which tools did you use and how you used them (more on this in a bit)\r\n\r\n5. Main results\r\n\r\n6. Conclusions\r\n\r\n<img src="https://www.dropbox.com/s/06o9rixg2r5ocvz/ppic155.jpeg?raw=1" alt="" style="zoom:33%;" />\r\n\r\n---\r\n# Intro\r\n\r\nPresent the research topic and research hypothesis\r\n\r\n---\r\n# Literature review\r\n\r\nat least 5-6 papers you will summarize relating to your project\r\n\r\n\r\n\r\n\r\n### Describe the problem in detail\r\n\r\nwhen rendering I get errors, here is the error from the example file above\r\n`\r\nERROR: YAMLError: end of the stream or a document separator is expected at line 10, column 12:\r\n 4. Methods: which tools did you use and ho ... \r\n ^\r\n`\r\n\r\n\r\n### Describe the behavior you expected\r\n\r\nexpected for the file to render correctly \r\n\r\n- [ X] I have read the guide for submitting good bug reports.\r\n- [ X] I have installed the latest version of RStudio, and confirmed that the issue still persists.\r\n- [ X] If I am reporting an RStudio crash, I have included a diagnostics report.\r\n- [ X] I have done my best to include a minimal, self-contained set of instructions for consistently reproducing the issue.\r\n' 'Nested buttons do not handle enabled properly\nWith 2 nested buttons, if the outside one has the prop `enabled={false}` then then inside one does not receive touch events.\r\n\r\nTested on iOS, not sure about Android.\r\n\r\nSnack: https://snack.expo.io/H15lpZuFQ'
non-bug	'Migrating Woo Comparison table to Sparks\n### Description:\r\nWe need to migrate the current comparison table to Sparks and remove it from Otter.' '[bug] Hard code 'movie_id' in neg_sampler.py\n\r\n\r\nUse item parameter instead of hard code 'movie_id'.' "mk: omit transitive shared-library dependencies from linker command line\nRight now, binaries created directly within a build directory are linked slightly different compared to binaries created as depot archive. When created in the build directory, all shared-library dependencies including transitive shared-library dependencies of the target's used shared libraries end up at the linker command line. In contrast, when building a depot archive - where transitive shared libraries are not known because they are hidden behind the library's ABI - only the immediate dependencies appear at the linker command line. To improve the consistency, we should better link without transitive shared objects in both cases."

'lookatme requirements should specify click<9\nlookatme specifies a requirements of click>=7,<8 but in fact seems to work fine with click 8+. Many tools (including poetry, and soon pip) will refuse to install lookatme in a venv with modern Python packages because those packages require click 8+.\r\n\r\nThis is easily fixed by updating requirements.\r\n\r\nSteps to reproduce the behavior:\r\n\r\n\r\npoetry shell\r\npoetry add black\r\npoetry add lookatme\r\n\r\n\r\n**Expected behavior**\r\nlookatme can be installed with black.\r\n\r\n**Actual behavior**\r\npoetry refuses to install lookatme because of the unnecessary requirement.\r\n\r\n**Additional context**\r\nPR inbound.'
'Quarto error when trying to render a simple .qmd file\n### System details\r\n\r\nVersion 2022.11.0-daily+87 (2022.11.0-daily+87)\r\nsysname\r\n"Darwin"\r\nrelease\r\n"21.5.0"\r\nversion\r\n"Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:37 PDT 2022; root:xnu8020.121.3~4/RELEASE_ARM64_T6000" \r\n\r\n\r\n### Steps to reproduce the problem\r\nthis is an example qmd file\r\n\r\n---\r\ntitle: "An Introduction to data science"\r\nformat: revealjs\r\n---\r\n\r\n\r\n\r\n---\r\n# How is the project is constructed\r\n\r\n1. Intro\r\n\r\n2. Literature review\r\n\r\n3. Hypothesis\r\n\r\n4. Methods: which tools did you use and how you used them (more on this in a bit)\r\n\r\n5. Main results\r\n\r\n6. Conclusions\r\n\r\n<img src="https://www.dropbox.com/s/06o9rixg2r5ocvz/ppic155.jpeg?raw=1" alt="" style="zoom:33%;" />\r\n\r\n---\r\n# Intro\r\n\r\nPresent the research topic and research hypothesis\r\n\r\n---\r\n# Literature review\r\n\r\nat least 5-6 papers you will summarize relating to your project\r\n\r\n\r\n\r\n\r\n### Describe the problem in detail\r\n\r\nwhen rendering I get errors, here is the error from the example file above\r\n\r\nERROR: YAMLError: end of the stream or a document separator is expected at line 10, column 12:\r\n 4. Methods: which tools did you use and ho ... \r\n ^\r\n\r\n\r\n\r\n### Describe the behavior you expected\r\n\r\nexpected for the file to render correctly \r\n\r\n- [ X] I have read the guide for submitting good bug reports.\r\n- [ X] I have installed the latest version of RStudio, and confirmed that the issue still persists.\r\n- [ X] If I am reporting an RStudio crash, I have included a diagnostics report.\r\n- [ X] I have done my best to include a minimal, self-contained set of instructions for consistently reproducing the issue.\r\n'
'Nested buttons do not handle enabled properly\nWith 2 nested buttons, if the outside one has the prop enabled={false} then then inside one does not receive touch events.\r\n\r\nTested on iOS, not sure about Android.\r\n\r\nSnack: https://snack.expo.io/H15lpZuFQ'

non-bug

'Migrating Woo Comparison table to Sparks\n### Description:\r\nWe need to migrate the current comparison table to Sparks and remove it from Otter.'
'[bug] Hard code 'movie_id' in neg_sampler.py\n\r\n\r\nUse item parameter instead of hard code 'movie_id'.'
"mk: omit transitive shared-library dependencies from linker command line\nRight now, binaries created directly within a build directory are linked slightly different compared to binaries created as depot archive. When created in the build directory, all shared-library dependencies including transitive shared-library dependencies of the target's used shared libraries end up at the linker command line. In contrast, when building a depot archive - where transitive shared libraries are not known because they are hidden behind the library's ABI - only the immediate dependencies appear at the linker command line. To improve the consistency, we should better link without transitive shared objects in both cases."

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("setfit_model_id")
# Run inference
preds = model("Read the Docs
Implement read the docs for documentation")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	3	186.9402	10443

Label	Training Sample Count
bug	47
non-bug	137

Training Hyperparameters

batch_size: (16, 2)
num_epochs: (1, 1)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 20
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0022	1	0.6468	-
0.1087	50	0.2755	-
0.2174	100	0.0535	-
0.3261	150	0.0011	-
0.4348	200	0.0004	-
0.5435	250	0.0003	-
0.6522	300	0.0003	-
0.7609	350	0.0002	-
0.8696	400	0.0002	-
0.9783	450	0.0001	-

Framework Versions

Python: 3.11.6
SetFit: 1.1.0
Sentence Transformers: 3.0.1
Transformers: 4.44.2
PyTorch: 2.4.1+cu121
Datasets: 2.21.0
Tokenizers: 0.19.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}