SetFit with BAAI/bge-m3

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-m3 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: BAAI/bge-m3
Classification head: a LogisticRegression instance
Maximum Sequence Length: 8192 tokens
Number of Classes: 2 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
lexical	"How does Happeo's search AI work to provide answers to user queries?" 'What are the primary areas of focus in the domain of Data Science and Analysis?' 'How can one organize a running event in Belgium?'
semantic	'What changes can be made to a channel header?' 'How can hardware capabilities impact the accuracy of motion and object detections?' 'Who is responsible for managing guarantees and prolongations?'

Evaluation

Metrics

Label	Accuracy
all	0.8947

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("yaniseuranova/setfit-rag-hybrid-search-query-router")
# Run inference
preds = model("What is the purpose of setting up a CUPS on a server?")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	4	13.7407	28

Label	Training Sample Count
lexical	44
semantic	118

Training Hyperparameters

batch_size: (8, 8)
num_epochs: (3, 3)
max_steps: -1
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: True

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0005	1	0.257	-
0.0250	50	0.1944	-
0.0499	100	0.2383	-
0.0749	150	0.1279	-
0.0999	200	0.0033	-
0.1248	250	0.0021	-
0.1498	300	0.0012	-
0.1747	350	0.0008	-
0.1997	400	0.0004	-
0.2247	450	0.0006	-
0.2496	500	0.0005	-
0.2746	550	0.0003	-
0.2996	600	0.0003	-
0.3245	650	0.0003	-
0.3495	700	0.0004	-
0.3744	750	0.0005	-
0.3994	800	0.0003	-
0.4244	850	0.0002	-
0.4493	900	0.0002	-
0.4743	950	0.0002	-
0.4993	1000	0.0001	-
0.5242	1050	0.0001	-
0.5492	1100	0.0001	-
0.5741	1150	0.0002	-
0.5991	1200	0.0001	-
0.6241	1250	0.0003	-
0.6490	1300	0.0002	-
0.6740	1350	0.0001	-
0.6990	1400	0.0003	-
0.7239	1450	0.0001	-
0.7489	1500	0.0002	-
0.7738	1550	0.0001	-
0.7988	1600	0.0002	-
0.8238	1650	0.0002	-
0.8487	1700	0.0002	-
0.8737	1750	0.0002	-
0.8987	1800	0.0003	-
0.9236	1850	0.0001	-
0.9486	1900	0.0001	-
0.9735	1950	0.0001	-
0.9985	2000	0.0001	-
1.0	2003	-	0.1735
1.0235	2050	0.0001	-
1.0484	2100	0.0001	-
1.0734	2150	0.0001	-
1.0984	2200	0.0	-
1.1233	2250	0.0001	-
1.1483	2300	0.0001	-
1.1732	2350	0.0001	-
1.1982	2400	0.0002	-
1.2232	2450	0.0001	-
1.2481	2500	0.0	-
1.2731	2550	0.0001	-
1.2981	2600	0.0001	-
1.3230	2650	0.0	-
1.3480	2700	0.0001	-
1.3729	2750	0.0001	-
1.3979	2800	0.0001	-
1.4229	2850	0.0	-
1.4478	2900	0.0001	-
1.4728	2950	0.0001	-
1.4978	3000	0.0001	-
1.5227	3050	0.0001	-
1.5477	3100	0.0	-
1.5726	3150	0.0	-
1.5976	3200	0.0001	-
1.6226	3250	0.0001	-
1.6475	3300	0.0001	-
1.6725	3350	0.0001	-
1.6975	3400	0.0001	-
1.7224	3450	0.0	-
1.7474	3500	0.0002	-
1.7723	3550	0.0001	-
1.7973	3600	0.0	-
1.8223	3650	0.0	-
1.8472	3700	0.0001	-
1.8722	3750	0.0	-
1.8972	3800	0.0001	-
1.9221	3850	0.0	-
1.9471	3900	0.0	-
1.9720	3950	0.0001	-
1.9970	4000	0.0	-
2.0	4006	-	0.2593
2.0220	4050	0.0001	-
2.0469	4100	0.0001	-
2.0719	4150	0.0	-
2.0969	4200	0.0001	-
2.1218	4250	0.0	-
2.1468	4300	0.0001	-
2.1717	4350	0.0001	-
2.1967	4400	0.0001	-
2.2217	4450	0.0001	-
2.2466	4500	0.0001	-
2.2716	4550	0.0	-
2.2966	4600	0.0	-
2.3215	4650	0.0	-
2.3465	4700	0.0001	-
2.3714	4750	0.0001	-
2.3964	4800	0.0002	-
2.4214	4850	0.0001	-
2.4463	4900	0.0001	-
2.4713	4950	0.0	-
2.4963	5000	0.0001	-
2.5212	5050	0.0001	-
2.5462	5100	0.0	-
2.5711	5150	0.0001	-
2.5961	5200	0.0	-
2.6211	5250	0.0	-
2.6460	5300	0.0	-
2.6710	5350	0.0	-
2.6960	5400	0.0	-
2.7209	5450	0.0	-
2.7459	5500	0.0	-
2.7708	5550	0.0	-
2.7958	5600	0.0001	-
2.8208	5650	0.0	-
2.8457	5700	0.0	-
2.8707	5750	0.0	-
2.8957	5800	0.0	-
2.9206	5850	0.0	-
2.9456	5900	0.0001	-
2.9705	5950	0.0	-
2.9955	6000	0.0	-
3.0	6009	-	0.2738

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.12
SetFit: 1.0.3
Sentence Transformers: 2.6.1
Transformers: 4.39.0
PyTorch: 2.3.1+cu121
Datasets: 2.18.0
Tokenizers: 0.15.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

yaniseuranova
/

setfit-rag-hybrid-search-query-router