metadata

library_name: setfit
tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
metrics:
  - accuracy
widget:
  - text: >-
      Building TopazMarket  Prev AptosLabs  Founder AptosNames  All views posts
      and opinions shared are my own  Not financial advice 
  - text: >-
      Founder FrequenC__ an awardwinning marketing agency for the next internet
      Mentor speaker  cat mom  Tweets are my own opinion libertylabsxyz  
  - text: >-
      No1 ExchangeIndonesia  Pertama Terdaftar dan Teregulasi di Bappebti  CS
      Live Chat 247 Jakarta Capital Region
  - text: producer business and elsewhere  on leave  views my own la gran manzana
  - text: >-
      Founder GainForestNow   CoLead ETHBiodivX  CL ClimateChangeAI  PhD ETH
      prevGermanyHong_Kong_SAR_ChinaVietnam Son of Hoa refugees  hehim Zurich
      Switzerland
pipeline_tag: text-classification
inference: true
base_model: BAAI/bge-small-en-v1.5
model-index:
  - name: SetFit with BAAI/bge-small-en-v1.5
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: Unknown
          type: unknown
          split: test
        metrics:
          - type: accuracy
            value: 0.5565092989985694
            name: Accuracy

SetFit with BAAI/bge-small-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: BAAI/bge-small-en-v1.5
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 28 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
UNDETERMINED	'Professor Emeritus of Cognitive Sciences at the University of California Irvine Research Visual perception evolutionary psychology consciousness AI Irvine CA' 'Emeritus Professor of War Studies Kings College London just published Command The Politics of Military Operations from Korea to Ukraine UK Penguin US OUP ' 'XML apologist Erlang enthusiast Currently JVMs Performance stuff at Netflix Previously JVMs performative stuff at Twitter Hehim San Francisco California'
NFT_ARTIST	'Artist Web3 Marketing Advisor Educator Making history everyday Trapped in the blockchain' 'OwnYourAssets TokenGatedFile Access For CrossPlatformInteroperableGaming C5isComing CYBΞRVΞRSΞ' 'Pronounced Akossya artist Zurich'
ONCHAIN_ANALYST	'I write about onchain stuff fixer AleoHQ prev rabbithole_gg and plenty of DAOs youve heard of ' 'cofounder 3pochLabs onchain' 'onchain data farcer building mosaicdrops media CryptoSapiens_ OntologyNetwork OrangeProtocol banklessDAO s0 _buildspace s4 Mosaicverse'
BUSINESS_DEVELOPER	'Prev opensea TheBlock__ amazon ' 'Building HxroNetwork variable' 'Building something old CoFounder alongsidefi '
NFT_COLLECTOR	'Building glitchmarfa Collecting brightopps prev brtmoments ' 'My soul is a cat My two children rpcnftclub ChainFeedsxyz Bangkok' 'prev OpenSea NYC'
DEVELOPER	'Architect DoraHacks DoraFactory The everlasting hacker movement Menlo Park' 'Engineer at Inria scikitlearn developer supported by Python and Machine Learning Between Vannes Paris France' 'Working paritytech on substrate Views are my own I working mostly with rustlang nowadays '
TRADER	'Applied game theorist blog occasionally at formerly not a very serious person Scott Alexander ' 'Crypto Trading Bitcoin class of 2013 insilicotrading COO Banana Cabana' 'token maxi '
COMMUNITY_MANAGER	'chutzpah controlled chaos connoisseur arbitrum chinshilling chinchillin thoughts are my own Rio de Janeiro Brazil' 'commonsstack CoFounder tecmns Founding Steward KERNEL0x KB5 trustedseed tamaralens ' 'Community Admin at The Arbitrum Foundation Helping to scale Ethereum at Arbitrum Feed KOL Binance WEB3'
SECURITY_AUDITOR	'founder adjacentfi cofounder former auditor osec_io MEV on solana ' 'Security Researcher Googles Threat Analysis Group 0days all day Love all things bytes assembly and glitter sheher ' '採用マーケ得意仮想通貨エンジニア4社1社ホワイトハッカーとして月110万達成現在歯科衛生士の妻と事業開始実績年商1億超えのマーケ担当開始5ヶ月で6名見学開始2年で累計DH11名見学6名採用ハイライト要チェックブログに今までの有益投稿をまとめました岩手長野福岡ドバイ沖縄'
VENTURE_CAPITALIST	'Liquid Crypto Brevan Howard Prev dragonfly_xyz consensys Arena' 'maverick LA' 'Founder of SavvyBooks Degen dcv_capital Summoner ElasticDAO metafam Judge code4rena Contributor CantoPublic Nomadic'
INVESTOR	'Crypto Investor at Tephra Digital Ex Head of Research Grayscale DCGco FMR Head of Digital Asset Strategy Fundstrat New York NY' 'Capital Allocators New York NY' 'Director of Research Autonomous Technology Robotics ARKinvest Automation robotics energy storage alternative energy and space Disclosure New York NY'
ANGEL_INVESTOR	'larp LawliettesLab angel uvocapital ' 'Initiator inverternetwork I Angel Investor I ex Gitcoin ' 'VP Head of BD AleoHQ Mainnet Launch Soon Strategic Advisor VoxiesNFT Angel Investor rcsdao ExOP ExCoinbase Professionally CuriousOpinions My Own Manhattan NY'
EXECUTIVE	'Chief Strategy Marketing Officer of Liquidity Group Im also the cofounder of Hudson Rock RockHudsonRock a cybercrime intelligence company TelAviv' 'CEO Polymarket Ethereum since 14 I love music and collect art new york' 'CEO StartaleHQ Founder AstarNetwork All things for Web3 for billions Japanese Sota_Web3 Earth'
MARKETER	'Director General en Kayum comparador de seguros insurance PPC tech crypto f1 Mexico City Mexico' 'Insights about Web3 data economy and AI by oceanprotocol Currently in Marcom oceanprotocol ocean Ocean ' 'f加速 ethereum China internet culture history podcast growth marketing realmasknetwork prev newsbreakapp smartnews Zuzalu human Palo Alto USA'
DATA_SCIENTIST	'data uniswap prev theTIEIO go bears New York NY' 'engineering data science a16zcrypto ' 'LangChainAI previously robusthq kensho MLOps Generative AI sports analytics '
EDUCATOR	' London' 'MSc Immunology student Past cofounder prof director USF Center Applied Data Ethics math PhD math_rachelmastodonsocial sheher Brisbane Australia' 'Here to build shared intelligence listen learn share via community tokenengineering KERNEL0x OptimismGov publicgoods education valuesmatter CyberDyn0x tauranga teikaamaui'
INFLUENCER	'the destroyer Titan' 'Healthy life style healthier bags Cape Town South Africa' 'Beauty Brains Bitcoin Beauty in an anonymous world'
ADVISOR	'A decentralized onchain governance consultant Health Wealth RunItUp The only Alpha discord youll ever need to joingametheoryweb3 squanchland Profit Land' 'Design director Startup Advisor Midjourney Sharing learnings and prompts In my free time working on offscreenai Vancouver Canada' 'I help fix and grow crypto portfolios through premium research and strategies 1000 members Founder cshift_io Podcast benandbergs Join 10k Crypto Investors '
BLOGGER	'NOW Editor Forbes Writer Stripe HarvardBiz Back on Twitter after ignoring it for a decade I will try my best London' 'larp coindesk ' ' '
RESEARCHER	'Roblox Chief Scientist UWaterloo McGill Prof morgan3dbsky Known for NVIDIA Unity Graphics Codex Markdeep G3D Skylanders E Ink Titan Quest Williams Ontario Canada' 'Simple human Simple life I am trying to do good around me Empathy creativity inspiration ArigatōMerci For ever apprenti researcher Nulle part ailleurs Nowhere' 'Research community And we have our own NFT collection Telegram'
METAVERSE_ENTHUSIAST	'fluent speaker of http and color virtual world evangelist game developer painter writer cj5 driver San Diego' 'Blockchain Gaming Evangelist CritTheory Gaming CoFounder Earth' 'We are a peeple obsessed recruiting service collective Treating everyone like a DMs checked infrequently Metaverse'
NODE_OPERATOR	'into protocools and shitposting at nodeguardians ' ' CoFounder of onivalidator Filmmaker People Maxi Los Angeles CA' 'I attest to block 247 Hobby involves the occasional block proposal Have commercial agreements with the MEV trade association Members of Sync Committees Los Angeles'
LAWYER	'Law professor at Cal BerkeleyLaw Berkeley California' 'IP litigator first sale doctrine respecter schedule a disrespecter wife mom to the tiny boss likes design patents needlework yarn new hampshire' 'Lawyer FINTConsulting TechPolicy E4EProject upcoming GRC CybersecurityAnalyst ex InstituteGC Tweet law tech policy GRC Cybersecurity Decentralized'
DATA_ANALYST	'Llama pilot at and ' 'blockchain data opensea kqian on Dune my views are my own dyor nfa data only wagmi open sea' 'Blockchain analyst Cat and dog dad Taylor Swift fan Army veteran Pittsburgh PA'
MINER	'Blockchain bitcoin mining since 2011 analyst 35 years in IT UnixNetwork engineer fpgachip design exCIO Bitfury BitfuryGroup LNSegWit taproot California USA' 'Founder and CEO of Austin TX' '在币圈捡矿泉水瓶子的人 0xb38544ccf295d78b7ae7b2bae5dbebdb1f09910dcrossbell Member of 33daoweb3 Metaverse'
SHITCOINER	'Degen ETH and SOL lover ' 'VMPX mrjacklevin Draculaborg' 'gripto alt notapornfolder_ '
FINANCIAL_ANALYST	'Enrolled Agent Crypto Enthusiast Tax EXPERT StackingSats Chopping Tax Since 2016 NoSatoshiLeftBehind hodlmore payless crypto taxes Longmont CO' 'Politico financial services editor zwarmbrodtpoliticocom zacharywarmbrodtprotonmailcom Washington DC' 'Im just lookin for clues at the scene of the crime Sedona Arizona'
BUSINESS_ANALYST	'Biz Analyst by day web3crypto learner by nightweekend Optimistic about Crypto FanVajpayeeji NaMo M Andreessen E Musk C Dixon Balaji S web3SF Bay Area'

Evaluation

Metrics

Label	Accuracy
all	0.5565

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("kasparas12/crypto_individual_infer_model_setfit")
# Run inference
preds = model("producer business and elsewhere  on leave  views my own la gran manzana")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	1	13.3415	65

Label	Training Sample Count
DEVELOPER	2111
DATA_SCIENTIST	93
DATA_ANALYST	25
NODE_OPERATOR	71
MINER	47
SECURITY_AUDITOR	352
INVESTOR	484
ANGEL_INVESTOR	160
VENTURE_CAPITALIST	941
TRADER	270
SHITCOINER	88
BUSINESS_DEVELOPER	917
BUSINESS_ANALYST	1
COMMUNITY_MANAGER	401
MARKETER	190
FINANCIAL_ANALYST	72
ADVISOR	150
RESEARCHER	691
ONCHAIN_ANALYST	45
EXECUTIVE	741
INFLUENCER	834
LAWYER	137
BLOGGER	198
NFT_COLLECTOR	335
NFT_ARTIST	598
EDUCATOR	281
METAVERSE_ENTHUSIAST	132
UNDETERMINED	2216

Training Hyperparameters

batch_size: (64, 64)
num_epochs: (1, 1)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 20
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0001	1	0.2625	-
0.0064	50	0.2677	-
0.0127	100	0.2515	-
0.0191	150	0.2413	-
0.0254	200	0.2374	-
0.0318	250	0.2383	-
0.0381	300	0.222	-
0.0445	350	0.1972	-
0.0509	400	0.2268	-
0.0572	450	0.2333	-
0.0636	500	0.199	-
0.0699	550	0.2035	-
0.0763	600	0.1676	-
0.0827	650	0.1566	-
0.0890	700	0.1909	-
0.0954	750	0.189	-
0.1017	800	0.1872	-
0.1081	850	0.1576	-
0.1144	900	0.1382	-
0.1208	950	0.1603	-
0.1272	1000	0.155	-
0.1335	1050	0.1764	-
0.1399	1100	0.1506	-
0.1462	1150	0.1439	-
0.1526	1200	0.1581	-
0.1590	1250	0.1494	-
0.1653	1300	0.1622	-
0.1717	1350	0.1503	-
0.1780	1400	0.1094	-
0.1844	1450	0.1576	-
0.1907	1500	0.1194	-
0.1971	1550	0.1515	-
0.2035	1600	0.1662	-
0.2098	1650	0.1642	-
0.2162	1700	0.0943	-
0.2225	1750	0.1472	-
0.2289	1800	0.1622	-
0.2352	1850	0.0809	-
0.2416	1900	0.1623	-
0.2480	1950	0.1444	-
0.2543	2000	0.1304	-
0.2607	2050	0.1175	-
0.2670	2100	0.078	-
0.2734	2150	0.1189	-
0.2798	2200	0.141	-
0.2861	2250	0.1233	-
0.2925	2300	0.1446	-
0.2988	2350	0.1076	-
0.3052	2400	0.1016	-
0.3115	2450	0.0818	-
0.3179	2500	0.1384	-
0.3243	2550	0.1065	-
0.3306	2600	0.1029	-
0.3370	2650	0.1227	-
0.3433	2700	0.0982	-
0.3497	2750	0.0959	-
0.3561	2800	0.0851	-
0.3624	2850	0.1028	-
0.3688	2900	0.1136	-
0.3751	2950	0.1111	-
0.3815	3000	0.115	-
0.3878	3050	0.1183	-
0.3942	3100	0.0689	-
0.4006	3150	0.1004	-
0.4069	3200	0.1079	-
0.4133	3250	0.112	-
0.4196	3300	0.0758	-
0.4260	3350	0.09	-
0.4323	3400	0.1267	-
0.4387	3450	0.1024	-
0.4451	3500	0.1352	-
0.4514	3550	0.0681	-
0.4578	3600	0.0483	-
0.4641	3650	0.0937	-
0.4705	3700	0.0744	-
0.4769	3750	0.0926	-
0.4832	3800	0.0764	-
0.4896	3850	0.0814	-
0.4959	3900	0.108	-
0.5023	3950	0.0936	-
0.5086	4000	0.0687	-
0.5150	4050	0.0607	-
0.5214	4100	0.0829	-
0.5277	4150	0.0772	-
0.5341	4200	0.0309	-
0.5404	4250	0.0797	-
0.5468	4300	0.063	-
0.5532	4350	0.071	-
0.5595	4400	0.0667	-
0.5659	4450	0.121	-
0.5722	4500	0.0565	-
0.5786	4550	0.0915	-
0.5849	4600	0.0613	-
0.5913	4650	0.0479	-
0.5977	4700	0.0622	-
0.6040	4750	0.0687	-
0.6104	4800	0.0635	-
0.6167	4850	0.1233	-
0.6231	4900	0.0351	-
0.6295	4950	0.0717	-
0.6358	5000	0.0906	-
0.6422	5050	0.0712	-
0.6485	5100	0.1133	-
0.6549	5150	0.0757	-
0.6612	5200	0.0809	-
0.6676	5250	0.112	-
0.6740	5300	0.0893	-
0.6803	5350	0.0591	-
0.6867	5400	0.0872	-
0.6930	5450	0.0937	-
0.6994	5500	0.038	-
0.7057	5550	0.0793	-
0.7121	5600	0.0569	-
0.7185	5650	0.0861	-
0.7248	5700	0.1022	-
0.7312	5750	0.0759	-
0.7375	5800	0.0451	-
0.7439	5850	0.08	-
0.7503	5900	0.058	-
0.7566	5950	0.0423	-
0.7630	6000	0.043	-
0.7693	6050	0.109	-
0.7757	6100	0.072	-
0.7820	6150	0.0342	-
0.7884	6200	0.0833	-
0.7948	6250	0.0643	-
0.8011	6300	0.1069	-
0.8075	6350	0.0713	-
0.8138	6400	0.0807	-
0.8202	6450	0.0518	-
0.8266	6500	0.0796	-
0.8329	6550	0.0954	-
0.8393	6600	0.0709	-
0.8456	6650	0.0541	-
0.8520	6700	0.0503	-
0.8583	6750	0.0737	-
0.8647	6800	0.0931	-
0.8711	6850	0.0636	-
0.8774	6900	0.0579	-
0.8838	6950	0.1168	-
0.8901	7000	0.0751	-
0.8965	7050	0.0945	-
0.9028	7100	0.0396	-
0.9092	7150	0.0623	-
0.9156	7200	0.0641	-
0.9219	7250	0.0697	-
0.9283	7300	0.0675	-
0.9346	7350	0.0544	-
0.9410	7400	0.0803	-
0.9474	7450	0.0549	-
0.9537	7500	0.0612	-
0.9601	7550	0.0721	-
0.9664	7600	0.0692	-
0.9728	7650	0.07	-
0.9791	7700	0.0476	-
0.9855	7750	0.0673	-
0.9919	7800	0.0606	-
0.9982	7850	0.1001	-

Framework Versions

Python: 3.9.16
SetFit: 1.0.3
Sentence Transformers: 2.2.2
Transformers: 4.21.3
PyTorch: 1.12.1+cu116
Datasets: 2.4.0
Tokenizers: 0.12.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}