metadata
base_model: BAAI/bge-small-en-v1.5
datasets: []
language: []
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@5
- cosine_ndcg@10
- cosine_ndcg@100
- cosine_mrr@5
- cosine_mrr@10
- cosine_mrr@100
- cosine_map@100
- dot_accuracy@1
- dot_accuracy@5
- dot_accuracy@10
- dot_precision@1
- dot_precision@5
- dot_precision@10
- dot_recall@1
- dot_recall@5
- dot_recall@10
- dot_ndcg@5
- dot_ndcg@10
- dot_ndcg@100
- dot_mrr@5
- dot_mrr@10
- dot_mrr@100
- dot_map@100
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:7033
- loss:GISTEmbedLoss
widget:
- source_sentence: >-
How will the performance of CBBOs be assessed in the third and fourth
year?
sentences:
- >-
' (iv) In third and fourth year, performance of the CBBOs will be
assessed based on - (a) issuing Share Certificates to each member in
third year, if any; (b) audited Financial Statements for FPOs for
second year and third year in due time and filing as required; (c) MoU
and vendor registration as per Business Plan with Marketing
Agencies/Institutional Buyers; (d) trading/uploading of produce in
e-NAM/other sources, if any; (e) second tranche equity grant to FPOs,
if any; and (f) second tranche of credit guarantee facility, if any .
(v) In the fifth year, performance of the CBBOs will be assessed based
on (a) audited Statements of accounts of FPO and filing it; (b) 100% of
agri-business plan executed and value chain developed; (c) revenue
model showing financial growth in last 3 consecutive years; (d)
detailed project completion Report; and (e) third tranche of credit
guarantee facility if any.'
- >-
'5. Tussock caterpillar, Notolopus (=Orygyia) postica , Lymantriidae,
Lepidoptera Symptom of damage: Defoliation. Nature of damage:
Caterpillars of the moth feed on the leaves. Egg: Eggs are laid in
clusters on the leaves and covered over with hairs. Larva: Caterpillars
are gregarious in young stages. Full grown larva possess a brown head, a
pair of long pencil of hairs projecting forwardly from the prothorax,
yellowish tuft of hairs arising from the lateral side of the first two
abdominal segment and long brownish hairs arising from 8 th abdominal
segment. Pupa: Pupation takes place in silken cocoon. Adult: Small
adult with yellowish brown wings. Female moth is wingless. Presence of
bipectinate antenna.'
- >-
'The Kisan Credit Card (KCC) scheme was introduced in 1998 for issue of
Kisan Credit Cards to farmers on the basis of their holdings for uniform
adoption by the banks so that farmers may use them to readily purchase
agriculture inputs such as seeds, fertilizers, pesticides etc. and draw
cash for their production needs. The scheme was further extended for the
investment credit requirement of farmers viz. allied and non-farm
activities in the year 2004. The scheme was further revisited in 2012 by
a working Group under the Chairmanship of Shri T. M. Bhasin, CMD, Indian
Bank with a view to simplify the scheme and facilitate issue of
Electronic Kisan Credit Cards. The scheme provides broad guidelines to
banks for operationalizing the KCC scheme. Implementing banks will have
the discretion to adopt the same to suit institution/location specific
requirements.'
- source_sentence: >-
How should State Government disclose ceiling premium rate for a crop in
the tender document?
sentences:
- >-
'However, in absence of insured area of last year/season for all
proposed crops or any crop, net sown area of that crop(s) will be
considered for calculation of weighted premium of district. This data
will be used for calculation of L1 only. 7.1.5 Bidding **shall be done
through e-tendering** and work order may be released within 2 weeks of
the opening of the Tender. 7.1.6 Depending on the risk profile,
historical loss cost and cost benefit analysis for the proposed crop(s)
in district(s) of any cluster, if the State Government feels that the
premium rate likely to be offered by bidding Insurance Companies would
be abnormally high, then the State Govt. can fix a ceiling on premium
rates for such crop(s) proposed to be included in the bidding evaluation
for the bidding period. However, recourse to this ceiling provision may
be done only in well justified cases and not as a general practice. The
ceiling premium rate may be derived based on statistical
evaluation/actuarial premium analysis, loss cost, historical payout etc
and name of such crop should be disclosed by State Govt. compulsorily
in the tender document. 7.1.7 In such cases where a ceiling has been
indicated, State government must call financial bids in two step
bidding or in two separate envelopes. First bid/envelop is for
disclosing the premium rate offered by each participating Insurance
Company for such ceiling crops and must be categorised under \'Ceiling
Premium Rate\' and 2nd bid envelop is for bidding of crop wise premium
rate for all crops included in tender. Time interval for opening of both
bid/envelop should be compulsorily mentioned in the bidding documents
and should preferably be on the same day. All participating Insurance
Companies have to submit the bid offer as per the procedure mentioned
above. 7.1.8 State Govt.'
- >-
'| Chapters |
Particulars | Page No.
|\n|---------------|------------------------------------------------------------|-------------|\n|
1 | Concept of Producer
Organisation | 1 |\n| 2
| Producer Organisation Registered as Cooperative Society |
15 |\n| 3 | Producer Organisation Registered as
Producer Company | 19 |\n| 4 | Producer
Organisation Registered as Non-Profit Society | 33 |\n|
5 | Producer Organisation Registered as
Trust | 36 |\n| 6 | Producer
Organisation Registered as Section 8 Company | 39 |\n|
7 | Business
Planning | 42 |\n|
8 | Financial
Management | 55 |\n|
9 | Funding
Arrangement | 60 |\n|
10 | Monitoring by the PO, POPI and Funding
Agencies | 80 |\n| Attachment
|
| |\n| 1 | Producer Company Act
provisions | |\n| 2 |
PRODUCE Fund Operational Guidelines | 106
|\n| 3 | SFAC Circular on Promoting / supporting Producer
Companies | 114 |\n| 4 | Case Study on Bilaspur
Model of PO | 125 |\n| 5 |
Indicative Framework of the process of forming a PO | 131
|\n| 6 |
References | 138
|\n| 7 | Memorandum of Agreement between NABARD and
POPI | 139 |\n| 8 | Memorandum of
Understanding between NABARD and RSA | 143 |\n|
9
|
| |\n| Abbreviations
|
| |\n|
|
| |\n| 146
|
| |\n|
|
| |\n|
|
| |\n|
|
| |\n|
|
| |\n|
|
| |\n|
|
| |\n|
|
| |\n|
|
| |\n|
|
| |\n|
|
| |\n|
|
| |\n|
|
| |'
- >-
'Agro-industries generate residues like husk, hull, shell, peel, testa,
skin, fibre, bran, linter, stone, seed, cob, prawn, head, frog legs, low
grade fish, leather waste, hair, bones, coir dust, saw dust, bamboo
dust, etc. which could be recycled or used efficiently through
agro-processing centres. In the last three decades, rice and sugarcane
residues have increased by 162 and 172 %, respectively. Their disposal
problem needs serious rethinking (Vimal, 1981). To some extent these
organic residues are used as soil conditioner, animal feed, fuel,
thatching and packing materials. These can also be put to new uses for
manufacture of various chemicals and specific products (like silica,
alcohol, tannins, glue, gelatine, wax, etc), feed, pharmaceuticals
(Iycogenin, antibiotics, vitamins, etc.), fertilizers, energy,
construction materials, paper pulp, handicraft materials etc. Residues
from fruit and vegetable industries, fish and marine industries and
slaughter o straw decrease their efficiency without pretreatment.'
- source_sentence: What is the purpose of using pectolytic enzymes in fruit juice processing?
sentences:
- >-
'Aggregating producers into collectives is one of the best mechanism to
improve access of small producers to investment, technology and market.
The facilitating agency should however keep the following factors in
view: a. Types of small scale producers in the target area, volume of
production, socioeconomic status, marketing arrangement b. Sufficient
demand in the existing market to absorb the additional production
without significantly affecting the prices c. Willingness of producers
to invest and adopt new technology, if identified, to increase
productivity or quality of produce d. Challenges in the market chain
and market environment e. Vulnerability of the market to shocks, trends
and seasonality f. Previous experience of collective action (of any
kind) in the community g. Key commodities, processed products or
semi-finished goods demanded by major retailers or processing companies
in the surrounding areas/districts h. Support from Government
Departments, NGOs, specialist support agencies and private companies
for enterprise development i. Incentives for members (also
disincentives) for joining the PO Keeping in view the sustainability
of a Producer Organisation, a flow chart of activities along with
timeline, verifiable indicators and risk factors is provided at
Attachment-5.'
- >-
'2. Sampling method to be adopted – Random Size of the card including
area for label and other details = 20 x 30 cmm = 600 cm 2 No. of Grids
= 30 Area of each grid = 7 x 2 cm = 14 cm 2 Total No. of eggs / cm 2 to
be accommodated = 96,000 – 1,08,000 Mean number of egg / cm 2 of the
card in the grid area excluding area for labeling = 200 – 250 Number of
counts/ card of size 20 x 30 cm to be taken No. of parasitised eggs = 12
• 3-4 days old parasitised egg card has to be selected for examination •
count the number of eggs and eggs parasitised in an area by 1 cm 2 •
Per card of size 20 x 30 cm count randomly in 12 positions • Repeat the
process for three different cards of same age • Express the per cent
parasitisation . The result should fall in range of 85-90 per cent.'
- >-
'Pectins are colloidal in nature, making solutions viscous and holding
other materials in suspension. Pectinesterase removes methyl groups from
the pectin molecules exposing carboxyl groups which in the presence of
bi- or multivalent cations, such as calcium, form insoluble salts which
can readily be removed. At the same time, polygalacturonase degrades
macromolecular pectin, causing reduction in viscosity and destroying the
protective colloidal action so that suspended materials will settle out.
Extensive use of pectolytic enzymes is made in processing fruit juices.
Addition of pectic enzymes to grapes or other fruits during crushing or
grinding results in increased yields of juice on pressing. Wine from
grapes so treated will usually clear faster when fermentation is
complete, and have better color.'
- source_sentence: What is the purpose of the PM-Kisan Portal?
sentences:
- >-
' 2) In case of cultivable land in the State of Nagaland which is
categorised as Jhum land as per definition under Section–2(7) of the
Nagaland Jhum Land Act, 1970 and which is owned by the
community/clan/village council/village chieftan, the identification of
beneficiaries under PM-Kisan scheme, shall be on the basis of
certification of land holding by the village council/chief/head of the
village, duly verified by the administrative head of the circle/sub
division and countersigned by the Deputy Commissioner of the District.
Provided that the name of the beneficiary is included in the state of
Nagaland's Agriculture Census of 2015-16. This proviso shall not be
applicable in cases of succession and family partition. The list of
such beneficiaries shall be subject to the exclusions under the
operational guidelines. 5.6 For identification of *bona fide*
beneficiary under PM-Kisan Scheme in Jharkhand, the following proposal
of Government of Jharkhand was considered and approved by the Committee:
\'The farmer will be asked to submit 'Vanshavali (Lineage)' linked to
the entry of land record comprising his \\ her ancestor's name giving a
chart of successor. This lineage chart shall be submitted before the
Gram Sabha for calling objections. After approval of the Gram Sabha, the
village level \\ circle level revenue officials will verify and
authenticate the Vanshawali and possession of holding. This
authenticated list of farmers after due verification of succession chart
shall be countersigned by the District level revenue authority. Farmers'
names, subject to the exclusion criterion after following the
aforementioned process, shall be uploaded on the PM-Kisan portal along
with other required details for this disbursement of benefit under the
scheme.\''
- >-
'Deep summer ploughing should be done for field preparation for
pulses,apply FYM and compost @ 8-10 t/ha and mix well. Sowing of Pigeon
pea should be done by the end of June in rows at the spacing of
60-90x15-20 cm. Seed rate should be 12-15 kg/ha Seed should be treated
with Carbendazim or Thirum @3g/kg seed Fertilizer dose should be
scheduled as per the soil test results. In general, 20-25 kg N, 45-50 kg
P and 15-20 kg K and 20 kg S should be given basal. Improved varieties
like Chhattisgarh Arhar -1, Chhattisgarh-2, Rajivlochan and TJT-501
should be sown. Soybean and other pulse crops should be sown with proper
drainage arrangement. For this seed should be treated with culture
before sowing. The quantity of Rhizobium culture@5g + PSB @ 10 g/kg seed
should be used for this seed treatment.'
- >-
'Union Territory. The details of farmers are being maintained by the
States / UTs either in electronic form or in manual register. To make
integrated platform available in the country to assist in benefit
transfer, a platform named **PM-Kisan Portal** available at URL
(**http://pmkisan.gov.in**) has been be launched for uploading the
farmers' details at a single web-portal in a uniform structure. 9.2 The
PM-Kisan Portal has been created with the following objectives - i) To
provide verified and single source of truth on farmers' details at the
portal. ii) Timely assistance to the farmers in farm operation iii)
A unified e-platform for transferring of cash benefits into farmer's
bank account through Public Financial Management System (PFMS)
integration. iv) Location wise availability of benefited farmers'
list. v) Ease of monitoring across the country on fund transaction
details.'
- source_sentence: >-
What should be done before sowing pigeonpea in fields where it is being
sown for the first time after a long time?
sentences:
- >-
'The sole arbitrator shall be appointed by NABARD in case of dispute
raised by NABARD, from the panel of three persons nominated by RSA.
Similarly, the sole arbitrator shall be appointed by RSA if dispute is
raised by RSA from the panel of three persons nominated by NABARD. The
language of the Arbitration shall be English and the arbitrator shall be
fluent in English. The arbitrator should be person of repute and
integrity and place of arbitration shall be Mumbai.\' 9. NABARD shall
have the right to enter into similar MoU/agreements with any other
RSA/Institution. 10. Any notice required to be given under this
MoU/Agreement shall be served on the party at their respective address
given below by hand delivery or by registered post :'
- >-
'y Firstly, Treat 1kg seeds with a mixture of 2 grams of thiram and one
gram of carbendazim or 4 grams of Trichoderma + 1 gram of carboxyne or
carbendazim. Before planting, treat each seed with a unique Rhizobium
culture of pigeon pea. A packet of this culture has to be sprinkled over
10 kg of seeds, then mix it lightly with hands, so that a light layer is
formed on the seeds. Sow this seed immediately. There is a possibility
of the death of culture organisms from strong sunlight. In fields where
pigeonpea is being sown for the first time after a long time, it must
use culture.'
- >-
'Organic farming is one of the several approaches found to meet the
objectives of sustainable agriculture. Organic farming is often
associated directly with, \'Sustainable farming.\' However, ‘organic
farming’ and ‘sustainable farming’, policy and ethics-wise are t wo
different terms. Many techniques used in organic farming like
inter-cropping, mulching and integration of crops and livestock are not
alien to various agriculture systems including the traditional
agriculture practiced in old countries like India. However, organic
farming is based on various laws and certification programmes, which
prohibit the use of almost all synthetic inputs, and health of the soil
is recognized as the central theme of the method. Organic products are
grown under a system of agriculture without the use of chemical
fertilizers and pesticides with an environmentally and socially
responsible approach. This is a method of farming that works at'
model-index:
- name: SentenceTransformer based on BAAI/bge-small-en-v1.5
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: val evaluator
type: val_evaluator
metrics:
- type: cosine_accuracy@1
value: 0.4680306905370844
name: Cosine Accuracy@1
- type: cosine_accuracy@5
value: 0.9092071611253197
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9603580562659847
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.4680306905370844
name: Cosine Precision@1
- type: cosine_precision@5
value: 0.18184143222506394
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09603580562659846
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.4680306905370844
name: Cosine Recall@1
- type: cosine_recall@5
value: 0.9092071611253197
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9603580562659847
name: Cosine Recall@10
- type: cosine_ndcg@5
value: 0.7079399335444153
name: Cosine Ndcg@5
- type: cosine_ndcg@10
value: 0.724527850349024
name: Cosine Ndcg@10
- type: cosine_ndcg@100
value: 0.732682390595948
name: Cosine Ndcg@100
- type: cosine_mrr@5
value: 0.6404518329070746
name: Cosine Mrr@5
- type: cosine_mrr@10
value: 0.6473191450493229
name: Cosine Mrr@10
- type: cosine_mrr@100
value: 0.649235332852707
name: Cosine Mrr@100
- type: cosine_map@100
value: 0.6492353328527082
name: Cosine Map@100
- type: dot_accuracy@1
value: 0.46675191815856776
name: Dot Accuracy@1
- type: dot_accuracy@5
value: 0.9092071611253197
name: Dot Accuracy@5
- type: dot_accuracy@10
value: 0.9603580562659847
name: Dot Accuracy@10
- type: dot_precision@1
value: 0.46675191815856776
name: Dot Precision@1
- type: dot_precision@5
value: 0.18184143222506394
name: Dot Precision@5
- type: dot_precision@10
value: 0.09603580562659846
name: Dot Precision@10
- type: dot_recall@1
value: 0.46675191815856776
name: Dot Recall@1
- type: dot_recall@5
value: 0.9092071611253197
name: Dot Recall@5
- type: dot_recall@10
value: 0.9603580562659847
name: Dot Recall@10
- type: dot_ndcg@5
value: 0.7074679767075504
name: Dot Ndcg@5
- type: dot_ndcg@10
value: 0.7240558935121589
name: Dot Ndcg@10
- type: dot_ndcg@100
value: 0.7322104337590828
name: Dot Ndcg@100
- type: dot_mrr@5
value: 0.6398124467178163
name: Dot Mrr@5
- type: dot_mrr@10
value: 0.6466797588600646
name: Dot Mrr@10
- type: dot_mrr@100
value: 0.6485959466634487
name: Dot Mrr@100
- type: dot_map@100
value: 0.6485959466634499
name: Dot Map@100
SentenceTransformer based on BAAI/bge-small-en-v1.5
This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-small-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("SamagraDataGov/embedding_finetuned")
# Run inference
sentences = [
'What should be done before sowing pigeonpea in fields where it is being sown for the first time after a long time?',
"'y Firstly, Treat 1kg seeds with a mixture of 2 grams of thiram and one gram of carbendazim or 4 grams of Trichoderma + 1 gram of carboxyne or carbendazim. Before planting, treat each seed with a unique Rhizobium culture of pigeon pea. A packet of this culture has to be sprinkled over 10 kg of seeds, then mix it lightly with hands, so that a light layer is formed on the seeds. Sow this seed immediately. There is a possibility of the death of culture organisms from strong sunlight. In fields where pigeonpea is being sown for the first time after a long time, it must use culture.'",
"'Organic farming is one of the several approaches found to meet the objectives of sustainable agriculture. Organic farming is often associated directly with, \\'Sustainable farming.\\' However, ‘organic farming’ and ‘sustainable farming’, policy and ethics-wise are t wo different terms. Many techniques used in organic farming like inter-cropping, mulching and integration of crops and livestock are not alien to various agriculture systems including the traditional agriculture practiced in old countries like India. However, organic farming is based on various laws and certification programmes, which prohibit the use of almost all synthetic inputs, and health of the soil is recognized as the central theme of the method. Organic products are grown under a system of agriculture without the use of chemical fertilizers and pesticides with an environmentally and socially responsible approach. This is a method of farming that works at'",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
val_evaluator
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.468 |
cosine_accuracy@5 | 0.9092 |
cosine_accuracy@10 | 0.9604 |
cosine_precision@1 | 0.468 |
cosine_precision@5 | 0.1818 |
cosine_precision@10 | 0.096 |
cosine_recall@1 | 0.468 |
cosine_recall@5 | 0.9092 |
cosine_recall@10 | 0.9604 |
cosine_ndcg@5 | 0.7079 |
cosine_ndcg@10 | 0.7245 |
cosine_ndcg@100 | 0.7327 |
cosine_mrr@5 | 0.6405 |
cosine_mrr@10 | 0.6473 |
cosine_mrr@100 | 0.6492 |
cosine_map@100 | 0.6492 |
dot_accuracy@1 | 0.4668 |
dot_accuracy@5 | 0.9092 |
dot_accuracy@10 | 0.9604 |
dot_precision@1 | 0.4668 |
dot_precision@5 | 0.1818 |
dot_precision@10 | 0.096 |
dot_recall@1 | 0.4668 |
dot_recall@5 | 0.9092 |
dot_recall@10 | 0.9604 |
dot_ndcg@5 | 0.7075 |
dot_ndcg@10 | 0.7241 |
dot_ndcg@100 | 0.7322 |
dot_mrr@5 | 0.6398 |
dot_mrr@10 | 0.6467 |
dot_mrr@100 | 0.6486 |
dot_map@100 | 0.6486 |
Training Details
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsgradient_accumulation_steps
: 4learning_rate
: 1e-05weight_decay
: 0.01num_train_epochs
: 1.0warmup_ratio
: 0.1load_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 8per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 4eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 1e-05weight_decay
: 0.01adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1.0max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | loss | val_evaluator_dot_map@100 |
---|---|---|---|---|
0.0682 | 15 | 0.6463 | 0.3498 | 0.6152 |
0.1364 | 30 | 0.3071 | 0.1975 | 0.6212 |
0.2045 | 45 | 0.2023 | 0.1576 | 0.6248 |
0.2727 | 60 | 0.1457 | 0.1357 | 0.6321 |
0.3409 | 75 | 0.2456 | 0.1228 | 0.6370 |
0.4091 | 90 | 0.1407 | 0.1130 | 0.6365 |
0.4773 | 105 | 0.1727 | 0.1042 | 0.6393 |
0.5455 | 120 | 0.1311 | 0.0975 | 0.6428 |
0.6136 | 135 | 0.13 | 0.0910 | 0.6433 |
0.6818 | 150 | 0.0919 | 0.0872 | 0.6466 |
0.75 | 165 | 0.1587 | 0.0851 | 0.6490 |
0.8182 | 180 | 0.1098 | 0.0834 | 0.6481 |
0.8864 | 195 | 0.1013 | 0.0824 | 0.6461 |
0.9545 | 210 | 0.1144 | 0.082 | 0.6486 |
1.0 | 220 | - | 0.0820 | 0.6486 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.14
- Sentence Transformers: 3.0.1
- Transformers: 4.43.4
- PyTorch: 2.4.1+cu121
- Accelerate: 0.33.0
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
GISTEmbedLoss
@misc{solatorio2024gistembed,
title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning},
author={Aivin V. Solatorio},
year={2024},
eprint={2402.16829},
archivePrefix={arXiv},
primaryClass={cs.LG}
}