bobox's picture
Training in progress, step 428, checkpoint
e1f853e verified
metadata
base_model: microsoft/deberta-v3-small
datasets:
  - jinaai/negation-dataset-v2
  - tals/vitaminc
  - allenai/scitail
  - allenai/sciq
  - allenai/qasc
  - sentence-transformers/msmarco-msmarco-distilbert-base-v3
  - sentence-transformers/natural-questions
  - sentence-transformers/trivia-qa
  - sentence-transformers/gooaq
  - google-research-datasets/paws
language:
  - en
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
  - cosine_accuracy
  - dot_accuracy
  - manhattan_accuracy
  - euclidean_accuracy
  - max_accuracy
  - cosine_accuracy_threshold
  - cosine_f1
  - cosine_f1_threshold
  - cosine_precision
  - cosine_recall
  - cosine_ap
  - dot_accuracy_threshold
  - dot_f1
  - dot_f1_threshold
  - dot_precision
  - dot_recall
  - dot_ap
  - manhattan_accuracy_threshold
  - manhattan_f1
  - manhattan_f1_threshold
  - manhattan_precision
  - manhattan_recall
  - manhattan_ap
  - euclidean_accuracy_threshold
  - euclidean_f1
  - euclidean_f1_threshold
  - euclidean_precision
  - euclidean_recall
  - euclidean_ap
  - max_accuracy_threshold
  - max_f1
  - max_f1_threshold
  - max_precision
  - max_recall
  - max_ap
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:226010
  - loss:CachedGISTEmbedLoss
widget:
  - source_sentence: what is the common lifespan of a star
    sentences:
      - >-
        Mites can leave bites that look like they came from bed bugs (see these
        pictures of bed bug bites), but not all mites are the same, so let me
        quickly explain. In fact, there are almost 46,000 species of mites, but
        only a few bite humans! They are the Northern Fowl Mite, Tropical Rat
        Mite, and Itch or Scabies Mite.
      - >-
        Cost of Cardiac Catheterization Procedures Any type of cardiac care in
        the United States is growing increasingly pricey. A cardiac
        catheterization procedure, depending on location, may range in price
        between $2,400 and $4,000 in the United States.
      - "Lifespans for main sequence stars have a vast range. Whilst our Sun will spend 10 billion years on the main sequence, a high-mass, ten solar-mass (10 M Sun) star will only last 20 million years (2.0รƒ\x97 10 7 years) on the main sequence.A star with a only half the mass of Sun can spend 80 billion years on the main sequence.tars are composed almost entirely of hydrogen and helium. A star such as our Sun is about 73% hydrogen by mass and 25% helium. If determined by number of nuclei then it is 92% hydrogen and 7.8% helium. The remaining 2% by mass or 0.2% by number is all the heavier elements."
  - source_sentence: >-
      More than 169 countries had reported over 212,000 COVID-19 cases before
      March 19 , 2020 .
    sentences:
      - >-
        As of 23 March , more than 341,000 cases of COVID-19 have been reported
        in 192 countries and territories , resulting in more than 14,700 deaths
        and 99,000 recoveries .
      - >-
        As of 21 March , more than 278,000 cases of COVID-19 have been reported
        in over 186 countries and territories , resulting in more than 11,500
        deaths and 92,000 recoveries.  virus seems to mostly spread between
        people via respiratory droplets .
      - >-
        As of 18 March 2020 , more than 212,000 cases of COVID-19 have been
        reported in at least 170 countries and territories , with major
        outbreaks in China , Iran and the European Union .
  - source_sentence: >-
      The memory walk saw participants gather at Bents Park in South Shields on
      Saturday and travel 7km (4.3miles) along a coastal route.

      The Alzheimer's Society said it was the biggest event of its kind it had
      staged in the north-east of England.

      Organisers said the number of participants almost doubled that of last
      year's event.

      About 35,000 people in the region have dementia, according to the charity.
    sentences:
      - >-
        More than 4,500 people have taken part in a charity event raising funds
        for the fight against Alzheimer's disease.
      - >-
        Gareth Southgate should be appointed England manager "as soon as
        possible" and be given the same contract as predecessor Sam Allardyce,
        says former Three Lions defender Danny Mills.
      - >-
        The owners of a Gwynedd skip hire business have been jailed for
        illegally storing waste.
  - source_sentence: >-
      Electrical energy can be converted into kinetic energy and heat energy by
      an electric motor.
    sentences:
      - >-
        Solution is the term for a homogeneous mixture of two or more
        substances.
      - >-
        Solution is the term for a homogeneous mixture of two or more
        substances.
      - Electric motors transform electrical energy into kinetic energy.
  - source_sentence: who did ben assault in home and away
    sentences:
      - >-
        List of Home and Away characters (2017) Ben and Maggie learn that Ziggy
        has been dating Brody in secret and they disapprove of the relationship.
        Ziggy leaves the house and Ben tells her not to come back. He apologises
        to her the next day, but does not accept her relationship with Brody, so
        Ziggy refuses to come home. Brody later breaks up with her. Ben begins
        making surf boards to sell at the pier. Ben finds Coco convulsing in the
        garden and he and Maggie learn she has bulimia. Ziggy later leaves home.
        Days later, Ben sees her with Brody, who is attempting to bring her
        home, and punches him in the face. Olivia Fraser Richards (Raechelle
        Banno) tells Sergeant Phillip McCarthy (Nicholas Cassim) and Ben is
        arrested. McCarthy and Kat Chapman (Pia Miller) learns he has a criminal
        record for assaulting his brother. Ben insults Kat, which leads to him
        being charged. Maggie secures a loan to get him out on bail. Maggie's
        mother, Diana (Sarah Chadwick) came to Summer Bay to visit the family
        and Diana told Ben and Maggie that she is the one who bailed Ben out of
        jail.
      - >-
        Stone (unit) The name "stone" derives from the use of stones for
        weights, a practice that dates back into antiquity. The Biblical law
        against the carrying of "diverse weights, a large and a small"[7] is
        more literally translated as "you shall not carry a stone and a stone
        (ืื‘ืŸ ื•ืื‘ืŸ), a large and a small". There was no standardised "stone" in
        the ancient Jewish world,[8] but in Roman times stone weights were
        crafted to multiples of the Roman pound.[9] Such weights varied in
        quality: the Yale Medical Library holds 10 and 50-pound examples of
        polished serpentine,[10] while a 40-pound example at the Eschborn Museum
        (see right) is made of sandstone.[11]
      - >-
        Bad Things (Machine Gun Kelly and Camila Cabello song) "Bad Things" is a
        song by American rapper Machine Gun Kelly and Cuban-American singer
        Camila Cabello. The song was released on October 14, 2016 and was
        produced by The Futuristics. Its music video was directed by Hannah Lux
        Davis and premiered on December 1, 2016. The song features an
        interpolation of Fastball's 1999 single "Out of My Head". The single
        peaked at number four on the US Billboard Hot 100.
model-index:
  - name: SentenceTransformer based on microsoft/deberta-v3-small
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.8404451477820003
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8859238616569335
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.8739146310077538
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.8785835525192047
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.873696402540065
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.8767156244780591
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.8376741383364052
            name: Pearson Dot
          - type: spearman_dot
            value: 0.8626544264654313
            name: Spearman Dot
          - type: pearson_max
            value: 0.8739146310077538
            name: Pearson Max
          - type: spearman_max
            value: 0.8859238616569335
            name: Spearman Max
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: NLI v2
          type: NLI-v2
        metrics:
          - type: cosine_accuracy
            value: 1
            name: Cosine Accuracy
          - type: dot_accuracy
            value: 0
            name: Dot Accuracy
          - type: manhattan_accuracy
            value: 1
            name: Manhattan Accuracy
          - type: euclidean_accuracy
            value: 1
            name: Euclidean Accuracy
          - type: max_accuracy
            value: 1
            name: Max Accuracy
      - task:
          type: binary-classification
          name: Binary Classification
        dataset:
          name: VitaminC
          type: VitaminC
        metrics:
          - type: cosine_accuracy
            value: 0.578125
            name: Cosine Accuracy
          - type: cosine_accuracy_threshold
            value: 0.8052636384963989
            name: Cosine Accuracy Threshold
          - type: cosine_f1
            value: 0.6577540106951871
            name: Cosine F1
          - type: cosine_f1_threshold
            value: 0.3108493387699127
            name: Cosine F1 Threshold
          - type: cosine_precision
            value: 0.4900398406374502
            name: Cosine Precision
          - type: cosine_recall
            value: 1
            name: Cosine Recall
          - type: cosine_ap
            value: 0.5479388360307975
            name: Cosine Ap
          - type: dot_accuracy
            value: 0.58203125
            name: Dot Accuracy
          - type: dot_accuracy_threshold
            value: 318.633056640625
            name: Dot Accuracy Threshold
          - type: dot_f1
            value: 0.6577540106951871
            name: Dot F1
          - type: dot_f1_threshold
            value: 125.5129165649414
            name: Dot F1 Threshold
          - type: dot_precision
            value: 0.4900398406374502
            name: Dot Precision
          - type: dot_recall
            value: 1
            name: Dot Recall
          - type: dot_ap
            value: 0.5533499611019033
            name: Dot Ap
          - type: manhattan_accuracy
            value: 0.578125
            name: Manhattan Accuracy
          - type: manhattan_accuracy_threshold
            value: 266.60528564453125
            name: Manhattan Accuracy Threshold
          - type: manhattan_f1
            value: 0.6559999999999999
            name: Manhattan F1
          - type: manhattan_f1_threshold
            value: 512.4686279296875
            name: Manhattan F1 Threshold
          - type: manhattan_precision
            value: 0.4880952380952381
            name: Manhattan Precision
          - type: manhattan_recall
            value: 1
            name: Manhattan Recall
          - type: manhattan_ap
            value: 0.5411403083150335
            name: Manhattan Ap
          - type: euclidean_accuracy
            value: 0.58203125
            name: Euclidean Accuracy
          - type: euclidean_accuracy_threshold
            value: 12.9645357131958
            name: Euclidean Accuracy Threshold
          - type: euclidean_f1
            value: 0.6559999999999999
            name: Euclidean F1
          - type: euclidean_f1_threshold
            value: 23.908817291259766
            name: Euclidean F1 Threshold
          - type: euclidean_precision
            value: 0.4880952380952381
            name: Euclidean Precision
          - type: euclidean_recall
            value: 1
            name: Euclidean Recall
          - type: euclidean_ap
            value: 0.541753017593475
            name: Euclidean Ap
          - type: max_accuracy
            value: 0.58203125
            name: Max Accuracy
          - type: max_accuracy_threshold
            value: 318.633056640625
            name: Max Accuracy Threshold
          - type: max_f1
            value: 0.6577540106951871
            name: Max F1
          - type: max_f1_threshold
            value: 512.4686279296875
            name: Max F1 Threshold
          - type: max_precision
            value: 0.4900398406374502
            name: Max Precision
          - type: max_recall
            value: 1
            name: Max Recall
          - type: max_ap
            value: 0.5533499611019033
            name: Max Ap

SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small on the negation-triplets, vitaminc-pairs, scitail-pairs-qa, scitail-pairs-pos, xsum-pairs, sciq_pairs, qasc_pairs, openbookqa_pairs, msmarco_pairs, nq_pairs, trivia_pairs, gooaq_pairs and paws-pos datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("bobox/DeBERTa-small-ST-v1-toytest-checkpoints-tmp")
# Run inference
sentences = [
    'who did ben assault in home and away',
    "List of Home and Away characters (2017) Ben and Maggie learn that Ziggy has been dating Brody in secret and they disapprove of the relationship. Ziggy leaves the house and Ben tells her not to come back. He apologises to her the next day, but does not accept her relationship with Brody, so Ziggy refuses to come home. Brody later breaks up with her. Ben begins making surf boards to sell at the pier. Ben finds Coco convulsing in the garden and he and Maggie learn she has bulimia. Ziggy later leaves home. Days later, Ben sees her with Brody, who is attempting to bring her home, and punches him in the face. Olivia Fraser Richards (Raechelle Banno) tells Sergeant Phillip McCarthy (Nicholas Cassim) and Ben is arrested. McCarthy and Kat Chapman (Pia Miller) learns he has a criminal record for assaulting his brother. Ben insults Kat, which leads to him being charged. Maggie secures a loan to get him out on bail. Maggie's mother, Diana (Sarah Chadwick) came to Summer Bay to visit the family and Diana told Ben and Maggie that she is the one who bailed Ben out of jail.",
    'Bad Things (Machine Gun Kelly and Camila Cabello song) "Bad Things" is a song by American rapper Machine Gun Kelly and Cuban-American singer Camila Cabello. The song was released on October 14, 2016 and was produced by The Futuristics. Its music video was directed by Hannah Lux Davis and premiered on December 1, 2016. The song features an interpolation of Fastball\'s 1999 single "Out of My Head". The single peaked at number four on the US Billboard Hot 100.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.8404
spearman_cosine 0.8859
pearson_manhattan 0.8739
spearman_manhattan 0.8786
pearson_euclidean 0.8737
spearman_euclidean 0.8767
pearson_dot 0.8377
spearman_dot 0.8627
pearson_max 0.8739
spearman_max 0.8859

Triplet

Metric Value
cosine_accuracy 1.0
dot_accuracy 0.0
manhattan_accuracy 1.0
euclidean_accuracy 1.0
max_accuracy 1.0

Binary Classification

Metric Value
cosine_accuracy 0.5781
cosine_accuracy_threshold 0.8053
cosine_f1 0.6578
cosine_f1_threshold 0.3108
cosine_precision 0.49
cosine_recall 1.0
cosine_ap 0.5479
dot_accuracy 0.582
dot_accuracy_threshold 318.6331
dot_f1 0.6578
dot_f1_threshold 125.5129
dot_precision 0.49
dot_recall 1.0
dot_ap 0.5533
manhattan_accuracy 0.5781
manhattan_accuracy_threshold 266.6053
manhattan_f1 0.656
manhattan_f1_threshold 512.4686
manhattan_precision 0.4881
manhattan_recall 1.0
manhattan_ap 0.5411
euclidean_accuracy 0.582
euclidean_accuracy_threshold 12.9645
euclidean_f1 0.656
euclidean_f1_threshold 23.9088
euclidean_precision 0.4881
euclidean_recall 1.0
euclidean_ap 0.5418
max_accuracy 0.582
max_accuracy_threshold 318.6331
max_f1 0.6578
max_f1_threshold 512.4686
max_precision 0.49
max_recall 1.0
max_ap 0.5533

Training Details

Training Datasets

negation-triplets

  • Dataset: negation-triplets
  • Size: 26,000 training samples
  • Columns: anchor, entailment, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor entailment negative
    type string string string
    details
    • min: 5 tokens
    • mean: 22.32 tokens
    • max: 124 tokens
    • min: 4 tokens
    • mean: 14.05 tokens
    • max: 42 tokens
    • min: 4 tokens
    • mean: 14.36 tokens
    • max: 42 tokens
  • Samples:
    anchor entailment negative
    Braร… ov is part of the Transylvania area . Like many other cities in Transylvania , Braร… ov is also home for a significant ethnic Hungarian minority . Like many other cities in Transylvania, Braร… ov is also home for a significant ethnic Romanian majority.
    If some of the principles of supersymmetry are correct , it is possible to recreate these superparticles with particle accelerators . This attempt could prove or disprove the ideas of supersymmetry . It is possible to have more than one kind of supersymmetry transformation . It is impossible to find even one kind of supersymmetry transformation.
    A group of people running a bicycle race past a red building. A bunch of people run past a building A few people stay still near a building
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

vitaminc-pairs

  • Dataset: vitaminc-pairs at be6febb
  • Size: 24,000 training samples
  • Columns: claim and evidence
  • Approximate statistics based on the first 1000 samples:
    claim evidence
    type string string
    details
    • min: 7 tokens
    • mean: 17.43 tokens
    • max: 67 tokens
    • min: 8 tokens
    • mean: 37.41 tokens
    • max: 366 tokens
  • Samples:
    claim evidence
    By March 2016 , Baby was above the 8th most viewed YouTube video . On March 5 , 2014 , Baby '' became the second video , after Gangnam Style '' , to receive 1 billion views on YouTube , and is the ninth most viewed video on the site , with over 1.33 billion views as of March 2016 .
    The movie Think Like A Man had a rating of less than 50 % on Metacritic . Early reviews for the film were mixed , the film currently holds a 47 % on Metacritic , indicating `` mixed or average reviews '' .
    Animal consumption at the Huanan Seafood Market is suspected to be where the severe acute respiratory syndrome coronavirus 2 originated . Animals sold for food are suspected to be the reservoir because many of first identified infected individuals were workers at the Huanan Seafood Market .
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

scitail-pairs-qa

  • Dataset: scitail-pairs-qa at 0cc4353
  • Size: 14,237 training samples
  • Columns: sentence2 and sentence1
  • Approximate statistics based on the first 1000 samples:
    sentence2 sentence1
    type string string
    details
    • min: 7 tokens
    • mean: 16.12 tokens
    • max: 41 tokens
    • min: 7 tokens
    • mean: 15.23 tokens
    • max: 41 tokens
  • Samples:
    sentence2 sentence1
    Instruments that measure the angle of the slope of a volcano are called tilt meters. Instruments that measure the angle of the slope of a volcano are called what?
    Ultrasound, a diagnostic technology, uses high-frequency vibrations transmitted into any tissue in contact with the transducer. What diagnostic technology uses high-frequency vibrations transmitted into any tissue in contact with the transducer?
    Many species of birds in new england fly south for the winter months to find an environment with more food. Which of the following best explains why many species of birds in New England fly south for the winter months?
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

scitail-pairs-pos

  • Dataset: scitail-pairs-pos at 0cc4353
  • Size: 8,600 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 8 tokens
    • mean: 23.36 tokens
    • max: 74 tokens
    • min: 7 tokens
    • mean: 15.79 tokens
    • max: 41 tokens
  • Samples:
    sentence1 sentence2
    Anyway, what makes it possible for insects to walk on water is called Surface tension . Surface tension is responsible for the fact that small insects can walk on water.
    Elastic potential energy is the potential energy of an elastic object (for example a bow or a catapult) that is deformed under tension or compression (or stressed in formal terminology). The term elastic potential energy is used to describe potential energy due to an objectโ€™s shape.
    But, season or not, tornadoes can occur at any time of the year, if the weather conditions are right. Tornadoes can occur in any.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

xsum-pairs

  • Dataset: xsum-pairs
  • Size: 24,000 training samples
  • Columns: document and summary
  • Approximate statistics based on the first 1000 samples:
    document summary
    type string string
    details
    • min: 45 tokens
    • mean: 255.53 tokens
    • max: 487 tokens
    • min: 8 tokens
    • mean: 25.67 tokens
    • max: 42 tokens
  • Samples:
    document summary
    Jean Galligan, 82, from Dumfries, died when her car caught fire after it was involved in a collision with a Vauxhall Zafira on 14 May.
    Police Scotland said Mrs Galligan was driving a red Daihatsu which was burnt out as a result of the accident on the A76 at Holywood.
    Neither the driver nor the front seat passenger in the Zafira were injured.
    A woman killed in a road accident near Dumfries has been named by police.
    Police officers carried out arrests on Thursday in connection with alleged sex offences against females which occurred between 2008 and 2015.
    Six men were charged with the rape of a girl under 16 as well as other sexual offences, while a seventh man was charged with conspiracy to rape.
    Six men - all from Oxford - will appear before Oxford magistrates.
    They are: Shabir Dogar, 22; Shabaz Khan, 23; Shohab Dogar, 23; Yasin Hamid, 20; Usman Iddris, 22; and Joseph Suraina, 22.
    Waqas Hussain, 24, of no fixed abode, will appear at Oxford Magistrates' Court on 4 April.
    Mr Hussain has also been charged with the attempted sexual assault of a girl under 13, as have Shabir Dogar and Shohab Dogar.
    The raids were part of what the police are calling Operation Nautical.
    A further 10 men were also arrested on Wednesday as part of the same operation.
    Seven men have been charged in connection with a major child sexual exploitation investigation in Oxford.
    In February 1957, 11-year-old Moira Anderson left her grandmother's house in Coatbridge to go to the shops but never returned.
    Bus driver and convicted paedophile Alexander Gartshore, who died in 2006, is suspected of her murder.
    Police are now looking at an area of Monkland Canal in an attempt to find her remains.
    Moira Anderson was last seen on 23 February 1957 when she left on an errand during a heavy snowstorm, and boarded a Baxter's bus that was driven by Gartshore.
    Later that year, he was jailed for raping a 17-year-old babysitter.
    In 1999, convicted child abuser James Gallogley named his former friend Gartshore as Moira's murderer.
    Gartshore's own daughter Sandra Brown was convinced he was the killer and campaigned to have him charged.
    In 2014 prosecutors took the unusual step of announcing that Gartshore would have faced prosecution for the schoolgirl's murder if he were still alive.
    In 1957 a witness reported seeing a tall man carrying a large, heavy sack towards the canal the morning after Moira disappeared but the possible sighting was not followed up.
    Four years ago a grave in Old Monkland Cemetery in Coatbridge was exhumed as part of the search but no evidence was found that Moira was buried there.
    A new search to find a schoolgirl who disappeared 60 years in North Lanarkshire has begun.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

sciq_pairs

  • Dataset: sciq_pairs at 2c94ad3
  • Size: 11,095 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 7 tokens
    • mean: 16.67 tokens
    • max: 60 tokens
    • min: 2 tokens
    • mean: 82.57 tokens
    • max: 512 tokens
  • Samples:
    sentence1 sentence2
    The punnett square shows the possible what, and their most likely ratios? If the parents had four offspring, their most likely genotypes would be one BB, two Bb, and one bb. But the genotype ratios of their actual offspring may differ. That's because which gametes happen to unite is a matter of chance, like a coin toss. The Punnett square just shows the possible genotypes and their most likely ratios.
    Which hormones work together to control the level of glucose in the blood? The pancreas is located near the stomach. Its hormones include insulin and glucagon. These two hormones work together to control the level of glucose in the blood. Insulin causes excess blood glucose to be taken up by the liver, which stores the glucose as glycogen. Glucagon stimulates the liver to break down glycogen into glucose and release it back into the blood. The pancreas also secretes digestive enzymes into the digestive tract.
    What is theโ€œpacketโ€ of energy called that the nucleus emits during gamma decay? Gamma rays are produced during gamma decay of an excited nucleus. During gamma decay, the nucleus emits a โ€œpacketโ€ of energy called a gamma particle.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

qasc_pairs

  • Dataset: qasc_pairs at a34ba20
  • Size: 7,727 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 5 tokens
    • mean: 11.36 tokens
    • max: 22 tokens
    • min: 14 tokens
    • mean: 34.66 tokens
    • max: 66 tokens
  • Samples:
    sentence1 sentence2
    Orbiting spacecraft reentering the Earth's atmosphere friction causes an object to lose energy. Dear Ashlee, The heat in the reentry phase is due to friction between the spacecraft and the air.. Spacecraft that are reentering the atmosphere lose energy
    The chance of you developing cancer depends most on your Cancer genes can be inherited.. Genes are inherited from parents.. Developing cancer can depend on your parents
    Why does a snake look for shelter in the winter? shelter is used for protection by animals against weather. Many snakes seek shelter from the winter weather by holding up in dens.. Snakes use shelter to protect themselves in the winter
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

openbookqa_pairs

  • Dataset: openbookqa_pairs
  • Size: 4,522 training samples
  • Columns: question and fact
  • Approximate statistics based on the first 1000 samples:
    question fact
    type string string
    details
    • min: 3 tokens
    • mean: 13.8 tokens
    • max: 78 tokens
    • min: 4 tokens
    • mean: 11.5 tokens
    • max: 30 tokens
  • Samples:
    question fact
    What is animal competition? if two animals eat the same prey then those animals compete for that pey
    If you wanted to make a metal bed frame, where would you start? alloys are made of two or more metals
    Places lacking warmth have few what cold environments contain few organisms
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

msmarco_pairs

  • Dataset: msmarco_pairs at 28ff31e
  • Size: 22,000 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 4 tokens
    • mean: 8.75 tokens
    • max: 38 tokens
    • min: 14 tokens
    • mean: 76.85 tokens
    • max: 201 tokens
  • Samples:
    sentence1 sentence2
    what is hydrolysis in digestion Digestion and Hydrolysis The digestion process relies upon hydrolysis to render the biochemical reactions that break down food. The digestive tract secretes enzymes, such as proteases, carbohydrases, nucleases and lipases that, along with water, catalyze the hydrolysis that releases various nutrients.
    is cartier a good watch CARTIER WATCHES. AuthenticWatches.com is one of the largest Internet Dealers for authentic Cartier watches. Cartier watches have no equal with respect to elegance and luxury. The name Cartier is synonymous with exquisite luxury and quality. Founded in 1847 by Louis-Fran ois Cartier, Cartier has led the industry in jewelry and watches alike. Cartier watches boast a large variety of design and functionality, yet maintain the utmost quality and sophistication in every series.
    what vitamin is a precursor for a neurotransmitter Tryptophan is an essential amino acid which is the precursor of serotonin. Serotonin is a brain neurotransmitter, platelet clotting factor and neurohormone found in organs throughout the body. Metabolism of tryptophan to serotonin requires nutrients such as vitamin B6, niacin and glutathione.ower doses, as little as 1000 to 2000 mg, have been found to be effective clinically, as well as experimentally in animals. The minimum daily requirement for adults of tyrosine and its precursor, phenylalanine, is 16 mg/kg a day or about 1000 mg total. Hence, 6 g is at least six times the minimum daily requirement.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

nq_pairs

  • Dataset: nq_pairs at f9e894e
  • Size: 22,000 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 10 tokens
    • mean: 11.92 tokens
    • max: 27 tokens
    • min: 15 tokens
    • mean: 132.65 tokens
    • max: 512 tokens
  • Samples:
    sentence1 sentence2
    friends episode with the turkey on the head The One with All the Thanksgivings "The One with All the Thanksgivings" (also known as "The One with the Thanksgiving Flashbacks"[2]) is the eighth episode of the fifth season of Friends. It first aired on the NBC network in the United States on November 19, 1998. In the episode, the main characters spend Thanksgiving at Monica's (Courteney Cox) apartment and begin telling stories about their worst Thanksgivings: Chandler (Matthew Perry) learning of his parents' divorce, Phoebe (Lisa Kudrow) losing arms in past lives and Joey (Matt LeBlanc) having his head stuck in a turkey. Rachel (Jennifer Aniston) reveals Monica's worst Thanksgivingรขโ‚ฌโ€accidentally cutting off Chandler's toe after he called her "fat" in their first encounter. When Monica begs Chandler to forgive her, he accidentally reveals that he loves her.
    who played the first buford pusser in walking tall Walking Tall (1973 film) Buford Pusser (Joe Don Baker), at his wife Pauline's (Elizabeth Hartman) behest, retires from the professional wrestling ring and moves back to Tennessee to start a logging business with his father, Carl Pusser (Noah Beery, Jr.).
    when did the us let go of the philippines History of the Philippines (1898โ€“1946) The history of the Philippines from 1898 to 1946 covers the period of American rule in the Philippines and began with the outbreak of the Spanishโ€“American War in April 1898, when the Philippines was still part of the Spanish East Indies, and concluded when the United States formally recognised the independence of the Republic of the Philippines on July 4, 1946.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

trivia_pairs

  • Dataset: trivia_pairs at a7c36e3
  • Size: 20,000 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 8 tokens
    • mean: 18.89 tokens
    • max: 60 tokens
    • min: 15 tokens
    • mean: 456.48 tokens
    • max: 512 tokens
  • Samples:
    sentence1 sentence2
    With the symbol Wb what is the unit of magnetic flux? schoolphysics ::Welcome:: HOME > AGE 16 - 19 > ELECTRICITY AND MAGNETISM > ELECTROMAGNETISM > FLUX AND FLUX DENSITY Flux and flux density To understand the meaning of magnetic flux (ฮฆ) and magnetic flux density (B) think first about an ordinary bar magnet. Around the magnet there is a magnetic field and this gives a ๏ฟฝflow of magnetic energy๏ฟฝ around the magnet. It is this flow of energy that we call magnetic flux (ฮฆ). We think of magnetic flux as flowing from the north pole of a magnet round to its south pole as shown by the arrows on the lines in the diagram. Looking at the diagram you should see that there is as much flux flowing ๏ฟฝfrom the north pole๏ฟฝ as there is ๏ฟฝflowing into the south pole๏ฟฝ. Magnetic flux is given the symbol ฮฆ and is measured in units called Webers (Wb). However the amount of magnetic flux flowing through a given area will change from one point to another around the magnet and you can understand this by thinking about a loop of wire placed in the field at two different points (A and B). You can see that in position B there are a smaller number of magnetic field lines passing through the loop than there is when it is in position A. We call the amount of flux passing through a unit area at right angles to the magnetic field lines the flux density (B) at that point. Flux density is measured in Tesla (T) where 1 T = 1 Wbm-2 So: Flux (ฮฆ) = Flux density (B) x area through which flux passes (A)    ฮฆ = BA If we now use more than one loop of wire, in others words a coil of N turns as shown in position C the flux flowing through the N turns is simply N times that flowing through the single loop. The quantity Nฮฆ is called the the flux linkage for the coil at that point. Therefore:
    The informal term for a gangster, especially belonging to the Mafia is? goodfella - definition of goodfella in English
    What did the band S Club 7 change their name to when Paul Cattermole left in June 2002? BBC News
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

gooaq_pairs

  • Dataset: gooaq_pairs at b089f72
  • Size: 20,000 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 8 tokens
    • mean: 11.52 tokens
    • max: 19 tokens
    • min: 16 tokens
    • mean: 57.68 tokens
    • max: 121 tokens
  • Samples:
    sentence1 sentence2
    gyan is called in english? Gyan (Sanskrit), a Sanskrit word that roughly translates to 'knowledge' in English.
    are mud baths good for dogs? Mud has many benefits for your dog. It can soothe irritations by removing dead irritated skin. It can soothe hot spots. The mud applied to your dogs coat during the bath can help moisturize the skin and remove dandruff.
    how many calories do you burn doing interval running? Cost in Calories exerciser who performs sprint intervals at a speed of 12 mph for 20 minutes expends roughly 608 calories. To demonstrate how body weight affects calorie expenditure, a 250-lb. person burns approximately 845 calories using the identical sprint speed and workout duration.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

paws-pos

  • Dataset: paws-pos at 161ece9
  • Size: 21,829 training samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 9 tokens
    • mean: 25.38 tokens
    • max: 51 tokens
    • min: 9 tokens
    • mean: 25.41 tokens
    • max: 50 tokens
  • Samples:
    sentence1 sentence2
    After some protests from the girl 's father , the man of Douji and Hime was killed before Suzu and her father 's corpse were consumed by the Orochi . After some protesting from the girl 's father , the man was killed by Douji and Hime before Suzu and her father 's corpse were consumed by the Orochi .
    162 . Fighter Escadrille was a unit of the ลรณdลบ Army at the start of the Second World War . The unit was attached to the Polish Air Force . At the beginning of the Second World War , the Fighter Escadrille was a unit of the ลรณdลบ army , which was attached to the Polish Air Force .
    The first music video for the album filmed and edited for the song 'Trust You ' was made by Pierre Bouvier-Patron of Friends Studio , London . The first music video for the album , filmed and edited for the song 'Trust You ' , was made by Pierre Bouvier-Patron of Friends Studio , London .
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

Evaluation Datasets

vitaminc-pairs

  • Dataset: vitaminc-pairs at be6febb
  • Size: 108 evaluation samples
  • Columns: claim and evidence
  • Approximate statistics based on the first 1000 samples:
    claim evidence
    type string string
    details
    • min: 9 tokens
    • mean: 21.36 tokens
    • max: 41 tokens
    • min: 11 tokens
    • mean: 36.11 tokens
    • max: 79 tokens
  • Samples:
    claim evidence
    Dragon Con had over 5000 guests . Among the more than 6000 guests and musical performers at the 2009 convention were such notables as Patrick Stewart , William Shatner , Leonard Nimoy , Terry Gilliam , Bruce Boxleitner , James Marsters , and Mary McDonnell .
    COVID-19 has reached more than 185 countries . As of , more than cases of COVID-19 have been reported in more than 190 countries and 200 territories , resulting in more than deaths .
    In March , Italy had 3.6x times more cases of coronavirus than China . As of 12 March , among nations with at least one million citizens , Italy has the world 's highest per capita rate of positive coronavirus cases at 206.1 cases per million people ( 3.6x times the rate of China ) and is the country with the second-highest number of positive cases as well as of deaths in the world , after China .
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

negation-triplets

  • Dataset: negation-triplets
  • Size: 64 evaluation samples
  • Columns: anchor, entailment, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor entailment negative
    type string string string
    details
    • min: 10 tokens
    • mean: 13.88 tokens
    • max: 18 tokens
    • min: 10 tokens
    • mean: 13.31 tokens
    • max: 21 tokens
    • min: 10 tokens
    • mean: 13.64 tokens
    • max: 22 tokens
  • Samples:
    anchor entailment negative
    1 military jet fighter flying in formation alongside a 1 military propeller pilot. The two planes are different in design, but flying in a similar flight pattern. The two planes are identical in design, but flying in different flight patterns.
    A random plane in the sky flying alone An airplane flying high in the blue sky. A helicopter flying low in the cloudy sky.
    A picture of a white gas range with figurines above. a white stove turned off with a digital clock a black stove turned on with a digital clock
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

scitail-pairs-pos

  • Dataset: scitail-pairs-pos at 0cc4353
  • Size: 54 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 9 tokens
    • mean: 20.81 tokens
    • max: 45 tokens
    • min: 10 tokens
    • mean: 15.48 tokens
    • max: 23 tokens
  • Samples:
    sentence1 sentence2
    humans normally have 23 pairs of chromosomes. Humans typically have 23 pairs pairs of chromosomes.
    A solution is a homogenous mixture of two or more substances that exist in a single phase. Solution is the term for a homogeneous mixture of two or more substances.
    Upwelling The physical process in near-shore ocean systems of rising of nutrients and colder bottom waters to the surface because of constant wind patterns along the shoreline. Upwelling is the term for when deep ocean water rises to the surface.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

xsum-pairs

  • Dataset: xsum-pairs
  • Size: 128 evaluation samples
  • Columns: document and summary
  • Approximate statistics based on the first 1000 samples:
    document summary
    type string string
    details
    • min: 74 tokens
    • mean: 242.33 tokens
    • max: 374 tokens
    • min: 12 tokens
    • mean: 25.18 tokens
    • max: 38 tokens
  • Samples:
    document summary
    The region is already struggling to cope with a huge influx of migrants arriving from Tunisia.
    Since January, at least 15,000 migrants have arrived, many of them landing on the tiny island of Lampedusa which is struggling to cope.
    Thousands of people are living in basic camps on the island, leading to health concerns and rising local tensions.
    "Until now the only migrants to arrive in Lampedusa were Tunisians," said Laura Boldrini a spokeswoman for the UN's refugee agency.
    "This is the first boat coming from Libya with people fleeing the military escalation, the vendettas and the retaliation attacks," she said.
    Overnight on Saturday, a boat carrying some 300 migrants was escorted by the Italian coastguard to Linosa, an even smaller island some 50 km (35 miles) north of Lampedusa.
    The passengers were mostly Somalis, Eritreans and Ethiopians and included a woman who had just given birth - she and the baby were flown to Lampedusa for medical care.
    Several other boats from Libya, each carrying hundreds more migrants, are expected to reach Italy within hours.
    Officials on Lampedusa, which is less than 160km from the Tunisian coast, have moved thousands of migrants to reception centres on the mainland, but some 5,000 remain.
    The island's mayor has said he is desperate for help to relieve pressure on the island's very limited resources. Local people have said they are afraid of an outbreak of disease in the camps.
    The Italian government has appealed to the international community for help.
    Boatloads of migrants fleeing fighting in Libya are beginning to arrive in southern Italy, say officials.
    The Belgium midfielder was one of a handful of players Mourinho had deemed to have underperformed this season.
    Mourinho has also been under scrutiny, with the champions 15th in the league, 14 points behind leaders Leicester.
    "I don't have a problem with him. We hope we can win a lot of trophies together," said Hazard, 24.
    "Maybe not this season because it will be difficult, but next season and on."
    The Blues moved above Norwich into 15th with a 1-0 win over the Canaries on Saturday - it was only their fourth victory in 13 league games this season.
    Hazard, who was recently linked with a move to Real Madrid, has yet to score for the Blues in 18 appearances in all competitions in 2015-16.
    He managed 20 last season en route to winning the Professional Footballers' Association and Football Writers' Association player of the year awards.
    On his own form, he said: "I didn't start the season well. I tried to find out why, but I don't know.
    "Sometimes you don't know. You have to keep going. I gave everything in training, on the pitch when I played.
    "I hope I can get a lot of form and try to help the team win games."
    Eden Hazard has denied having a strained relationship with Chelsea boss Jose Mourinho and suggested he wants to stay at the Premier League champions.
    Amadou Gallo Fall, the NBA's vice-president for Africa, told the BBC the centre would train boys and girls aged between 16 and 18.
    He said the centre would be part of its global network of elite training academies.
    Several Africans have played for top teams in the NBA league.
    Mr Fall, who is originally from Senegal, said the pan-African academy would use its network to scout for players from around the continent.
    He said the players would be given access to facilities and resources available to elite players including nutritionists, personal coaches and physiotherapists.
    The centre will be in Thies, 60km (40 miles) east of the capital, Dakar.
    Senegal's national teams - men and women - have traditionally been among the strongest in Africa.
    Mr Fall said those who don't make it to the NBA would have other avenues, such as "other great leagues around the world, including the NBA development league or in US universities".
    He said there were 14 African-born players on the NBA opening roster this year, including Senegal's Gorgui Dieng and Cameroon's Pascal Siakam.
    He added the NBA had a long association with the continent, citing legendary players such as Hakeem Olajuwon, Manute Bol and Dikembe Mutombo.
    "That generation has paved the way and they've inspired and ushered a significant number of other young players over the years, a lot of them from Senegal," he said.
    The NBA launched three academy centres in China in October, one in India last month and is planning to open another global centre based in Australia.
    The NBA held its first game in Africa in 1 August 2015 in the South African city of Johannesburg.
    The US National Basketball Association (NBA) has announced it will open its first African training academy in Senegal early next year.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

sciq_pairs

  • Dataset: sciq_pairs at 2c94ad3
  • Size: 128 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 8 tokens
    • mean: 16.24 tokens
    • max: 37 tokens
    • min: 2 tokens
    • mean: 71.48 tokens
    • max: 375 tokens
  • Samples:
    sentence1 sentence2
    Water molds mostly live in water or moist? Define physical change, and give examples of physical change.
    By allowing blood levels of a hormone to be regulated within a narrow range, feedback loops contribute to maintaining what state? Role of Feedback Loops The contribution of feedback loops to homeostasis will only be briefly reviewed here. Positive feedback loops are characterized by the release of additional hormone in response to an original hormone release. The release of oxytocin during childbirth is a positive feedback loop. The initial release of oxytocin begins to signal the uterine muscles to contract, which pushes the fetus toward the cervix, causing it to stretch. This, in turn, signals the pituitary gland to release more oxytocin, causing labor contractions to intensify. The release of oxytocin decreases after the birth of the child. The more common method of hormone regulation is the negative feedback loop. Negative feedback is characterized by the inhibition of further secretion of a hormone in response to adequate levels of that hormone. This allows blood levels of the hormone to be regulated within a narrow range. An example of a negative feedback loop is the release of glucocorticoid hormones from the adrenal glands, as directed by the hypothalamus and pituitary gland. As glucocorticoid concentrations in the blood rise, the hypothalamus and pituitary gland reduce their signaling to the adrenal glands to prevent additional glucocorticoid secretion (Figure 17.6).
    What changes the chemical composition of a substance and can only occur through a chemical reaction? Pure substances, such as compounds, can be separated through chemical changes. Chemical changes change the chemical composition of a substance and can only occur through a chemical reaction.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

qasc_pairs

  • Dataset: qasc_pairs at a34ba20
  • Size: 128 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 6 tokens
    • mean: 11.16 tokens
    • max: 22 tokens
    • min: 17 tokens
    • mean: 34.24 tokens
    • max: 55 tokens
  • Samples:
    sentence1 sentence2
    what code proteins? Chromosomes contain genes, which code for proteins.. Chromosomes are composed of DNA and proteins.. genes code proteins
    Furry animals grow thicker coats which has what impact on their survival? staying warm has a positive impact on an animal 's survival. Furry animals grow thicker coats to keep warm in the winter.. Furry animals grow thicker coats which has a positive impact on their survival.
    Erosion can be caused by heavy rains cause flooding. Flooding is problematic because it causes erosion problems.. heavy rains cause erosion
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

openbookqa_pairs

  • Dataset: openbookqa_pairs
  • Size: 128 evaluation samples
  • Columns: question and fact
  • Approximate statistics based on the first 1000 samples:
    question fact
    type string string
    details
    • min: 3 tokens
    • mean: 13.98 tokens
    • max: 47 tokens
    • min: 4 tokens
    • mean: 11.78 tokens
    • max: 28 tokens
  • Samples:
    question fact
    The thermal production of a stove is generically used for a stove generates heat for cooking usually
    What creates a valley? a valley is formed by a river flowing
    when it turns day and night on a planet, what cause this? a planet rotating causes cycles of day and night on that planet
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

msmarco_pairs

  • Dataset: msmarco_pairs at 28ff31e
  • Size: 128 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 4 tokens
    • mean: 8.68 tokens
    • max: 32 tokens
    • min: 21 tokens
    • mean: 72.57 tokens
    • max: 159 tokens
  • Samples:
    sentence1 sentence2
    what types of functions might you use for looking items up within your data? can you list any examples of where the formulas you chose might be useful? Tip: Use MATCH instead of one of the LOOKUP functions when you need the position of an item in a range instead of the item itself. For example, you might use the MATCH function to provide a value for the row_num argument of the INDEX function.
    ppt vehicle definition A policy purchased by vehicle owners to mitigate costs associated with getting into an auto accident. Instead of paying out of pocket for auto accidents, people pay annual premiums to an auto insurance company; the company then pays all or most of the costs associated with an auto accident or other vehicle damage.
    difference between integrated and dedicated graphics Key Difference: Dedicated and Integrated Graphics Cards are two types of graphics cards. The main difference between two is that the integrated graphics card comes built in to the computer. Whereas, the dedicated graphics card is an external attachment that must be connected to the motherboard.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

nq_pairs

  • Dataset: nq_pairs at f9e894e
  • Size: 128 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 10 tokens
    • mean: 11.65 tokens
    • max: 18 tokens
    • min: 23 tokens
    • mean: 133.51 tokens
    • max: 299 tokens
  • Samples:
    sentence1 sentence2
    when was everything that rises must converge written Everything That Rises Must Converge Everything That Rises Must Converge is a collection of short stories written by Flannery O'Connor during the final decade of her life. The collection's eponymous story derives its name from the work of Pierre Teilhard de Chardin.[1][2] The collection was published posthumously in 1965 and contains an introduction by Robert Fitzgerald. Of the volume's nine stories, seven had been printed in magazines or literary journals prior to being collected. "Judgment Day" is a dramatically reworked version of "The Geranium," which was one of O'Connor's earliest publications and appeared in her graduate thesis at the University of Iowa. "Parker's Back," the collection's only completely new story, was a last-minute addition.
    what are the creatures in the woods american horror story American Horror Story: Asylum Dr. Arden is a former Nazi whose experiments have produced "Raspers", mutated former patients, who lurk in the woods surrounding the institution, and who are fed the flesh of dead patients. Dr. Thredson is assigned to evaluate Kit, who is accused of being the infamous serial killer 'Bloody Face' and believes his wife Alma (Britne Oldford) was abducted by aliens. Thredson also tries to "reform" Lana, who was an ambitious journalist attempting to expose Briarcliff's mistreatments of patients. She was in a relationship with Wendy (Clea Duvall), who was blackmailed by Sister Jude into committing Winters, before being killed by Bloody Face. Thredson helps Lana escape from the asylum, but she learns that Thredson is actually Bloody Face, and is kept prisoner. He rapes her and tries to kill her, but she manages to escape, only to end up back at Briarcliff. She later learns she is pregnant with Thredson's baby.
    what does the pink panther movie have to do with the cartoon The Pink Panther The first film in the series derives its name from the eponymous pink diamond that has an enormous size and value. The diamond is called the "Pink Panther" because the flaw at its centre, when viewed closely, is said to resemble a leaping pink panther. The phrase reappears in the title of the fourth film The Return of the Pink Panther, in which the theft of the diamond is again the centre of the plot. The phrase was used for all the subsequent films in the series, even when the jewel did not figure in the plot. It ultimately appeared in six of the eleven films.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

trivia_pairs

  • Dataset: trivia_pairs at a7c36e3
  • Size: 128 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 9 tokens
    • mean: 19.75 tokens
    • max: 54 tokens
    • min: 58 tokens
    • mean: 452.12 tokens
    • max: 512 tokens
  • Samples:
    sentence1 sentence2
    In which country was Ursula Andrews born? Ursula Andress - Biography - IMDb Ursula Andress Biography Showing all 59 items Jump to: Overview  (3)
    What chemical element, symbol Cr, is named due its colourful/colorful compound effect? Etymologies of element names
    What religion is the Dalai Lama? BBC - Religions - Buddhism: Dalai Lama Dalai Lama Last updated 2006-09-21 The institution of the Dalai Lama is a relatively recent one. There have been only 14 Dalai Lamas in the history of Buddhism. On this page The rรดle of the Dalai Lama Potala Palace, the Dalai Lama's residence until 1959 The Dalai Lama is the head monk of Tibetan Buddhism and traditionally has been responsible for the governing of Tibet, until the Chinese government took control in 1959. Before 1959, his official residence was Potala Palace in Lhasa, the capital of Tibet. The Dalai Lama belongs to the Gelugpa tradition of Tibetan Buddhism, which is the largest and most influential tradition in Tibet. The institution of the Dalai Lama is a relatively recent one. There have been only 14 Dalai Lamas in the history of Buddhism, and the first and second Dalai Lamas were given the title posthumously. According to Buddhist belief, the current Dalai Lama is a reincarnation of a past lama who decided to be reborn again to continue his important work, instead of moving on from the wheel of life. A person who decides to be continually reborn is known as tulku. Buddhists believe that the first tulku in this reincarnation was Gedun Drub, who lived from 1391-1474 and the second was Gendun Gyatso. However, the name Dalai Lama, meaning Ocean of Wisdom, was not conferred until the third reincarnation in the form of Sonam Gyatso in 1578. The current Dalai Lama is Tenzin Gyatso. Tenzin Gyatso, 14th Dalai Lama, as a child ยฉ Choosing a Dalai Lama After the death of a Dalai Lama it has traditionally been the responsibility of the High Lamas of the Gelugpa Tradition and the Tibetan government to find his reincarnation. The High Lamas search for a boy who was born around the same time as the death of the Dalai Lama. It can take around two or three years to identify the Dalai Lama, and for the current, 14th Dalai Lama, it was four years before he was found. There are several ways in which the High Lamas might find out where the next reincarnation will be found. Dream One of the High Lamas may dream about some mark or location that will identify the boy. Smoke If the previous Dalai Lama was cremated, High Lamas will watch the direction of the smoke and search accordingly. Oracle Lake High Lamas go to a holy lake, called Lhamo Lhatso, in central Tibet and watch for a sign from the lake itself. This may be either a vision or some indication of the direction in which to search.The home and village of Tenzin Gyatso was identified in a vision from this lake. Once the High Lamas have located the home and the boy, they present a number of artefacts which they have brought with them in preparation, to the child. Amongst these artefacts are a number of items that belonged to the deceased Dalai Lama. If the boy chooses the items that belonged to the previous Dalai Lama, this is seen as a sign, in conjunction with all of the other indications, that the boy is a reincarnation. This procedure, however, as Tenzin Gyatso has said himself, is not set in stone; if two thirds of the Tibetan people wish to change the method of identifying the next reincarnation, this would be just as valid. The search for the Dalai Lama has usually been limited to Tibet, although the third tulku was born in Mongolia. However, as Tibet has been taken by the Chinese government, Tenzin Gyatso says that if he is reborn it will not be in a country run by the People's Republic of China, or any other country which is not free. In order to see this content you need to have both Javascript enabled and Flash installed. Visit BBC Webwise for full instructions Interestingly, Tenzin Gyatso has also expressed doubts over whether he will be reborn at all, suggesting the function of the Dalai Lama may be over. However, until Tibet is reunited with its spiritual leader, it seems likely that there will continue to be a Dalai Lama. Top Tenzin Gyatso, the 14th Dalai Lama Tenzin Gyatso is the fourteenth Dalai Lama of Tibetan Buddhism. He was born in 1935 and recognised as the reincarnation of Thubten Gyatso at a young age. Tenzin
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

gooaq_pairs

  • Dataset: gooaq_pairs at b089f72
  • Size: 128 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 8 tokens
    • mean: 11.23 tokens
    • max: 17 tokens
    • min: 16 tokens
    • mean: 57.14 tokens
    • max: 107 tokens
  • Samples:
    sentence1 sentence2
    who are the dc characters? ['Superman.', 'Batman. ... ', 'Flash. ... ', 'Green Lantern. ... ', 'Wonder Woman. ... ', 'Martian Manhunter. ... ', 'Aquaman. ... ', 'John Constantine. ... ']
    what restaurants are giving free food today? ['Burger King.', 'The Cheesecake Factory.', "Steak 'n Shake.", "Wendy's.", "TGI Friday's.", 'Panera.', "Moe's Southwest Grill."]
    who is paige on pretty little liars? McCullers is a character in Pretty Little Liars television series on ABC Family. She is portrayed by Lindsey Shaw. Paige is a talented swimmer and a pretty good fighter, as we see in This Is A Dark Ride. She is part of Rosewood High School's swim team.
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

paws-pos

  • Dataset: paws-pos at 161ece9
  • Size: 128 evaluation samples
  • Columns: sentence1 and sentence2
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2
    type string string
    details
    • min: 10 tokens
    • mean: 25.72 tokens
    • max: 42 tokens
    • min: 10 tokens
    • mean: 25.55 tokens
    • max: 41 tokens
  • Samples:
    sentence1 sentence2
    They were there to enjoy us and they were there to pray for us . They were there for us to enjoy and they were there for us to pray .
    After the end of the war in June 1902 , Higgins left Southampton in the `` SSBavarian '' in August , returning to Cape Town the following month . In August , after the end of the war in June 1902 , Higgins Southampton left the `` SSBavarian '' and returned to Cape Town the following month .
    From the merger of the Four Rivers Council and the Audubon Council , the Shawnee Trails Council was born . Shawnee Trails Council was formed from the merger of the Four Rivers Council and the Audubon Council .
  • Loss: CachedGISTEmbedLoss with these parameters:
    {'guide': SentenceTransformer(
      (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
      (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
      (2): Normalize()
    ), 'temperature': 0.05}
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 160
  • per_device_eval_batch_size: 64
  • gradient_accumulation_steps: 8
  • learning_rate: 4e-05
  • weight_decay: 0.0001
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 1.3333333333333335e-05}
  • warmup_ratio: 0.33
  • save_safetensors: False
  • fp16: True
  • push_to_hub: True
  • hub_model_id: bobox/DeBERTa-small-ST-v1-toytest-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 160
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 8
  • eval_accumulation_steps: None
  • learning_rate: 4e-05
  • weight_decay: 0.0001
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: cosine_with_min_lr
  • lr_scheduler_kwargs: {'num_cycles': 0.5, 'min_lr': 1.3333333333333335e-05}
  • warmup_ratio: 0.33
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: False
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: bobox/DeBERTa-small-ST-v1-toytest-checkpoints-tmp
  • hub_strategy: all_checkpoints
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss vitaminc-pairs loss trivia pairs loss xsum-pairs loss paws-pos loss sciq pairs loss msmarco pairs loss openbookqa pairs loss gooaq pairs loss nq pairs loss scitail-pairs-pos loss qasc pairs loss negation-triplets loss NLI-v2_max_accuracy VitaminC_max_ap sts-test_spearman_cosine
0.0169 3 7.2372 - - - - - - - - - - - - - - -
0.0339 6 6.855 - - - - - - - - - - - - - - -
0.0508 9 7.4707 - - - - - - - - - - - - - - -
0.0677 12 7.0187 - - - - - - - - - - - - - - -
0.0847 15 6.6756 - - - - - - - - - - - - - - -
0.1016 18 6.0155 - - - - - - - - - - - - - - -
0.1186 21 6.1644 - - - - - - - - - - - - - - -
0.1355 24 6.2158 - - - - - - - - - - - - - - -
0.1524 27 6.1369 2.6986 6.3356 6.0730 2.2308 0.3450 6.9377 4.4060 6.4060 6.7941 1.9217 3.2268 5.1429 1.0 0.5356 0.1067
0.1694 30 5.7653 - - - - - - - - - - - - - - -
0.1863 33 6.1259 - - - - - - - - - - - - - - -
0.2032 36 5.7539 - - - - - - - - - - - - - - -
0.2202 39 6.0131 - - - - - - - - - - - - - - -
0.2371 42 6.0074 - - - - - - - - - - - - - - -
0.2541 45 5.7125 - - - - - - - - - - - - - - -
0.2710 48 5.5634 - - - - - - - - - - - - - - -
0.2879 51 5.2924 - - - - - - - - - - - - - - -
0.3049 54 5.2286 2.6647 5.6474 5.2498 0.8336 0.2962 5.2464 3.8855 5.2259 5.3326 1.2414 2.5309 4.6218 1.0 0.5225 0.1969
0.3218 57 4.4811 - - - - - - - - - - - - - - -
0.3387 60 4.4239 - - - - - - - - - - - - - - -
0.3557 63 4.0273 - - - - - - - - - - - - - - -
0.3726 66 3.4508 - - - - - - - - - - - - - - -
0.3896 69 3.9702 - - - - - - - - - - - - - - -
0.4065 72 3.5295 - - - - - - - - - - - - - - -
0.4234 75 3.6395 - - - - - - - - - - - - - - -
0.4404 78 3.2398 - - - - - - - - - - - - - - -
0.4573 81 3.116 2.5044 3.1392 2.4290 0.1975 0.1526 2.9677 2.4785 2.8775 3.3587 0.2785 1.2902 3.4229 1.0 0.5306 0.4892
0.4742 84 2.6049 - - - - - - - - - - - - - - -
0.4912 87 2.7738 - - - - - - - - - - - - - - -
0.5081 90 2.5416 - - - - - - - - - - - - - - -
0.5251 93 2.3913 - - - - - - - - - - - - - - -
0.5420 96 2.3144 - - - - - - - - - - - - - - -
0.5589 99 2.1857 - - - - - - - - - - - - - - -
0.5759 102 1.8881 - - - - - - - - - - - - - - -
0.5928 105 2.2699 - - - - - - - - - - - - - - -
0.6097 108 2.1425 2.7217 1.7080 1.2066 0.0800 0.0949 1.6446 1.5739 1.7924 2.3649 0.2329 0.8462 2.3389 1.0 0.5323 0.7806
0.6267 111 2.1276 - - - - - - - - - - - - - - -
0.6436 114 1.7531 - - - - - - - - - - - - - - -
0.6606 117 2.0179 - - - - - - - - - - - - - - -
0.6775 120 1.5305 - - - - - - - - - - - - - - -
0.6944 123 1.6925 - - - - - - - - - - - - - - -
0.7114 126 1.5248 - - - - - - - - - - - - - - -
0.7283 129 1.523 - - - - - - - - - - - - - - -
0.7452 132 1.5474 - - - - - - - - - - - - - - -
0.7622 135 1.7221 2.8521 1.4495 0.7707 0.0601 0.0751 1.1524 1.4015 1.3955 1.7769 0.2150 0.6356 2.0742 1.0 0.5327 0.8315
0.7791 138 1.5366 - - - - - - - - - - - - - - -
0.7960 141 1.3045 - - - - - - - - - - - - - - -
0.8130 144 1.1999 - - - - - - - - - - - - - - -
0.8299 147 1.3483 - - - - - - - - - - - - - - -
0.8469 150 1.2009 - - - - - - - - - - - - - - -
0.8638 153 1.4495 - - - - - - - - - - - - - - -
0.8807 156 1.2329 - - - - - - - - - - - - - - -
0.8977 159 1.1905 - - - - - - - - - - - - - - -
0.9146 162 1.277 2.7764 1.2929 0.5587 0.0525 0.0604 0.8656 1.1903 1.1581 1.1554 0.1988 0.4943 2.0055 1.0 0.5311 0.8548
0.9315 165 1.339 - - - - - - - - - - - - - - -
0.9485 168 1.1535 - - - - - - - - - - - - - - -
0.9654 171 1.1643 - - - - - - - - - - - - - - -
0.9824 174 1.2221 - - - - - - - - - - - - - - -
0.9993 177 1.0974 - - - - - - - - - - - - - - -
1.0162 180 1.0984 - - - - - - - - - - - - - - -
1.0332 183 1.0543 - - - - - - - - - - - - - - -
1.0501 186 1.0994 - - - - - - - - - - - - - - -
1.0670 189 1.0621 2.6755 1.2004 0.3837 0.0421 0.0556 0.6897 1.0837 1.0353 0.9604 0.1854 0.4047 1.9071 1.0 0.5420 0.8680
1.0840 192 0.8724 - - - - - - - - - - - - - - -
1.1009 195 0.9381 - - - - - - - - - - - - - - -
1.1179 198 0.9617 - - - - - - - - - - - - - - -
1.1348 201 1.0139 - - - - - - - - - - - - - - -
1.1517 204 1.1073 - - - - - - - - - - - - - - -
1.1687 207 0.8365 - - - - - - - - - - - - - - -
1.1856 210 1.1012 - - - - - - - - - - - - - - -
1.2025 213 1.0016 - - - - - - - - - - - - - - -
1.2195 216 1.0957 2.5466 1.1412 0.3591 0.0395 0.0517 0.5819 0.9366 0.9686 0.8172 0.1901 0.3075 1.9161 1.0 0.5385 0.8656
1.2364 219 1.1273 - - - - - - - - - - - - - - -
1.2534 222 1.2568 - - - - - - - - - - - - - - -
1.2703 225 0.873 - - - - - - - - - - - - - - -
1.2872 228 1.0003 - - - - - - - - - - - - - - -
1.3042 231 1.142 - - - - - - - - - - - - - - -
1.3211 234 0.807 - - - - - - - - - - - - - - -
1.3380 237 1.0231 - - - - - - - - - - - - - - -
1.3550 240 0.797 - - - - - - - - - - - - - - -
1.3719 243 0.8473 2.5140 1.1067 0.2802 0.0343 0.0467 0.5559 0.8562 0.8929 0.7435 0.1750 0.2355 1.8629 1.0 0.5508 0.8687
1.3888 246 0.9531 - - - - - - - - - - - - - - -
1.4058 249 0.9023 - - - - - - - - - - - - - - -
1.4227 252 0.8922 - - - - - - - - - - - - - - -
1.4397 255 0.9874 - - - - - - - - - - - - - - -
1.4566 258 0.8508 - - - - - - - - - - - - - - -
1.4735 261 0.7149 - - - - - - - - - - - - - - -
1.4905 264 0.894 - - - - - - - - - - - - - - -
1.5074 267 0.867 - - - - - - - - - - - - - - -
1.5243 270 0.7493 2.5574 1.0634 0.2217 0.0319 0.0435 0.5027 0.7999 0.8005 0.6530 0.1693 0.2443 1.8535 1.0 0.5499 0.8716
1.5413 273 0.7974 - - - - - - - - - - - - - - -
1.5582 276 0.797 - - - - - - - - - - - - - - -
1.5752 279 0.6749 - - - - - - - - - - - - - - -
1.5921 282 0.9325 - - - - - - - - - - - - - - -
1.6090 285 0.8418 - - - - - - - - - - - - - - -
1.6260 288 1.0135 - - - - - - - - - - - - - - -
1.6429 291 0.6961 - - - - - - - - - - - - - - -
1.6598 294 0.9361 - - - - - - - - - - - - - - -
1.6768 297 0.6747 2.4871 0.9762 0.2242 0.0291 0.0396 0.5025 0.7668 0.7546 0.6427 0.1596 0.1963 1.7349 1.0 0.5461 0.8787
1.6937 300 0.7786 - - - - - - - - - - - - - - -
1.7107 303 0.7171 - - - - - - - - - - - - - - -
1.7276 306 0.6627 - - - - - - - - - - - - - - -
1.7445 309 0.6711 - - - - - - - - - - - - - - -
1.7615 312 0.9076 - - - - - - - - - - - - - - -
1.7784 315 0.7414 - - - - - - - - - - - - - - -
1.7953 318 0.582 - - - - - - - - - - - - - - -
1.8123 321 0.6068 - - - - - - - - - - - - - - -
1.8292 324 0.6219 2.5197 1.0206 0.1630 0.0273 0.0383 0.4859 0.7109 0.7736 0.5533 0.1535 0.2044 1.7016 1.0 0.5532 0.8807
1.8462 327 0.5862 - - - - - - - - - - - - - - -
1.8631 330 0.678 - - - - - - - - - - - - - - -
1.8800 333 0.6272 - - - - - - - - - - - - - - -
1.8970 336 0.5048 - - - - - - - - - - - - - - -
1.9139 339 0.7653 - - - - - - - - - - - - - - -
1.9308 342 0.6613 - - - - - - - - - - - - - - -
1.9478 345 0.6122 - - - - - - - - - - - - - - -
1.9647 348 0.5939 - - - - - - - - - - - - - - -
1.9817 351 0.6923 2.4379 0.9582 0.1464 0.0264 0.0382 0.4348 0.7554 0.7220 0.5432 0.1481 0.1640 1.7345 1.0 0.5560 0.8837
1.9986 354 0.5712 - - - - - - - - - - - - - - -
2.0155 357 0.5969 - - - - - - - - - - - - - - -
2.0325 360 0.5881 - - - - - - - - - - - - - - -
2.0494 363 0.6005 - - - - - - - - - - - - - - -
2.0663 366 0.6066 - - - - - - - - - - - - - - -
2.0833 369 0.4921 - - - - - - - - - - - - - - -
2.1002 372 0.5354 - - - - - - - - - - - - - - -
2.1171 375 0.5602 - - - - - - - - - - - - - - -
2.1341 378 0.5686 2.3908 0.9614 0.1454 0.0271 0.0374 0.4246 0.7796 0.6965 0.5298 0.1401 0.1604 1.7678 1.0 0.5539 0.8804
2.1510 381 0.6496 - - - - - - - - - - - - - - -
2.1680 384 0.4713 - - - - - - - - - - - - - - -
2.1849 387 0.6345 - - - - - - - - - - - - - - -
2.2018 390 0.5994 - - - - - - - - - - - - - - -
2.2188 393 0.6763 - - - - - - - - - - - - - - -
2.2357 396 0.7254 - - - - - - - - - - - - - - -
2.2526 399 0.8032 - - - - - - - - - - - - - - -
2.2696 402 0.4914 - - - - - - - - - - - - - - -
2.2865 405 0.6307 2.4388 0.9862 0.1308 0.0262 0.0379 0.3928 0.7434 0.6976 0.4998 0.1192 0.1466 1.7093 1.0 0.5533 0.8859
2.3035 408 0.7493 - - - - - - - - - - - - - - -
2.3204 411 0.5139 - - - - - - - - - - - - - - -
2.3373 414 0.6364 - - - - - - - - - - - - - - -
2.3543 417 0.4763 - - - - - - - - - - - - - - -
2.3712 420 0.583 - - - - - - - - - - - - - - -
2.3881 423 0.5912 - - - - - - - - - - - - - - -
2.4051 426 0.5936 - - - - - - - - - - - - - - -

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.3
  • PyTorch: 2.1.2
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}