batch32-100 / README.md
Yohhei's picture
Add new SentenceTransformer model.
4768a66 verified
|
raw
history blame
No virus
26.7 kB
metadata
base_model: intfloat/multilingual-e5-small
datasets: []
language: []
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:16
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      Please contact each construction office of the Construction Bureau. The
      police will respond in cooperation with the police and other authorities.
      For the telephone number of each construction office, please refer to the
      link "Area Management Offices and Construction Offices".
    sentences:
      - I have an abandoned vehicle on the street, what should I do?
      - Do I need special permission to place a giant ad?
      - What is katagatashi/tai?
  - source_sentence: >-
      Currently, there are four Seseragi-no-Sato (Seseragi-no-Sato) oases in the
      city filled with flowers and greenery, as an oasis of relaxation and
      luster. How about touring the "Seseragi-no-Sato" while admiring the
      seasonal flowers? Please take a stroll. For more information, please refer
      to the link "Flower Sewage Treatment Plant and Seseragi no Sato (Sewage
      Treatment Plant and Seseragi no Sato)". For more information, please refer
      to the link "Sewage Treatment Plants and Seseragi no Sato (Flower Sewage
      Treatment Plants and Seseragi no Sato)".
    sentences:
      - I want to install a sign on the road. Do I need any permits?
      - Who can I talk to about housing?
      - I would like to know more about Seseragi no Sato.
  - source_sentence: >-
      The Osaka Municipal Housing Information Center provides comprehensive
      information on housing, and consists of the Housing Information Plaza,
      which provides various consultations and information on housing, and the
      Osaka Kurashi-no Konjikan, a museum of housing that exhibits the culture
      and history of housing and people's lives. Location and Access] Location:
      6-4-20 Tenjinbashi, Kita-ku, Osaka Access: - Direct connection from Exit 3
      of Tenjinbashisuji Rokuchome Station on the Osaka Metro Tanimachi Line,
      Sakaisuji Line, and Hankyu Railway - Approximately 650 m north of Tenma
      Station on the JR Loop Line - Approximately 2 km by cab from Midosuji
      South Exit of JR Osaka Station via Miyakojima-dori, 7 minutes by car By
      car: Approx. 500 m from Nagara Exit on the Moriguchi Line of Hanshin
      Expressway via Miyakojima-dori Street. ◆Housing Information Plaza Hours:
      Weekdays and Saturdays: 9:00-19:00, Sundays and National Holidays:
      10:00-17:00, Closed: Tuesdays (closed the following day if Tuesday is a
      national vacation), the day after national holidays (except Sundays and
      Mondays), year-end and New Year holidays (12/29 - 1/3) *Special holidays
      may occur in addition to the above. ◆Housing Museum "Osaka Kurashi no
      Konjakukan" Hours: 10:00 - 17:00 (admission until 16:30) Closed: Tuesdays,
      Year-end and New Year holidays (12/29 - 1/3) *The museum may be open or
      closed on a temporary basis in addition to the above. In addition to the
      above, the museum may be open or closed temporarily. 6208-9224 Fax:
      06-6202-7064
    sentences:
      - Please tell me about the Osaka Municipal Housing Information Center.
      - Where is advertising prohibited?
      - How much is the admission fee to Osaka Kurashi-no-Museum?
  - source_sentence: >-
      A pamphlet and leaflet, "Sewerage in Osaka City," which introduces the
      sewerage system of Osaka City, including its structure and roles, are
      distributed at City Hall and other locations. They are also available on
      the city website. You can also tour the following sewerage facilities. All
      tours are free of charge. Taikoh Sewer: You can visit the Taikoh Sewer, a
      designated cultural asset of Osaka City. Those who wish to tour the
      underground facilities must apply in advance. Maishima Sludge Center
      Sludge Treatment Facility】Persons wishing to tour the sludge treatment
      facility are required to apply in advance. Sewage Treatment Plants】Persons
      wishing to tour the facilities should contact the respective sewage
      treatment plant in advance. (Tours may not be available due to
      construction work at sewage treatment plants.) For details, please refer
      to the following links: "Leaflet "Sewerage in Osaka City" (digest
      version)," "Pamphlet "Sewerage in Osaka City," "Taikoh Sewage Treatment
      Plant," "Maishima Sludge Center (sewage sludge treatment plant)," "Osaka
      City Visual Sewage Plan," "Osaka Eco Kids: Learn Sewage! for more
      information.
    sentences:
      - I want to know how the sewage system works.
      - >-
        How much is the rent for Osaka City's excellent rental housing for the
        elderly?
      - Please tell me about K.K. General Rental Housing and K.K. Sumai Ringu.
  - source_sentence: >-
      For posters, billboards, etc. on roads that are in violation of the
      ordinance, we systematically provide corrective guidance and remove them,
      as well as conduct road patrols as needed. In addition, for minor
      violations such as posters and billboards, we also remove them through the
      activities of contractors and citizen volunteers called "Katayaki-Tai".
      For details, please refer to the link "Recruitment of "KATADACHI-TAI" (a
      system for removing simple advertisements on the street)". For more
      information, please refer to the link "Simple Roadside Advertisement
      Removal Activities" for details.
    sentences:
      - >-
        I am thinking of buying a house in Osaka City, is there any assistance
        available? (Newlyweds and families raising children)
      - What is the maximum size of a billboard or advertisement?
      - What measures are in place to deal with objectionable signs and posters?
model-index:
  - name: SentenceTransformer based on intfloat/multilingual-e5-small
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: intfloat/multilingual e5 small
          type: intfloat/multilingual-e5-small
        metrics:
          - type: cosine_accuracy@1
            value: 0.75
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 1
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 1
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 1
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.75
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3333333333333333
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.2
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.1
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.75
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 1
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 1
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 1
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9077324383928644
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.875
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.875
            name: Cosine Map@100

SentenceTransformer based on intfloat/multilingual-e5-small

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Yohhei/batch32-100")
# Run inference
sentences = [
    'For posters, billboards, etc. on roads that are in violation of the ordinance, we systematically provide corrective guidance and remove them, as well as conduct road patrols as needed. In addition, for minor violations such as posters and billboards, we also remove them through the activities of contractors and citizen volunteers called "Katayaki-Tai". For details, please refer to the link "Recruitment of "KATADACHI-TAI" (a system for removing simple advertisements on the street)". For more information, please refer to the link "Simple Roadside Advertisement Removal Activities" for details.',
    'What measures are in place to deal with objectionable signs and posters?',
    'I am thinking of buying a house in Osaka City, is there any assistance available? (Newlyweds and families raising children)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.75
cosine_accuracy@3 1.0
cosine_accuracy@5 1.0
cosine_accuracy@10 1.0
cosine_precision@1 0.75
cosine_precision@3 0.3333
cosine_precision@5 0.2
cosine_precision@10 0.1
cosine_recall@1 0.75
cosine_recall@3 1.0
cosine_recall@5 1.0
cosine_recall@10 1.0
cosine_ndcg@10 0.9077
cosine_mrr@10 0.875
cosine_map@100 0.875

Training Details

Training Dataset

Unnamed Dataset

  • Size: 16 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 51 tokens
    • mean: 226.69 tokens
    • max: 419 tokens
    • min: 8 tokens
    • mean: 16.81 tokens
    • max: 31 tokens
  • Samples:
    positive anchor
    The following is a summary of K.H.I. General Rental Housing and K.H.I. Sumai Ringu. Outline】 ● KLH General Rental Housing and KLH Sumai Ringu are rental housing units for middle class households where the annual income of the applicant household must be within a certain range. The public corporation's Sumai Ringu housing is a unit to which the government's specified excellent rental housing system, etc., is applied, and the government and Osaka City subsidize a portion of the rent for a certain period of time, depending on the income of the household moving in. Applications for vacant units are accepted on an as-needed basis. Please refer to the "List of Apartment Complexes" at the link below. Inquiries: ◆Osaka City Housing Corporation, Housing Management Department, Management Division, Recruitment Section Telephone: 06-6882-9000* Weekdays: 9:00 - 19:00 (Tuesdays and the day after national holidays (weekdays): 9:00 - 17:30) Saturdays: 9:00 - 19:00 Sundays and holidays: 10:00 - 17:00, except during the year-end and New Year holidays (December 29 - January 3). Please tell me about K.K. General Rental Housing and K.K. Sumai Ringu.
    There are permit criteria for each property for wall boards, towers (rooftop/ground), and boards (rooftop/ground), and permit criteria vary by location. Please refer to the "Permit Criteria" in the "Outdoor Advertisement Bookmark. (Downloadable from the website) △Link to "https://www.city.osaka.lg.jp/kensetsu/page/0000372127.html屋外広告物の許可について (Outdoor Advertisement Bookmark, Outdoor Advertisement Ordinance, etc.) [Inquiries] ◆Construction Bureau, Administration Division Phone: 06-6615-6687 Fax: 06-6615-6576 What is the maximum size of a billboard or advertisement?
    Areas or properties where advertising materials may not be displayed are as follows In addition to the above, the following areas or properties are prohibited from displaying advertising materials: - Areas along the Hanshin Expressway up to 50 m on both sides and 15 m above the road surface level - Areas within the grounds of ancient tombs and cemeteries - Bridges, roadside trees, traffic signals, pedestrian railings, utility poles, mailboxes, transmission towers, statues, monuments, etc. - The Okawa Wind Area from Genpachi Bridge to Tenmabashi Bridge. In addition to the above, the display of posters, billboards, etc., advertising flags, and standing signs, etc., is prohibited on the following roads and in areas or locations facing these roads. Midosuji (from Osaka Station to Namba Station) ●Sakaisuji (from Naniwabashi to Nipponbashi) ●Tosabori Dori (from Higobashi to Yoshiyabashi) ●Uemachi-suji (from Otemae 1-chome, Chuo-ku to Hoenzaka 1-chome, Chuo-ku) ●Nagahori Dori (from Minami-Senba 1-chome, Chuo-ku to Minami Senba 1-chome, Chuo-ku) Dotonbori River promenade (from east side of Sumiyoshi Bridge to west side of Nihonbashi Bridge)-Please refer to the link. https://www.city.osaka.lg.jp/kensetsu/page/ 0000372127.htmlAbout Permission for Outdoor Advertisements (Outdoor Advertisement Booklet, Outdoor Advertisement Ordinance, etc.) [Inquiries: ◆Construction Bureau, Administration Division Tel: 06-6615-6687 Fax: 06-6615-6576 Where is advertising prohibited?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step intfloat/multilingual-e5-small_cosine_map@100
0 0 0.875

Framework Versions

  • Python: 3.8.10
  • Sentence Transformers: 3.0.1
  • Transformers: 4.44.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.32.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}