metadata
base_model: intfloat/multilingual-e5-small
datasets: []
language: []
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:16
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: >-
Please contact each construction office of the Construction Bureau. The
police will respond in cooperation with the police and other authorities.
For the telephone number of each construction office, please refer to the
link "Area Management Offices and Construction Offices".
sentences:
- I have an abandoned vehicle on the street, what should I do?
- Do I need special permission to place a giant ad?
- What is katagatashi/tai?
- source_sentence: >-
Currently, there are four Seseragi-no-Sato (Seseragi-no-Sato) oases in the
city filled with flowers and greenery, as an oasis of relaxation and
luster. How about touring the "Seseragi-no-Sato" while admiring the
seasonal flowers? Please take a stroll. For more information, please refer
to the link "Flower Sewage Treatment Plant and Seseragi no Sato (Sewage
Treatment Plant and Seseragi no Sato)". For more information, please refer
to the link "Sewage Treatment Plants and Seseragi no Sato (Flower Sewage
Treatment Plants and Seseragi no Sato)".
sentences:
- I want to install a sign on the road. Do I need any permits?
- Who can I talk to about housing?
- I would like to know more about Seseragi no Sato.
- source_sentence: >-
The Osaka Municipal Housing Information Center provides comprehensive
information on housing, and consists of the Housing Information Plaza,
which provides various consultations and information on housing, and the
Osaka Kurashi-no Konjikan, a museum of housing that exhibits the culture
and history of housing and people's lives. Location and Access] Location:
6-4-20 Tenjinbashi, Kita-ku, Osaka Access: - Direct connection from Exit 3
of Tenjinbashisuji Rokuchome Station on the Osaka Metro Tanimachi Line,
Sakaisuji Line, and Hankyu Railway - Approximately 650 m north of Tenma
Station on the JR Loop Line - Approximately 2 km by cab from Midosuji
South Exit of JR Osaka Station via Miyakojima-dori, 7 minutes by car By
car: Approx. 500 m from Nagara Exit on the Moriguchi Line of Hanshin
Expressway via Miyakojima-dori Street. ◆Housing Information Plaza Hours:
Weekdays and Saturdays: 9:00-19:00, Sundays and National Holidays:
10:00-17:00, Closed: Tuesdays (closed the following day if Tuesday is a
national vacation), the day after national holidays (except Sundays and
Mondays), year-end and New Year holidays (12/29 - 1/3) *Special holidays
may occur in addition to the above. ◆Housing Museum "Osaka Kurashi no
Konjakukan" Hours: 10:00 - 17:00 (admission until 16:30) Closed: Tuesdays,
Year-end and New Year holidays (12/29 - 1/3) *The museum may be open or
closed on a temporary basis in addition to the above. In addition to the
above, the museum may be open or closed temporarily. 6208-9224 Fax:
06-6202-7064
sentences:
- Please tell me about the Osaka Municipal Housing Information Center.
- Where is advertising prohibited?
- How much is the admission fee to Osaka Kurashi-no-Museum?
- source_sentence: >-
A pamphlet and leaflet, "Sewerage in Osaka City," which introduces the
sewerage system of Osaka City, including its structure and roles, are
distributed at City Hall and other locations. They are also available on
the city website. You can also tour the following sewerage facilities. All
tours are free of charge. Taikoh Sewer: You can visit the Taikoh Sewer, a
designated cultural asset of Osaka City. Those who wish to tour the
underground facilities must apply in advance. Maishima Sludge Center
Sludge Treatment Facility】Persons wishing to tour the sludge treatment
facility are required to apply in advance. Sewage Treatment Plants】Persons
wishing to tour the facilities should contact the respective sewage
treatment plant in advance. (Tours may not be available due to
construction work at sewage treatment plants.) For details, please refer
to the following links: "Leaflet "Sewerage in Osaka City" (digest
version)," "Pamphlet "Sewerage in Osaka City," "Taikoh Sewage Treatment
Plant," "Maishima Sludge Center (sewage sludge treatment plant)," "Osaka
City Visual Sewage Plan," "Osaka Eco Kids: Learn Sewage! for more
information.
sentences:
- I want to know how the sewage system works.
- >-
How much is the rent for Osaka City's excellent rental housing for the
elderly?
- Please tell me about K.K. General Rental Housing and K.K. Sumai Ringu.
- source_sentence: >-
For posters, billboards, etc. on roads that are in violation of the
ordinance, we systematically provide corrective guidance and remove them,
as well as conduct road patrols as needed. In addition, for minor
violations such as posters and billboards, we also remove them through the
activities of contractors and citizen volunteers called "Katayaki-Tai".
For details, please refer to the link "Recruitment of "KATADACHI-TAI" (a
system for removing simple advertisements on the street)". For more
information, please refer to the link "Simple Roadside Advertisement
Removal Activities" for details.
sentences:
- >-
I am thinking of buying a house in Osaka City, is there any assistance
available? (Newlyweds and families raising children)
- What is the maximum size of a billboard or advertisement?
- What measures are in place to deal with objectionable signs and posters?
model-index:
- name: SentenceTransformer based on intfloat/multilingual-e5-small
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: intfloat/multilingual e5 small
type: intfloat/multilingual-e5-small
metrics:
- type: cosine_accuracy@1
value: 0.75
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.75
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.3333333333333333
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.2
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.1
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.75
name: Cosine Recall@1
- type: cosine_recall@3
value: 1
name: Cosine Recall@3
- type: cosine_recall@5
value: 1
name: Cosine Recall@5
- type: cosine_recall@10
value: 1
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.9077324383928644
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.875
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.875
name: Cosine Map@100
SentenceTransformer based on intfloat/multilingual-e5-small
This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: intfloat/multilingual-e5-small
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Yohhei/batch32-100")
# Run inference
sentences = [
'For posters, billboards, etc. on roads that are in violation of the ordinance, we systematically provide corrective guidance and remove them, as well as conduct road patrols as needed. In addition, for minor violations such as posters and billboards, we also remove them through the activities of contractors and citizen volunteers called "Katayaki-Tai". For details, please refer to the link "Recruitment of "KATADACHI-TAI" (a system for removing simple advertisements on the street)". For more information, please refer to the link "Simple Roadside Advertisement Removal Activities" for details.',
'What measures are in place to deal with objectionable signs and posters?',
'I am thinking of buying a house in Osaka City, is there any assistance available? (Newlyweds and families raising children)',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
intfloat/multilingual-e5-small
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.75 |
cosine_accuracy@3 | 1.0 |
cosine_accuracy@5 | 1.0 |
cosine_accuracy@10 | 1.0 |
cosine_precision@1 | 0.75 |
cosine_precision@3 | 0.3333 |
cosine_precision@5 | 0.2 |
cosine_precision@10 | 0.1 |
cosine_recall@1 | 0.75 |
cosine_recall@3 | 1.0 |
cosine_recall@5 | 1.0 |
cosine_recall@10 | 1.0 |
cosine_ndcg@10 | 0.9077 |
cosine_mrr@10 | 0.875 |
cosine_map@100 | 0.875 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 16 training samples
- Columns:
positive
andanchor
- Approximate statistics based on the first 1000 samples:
positive anchor type string string details - min: 51 tokens
- mean: 226.69 tokens
- max: 419 tokens
- min: 8 tokens
- mean: 16.81 tokens
- max: 31 tokens
- Samples:
positive anchor The following is a summary of K.H.I. General Rental Housing and K.H.I. Sumai Ringu. Outline】 ● KLH General Rental Housing and KLH Sumai Ringu are rental housing units for middle class households where the annual income of the applicant household must be within a certain range. The public corporation's Sumai Ringu housing is a unit to which the government's specified excellent rental housing system, etc., is applied, and the government and Osaka City subsidize a portion of the rent for a certain period of time, depending on the income of the household moving in. Applications for vacant units are accepted on an as-needed basis. Please refer to the "List of Apartment Complexes" at the link below. Inquiries: ◆Osaka City Housing Corporation, Housing Management Department, Management Division, Recruitment Section Telephone: 06-6882-9000* Weekdays: 9:00 - 19:00 (Tuesdays and the day after national holidays (weekdays): 9:00 - 17:30) Saturdays: 9:00 - 19:00 Sundays and holidays: 10:00 - 17:00, except during the year-end and New Year holidays (December 29 - January 3).
Please tell me about K.K. General Rental Housing and K.K. Sumai Ringu.
There are permit criteria for each property for wall boards, towers (rooftop/ground), and boards (rooftop/ground), and permit criteria vary by location. Please refer to the "Permit Criteria" in the "Outdoor Advertisement Bookmark. (Downloadable from the website) △Link to "https://www.city.osaka.lg.jp/kensetsu/page/0000372127.html屋外広告物の許可について (Outdoor Advertisement Bookmark, Outdoor Advertisement Ordinance, etc.) [Inquiries] ◆Construction Bureau, Administration Division Phone: 06-6615-6687 Fax: 06-6615-6576
What is the maximum size of a billboard or advertisement?
Areas or properties where advertising materials may not be displayed are as follows In addition to the above, the following areas or properties are prohibited from displaying advertising materials: - Areas along the Hanshin Expressway up to 50 m on both sides and 15 m above the road surface level - Areas within the grounds of ancient tombs and cemeteries - Bridges, roadside trees, traffic signals, pedestrian railings, utility poles, mailboxes, transmission towers, statues, monuments, etc. - The Okawa Wind Area from Genpachi Bridge to Tenmabashi Bridge. In addition to the above, the display of posters, billboards, etc., advertising flags, and standing signs, etc., is prohibited on the following roads and in areas or locations facing these roads. Midosuji (from Osaka Station to Namba Station) ●Sakaisuji (from Naniwabashi to Nipponbashi) ●Tosabori Dori (from Higobashi to Yoshiyabashi) ●Uemachi-suji (from Otemae 1-chome, Chuo-ku to Hoenzaka 1-chome, Chuo-ku) ●Nagahori Dori (from Minami-Senba 1-chome, Chuo-ku to Minami Senba 1-chome, Chuo-ku) Dotonbori River promenade (from east side of Sumiyoshi Bridge to west side of Nihonbashi Bridge)-Please refer to the link. https://www.city.osaka.lg.jp/kensetsu/page/ 0000372127.htmlAbout Permission for Outdoor Advertisements (Outdoor Advertisement Booklet, Outdoor Advertisement Ordinance, etc.) [Inquiries: ◆Construction Bureau, Administration Division Tel: 06-6615-6687 Fax: 06-6615-6576
Where is advertising prohibited?
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 32per_device_eval_batch_size
: 16learning_rate
: 2e-05num_train_epochs
: 1warmup_ratio
: 0.1fp16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 1max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | intfloat/multilingual-e5-small_cosine_map@100 |
---|---|---|
0 | 0 | 0.875 |
Framework Versions
- Python: 3.8.10
- Sentence Transformers: 3.0.1
- Transformers: 4.44.2
- PyTorch: 2.1.2+cu121
- Accelerate: 0.32.0
- Datasets: 2.19.1
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}