--- base_model: BAAI/bge-base-en-v1.5 datasets: - sentence-transformers/hotpotqa language: - en library_name: sentence-transformers license: apache-2.0 metrics: - cosine_accuracy - dot_accuracy - manhattan_accuracy - euclidean_accuracy - max_accuracy pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:76064 - loss:MatryoshkaLoss - loss:TripletLoss widget: - source_sentence: Which survey in 2010 recommended the The Wide Field Infrared Survey Telescope as the top priority for astronomy? sentences: - High Energy Astronomy Observatory 1 HEAO-1 was an X-ray telescope launched in 1977. HEAO-1 surveyed the sky in the X-ray portion of the electromagnetic spectrum (0.2 keV - 10 MeV), providing nearly constant monitoring of X-ray sources near the ecliptic poles and more detailed studies of a number of objects by observations lasting 3-6 hours. It was the first of NASA's three High Energy Astronomy Observatories, HEAO 1, launched August 12, 1977 aboard an Atlas rocket with a Centaur upper stage, operated until 9 January 1979. During that time, it scanned the X-ray sky almost three times - Wide Field Infrared Survey Telescope The Wide Field Infrared Survey Telescope (WFIRST) is a future infrared space observatory that was recommended in 2010 by United States National Research Council Decadal Survey committee as the top priority for the next decade of astronomy. On February 17, 2016, WFIRST was formally designated as a mission by NASA. - The Bluebells The Bluebells were a Scottish indie rock band, active between 1981 and 1986 (later briefly reforming in 1993, 2008–2009 and 2011). - source_sentence: Near what river is the library that contains the Aberdeen Bestiary located? sentences: - Joseph Roth Joseph Roth, born Moses Joseph Roth (2 September 1894 – 27 May 1939), was an Austrian-Jewish journalist and novelist, best known for his family saga "Radetzky March" (1932), about the decline and fall of the Austro-Hungarian Empire, his novel of Jewish life, "Job" (1930), and his seminal essay "Juden auf Wanderschaft" (1927; translated into English in "The Wandering Jews"), a fragmented account of the Jewish migrations from eastern to western Europe in the aftermath of World War I and the Russian Revolution. In the 21st century, publications in English of "Radetzky March" and of collections of his journalism from Berlin and Paris created a revival of interest in Roth. - Aberdeen Bestiary The Aberdeen Bestiary (Aberdeen University Library, Univ Lib. MS 24) is a 12th-century English illuminated manuscript bestiary that was first listed in 1542 in the inventory of the Old Royal Library at the Palace of Westminster. - House of Monymusk The House of Monymusk is located on the outskirts of the Scottish village of Monymusk, in the Marr region of Aberdeenshire. The house is located near the "river Don", which is known for its spectacular trout-fishing. The village, which history dates back to 1170, was bought by the Forbses in the 1560s, who later built the House of Monymusk. The Forbses claim they built the present House of Monymusk from the blackened stones of the old Priory. - source_sentence: The Stage" is a song by Avenged Sevenfold and the first single from their seventh studio album of the same name, which was released on which date? sentences: - Allegra Stratton Allegra Stratton (born 25 November 1980) is a British journalist and writer. Since January 2016, she has been the National Editor of ITV News after four years as political editor on BBC Two's "Newsnight". She has also co-presented "Peston on Sunday" with Robert Peston since May 2016. - Appetite for Destruction Appetite for Destruction is the debut studio album by American hard rock band Guns N' Roses. It was released on July 21, 1987, by Geffen Records to massive commercial success. It topped the "Billboard" 200 and became the best-selling debut album as well as the 11th best-selling album in the United States. With about 30 million copies sold worldwide, it is also one of the best-selling records ever. Although critics were ambivalent toward the album when it was first released, "Appetite for Destruction" has since received retrospective acclaim and been viewed as one of the greatest albums of all time. - The Stage (Avenged Sevenfold song) "The Stage" is a song by Avenged Sevenfold and the first single from their seventh studio album of the same name, which was released on October 28, 2016. - source_sentence: Union County Speedway is home to what type of motorsports that are usually held at county fairs and festivals? sentences: - Long Beach, New York Long Beach is a city in Nassau County, New York, United States. Just south of Long Island, it is located on Long Beach Barrier Island, which is the westernmost of the outer barrier islands off Long Island's South Shore. As of the United States 2010 Census, the city population was 33,275. It was incorporated in 1922, and is nicknamed "The City By the Sea" (as seen in Latin on its official seal). - Union County Speedway Union County Speedway is a dirt racetrack in Liberty, Indiana, United States. It features races with cars such as, late models, Modifieds, Sidestroke, Bombers, Road Hogs, and Street Stocks. UCS is also host to dirtbike, quad, Mini-Sprint, and Demolition Derbies. - Mercer County Fairgrounds The Mercer County Fairgrounds, located on 12th Avenue SW in Aledo, are the home of the annual county fair in Mercer County, Illinois. The fairgrounds were established in 1869 when the fair moved to Aledo; from its creation in 1853 until then, it had taken place in Millersburg. The early fairs mainly focused on agricultural exhibitions, and the first two buildings were used for horticulture exhibits and household floral shows; these fairs also included entertainment such as baseball games and band concerts. By the end of the century, the fair had grown to host 8,000 visitors, many who came from neighboring counties by train, and show 3,000 entries in its various agricultural competitions. The fair added traveling entertainment and grew to host over 20,000 visitors in the 20th century; it is still held annually at the fairgrounds. In addition to the county fair, the fairgrounds have also held horse races, political events, picnics, and other community events. - source_sentence: James D. Farley, Jr. had an early interest in automobiles because of his grandfather who worked for what company? sentences: - Continental Motors Company Continental Motors Company was an American manufacturer of internal combustion engines. The company produced engines as a supplier to many independent manufacturers of automobiles, tractors, trucks, and stationary equipment (such as pumps, generators, and industrial machinery drives) from the 1900s through the 1960s. Continental Motors also produced Continental-branded automobiles in 1932–1933. The Continental Aircraft Engine Company was formed in 1929 to develop and produce its aircraft engines, and would become the core business of Continental Motors, Inc. - The Atlantic The Atlantic is an American magazine and multi-platform publisher, founded in 1857 as The Atlantic Monthly in Boston, Massachusetts. - Jim Farley (businessman) James D. Farley, Jr. (born June 1962) is an American automobile executive that currently serves as Ford Motor Company's Executive Vice President and president, Global Markets since June 2017. From 2015 to 2017, he was CEO and Chairman of Ford Europe. He had an early interest in automobiles, primarily spurred from his grandfather who worked at Henry Ford's River Rouge Plant starting in 1914. model-index: - name: BGE-base-en-v1.5-Hotpotqa results: - task: type: triplet name: Triplet dataset: name: dim 768 type: dim_768 metrics: - type: cosine_accuracy value: 0.9068859441552295 name: Cosine Accuracy - type: dot_accuracy value: 0.09311405584477046 name: Dot Accuracy - type: manhattan_accuracy value: 0.9066493137718883 name: Manhattan Accuracy - type: euclidean_accuracy value: 0.9068859441552295 name: Euclidean Accuracy - type: max_accuracy value: 0.9068859441552295 name: Max Accuracy - task: type: triplet name: Triplet dataset: name: dim 512 type: dim_512 metrics: - type: cosine_accuracy value: 0.9074775201135826 name: Cosine Accuracy - type: dot_accuracy value: 0.09311405584477046 name: Dot Accuracy - type: manhattan_accuracy value: 0.9055844770468529 name: Manhattan Accuracy - type: euclidean_accuracy value: 0.9064126833885471 name: Euclidean Accuracy - type: max_accuracy value: 0.9074775201135826 name: Max Accuracy - task: type: triplet name: Triplet dataset: name: dim 256 type: dim_256 metrics: - type: cosine_accuracy value: 0.907359204921912 name: Cosine Accuracy - type: dot_accuracy value: 0.09311405584477046 name: Dot Accuracy - type: manhattan_accuracy value: 0.9062943681968765 name: Manhattan Accuracy - type: euclidean_accuracy value: 0.9062943681968765 name: Euclidean Accuracy - type: max_accuracy value: 0.907359204921912 name: Max Accuracy - task: type: triplet name: Triplet dataset: name: dim 128 type: dim_128 metrics: - type: cosine_accuracy value: 0.9060577378135353 name: Cosine Accuracy - type: dot_accuracy value: 0.09488878371982963 name: Dot Accuracy - type: manhattan_accuracy value: 0.9014434453383815 name: Manhattan Accuracy - type: euclidean_accuracy value: 0.9035731187884525 name: Euclidean Accuracy - type: max_accuracy value: 0.9060577378135353 name: Max Accuracy - task: type: triplet name: Triplet dataset: name: dim 64 type: dim_64 metrics: - type: cosine_accuracy value: 0.9054661618551822 name: Cosine Accuracy - type: dot_accuracy value: 0.09808329389493611 name: Dot Accuracy - type: manhattan_accuracy value: 0.8983672503549456 name: Manhattan Accuracy - type: euclidean_accuracy value: 0.9013251301467108 name: Euclidean Accuracy - type: max_accuracy value: 0.9054661618551822 name: Max Accuracy --- # BGE-base-en-v1.5-Hotpotqa This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the [sentence-transformers/hotpotqa](https://huggingface.co/datasets/sentence-transformers/hotpotqa) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity - **Training Dataset:** - [sentence-transformers/hotpotqa](https://huggingface.co/datasets/sentence-transformers/hotpotqa) - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("sentence_transformers_model_id") # Run inference sentences = [ 'James D. Farley, Jr. had an early interest in automobiles because of his grandfather who worked for what company?', "Jim Farley (businessman) James D. Farley, Jr. (born June 1962) is an American automobile executive that currently serves as Ford Motor Company's Executive Vice President and president, Global Markets since June 2017. From 2015 to 2017, he was CEO and Chairman of Ford Europe. He had an early interest in automobiles, primarily spurred from his grandfather who worked at Henry Ford's River Rouge Plant starting in 1914.", 'Continental Motors Company Continental Motors Company was an American manufacturer of internal combustion engines. The company produced engines as a supplier to many independent manufacturers of automobiles, tractors, trucks, and stationary equipment (such as pumps, generators, and industrial machinery drives) from the 1900s through the 1960s. Continental Motors also produced Continental-branded automobiles in 1932–1933. The Continental Aircraft Engine Company was formed in 1929 to develop and produce its aircraft engines, and would become the core business of Continental Motors, Inc.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Triplet * Dataset: `dim_768` * Evaluated with [TripletEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator) | Metric | Value | |:--------------------|:-----------| | **cosine_accuracy** | **0.9069** | | dot_accuracy | 0.0931 | | manhattan_accuracy | 0.9066 | | euclidean_accuracy | 0.9069 | | max_accuracy | 0.9069 | #### Triplet * Dataset: `dim_512` * Evaluated with [TripletEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator) | Metric | Value | |:--------------------|:-----------| | **cosine_accuracy** | **0.9075** | | dot_accuracy | 0.0931 | | manhattan_accuracy | 0.9056 | | euclidean_accuracy | 0.9064 | | max_accuracy | 0.9075 | #### Triplet * Dataset: `dim_256` * Evaluated with [TripletEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator) | Metric | Value | |:--------------------|:-----------| | **cosine_accuracy** | **0.9074** | | dot_accuracy | 0.0931 | | manhattan_accuracy | 0.9063 | | euclidean_accuracy | 0.9063 | | max_accuracy | 0.9074 | #### Triplet * Dataset: `dim_128` * Evaluated with [TripletEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator) | Metric | Value | |:--------------------|:-----------| | **cosine_accuracy** | **0.9061** | | dot_accuracy | 0.0949 | | manhattan_accuracy | 0.9014 | | euclidean_accuracy | 0.9036 | | max_accuracy | 0.9061 | #### Triplet * Dataset: `dim_64` * Evaluated with [TripletEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator) | Metric | Value | |:--------------------|:-----------| | **cosine_accuracy** | **0.9055** | | dot_accuracy | 0.0981 | | manhattan_accuracy | 0.8984 | | euclidean_accuracy | 0.9013 | | max_accuracy | 0.9055 | ## Training Details ### Training Dataset #### sentence-transformers/hotpotqa * Dataset: [sentence-transformers/hotpotqa](https://huggingface.co/datasets/sentence-transformers/hotpotqa) at [f07d3cd](https://huggingface.co/datasets/sentence-transformers/hotpotqa/tree/f07d3cd2d290ea2e83ed35e33d67d6a4658b8786) * Size: 76,064 training samples * Columns: anchor, positive, and negative * Approximate statistics based on the first 1000 samples: | | anchor | positive | negative | |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | string | | details | | | | * Samples: | anchor | positive | negative | |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What historical geographic region in Central-Eastern Europe was the birthplace of a soldier of the Austro-Hungarian Army? | Bruno Olbrycht Bruno Olbrycht (nom de guerre: Olza; 6 October 1895 – 23 March 1951) was a soldier of the Austro-Hungarian Army and officer (later general) of the Polish Army both in the Second Polish Republic and postwar Poland. Born on 6 October 1895 in Sanok, Austrian Galicia, Olbrycht fought in Polish Legions in World War I, Polish–Ukrainian War, Polish–Soviet War and the Invasion of Poland. He died on 23 March 1951 in Kraków. | Padáň The village was first recorded in 1254 as "Padan", an old Pecheneg settlement. On the territory of the village, there used to be "Petény" village as well, which was mentioned in 1298 as the appurtenance of Pressburg Castle. Until the end of World War I, it was part of Hungary and fell within the Dunaszerdahely district of Pozsony County. After the Austro-Hungarian army disintegrated in November 1918, Czechoslovakian troops occupied the area. After the Treaty of Trianon of 1920, the village became officially part of Czechoslovakia. In November 1938, the First Vienna Award granted the area to Hungary and it was held by Hungary until 1945. After Soviet occupation in 1945, Czechoslovakian administration returned and the village became officially part of Czechoslovakia in 1947. | | Full Scale Assault is the fourth studio album by Dutch punk hardcore band Vitamin X, the album was recorded at Electrical Audio in Chicago by Steve Albini who previously recorded The Stooges, also known as Iggy and the Stooges, were an American rock band formed in Ann Arbor, Michigan in what year? | Full Scale Assault Full Scale Assault is the fourth studio album by Dutch punk hardcore band Vitamin X. Released through Tankcrimes on October 10, 2008 in the US, and Agipunk in Europe. The album was recorded at Electrical Audio in Chicago by Steve Albini who previously recorded Nirvana, Neurosis, PJ Harvey, High on Fire, Iggy Pop & The Stooges. It features guest vocals from Negative Approach's singer John Brannon. Art is by John Dyer Baizley. | The Dogs (US punk band) The Dogs are a three-piece proto-punk band formed in Lansing, Michigan, United States in 1969. They are noted for presaging the energy and sound of the later punk and hardcore genres. | | Which popular music style was a modification of the marches from "The March King" with heavy influences from African American communities? | Ragtime Ragtime – also spelled rag-time or rag time – is a musical style that enjoyed its peak popularity between 1895 and 1918. Its cardinal trait is its syncopated, or "ragged", rhythm. The style has its origins in African-American communities in cities such as St. Louis. Ernest Hogan (1865–1909) was a pioneer of ragtime and was the first composer to have his ragtime pieces (or "rags") published as sheet music, beginning with the song "LA Pas Ma LA," published in 1895. Hogan has also been credited for coining the term "ragtime". The term is actually derived from his hometown "Shake Rag" in Bowling Green, Kentucky. Ben Harney, another Kentucky native, has often been credited for introducing the music to the mainstream public. His first ragtime composition, "You've Been a Good Old Wagon But You Done Broke", helped popularize the style. The composition was published in 1895, a few months after Ernest Hogan's "LA Pas Ma LA." Ragtime was also a modification of the march style popularized by John Philip Sousa, with additional polyrhythms coming from African music. Ragtime composer Scott Joplin ("ca." 1868–1917) became famous through the publication of the "Maple Leaf Rag" (1899) and a string of ragtime hits such as "The Entertainer" (1902), although he was later forgotten by all but a small, dedicated community of ragtime aficionados until the major ragtime revival in the early 1970s. For at least 12 years after its publication, "Maple Leaf Rag" heavily influenced subsequent ragtime composers with its melody lines, harmonic progressions or metric patterns. | Joropo The Joropo is a musical style resembling the fandango, and an accompanying dance. It has African, Native South American and European influences and originated in the plains called "Los Llanos" of what is now Colombia and Venezuela. It is a fundamental genre of "música criolla" (creole music). It is also the most popular "folk rhythm": the well-known song "Alma Llanera" is a joropo, considered the unofficial national anthem of Venezuela. | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "TripletLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Evaluation Dataset #### sentence-transformers/hotpotqa * Dataset: [sentence-transformers/hotpotqa](https://huggingface.co/datasets/sentence-transformers/hotpotqa) at [f07d3cd](https://huggingface.co/datasets/sentence-transformers/hotpotqa/tree/f07d3cd2d290ea2e83ed35e33d67d6a4658b8786) * Size: 8,452 evaluation samples * Columns: anchor, positive, and negative * Approximate statistics based on the first 1000 samples: | | anchor | positive | negative | |:--------|:----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| | type | string | string | string | | details | | | | * Samples: | anchor | positive | negative | |:------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | What is the birthdate of this American dancer and choreographer of modern dance, who helped found the Joseph Campbell Foundation with Robert Walter? | Robert Walter (editor) Robert Walter is an editor and an executive with several not-for-profit organizations. Most notably, he is the executive director and board president of the Joseph Campbell Foundation (JCF), an organization that he helped found in 1990 with choreographer Jean Erdman, Joseph Campbell's widow. | Miguel Terekhov Miguel Terekhov (August 22, 1928 – January 3, 2012) was a Uruguayan-born American ballet dancer and ballet instructor. Terekhov and his wife, Yvonne Chouteau, one of the Five Moons, a group of Native American ballet dancers, founded the School of Dance at the University of Oklahoma in 1961. | | What is the difference between Konstantin Orbelyan and Haig P. Manoogian | Konstantin Orbelyan Konstantin Aghaparoni Orbelyan (Armenian: Կոնստանտին Աղապարոնի Օրբելյան ; Russian: Константин Агапаронович Орбелян , July 29, 1928 – April 24, 2014) was an Armenian pianist, composer, head of the State Estrada Orchestra of Armenia. | Mitrofan Lodyzhensky Mitrofan Vasilyevich Lodyzhensky (Russian: Митрофа́н Васи́льевич Лоды́женский , in some sources Лады́женский (Ladyzhensky ); February 27 [O.S. February 15] 1852 – May 31 [O.S. May 18] 1917 ) was a Russian religious philosopher, playwright, and statesman, best known for his "Mystical Trilogy" comprising "Super-consciousness and the Ways to Achieve It", "Light Invisible", and "Dark Force". | | Which movie has more producers, Laura's Star or 9? | Laura's Star Laura's Star (German: Lauras Stern ) is a 2004 German animated feature film produced and directed by Thilo Rothkirch. It is based on the children's book "Lauras Stern" by Klaus Baumgart. It was released by Warner Bros. Family Entertainment. | Laura Mañá Laura Mañá (born January 12, 1968 in Barcelona, Catalonia, Spain) is an actress, film director and screenwriter. | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "TripletLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 32 - `gradient_accumulation_steps`: 16 - `learning_rate`: 2e-05 - `num_train_epochs`: 5 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `bf16`: True - `tf32`: True - `load_best_model_at_end`: True - `optim`: adamw_torch_fused - `resume_from_checkpoint`: bge-base-hotpotwa-matryoshka - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 32 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 16 - `eval_accumulation_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 5 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: True - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: True - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: bge-base-hotpotwa-matryoshka - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | loss | dim_128_cosine_accuracy | dim_256_cosine_accuracy | dim_512_cosine_accuracy | dim_64_cosine_accuracy | dim_768_cosine_accuracy | |:------:|:----:|:-------------:|:-------:|:-----------------------:|:-----------------------:|:-----------------------:|:----------------------:|:-----------------------:| | 0.3366 | 50 | 23.6925 | 21.8521 | 0.9285 | 0.9288 | 0.9334 | 0.9226 | 0.9365 | | 0.6731 | 100 | 22.4254 | 20.8726 | 0.9102 | 0.9110 | 0.9156 | 0.9063 | 0.9168 | | 1.0097 | 150 | 22.046 | 20.7027 | 0.9142 | 0.9162 | 0.9188 | 0.9098 | 0.9200 | | 1.3462 | 200 | 21.871 | 20.6600 | 0.9227 | 0.9198 | 0.9233 | 0.9159 | 0.9232 | | 1.6828 | 250 | 21.7 | 20.6425 | 0.9193 | 0.9192 | 0.9203 | 0.9148 | 0.9217 | | 2.0194 | 300 | 21.5785 | 20.6416 | 0.9113 | 0.9133 | 0.9149 | 0.9082 | 0.9142 | | 2.3559 | 350 | 21.4963 | 20.5366 | 0.9141 | 0.9139 | 0.9162 | 0.9107 | 0.9177 | | 2.6925 | 400 | 21.4012 | 20.5315 | 0.9103 | 0.9114 | 0.9135 | 0.9081 | 0.9136 | | 3.0290 | 450 | 21.3447 | 20.5096 | 0.9093 | 0.9089 | 0.9102 | 0.9057 | 0.9106 | | 3.3656 | 500 | 21.3029 | 20.5548 | 0.9061 | 0.9074 | 0.9075 | 0.9055 | 0.9069 | ### Framework Versions - Python: 3.10.10 - Sentence Transformers: 3.0.1 - Transformers: 4.41.2 - PyTorch: 2.1.2+cu121 - Accelerate: 0.31.0 - Datasets: 2.19.1 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### TripletLoss ```bibtex @misc{hermans2017defense, title={In Defense of the Triplet Loss for Person Re-Identification}, author={Alexander Hermans and Lucas Beyer and Bastian Leibe}, year={2017}, eprint={1703.07737}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```