|
--- |
|
base_model: dbourget/pb-ds1-48K |
|
datasets: [] |
|
language: [] |
|
library_name: sentence-transformers |
|
metrics: |
|
- pearson_cosine |
|
- spearman_cosine |
|
- pearson_manhattan |
|
- spearman_manhattan |
|
- pearson_euclidean |
|
- spearman_euclidean |
|
- pearson_dot |
|
- spearman_dot |
|
- pearson_max |
|
- spearman_max |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- generated_from_trainer |
|
- dataset_size:106810 |
|
- loss:CosineSimilarityLoss |
|
widget: |
|
- source_sentence: In The Law of Civilization and Decay, Brooks provides a detailed |
|
look at the rise and fall of civilizations, offering a critical perspective on |
|
the impact of capitalism. As societies become prosperous, their pursuit of wealth |
|
ultimately leads to their own downfall as greed takes over. |
|
sentences: |
|
- Patrick Todd's The Open Future argues that all future contingent statements, such |
|
as 'It will rain tomorrow', are inherently false. |
|
- If propositions are made true in virtue of corresponding to facts, then what are |
|
the truth-makers of true negative propositions such as ‘The apple is not red’? |
|
Russell argued that there must be negative facts to account for what makes true |
|
negative propositions true and false positive propositions false. Others, more |
|
parsimonious in their ontological commitments, have attempted to avoid them. Wittgenstein |
|
rejected them since he was loath to think that the sign for negation referred |
|
to a negative element in a fact. A contemporary of Russell’s, Raphael Demos, attempted |
|
to eliminate them by appealing to ‘incompatibility’ facts. More recently, Armstrong |
|
has appealed to the totality of positive facts as the ground of the truth of true |
|
negative propositions. Oaklander and Miracchi have suggested that the absence |
|
or non-existence of the positive fact (which is not itself a further fact) is |
|
the basis of a positive proposition being false and therefore of the truth of |
|
its negation. |
|
- The Law of Civilization and Decay is an overview of history, articulating Brooks' |
|
critical view of capitalism. A civilization grows wealthy, and then its wealth |
|
causes it to crumble upon itself due to greed. |
|
- source_sentence: It is generally accepted that the development of the modern sciences |
|
is rooted in experiment. Yet for a long time, experimentation did not occupy a |
|
prominent role, neither in philosophy nor in history of science. With the ‘practical |
|
turn’ in studying the sciences and their history, this has begun to change. This |
|
paper is concerned with systems and cultures of experimentation and the consistencies |
|
that are generated within such systems and cultures. The first part of the paper |
|
exposes the forms of historical and structural coherence that characterize the |
|
experimental exploration of epistemic objects. In the second part, a particular |
|
experimental culture in the life sciences is briefly described as an example. |
|
A survey will be given of what it means and what it takes to analyze biological |
|
functions in the test tube |
|
sentences: |
|
- Experimentation has long been overlooked in the study of science, but with a new |
|
focus on practical aspects, this is starting to change. This paper explores the |
|
systems and cultures of experimentation and the patterns that emerge within them. |
|
The first part discusses the historical and structural coherence of experimental |
|
exploration. The second part provides a brief overview of an experimental culture |
|
in the life sciences. The paper concludes with a discussion on analyzing biological |
|
functions in the test tube. |
|
- Hintikka and Mutanen have introduced Trail-And-Error machines as a new way to |
|
think about computation, expanding on the traditional Turing machine model. This |
|
innovation opens up new possibilities in the field of computation theory. |
|
- As Allaire and Firsirotu (1984) pointed out over a decade ago, the concept of |
|
culture seemed to be sliding inexorably into a superficial explanatory pool that |
|
promised everything and nothing. However, since then, some sophisticated and interesting |
|
theoretical developments have prevented drowning in the pool of superficiality |
|
and hence theoretical redundancy. The purpose of this article is to build upon |
|
such theoretical developments and to introduce an approach that maintains that |
|
culture can be theorized in the same way as structure, possessing irreducible |
|
powers and properties that predispose organizational actors towards specific courses |
|
of action. The morphogenetic approach is the methodological complement of transcendental |
|
realism, providing explanatory leverage on the conditions that maintain for cultural |
|
change or stability. |
|
- source_sentence: 'This chapter examines three approaches to applied political and |
|
legal philosophy: Standard activism is primarily addressed to other philosophers, |
|
adopts an indirect and coincidental role in creating change, and counts articulating |
|
sound arguments as success. Extreme activism, in contrast, is a form of applied |
|
philosophy directly addressed to policy-makers, with the goal of bringing about |
|
a particular outcome, and measures success in terms of whether it makes a direct |
|
causal contribution to that goal. Finally, conceptual activism (like standard |
|
activism), primarily targets an audience of fellow philosophers, bears a distant, |
|
non-direct, relation to a desired outcome, and counts success in terms of whether |
|
it encourages a particular understanding and adoption of the concepts under examination.' |
|
sentences: |
|
- John Rawls’ resistance to any kind of global egalitarian principle has seemed |
|
strange and unconvincing to many commentators, including those generally supportive |
|
of Rawls’ project. His rejection of a global egalitarian principle seems to rely |
|
on an assumption that states are economically bounded and separate from one another, |
|
which is not an accurate portrayal of economic relations among states in our globalised |
|
world. In this article, I examine the implications of the domestic theory of justice |
|
as fairness to argue that Rawls has good reason to insist on economically bounded |
|
states. I argue that certain central features of the contemporary global economy, |
|
particularly the free movement of capital across borders, undermine the distributional |
|
autonomy required for states to realise Rawls’ principles of justice, and the |
|
domestic theory thus requires a certain degree of economic separation among states |
|
prior to the convening of the international original position. Given this, I defend |
|
Rawls’ reluctance to endorse a global egalitarian principle and defend a policy |
|
regime of international capital controls, to restore distributional autonomy and |
|
make the realisation of the principles of justice as fairness possible. |
|
- 'Bibliography of the writings by Hilary Putnam: 16 books, 198 articles, 10 translations |
|
into German (up to 1994).' |
|
- The jurisprudence under international human rights treaties has had a considerable |
|
impact across countries. Known for addressing complex agendas, the work of expert |
|
bodies under the treaties has been credited and relied upon for filling the gaps |
|
in the realization of several objectives, including the peace and security agenda. In |
|
1982, the Human Rights Committee (ICCPR), in a General Comment observed that “states |
|
have the supreme duty to prevent wars, acts of genocide and other acts of mass |
|
violence ... Every effort … to avert the danger of war, especially thermonuclear |
|
war, and to strengthen international peace and security would constitute the most |
|
important condition and guarantee for the safeguarding of the right to life.” |
|
Over the years, all treaty bodies have contributed in this direction, endorsing |
|
peace and security so as “to protect people against direct and structural violence |
|
… as systemic problems and not merely as isolated incidents …”. A closer look |
|
at the jurisprudence on peace and security, emanating from treaty monitoring mechanisms |
|
including state periodic reports, interpretive statements, the individual communications |
|
procedure, and others, reveals its distinctive nature |
|
- source_sentence: Autonomist accounts of cognitive science suggest that cognitive |
|
model building and theory construction (can or should) proceed independently of |
|
findings in neuroscience. Common functionalist justifications of autonomy rely |
|
on there being relatively few constraints between neural structure and cognitive |
|
function (e.g., Weiskopf, 2011). In contrast, an integrative mechanistic perspective |
|
stresses the mutual constraining of structure and function (e.g., Piccinini & |
|
Craver, 2011; Povich, 2015). In this paper, I show how model-based cognitive neuroscience |
|
(MBCN) epitomizes the integrative mechanistic perspective and concentrates the |
|
most revolutionary elements of the cognitive neuroscience revolution (Boone & |
|
Piccinini, 2016). I also show how the prominent subset account of functional realization |
|
supports the integrative mechanistic perspective I take on MBCN and use it to |
|
clarify the intralevel and interlevel components of integration. |
|
sentences: |
|
- Fictional truth, or truth in fiction/pretense, has been the object of extended |
|
scrutiny among philosophers and logicians in recent decades. Comparatively little |
|
attention, however, has been paid to its inferential relationships with time and |
|
with certain deliberate and contingent human activities, namely, the creation |
|
of fictional works. The aim of the paper is to contribute to filling the gap. |
|
Toward this goal, a formal framework is outlined that is consistent with a variety |
|
of conceptions of fictional truth and based upon a specific formal treatment of |
|
time and agency, that of so-called stit logics. Moreover, a complete axiomatic |
|
theory of fiction-making TFM is defined, where fiction-making is understood as |
|
the exercise of agency and choice in time over what is fictionally true. The language |
|
\ of TFM is an extension of the language of propositional logic, with the addition |
|
of temporal and modal operators. A distinctive feature of \ with respect to other |
|
modal languages is a variety of operators having to do with fictional truth, including |
|
a ‘fictionality’ operator \ . Some applications of TFM are outlined, and some |
|
interesting linguistic and inferential phenomena, which are not so easily dealt |
|
with in other frameworks, are accounted for |
|
- 'We have structured our response according to five questions arising from the |
|
commentaries: (i) What is sentience? (ii) Is sentience a necessary or sufficient |
|
condition for moral standing? (iii) What methods should guide comparative cognitive |
|
research in general, and specifically in studying invertebrates? (iv) How should |
|
we balance scientific uncertainty and moral risk? (v) What practical strategies |
|
can help reduce biases and morally dismissive attitudes toward invertebrates?' |
|
- 'In 2007, ten world-renowned neuroscientists proposed “A Decade of the Mind Initiative.” |
|
The contention was that, despite the successes of the Decade of the Brain, “a |
|
fundamental understanding of how the brain gives rise to the mind [was] still |
|
lacking” (2007, 1321). The primary aims of the decade of the mind were “to build |
|
on the progress of the recent Decade of the Brain (1990-99)” by focusing on “four |
|
broad but intertwined areas” of research, including: healing and protecting, understanding, |
|
enriching, and modeling the mind. These four aims were to be the result of “transdisciplinary |
|
and multiagency” research spanning “across disparate fields, such as cognitive |
|
science, medicine, neuroscience, psychology, mathematics, engineering, and computer |
|
science.” The proposal for a decade of the mind prompted many questions (See Spitzer |
|
2008). In this chapter, I address three of them: (1) How do proponents of this |
|
new decade conceive of the mind? (2) Why should a decade be devoted to understanding |
|
it? (3) What should this decade look like?' |
|
- source_sentence: This essay explores the historical and modern perspectives on the |
|
Gettier problem, highlighting the connections between this issue, skepticism, |
|
and relevance. Through methods such as historical analysis, induction, and deduction, |
|
it is found that while contextual theories and varying definitions of knowledge |
|
do not fully address skeptical challenges, they can help clarify our understanding |
|
of knowledge. Ultimately, embracing subjectivity and intuition can provide insight |
|
into what it truly means to claim knowledge. |
|
sentences: |
|
- In this article I present and analyze three popular moral justifications for hunting. |
|
My purpose is to expose the moral terrain of this issue and facilitate more fruitful, |
|
philosophically relevant discussions about the ethics of hunting. |
|
- Teaching competency in bioethics has been a concern since the field's inception. |
|
The first report on the teaching of contemporary bioethics was published in 1976 |
|
by The Hastings Center, which concluded that graduate programs were not necessary |
|
at the time. However, the report speculated that future developments may require |
|
new academic structures for graduate education in bioethics. The creation of a |
|
terminal degree in bioethics has its critics, with scholars debating whether bioethics |
|
is a discipline with its own methods and theoretical grounding, a multidisciplinary |
|
field, or something else entirely. Despite these debates, new bioethics training |
|
programs have emerged at all postsecondary levels in the U.S. This essay examines |
|
the number and types of programs and degrees in this growing field. |
|
- 'Objective: In this essay, I will try to track some historical and modern stages |
|
of the discussion on the Gettier problem, and point out the interrelations of |
|
the questions that this problem raises for epistemologists, with sceptical arguments, |
|
and a so-called problem of relevance. Methods: historical analysis, induction, |
|
generalization, deduction, discourse, intuition results: Albeit the contextual |
|
theories of knowledge, the use of different definitions of knowledge, and the |
|
different ways of the uses of knowledge do not resolve all the issues that the |
|
sceptic can put forward, but they can be productive in giving clarity to a concept |
|
of knowledge for us. On the other hand, our knowledge will always have an element |
|
of intuition and subjectivity, however not equating to epistemic luck and probability. Significance |
|
novelty: the approach to the context in general, not giving up being a Subject |
|
may give us a clarity about the sense of what it means to say – “I know”.' |
|
model-index: |
|
- name: SentenceTransformer based on dbourget/pb-ds1-48K |
|
results: |
|
- task: |
|
type: semantic-similarity |
|
name: Semantic Similarity |
|
dataset: |
|
name: sts dev |
|
type: sts-dev |
|
metrics: |
|
- type: pearson_cosine |
|
value: 0.9378177365442741 |
|
name: Pearson Cosine |
|
- type: spearman_cosine |
|
value: 0.8943299298202461 |
|
name: Spearman Cosine |
|
- type: pearson_manhattan |
|
value: 0.9709949018414847 |
|
name: Pearson Manhattan |
|
- type: spearman_manhattan |
|
value: 0.8969442622028955 |
|
name: Spearman Manhattan |
|
- type: pearson_euclidean |
|
value: 0.9711044669329696 |
|
name: Pearson Euclidean |
|
- type: spearman_euclidean |
|
value: 0.8966133108746955 |
|
name: Spearman Euclidean |
|
- type: pearson_dot |
|
value: 0.9419649751470724 |
|
name: Pearson Dot |
|
- type: spearman_dot |
|
value: 0.8551487313582053 |
|
name: Spearman Dot |
|
- type: pearson_max |
|
value: 0.9711044669329696 |
|
name: Pearson Max |
|
- type: spearman_max |
|
value: 0.8969442622028955 |
|
name: Spearman Max |
|
--- |
|
|
|
# SentenceTransformer based on dbourget/pb-ds1-48K |
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [dbourget/pb-ds1-48K](https://huggingface.co/dbourget/pb-ds1-48K). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Sentence Transformer |
|
- **Base model:** [dbourget/pb-ds1-48K](https://huggingface.co/dbourget/pb-ds1-48K) <!-- at revision fcd4aeedcdc3ad836820d47fd28ffd2529914647 --> |
|
- **Maximum Sequence Length:** 512 tokens |
|
- **Output Dimensionality:** 768 tokens |
|
- **Similarity Function:** Cosine Similarity |
|
<!-- - **Training Dataset:** Unknown --> |
|
<!-- - **Language:** Unknown --> |
|
<!-- - **License:** Unknown --> |
|
|
|
### Model Sources |
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
|
### Full Model Architecture |
|
|
|
``` |
|
SentenceTransformer( |
|
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel |
|
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
) |
|
``` |
|
|
|
## Usage |
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
First install the Sentence Transformers library: |
|
|
|
```bash |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can load this model and run inference. |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
# Download from the 🤗 Hub |
|
model = SentenceTransformer("dbourget/pb-ds1-48K-philsim") |
|
# Run inference |
|
sentences = [ |
|
'This essay explores the historical and modern perspectives on the Gettier problem, highlighting the connections between this issue, skepticism, and relevance. Through methods such as historical analysis, induction, and deduction, it is found that while contextual theories and varying definitions of knowledge do not fully address skeptical challenges, they can help clarify our understanding of knowledge. Ultimately, embracing subjectivity and intuition can provide insight into what it truly means to claim knowledge.', |
|
'Objective: In this essay, I will try to track some historical and modern stages of the discussion on the Gettier problem, and point out the interrelations of the questions that this problem raises for epistemologists, with sceptical arguments, and a so-called problem of relevance. Methods: historical analysis, induction, generalization, deduction, discourse, intuition results: Albeit the contextual theories of knowledge, the use of different definitions of knowledge, and the different ways of the uses of knowledge do not resolve all the issues that the sceptic can put forward, but they can be productive in giving clarity to a concept of knowledge for us. On the other hand, our knowledge will always have an element of intuition and subjectivity, however not equating to epistemic luck and probability. Significance novelty: the approach to the context in general, not giving up being a Subject may give us a clarity about the sense of what it means to say – “I know”.', |
|
"Teaching competency in bioethics has been a concern since the field's inception. The first report on the teaching of contemporary bioethics was published in 1976 by The Hastings Center, which concluded that graduate programs were not necessary at the time. However, the report speculated that future developments may require new academic structures for graduate education in bioethics. The creation of a terminal degree in bioethics has its critics, with scholars debating whether bioethics is a discipline with its own methods and theoretical grounding, a multidisciplinary field, or something else entirely. Despite these debates, new bioethics training programs have emerged at all postsecondary levels in the U.S. This essay examines the number and types of programs and degrees in this growing field.", |
|
] |
|
embeddings = model.encode(sentences) |
|
print(embeddings.shape) |
|
# [3, 768] |
|
|
|
# Get the similarity scores for the embeddings |
|
similarities = model.similarity(embeddings, embeddings) |
|
print(similarities.shape) |
|
# [3, 3] |
|
``` |
|
|
|
<!-- |
|
### Direct Usage (Transformers) |
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Downstream Usage (Sentence Transformers) |
|
|
|
You can finetune this model on your own dataset. |
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Out-of-Scope Use |
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
--> |
|
|
|
## Evaluation |
|
|
|
### Metrics |
|
|
|
#### Semantic Similarity |
|
* Dataset: `sts-dev` |
|
* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| pearson_cosine | 0.9378 | |
|
| **spearman_cosine** | **0.8943** | |
|
| pearson_manhattan | 0.971 | |
|
| spearman_manhattan | 0.8969 | |
|
| pearson_euclidean | 0.9711 | |
|
| spearman_euclidean | 0.8966 | |
|
| pearson_dot | 0.942 | |
|
| spearman_dot | 0.8551 | |
|
| pearson_max | 0.9711 | |
|
| spearman_max | 0.8969 | |
|
|
|
<!-- |
|
## Bias, Risks and Limitations |
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
--> |
|
|
|
<!-- |
|
### Recommendations |
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
--> |
|
|
|
## Training Details |
|
|
|
### Training Hyperparameters |
|
#### Non-Default Hyperparameters |
|
|
|
- `eval_strategy`: steps |
|
- `per_device_train_batch_size`: 190 |
|
- `per_device_eval_batch_size`: 190 |
|
- `learning_rate`: 5e-06 |
|
- `num_train_epochs`: 2 |
|
- `warmup_ratio`: 0.1 |
|
- `bf16`: True |
|
- `batch_sampler`: no_duplicates |
|
|
|
#### All Hyperparameters |
|
<details><summary>Click to expand</summary> |
|
|
|
- `overwrite_output_dir`: False |
|
- `do_predict`: False |
|
- `eval_strategy`: steps |
|
- `prediction_loss_only`: True |
|
- `per_device_train_batch_size`: 190 |
|
- `per_device_eval_batch_size`: 190 |
|
- `per_gpu_train_batch_size`: None |
|
- `per_gpu_eval_batch_size`: None |
|
- `gradient_accumulation_steps`: 1 |
|
- `eval_accumulation_steps`: None |
|
- `learning_rate`: 5e-06 |
|
- `weight_decay`: 0.0 |
|
- `adam_beta1`: 0.9 |
|
- `adam_beta2`: 0.999 |
|
- `adam_epsilon`: 1e-08 |
|
- `max_grad_norm`: 1.0 |
|
- `num_train_epochs`: 2 |
|
- `max_steps`: -1 |
|
- `lr_scheduler_type`: linear |
|
- `lr_scheduler_kwargs`: {} |
|
- `warmup_ratio`: 0.1 |
|
- `warmup_steps`: 0 |
|
- `log_level`: passive |
|
- `log_level_replica`: warning |
|
- `log_on_each_node`: True |
|
- `logging_nan_inf_filter`: True |
|
- `save_safetensors`: True |
|
- `save_on_each_node`: False |
|
- `save_only_model`: False |
|
- `restore_callback_states_from_checkpoint`: False |
|
- `no_cuda`: False |
|
- `use_cpu`: False |
|
- `use_mps_device`: False |
|
- `seed`: 42 |
|
- `data_seed`: None |
|
- `jit_mode_eval`: False |
|
- `use_ipex`: False |
|
- `bf16`: True |
|
- `fp16`: False |
|
- `fp16_opt_level`: O1 |
|
- `half_precision_backend`: auto |
|
- `bf16_full_eval`: False |
|
- `fp16_full_eval`: False |
|
- `tf32`: None |
|
- `local_rank`: 0 |
|
- `ddp_backend`: None |
|
- `tpu_num_cores`: None |
|
- `tpu_metrics_debug`: False |
|
- `debug`: [] |
|
- `dataloader_drop_last`: False |
|
- `dataloader_num_workers`: 0 |
|
- `dataloader_prefetch_factor`: None |
|
- `past_index`: -1 |
|
- `disable_tqdm`: False |
|
- `remove_unused_columns`: True |
|
- `label_names`: None |
|
- `load_best_model_at_end`: False |
|
- `ignore_data_skip`: False |
|
- `fsdp`: [] |
|
- `fsdp_min_num_params`: 0 |
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
- `fsdp_transformer_layer_cls_to_wrap`: None |
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
- `deepspeed`: None |
|
- `label_smoothing_factor`: 0.0 |
|
- `optim`: adamw_torch |
|
- `optim_args`: None |
|
- `adafactor`: False |
|
- `group_by_length`: False |
|
- `length_column_name`: length |
|
- `ddp_find_unused_parameters`: None |
|
- `ddp_bucket_cap_mb`: None |
|
- `ddp_broadcast_buffers`: False |
|
- `dataloader_pin_memory`: True |
|
- `dataloader_persistent_workers`: False |
|
- `skip_memory_metrics`: True |
|
- `use_legacy_prediction_loop`: False |
|
- `push_to_hub`: False |
|
- `resume_from_checkpoint`: None |
|
- `hub_model_id`: None |
|
- `hub_strategy`: every_save |
|
- `hub_private_repo`: False |
|
- `hub_always_push`: False |
|
- `gradient_checkpointing`: False |
|
- `gradient_checkpointing_kwargs`: None |
|
- `include_inputs_for_metrics`: False |
|
- `eval_do_concat_batches`: True |
|
- `fp16_backend`: auto |
|
- `push_to_hub_model_id`: None |
|
- `push_to_hub_organization`: None |
|
- `mp_parameters`: |
|
- `auto_find_batch_size`: False |
|
- `full_determinism`: False |
|
- `torchdynamo`: None |
|
- `ray_scope`: last |
|
- `ddp_timeout`: 1800 |
|
- `torch_compile`: False |
|
- `torch_compile_backend`: None |
|
- `torch_compile_mode`: None |
|
- `dispatch_batches`: None |
|
- `split_batches`: None |
|
- `include_tokens_per_second`: False |
|
- `include_num_input_tokens_seen`: False |
|
- `neftune_noise_alpha`: None |
|
- `optim_target_modules`: None |
|
- `batch_eval_metrics`: False |
|
- `eval_on_start`: False |
|
- `batch_sampler`: no_duplicates |
|
- `multi_dataset_batch_sampler`: proportional |
|
|
|
</details> |
|
|
|
### Training Logs |
|
<details><summary>Click to expand</summary> |
|
|
|
| Epoch | Step | Training Loss | loss | sts-dev_spearman_cosine | |
|
|:------:|:----:|:-------------:|:------:|:-----------------------:| |
|
| 0 | 0 | - | - | 0.8229 | |
|
| 0.0178 | 10 | 0.0545 | - | - | |
|
| 0.0355 | 20 | 0.0556 | - | - | |
|
| 0.0533 | 30 | 0.0502 | - | - | |
|
| 0.0710 | 40 | 0.0497 | - | - | |
|
| 0.0888 | 50 | 0.0413 | - | - | |
|
| 0.1066 | 60 | 0.0334 | - | - | |
|
| 0.1243 | 70 | 0.0238 | - | - | |
|
| 0.1421 | 80 | 0.0206 | - | - | |
|
| 0.1599 | 90 | 0.0167 | - | - | |
|
| 0.1776 | 100 | 0.0146 | 0.0725 | 0.8788 | |
|
| 0.1954 | 110 | 0.0127 | - | - | |
|
| 0.2131 | 120 | 0.0125 | - | - | |
|
| 0.2309 | 130 | 0.0115 | - | - | |
|
| 0.2487 | 140 | 0.0116 | - | - | |
|
| 0.2664 | 150 | 0.0111 | - | - | |
|
| 0.2842 | 160 | 0.0107 | - | - | |
|
| 0.3020 | 170 | 0.0113 | - | - | |
|
| 0.3197 | 180 | 0.0106 | - | - | |
|
| 0.3375 | 190 | 0.0099 | - | - | |
|
| 0.3552 | 200 | 0.0092 | 0.0207 | 0.8856 | |
|
| 0.3730 | 210 | 0.0097 | - | - | |
|
| 0.3908 | 220 | 0.0099 | - | - | |
|
| 0.4085 | 230 | 0.0087 | - | - | |
|
| 0.4263 | 240 | 0.0087 | - | - | |
|
| 0.4440 | 250 | 0.0082 | - | - | |
|
| 0.4618 | 260 | 0.0083 | - | - | |
|
| 0.4796 | 270 | 0.0089 | - | - | |
|
| 0.4973 | 280 | 0.0082 | - | - | |
|
| 0.5151 | 290 | 0.0078 | - | - | |
|
| 0.5329 | 300 | 0.0081 | 0.0078 | 0.8891 | |
|
| 0.5506 | 310 | 0.0081 | - | - | |
|
| 0.5684 | 320 | 0.0072 | - | - | |
|
| 0.5861 | 330 | 0.0084 | - | - | |
|
| 0.6039 | 340 | 0.0083 | - | - | |
|
| 0.6217 | 350 | 0.0078 | - | - | |
|
| 0.6394 | 360 | 0.0077 | - | - | |
|
| 0.6572 | 370 | 0.008 | - | - | |
|
| 0.6750 | 380 | 0.0073 | - | - | |
|
| 0.6927 | 390 | 0.008 | - | - | |
|
| 0.7105 | 400 | 0.0073 | 0.0058 | 0.8890 | |
|
| 0.7282 | 410 | 0.0075 | - | - | |
|
| 0.7460 | 420 | 0.0077 | - | - | |
|
| 0.7638 | 430 | 0.0074 | - | - | |
|
| 0.7815 | 440 | 0.0073 | - | - | |
|
| 0.7993 | 450 | 0.007 | - | - | |
|
| 0.8171 | 460 | 0.0043 | - | - | |
|
| 0.8348 | 470 | 0.0052 | - | - | |
|
| 0.8526 | 480 | 0.0046 | - | - | |
|
| 0.8703 | 490 | 0.0073 | - | - | |
|
| 0.8881 | 500 | 0.0056 | 0.0069 | 0.8922 | |
|
| 0.9059 | 510 | 0.0059 | - | - | |
|
| 0.9236 | 520 | 0.0045 | - | - | |
|
| 0.9414 | 530 | 0.0033 | - | - | |
|
| 0.9591 | 540 | 0.0058 | - | - | |
|
| 0.9769 | 550 | 0.0056 | - | - | |
|
| 0.9947 | 560 | 0.0046 | - | - | |
|
| 1.0124 | 570 | 0.003 | - | - | |
|
| 1.0302 | 580 | 0.0039 | - | - | |
|
| 1.0480 | 590 | 0.0032 | - | - | |
|
| 1.0657 | 600 | 0.0031 | 0.0029 | 0.8931 | |
|
| 1.0835 | 610 | 0.0046 | - | - | |
|
| 1.1012 | 620 | 0.003 | - | - | |
|
| 1.1190 | 630 | 0.0021 | - | - | |
|
| 1.1368 | 640 | 0.0031 | - | - | |
|
| 1.1545 | 650 | 0.0035 | - | - | |
|
| 1.1723 | 660 | 0.0033 | - | - | |
|
| 1.1901 | 670 | 0.0024 | - | - | |
|
| 1.2078 | 680 | 0.0012 | - | - | |
|
| 1.2256 | 690 | 0.0075 | - | - | |
|
| 1.2433 | 700 | 0.0028 | 0.0036 | 0.8945 | |
|
| 1.2611 | 710 | 0.0033 | - | - | |
|
| 1.2789 | 720 | 0.0023 | - | - | |
|
| 1.2966 | 730 | 0.0034 | - | - | |
|
| 1.3144 | 740 | 0.0018 | - | - | |
|
| 1.3321 | 750 | 0.0016 | - | - | |
|
| 1.3499 | 760 | 0.0025 | - | - | |
|
| 1.3677 | 770 | 0.002 | - | - | |
|
| 1.3854 | 780 | 0.0016 | - | - | |
|
| 1.4032 | 790 | 0.0018 | - | - | |
|
| 1.4210 | 800 | 0.003 | 0.0027 | 0.8944 | |
|
| 1.4387 | 810 | 0.0018 | - | - | |
|
| 1.4565 | 820 | 0.0008 | - | - | |
|
| 1.4742 | 830 | 0.0014 | - | - | |
|
| 1.4920 | 840 | 0.0025 | - | - | |
|
| 1.5098 | 850 | 0.0026 | - | - | |
|
| 1.5275 | 860 | 0.0012 | - | - | |
|
| 1.5453 | 870 | 0.001 | - | - | |
|
| 1.5631 | 880 | 0.001 | - | - | |
|
| 1.5808 | 890 | 0.0012 | - | - | |
|
| 1.5986 | 900 | 0.0021 | 0.0021 | 0.8952 | |
|
| 1.6163 | 910 | 0.0016 | - | - | |
|
| 1.6341 | 920 | 0.0008 | - | - | |
|
| 1.6519 | 930 | 0.0008 | - | - | |
|
| 1.6696 | 940 | 0.0009 | - | - | |
|
| 1.6874 | 950 | 0.0004 | - | - | |
|
| 1.7052 | 960 | 0.0003 | - | - | |
|
| 1.7229 | 970 | 0.0007 | - | - | |
|
| 1.7407 | 980 | 0.0007 | - | - | |
|
| 1.7584 | 990 | 0.0011 | - | - | |
|
| 1.7762 | 1000 | 0.0007 | 0.0029 | 0.8952 | |
|
| 1.7940 | 1010 | 0.0008 | - | - | |
|
| 1.8117 | 1020 | 0.001 | - | - | |
|
| 1.8295 | 1030 | 0.0006 | - | - | |
|
| 1.8472 | 1040 | 0.0006 | - | - | |
|
| 1.8650 | 1050 | 0.0015 | - | - | |
|
| 1.8828 | 1060 | 0.0009 | - | - | |
|
| 1.9005 | 1070 | 0.0005 | - | - | |
|
| 1.9183 | 1080 | 0.0006 | - | - | |
|
| 1.9361 | 1090 | 0.0021 | - | - | |
|
| 1.9538 | 1100 | 0.0009 | 0.0023 | 0.8943 | |
|
| 1.9716 | 1110 | 0.0007 | - | - | |
|
| 1.9893 | 1120 | 0.0003 | - | - | |
|
|
|
</details> |
|
|
|
### Framework Versions |
|
- Python: 3.10.12 |
|
- Sentence Transformers: 3.0.1 |
|
- Transformers: 4.42.3 |
|
- PyTorch: 2.2.0+cu121 |
|
- Accelerate: 0.31.0 |
|
- Datasets: 2.20.0 |
|
- Tokenizers: 0.19.1 |
|
|
|
## Citation |
|
|
|
### BibTeX |
|
|
|
#### Sentence Transformers |
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
month = "11", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/1908.10084", |
|
} |
|
``` |
|
|
|
<!-- |
|
## Glossary |
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Authors |
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Contact |
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
--> |