Finetuning: Problem with dense similarity scores (0.7 .. 1.0) ?
Great ST! But finetuning gets me bad results:
I could not manage to get good results when finetuning for sentence similarity with HF standard classes.
Loss functions tried: (online) contrastive loss, cosine similarity loss. For contrastive, a margin is used, which is usually 0.5. But for your Sentence Transformer this does not make any sense (because sim. scores are within 0.7 and 1.0, as you wrote and as I observed by myself).
Evaluators tried: BinaryClassificationEvaluator, EmbeddingSimilarityEvaluator. However, the real usage shows bad results after finetuning.
Which loss function (with which basic parameters) would you recommend to finetune for sentence similarity?
Especially, I want to match similar queries
Thank you for your reply :-)
I recommend using the InfoNCE loss as mentioned in the paper, this loss is not sensitive to the absolute value of the similarity scores.
If you see decreased performance after fine-tuning, please try to lower learning rate / using fewer steps / use hard negatives etc.