Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This is finetune version of SimCSE: Simple Contrastive Learning of Sentence Embeddings

  • Train supervised on 100K triplet samples samples related to stroke domain from : stroke books, quora medical, quora's stroke, quora's general and human annotates.
  • Positive sentences are generated by paraphrasing and back-translate.
  • Negative sentences are randomly selected in general domain.

Extract sentence representation

from transformers import AutoTokenizer, AutoModel  
tokenizer = AutoTokenizer.from_pretrained("demdecuong/stroke_sup_simcse")
model = AutoModel.from_pretrained("demdecuong/stroke_sup_simcse")

text = "What are disease related to red stroke's causes?"
inputs = tokenizer(text, return_tensors='pt')
outputs = model(**inputs)[1]

Build up embedding for database

database = [
    'What is the daily checklist for stroke returning home',
    'What are some tips for stroke adapt new life',
    'What  should I consider when using nursing-home care'
]

embedding = torch.zeros((len(database),768))

for i in range(len(database)):
  inputs = tokenizer(database[i], return_tensors="pt")
  outputs = model(**inputs)[1]
  embedding[i] = outputs

print(embedding.shape)

Result

On our company's PoC project, the testset contains positive/negative pairs of matching question related to stroke from human-generation.

  • SimCSE supervised + 100k : Train on 100K triplet samples contains : medical, stroke and general domain
  • SimCSE supervised + 42k : Train on 42K triplet samples contains : medical, stroke domain
Model Top-1 Accuracy
SimCSE supervised (author) 75.83
SimCSE unsupervised (ours) 76.66
SimCSE supervised + 100k (ours) 73.33
SimCSE supervised + 42k (ours) 75.83
Downloads last month
31
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.