File size: 1,392 Bytes
8fb1779 8e4a671 3ae35da b035a1d 3ae35da 7671b2d 8fb1779 b035a1d 46f3b33 8fb1779 c06fd72 b035a1d 8fb1779 3eef34e 8fb1779 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
---
language:
- en
tags:
- feature-extraction
- pubmed
- sentence-similarity
datasets:
- biu-nlp/abstract-sim-pubmed
---
A model for mapping abstract sentence descriptions to sentences that fit the descriptions. Trained on Pubmed sentences. Use ```load_finetuned_model``` to load the query and sentence encoder, and ```encode_batch()``` to encode a sentence with the model.
```python
from transformers import AutoTokenizer, AutoModel
import torch
def load_finetuned_model():
sentence_encoder = AutoModel.from_pretrained("biu-nlp/abstract-sim-sentence-pubmed", revision="71f4539120e29024adc618173a1ed5fd230ac249")
query_encoder = AutoModel.from_pretrained("biu-nlp/abstract-sim-query-pubmed", revision="8d34676d80a39bcbc5a1d2eec13e6f8078496215")
tokenizer = AutoTokenizer.from_pretrained("biu-nlp/abstract-sim-sentence-pubmed")
return tokenizer, query_encoder, sentence_encoder
def encode_batch(model, tokenizer, sentences, device):
input_ids = tokenizer(sentences, padding=True, max_length=128, truncation=True, return_tensors="pt",
add_special_tokens=True).to(device)
features = model(**input_ids)[0]
features = torch.sum(features[:,:,:] * input_ids["attention_mask"][:,:].unsqueeze(-1), dim=1) / torch.clamp(torch.sum(input_ids["attention_mask"][:,:], dim=1, keepdims=True), min=1e-9)
return features
``` |