RadBERT-RoBERTa-4m
This is one variant of our RadBERT models trained with 4 million deidentified medical reports from US VA hospital, which achieves stronger medical language understanding performance than previous medical domain models such as BioBERT, Clinical-BERT, BLUE-BERT and BioMed-RoBERTa.
Performances are evaluated on three tasks: (a) abnormal sentence classification: sentence classification in radiology reports as reporting abnormal or normal findings; (b) report coding: Assign a diagnostic code to a given radiology report for five different coding systems; (c) report summarization: given the findings section of a radiology report, extractively select key sentences that summarized the findings.
For details, check out the paper here: RadBERT: Adapting transformer-based language models to radiology
Code for the paper is released at this GitHub repo.
How to use
Here is an example of how to use this model to extract the features of a given text in PyTorch:
from transformers import AutoConfig, AutoTokenizer, AutoModel
config = AutoConfig.from_pretrained('zzxslp/RadBERT-RoBERTa-4m')
tokenizer = AutoTokenizer.from_pretrained('zzxslp/RadBERT-RoBERTa-4m')
model = AutoModel.from_pretrained('zzxslp/RadBERT-RoBERTa-4m', config=config)
text = "Replace me by any medical text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
BibTeX entry and citation info
If you use the model, please cite our paper:
@article{yan2022radbert,
title={RadBERT: Adapting transformer-based language models to radiology},
author={Yan, An and McAuley, Julian and Lu, Xing and Du, Jiang and Chang, Eric Y and Gentili, Amilcare and Hsu, Chun-Nan},
journal={Radiology: Artificial Intelligence},
volume={4},
number={4},
pages={e210258},
year={2022},
publisher={Radiological Society of North America}
}
- Downloads last month
- 769