RadBERT-2m
This is a base model of Radiology-BERT from UC San Diego and VA healthcare system. It is initialized from BERT-base-uncased and further trained with 2 million radiology reports deidentified from US VA hospital. The model achieves stronger medical language understanding performance than previous medical domain models such as BioBERT, Clinical-BERT, BLUE-BERT and BioMed-RoBERTa.
Performances are evaluated on three tasks: (a) abnormal sentence classification: sentence classification in radiology reports as reporting abnormal or normal findings; (b) report coding: Assign a diagnostic code to a given radiology report for five different coding systems; (c) report summarization: given the findings section of a radiology report, extractively select key sentences that summarized the findings.
It also shows superior performance on other radiology NLP tasks which are not reported in the paper.
For details, check out the paper here: RadBERT: Adapting transformer-based language models to radiology
How to use
Here is an example of how to use this model to extract the features of a given text in PyTorch:
from transformers import AutoConfig, AutoTokenizer, AutoModel
config = AutoConfig.from_pretrained('zzxslp/RadBERT-RoBERTa-4m')
tokenizer = AutoTokenizer.from_pretrained('zzxslp/RadBERT-RoBERTa-4m')
model = AutoModel.from_pretrained('zzxslp/RadBERT-RoBERTa-4m', config=config)
text = "Replace me by any medical text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
BibTeX entry and citation info
If you use the model, please cite our paper:
@article{yan2022radbert,
title={RadBERT: Adapting transformer-based language models to radiology},
author={Yan, An and McAuley, Julian and Lu, Xing and Du, Jiang and Chang, Eric Y and Gentili, Amilcare and Hsu, Chun-Nan},
journal={Radiology: Artificial Intelligence},
volume={4},
number={4},
pages={e210258},
year={2022},
publisher={Radiological Society of North America}
}
- Downloads last month
- 60