|
--- |
|
language: en |
|
tags: |
|
- bert |
|
- medical |
|
- clinical |
|
- text-classification |
|
- transformers |
|
thumbnail: https://core.app.datexis.com/static/paper.png |
|
inference: true |
|
widget: |
|
- text: Patient with hypertension presents to ICU. |
|
--- |
|
|
|
# CORe Model - Clinical Diagnosis Prediction |
|
|
|
## Model description |
|
|
|
The CORe (_Clinical Outcome Representations_) model is introduced in the paper [Clinical Outcome Predictions from Admission Notes using Self-Supervised Knowledge Integration](https://www.aclweb.org/anthology/2021.eacl-main.75.pdf). |
|
It is based on BioBERT and further pre-trained on clinical notes, disease descriptions and medical articles with a specialised _Clinical Outcome Pre-Training_ objective. |
|
|
|
This model checkpoint is **fine-tuned on the task of diagnosis prediction**. |
|
The model expects patient admission notes as input and outputs multi-label ICD9-code predictions. |
|
|
|
#### Model Predictions |
|
The model makes predictions on a total of 9237 labels. These contain 3- and 4-digit ICD9 codes and textual descriptions of these codes. The 4-digit codes and textual descriptions help to incorporate further topical and hierarchical information into the model during training (see Section 4.2 _ICD+: Incorporation of ICD Hierarchy_ in our paper). We recommend to only use the **3-digit code predictions at inference time**, because only those have been evaluated in our work. |
|
|
|
#### How to use CORe Diagnosis Prediction |
|
|
|
You can load the model via the transformers library: |
|
``` |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
tokenizer = AutoTokenizer.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction") |
|
model = AutoModelForSequenceClassification.from_pretrained("bvanaken/CORe-clinical-diagnosis-prediction") |
|
``` |
|
|
|
The following code shows an inference example: |
|
|
|
``` |
|
input = "CHIEF COMPLAINT: Headaches\n\nPRESENT ILLNESS: 58yo man w/ hx of hypertension, AFib on coumadin presented to ED with the worst headache of his life." |
|
|
|
tokenized_input = tokenizer(input, return_tensors="pt") |
|
output = model(**tokenized_input) |
|
|
|
import torch |
|
predictions = torch.sigmoid(output.logits) |
|
predicted_labels = [model.config.id2label[_id] for _id in (predictions > 0.3).nonzero()[:, 1].tolist()] |
|
``` |
|
Note: For the best performance, we recommend to determine the thresholds (0.3 in this example) individually per label. |
|
|
|
|
|
### More Information |
|
|
|
For all the details about CORe and contact info, please visit [CORe.app.datexis.com](http://core.app.datexis.com/). |
|
|
|
### Cite |
|
|
|
```bibtex |
|
@inproceedings{vanaken21, |
|
author = {Betty van Aken and |
|
Jens-Michalis Papaioannou and |
|
Manuel Mayrdorfer and |
|
Klemens Budde and |
|
Felix A. Gers and |
|
Alexander Löser}, |
|
title = {Clinical Outcome Prediction from Admission Notes using Self-Supervised |
|
Knowledge Integration}, |
|
booktitle = {Proceedings of the 16th Conference of the European Chapter of the |
|
Association for Computational Linguistics: Main Volume, {EACL} 2021, |
|
Online, April 19 - 23, 2021}, |
|
publisher = {Association for Computational Linguistics}, |
|
year = {2021}, |
|
} |
|
``` |