Yue Yang
commited on
Commit
•
da4abd5
1
Parent(s):
a47f4ef
update README
Browse files
README.md
CHANGED
@@ -2,14 +2,14 @@
|
|
2 |
license: mit
|
3 |
widget:
|
4 |
- src: >-
|
5 |
-
|
6 |
candidate_labels: enlarged heart, pleural effusion
|
7 |
example_title: X-ray of cardiomegaly
|
8 |
library_name: open_clip
|
9 |
pipeline_tag: zero-shot-image-classification
|
10 |
---
|
11 |
|
12 |
-
# Model Card for
|
13 |
|
14 |
# Table of Contents
|
15 |
|
@@ -22,9 +22,7 @@ pipeline_tag: zero-shot-image-classification
|
|
22 |
|
23 |
## Model Details
|
24 |
|
25 |
-
-
|
26 |
-
|
27 |
-
<!-- Provide the basic links for the model. -->
|
28 |
|
29 |
- **Paper:** https://arxiv.org/pdf/2405.14839
|
30 |
- **Website:** https://yueyang1996.github.io/knobo/
|
@@ -35,84 +33,83 @@ pipeline_tag: zero-shot-image-classification
|
|
35 |
|
36 |
Use the code below to get started with the model.
|
37 |
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
## Uses
|
42 |
|
43 |
-
|
|
|
|
|
|
|
44 |
|
45 |
-
|
|
|
|
|
46 |
|
47 |
-
|
|
|
48 |
|
49 |
-
|
|
|
|
|
|
|
|
|
50 |
|
51 |
-
|
52 |
|
53 |
-
|
54 |
-
|
55 |
-
[More Information Needed]
|
56 |
|
57 |
-
### Out-of-Scope Use
|
58 |
|
59 |
-
|
60 |
|
61 |
-
[
|
62 |
|
63 |
-
|
64 |
|
|
|
65 |
|
66 |
-
|
67 |
|
68 |
-
|
69 |
|
70 |
-
|
71 |
|
72 |
-
|
73 |
|
74 |
-
### Preprocessing [optional]
|
75 |
|
76 |
-
|
77 |
|
|
|
|
|
78 |
|
79 |
-
### Training
|
80 |
|
81 |
-
|
82 |
|
83 |
## Evaluation
|
84 |
|
85 |
-
<!-- This section describes the evaluation protocols and provides the results. -->
|
86 |
-
|
87 |
### Testing Data
|
88 |
|
89 |
-
|
90 |
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
### Metrics
|
95 |
-
|
96 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
97 |
-
|
98 |
-
[More Information Needed]
|
99 |
|
100 |
### Results
|
|
|
101 |
|
102 |
-
[
|
103 |
-
|
104 |
|
105 |
## Citation
|
106 |
|
107 |
-
|
108 |
-
|
109 |
-
**BibTeX:**
|
110 |
|
111 |
```
|
112 |
@article{yang2024textbook,
|
113 |
-
|
114 |
-
|
115 |
-
|
116 |
-
|
117 |
}
|
118 |
```
|
|
|
2 |
license: mit
|
3 |
widget:
|
4 |
- src: >-
|
5 |
+
https://prod-images-static.radiopaedia.org/images/566180/d527ff6fc1482161c9225345c4ab42_big_gallery.jpg
|
6 |
candidate_labels: enlarged heart, pleural effusion
|
7 |
example_title: X-ray of cardiomegaly
|
8 |
library_name: open_clip
|
9 |
pipeline_tag: zero-shot-image-classification
|
10 |
---
|
11 |
|
12 |
+
# Model Card for WhyXrayCLIP 🩻
|
13 |
|
14 |
# Table of Contents
|
15 |
|
|
|
22 |
|
23 |
## Model Details
|
24 |
|
25 |
+
WhyXrayCLIP can align X-ray images with text descriptions. It is fine-tuned from [OpenCLIP (ViT-L/14)](https://huggingface.co/laion/CLIP-ViT-L-14-laion2B-s32B-b82K) on [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/) with clinical reports processed by GPT-4. WhyXrayCLIP significantly outperforms PubMedCLIP, BioMedCLIP, etc. in zero-shot and linear probing on various chest X-ray datasets. (See results in [Evaluation](#evaluation)) While our CLIP models excel with careful data curation, training converges quickly, suggesting the current contrastive objective might not fully exploit the information from the data, potentially taking shortcuts, such as comparing images from different patients instead of focusing on diseases. Future research should explore more suitable objectives and larger-scale data collections to develop more robust medical foundation models.
|
|
|
|
|
26 |
|
27 |
- **Paper:** https://arxiv.org/pdf/2405.14839
|
28 |
- **Website:** https://yueyang1996.github.io/knobo/
|
|
|
33 |
|
34 |
Use the code below to get started with the model.
|
35 |
|
36 |
+
```bash
|
37 |
+
pip install open_clip_torch
|
38 |
+
```
|
|
|
39 |
|
40 |
+
```python
|
41 |
+
import torch
|
42 |
+
from PIL import Image
|
43 |
+
import open_clip
|
44 |
|
45 |
+
model, _, preprocess = open_clip.create_model_and_transforms("hf-hub:yyupenn/whyxrayclip")
|
46 |
+
model.eval()
|
47 |
+
tokenizer = open_clip.get_tokenizer("ViT-L-14")
|
48 |
|
49 |
+
image = preprocess(Image.open("test_xray.jpg")).unsqueeze(0)
|
50 |
+
text = tokenizer(["enlarged heart", "pleural effusion"])
|
51 |
|
52 |
+
with torch.no_grad(), torch.cuda.amp.autocast():
|
53 |
+
image_features = model.encode_image(image)
|
54 |
+
text_features = model.encode_text(text)
|
55 |
+
image_features /= image_features.norm(dim=-1, keepdim=True)
|
56 |
+
text_features /= text_features.norm(dim=-1, keepdim=True)
|
57 |
|
58 |
+
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
|
59 |
|
60 |
+
print("Label probs:", text_probs)
|
61 |
+
```
|
|
|
62 |
|
|
|
63 |
|
64 |
+
## Uses
|
65 |
|
66 |
+
As per the original [OpenAI CLIP model card](https://github.com/openai/CLIP/blob/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1/model-card.md), this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot medical image (X-ray) classification. We also hope it can be used for interdisciplinary studies of the potential impact of such models.
|
67 |
|
68 |
+
### Direct Use
|
69 |
|
70 |
+
WhyXrayCLIP can be used for zero-shot X-ray classification. You can use it to compute the similarity between an X-ray image and a text description.
|
71 |
|
72 |
+
### Downstream Use
|
73 |
|
74 |
+
WhyXrayCLIP can be used as a feature extractor for downstream tasks. You can use it to extract features from X-ray images and text descriptions for other downstream tasks.
|
75 |
|
76 |
+
### Out-of-Scope Use
|
77 |
|
78 |
+
WhyXrayCLIP should not be used for clinical diagnosis or treatment. It is not intended to be used for any clinical decision-making. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
|
79 |
|
|
|
80 |
|
81 |
+
## Training Details
|
82 |
|
83 |
+
### Training Data
|
84 |
+
We utilize the [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/) dataset, specifically selecting only the PA and AP X-rays, which results in 243,334 images, each accompanied by a clinical report written by doctors. We preprocess these reports by extracting medically relevant findings, each described in a short and concise term. In total, we assemble 953K image-text pairs for training WhyXrayCLIP.
|
85 |
|
86 |
+
### Training Details
|
87 |
|
88 |
+
We utilize the training script from [OpenCLIP](https://github.com/mlfoundations/open_clip) and select [ViT-L/14](https://huggingface.co/laion/CLIP-ViT-L-14-laion2B-s32B-b82K) as the backbone. Training is performed on 4 RTX A6000 GPUs for 10 epochs with a batch size of 128 and a learning rate of 1e−5. We choose checkpoints based on the lowest contrastive loss on validation sets.
|
89 |
|
90 |
## Evaluation
|
91 |
|
|
|
|
|
92 |
### Testing Data
|
93 |
|
94 |
+
We evaluate on 5 X-ray classification datasets: [Pneumonia](https://pubmed.ncbi.nlm.nih.gov/29474911/), [COVID-QU](https://arxiv.org/pdf/2003.13145), [NIH-CXR](https://www.kaggle.com/datasets/nih-chest-xrays/data), [Open-i](https://www.kaggle.com/datasets/raddar/chest-xrays-indiana-university), and [VinDr-CXR](https://vindr.ai/datasets/cxr). We report the zero-shot and linear probing accuracy on the above 5 datasets.
|
95 |
|
96 |
+
### Baselines
|
97 |
+
We compare various CLIP models, including [OpenAI-CLIP](https://huggingface.co/openai/clip-vit-large-patch14), [OpenCLIP](https://huggingface.co/laion/CLIP-ViT-L-14-laion2B-s32B-b82K), [PubMedCLIP](https://huggingface.co/flaviagiammarino/pubmed-clip-vit-base-patch32), [BioMedCLIP](https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224), [PMC-CLIP](https://huggingface.co/ryanyip7777/pmc_vit_l_14) and [MedCLIP](https://github.com/RyanWangZf/MedCLIP). We evaluate these models in both zero-shot and linear probe scenarios. In zero-shot, GPT-4 generates prompts for each class, and we use the ensemble of cosine similarities between the image and prompts as the score for each class. In linear probing, we use the CLIP models as image encoders to extract features for logistic regression. Additionally, we include [DenseNet-121](https://github.com/mlmed/torchxrayvision) (fine-tuned on the pretraining datasets with cross-entropy loss) as a baseline for linear probing.
|
|
|
|
|
|
|
|
|
|
|
|
|
98 |
|
99 |
### Results
|
100 |
+
The figure below shows the averaged Zero-shot and Linear Probe performance of different models on five chest X-ray datasets.
|
101 |
|
102 |
+
![Results](X-ray-results.png)
|
|
|
103 |
|
104 |
## Citation
|
105 |
|
106 |
+
Please cite our paper if you use this model in your work:
|
|
|
|
|
107 |
|
108 |
```
|
109 |
@article{yang2024textbook,
|
110 |
+
title={A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis},
|
111 |
+
author={Yue Yang and Mona Gandhi and Yufei Wang and Yifan Wu and Michael S. Yao and Chris Callison-Burch and James C. Gee and Mark Yatskar},
|
112 |
+
journal={arXiv preprint arXiv:2405.14839},
|
113 |
+
year={2024}
|
114 |
}
|
115 |
```
|