Yue Yang commited on
Commit
da4abd5
1 Parent(s): a47f4ef

update README

Browse files
Files changed (1) hide show
  1. README.md +46 -49
README.md CHANGED
@@ -2,14 +2,14 @@
2
  license: mit
3
  widget:
4
  - src: >-
5
- https://prod-images-static.radiopaedia.org/images/566180/d527ff6fc1482161c9225345c4ab42_big_gallery.jpg
6
  candidate_labels: enlarged heart, pleural effusion
7
  example_title: X-ray of cardiomegaly
8
  library_name: open_clip
9
  pipeline_tag: zero-shot-image-classification
10
  ---
11
 
12
- # Model Card for WhyXrayClip
13
 
14
  # Table of Contents
15
 
@@ -22,9 +22,7 @@ pipeline_tag: zero-shot-image-classification
22
 
23
  ## Model Details
24
 
25
- - **Model:** WhyXrayClip
26
-
27
- <!-- Provide the basic links for the model. -->
28
 
29
  - **Paper:** https://arxiv.org/pdf/2405.14839
30
  - **Website:** https://yueyang1996.github.io/knobo/
@@ -35,84 +33,83 @@ pipeline_tag: zero-shot-image-classification
35
 
36
  Use the code below to get started with the model.
37
 
38
- [More Information Needed]
39
-
40
-
41
- ## Uses
42
 
43
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
 
 
44
 
45
- ### Direct Use
 
 
46
 
47
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
48
 
49
- [More Information Needed]
 
 
 
 
50
 
51
- ### Downstream Use
52
 
53
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
54
-
55
- [More Information Needed]
56
 
57
- ### Out-of-Scope Use
58
 
59
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
60
 
61
- [More Information Needed]
62
 
63
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
64
 
 
65
 
66
- ## Training Details
67
 
68
- ### Training Data
69
 
70
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
71
 
72
- [More Information Needed]
73
 
74
- ### Preprocessing [optional]
75
 
76
- [More Information Needed]
77
 
 
 
78
 
79
- ### Training Hyperparameters
80
 
81
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
82
 
83
  ## Evaluation
84
 
85
- <!-- This section describes the evaluation protocols and provides the results. -->
86
-
87
  ### Testing Data
88
 
89
- <!-- This should link to a Dataset Card if possible. -->
90
 
91
- [More Information Needed]
92
-
93
-
94
- ### Metrics
95
-
96
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
97
-
98
- [More Information Needed]
99
 
100
  ### Results
 
101
 
102
- [More Information Needed]
103
-
104
 
105
  ## Citation
106
 
107
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
108
-
109
- **BibTeX:**
110
 
111
  ```
112
  @article{yang2024textbook,
113
- title={A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis},
114
- author={Yue Yang and Mona Gandhi and Yufei Wang and Yifan Wu and Michael S. Yao and Chris Callison-Burch and James C. Gee and Mark Yatskar},
115
- journal={arXiv preprint arXiv:2405.14839},
116
- year={2024}
117
  }
118
  ```
 
2
  license: mit
3
  widget:
4
  - src: >-
5
+ https://prod-images-static.radiopaedia.org/images/566180/d527ff6fc1482161c9225345c4ab42_big_gallery.jpg
6
  candidate_labels: enlarged heart, pleural effusion
7
  example_title: X-ray of cardiomegaly
8
  library_name: open_clip
9
  pipeline_tag: zero-shot-image-classification
10
  ---
11
 
12
+ # Model Card for WhyXrayCLIP 🩻
13
 
14
  # Table of Contents
15
 
 
22
 
23
  ## Model Details
24
 
25
+ WhyXrayCLIP can align X-ray images with text descriptions. It is fine-tuned from [OpenCLIP (ViT-L/14)](https://huggingface.co/laion/CLIP-ViT-L-14-laion2B-s32B-b82K) on [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/) with clinical reports processed by GPT-4. WhyXrayCLIP significantly outperforms PubMedCLIP, BioMedCLIP, etc. in zero-shot and linear probing on various chest X-ray datasets. (See results in [Evaluation](#evaluation)) While our CLIP models excel with careful data curation, training converges quickly, suggesting the current contrastive objective might not fully exploit the information from the data, potentially taking shortcuts, such as comparing images from different patients instead of focusing on diseases. Future research should explore more suitable objectives and larger-scale data collections to develop more robust medical foundation models.
 
 
26
 
27
  - **Paper:** https://arxiv.org/pdf/2405.14839
28
  - **Website:** https://yueyang1996.github.io/knobo/
 
33
 
34
  Use the code below to get started with the model.
35
 
36
+ ```bash
37
+ pip install open_clip_torch
38
+ ```
 
39
 
40
+ ```python
41
+ import torch
42
+ from PIL import Image
43
+ import open_clip
44
 
45
+ model, _, preprocess = open_clip.create_model_and_transforms("hf-hub:yyupenn/whyxrayclip")
46
+ model.eval()
47
+ tokenizer = open_clip.get_tokenizer("ViT-L-14")
48
 
49
+ image = preprocess(Image.open("test_xray.jpg")).unsqueeze(0)
50
+ text = tokenizer(["enlarged heart", "pleural effusion"])
51
 
52
+ with torch.no_grad(), torch.cuda.amp.autocast():
53
+ image_features = model.encode_image(image)
54
+ text_features = model.encode_text(text)
55
+ image_features /= image_features.norm(dim=-1, keepdim=True)
56
+ text_features /= text_features.norm(dim=-1, keepdim=True)
57
 
58
+ text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
59
 
60
+ print("Label probs:", text_probs)
61
+ ```
 
62
 
 
63
 
64
+ ## Uses
65
 
66
+ As per the original [OpenAI CLIP model card](https://github.com/openai/CLIP/blob/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1/model-card.md), this model is intended as a research output for research communities. We hope that this model will enable researchers to better understand and explore zero-shot medical image (X-ray) classification. We also hope it can be used for interdisciplinary studies of the potential impact of such models.
67
 
68
+ ### Direct Use
69
 
70
+ WhyXrayCLIP can be used for zero-shot X-ray classification. You can use it to compute the similarity between an X-ray image and a text description.
71
 
72
+ ### Downstream Use
73
 
74
+ WhyXrayCLIP can be used as a feature extractor for downstream tasks. You can use it to extract features from X-ray images and text descriptions for other downstream tasks.
75
 
76
+ ### Out-of-Scope Use
77
 
78
+ WhyXrayCLIP should not be used for clinical diagnosis or treatment. It is not intended to be used for any clinical decision-making. Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
79
 
 
80
 
81
+ ## Training Details
82
 
83
+ ### Training Data
84
+ We utilize the [MIMIC-CXR](https://physionet.org/content/mimic-cxr/2.0.0/) dataset, specifically selecting only the PA and AP X-rays, which results in 243,334 images, each accompanied by a clinical report written by doctors. We preprocess these reports by extracting medically relevant findings, each described in a short and concise term. In total, we assemble 953K image-text pairs for training WhyXrayCLIP.
85
 
86
+ ### Training Details
87
 
88
+ We utilize the training script from [OpenCLIP](https://github.com/mlfoundations/open_clip) and select [ViT-L/14](https://huggingface.co/laion/CLIP-ViT-L-14-laion2B-s32B-b82K) as the backbone. Training is performed on 4 RTX A6000 GPUs for 10 epochs with a batch size of 128 and a learning rate of 1e−5. We choose checkpoints based on the lowest contrastive loss on validation sets.
89
 
90
  ## Evaluation
91
 
 
 
92
  ### Testing Data
93
 
94
+ We evaluate on 5 X-ray classification datasets: [Pneumonia](https://pubmed.ncbi.nlm.nih.gov/29474911/), [COVID-QU](https://arxiv.org/pdf/2003.13145), [NIH-CXR](https://www.kaggle.com/datasets/nih-chest-xrays/data), [Open-i](https://www.kaggle.com/datasets/raddar/chest-xrays-indiana-university), and [VinDr-CXR](https://vindr.ai/datasets/cxr). We report the zero-shot and linear probing accuracy on the above 5 datasets.
95
 
96
+ ### Baselines
97
+ We compare various CLIP models, including [OpenAI-CLIP](https://huggingface.co/openai/clip-vit-large-patch14), [OpenCLIP](https://huggingface.co/laion/CLIP-ViT-L-14-laion2B-s32B-b82K), [PubMedCLIP](https://huggingface.co/flaviagiammarino/pubmed-clip-vit-base-patch32), [BioMedCLIP](https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224), [PMC-CLIP](https://huggingface.co/ryanyip7777/pmc_vit_l_14) and [MedCLIP](https://github.com/RyanWangZf/MedCLIP). We evaluate these models in both zero-shot and linear probe scenarios. In zero-shot, GPT-4 generates prompts for each class, and we use the ensemble of cosine similarities between the image and prompts as the score for each class. In linear probing, we use the CLIP models as image encoders to extract features for logistic regression. Additionally, we include [DenseNet-121](https://github.com/mlmed/torchxrayvision) (fine-tuned on the pretraining datasets with cross-entropy loss) as a baseline for linear probing.
 
 
 
 
 
 
98
 
99
  ### Results
100
+ The figure below shows the averaged Zero-shot and Linear Probe performance of different models on five chest X-ray datasets.
101
 
102
+ ![Results](X-ray-results.png)
 
103
 
104
  ## Citation
105
 
106
+ Please cite our paper if you use this model in your work:
 
 
107
 
108
  ```
109
  @article{yang2024textbook,
110
+ title={A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis},
111
+ author={Yue Yang and Mona Gandhi and Yufei Wang and Yifan Wu and Michael S. Yao and Chris Callison-Burch and James C. Gee and Mark Yatskar},
112
+ journal={arXiv preprint arXiv:2405.14839},
113
+ year={2024}
114
  }
115
  ```