NeuronZero commited on
Commit
288a011
1 Parent(s): ab95481

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -10
README.md CHANGED
@@ -1,22 +1,27 @@
1
-
2
  ---
3
  tags:
4
  - autotrain
5
  - image-classification
6
- widget:
7
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/tiger.jpg
8
- example_title: Tiger
9
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/teapot.jpg
10
- example_title: Teapot
11
- - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/palace.jpg
12
- example_title: Palace
13
  datasets:
14
  - CXR-Classifier/autotrain-data
 
 
15
  ---
16
 
17
- # Model Trained Using AutoTrain
 
 
 
 
 
 
 
 
 
 
18
 
19
- - Problem type: Image Classification
20
 
21
  ## Validation Metrics
22
  loss: 0.1180819422006607
@@ -30,3 +35,26 @@ recall: 0.973109243697479
30
  auc: 0.9916270580630442
31
 
32
  accuracy: 0.9644607843137255
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  tags:
3
  - autotrain
4
  - image-classification
5
+ - vision
 
 
 
 
 
 
6
  datasets:
7
  - CXR-Classifier/autotrain-data
8
+ license: apache-2.0
9
+ pipeline_tag: image-classification
10
  ---
11
 
12
+ # CXR-Classifier(Small-size model)
13
+
14
+ It is a fine tuned version of [Vision Transformer(base-sized model)](https://huggingface.co/google/vit-base-patch16-224). It was trained on a private dataset.
15
+
16
+ ## Model description
17
+
18
+ The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels. Next, the model was fine-tuned on ImageNet (also referred to as ILSVRC2012), a dataset comprising 1 million images and 1,000 classes, also at resolution 224x224.
19
+
20
+ Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder.
21
+
22
+ By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image.
23
 
24
+ It was introduced in the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Dosovitskiy et al. and first released in [this repository](https://github.com/google-research/vision_transformer). However, the weights were converted from the [timm repository](https://github.com/rwightman/pytorch-image-models) by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him
25
 
26
  ## Validation Metrics
27
  loss: 0.1180819422006607
 
35
  auc: 0.9916270580630442
36
 
37
  accuracy: 0.9644607843137255
38
+
39
+ ## How to use
40
+ Here is how to use this model to identify a pneumonia from a chest x-ray image.
41
+
42
+ ```python
43
+ from transformers import AutoImageProcessor, AutoModelForImageClassification
44
+ from PIL import Image
45
+ import requests
46
+
47
+ processor = AutoImageProcessor.from_pretrained("NeuronZero/CXR-Classifier")
48
+ model = AutoModelForImageClassification.from_pretrained("NeuronZero/CXR-Classifier")
49
+
50
+ #dataset URL: "https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia
51
+
52
+ image_url = "https://storage.googleapis.com/kagglesdsdata/datasets/17810/23812/chest_xray/test/PNEUMONIA/person100_bacteria_482.jpeg?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240406%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240406T182943Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=7edc1aa6f7c2b182355aa18dd1e712c88594decce41f4f69f0c1839b1bc549e5e2d9e9b0eadc836ce0daf884c60f84e482eb0313b921bbc613b8de406dfec403df3845cb32c04c6df9efc2469a2f182a58e65c9def260409ac751d6c06302afb00d32205e8072cc773ba37867bd0940b0e45dff3bfb9924ac44f7bc3682f64b99b4ce26160f62484894594b89da602af0fa235cc998cda55b71d1d99bf10b2a7b7829f68e6742440d11ca141efe1af9cdf7bee47afc5be99bacc1d1c8d2c5eb2dc0978fa8f845b0c1e4a53d2641dc3ba8d10fd4161586596c57314b23b94813427b122141f26d5277ce5b63355801e65a1d39a955f6390f704afc024e91c8a47"
53
+ image = Image.open(requests.get(image_url, stream=True).raw)
54
+
55
+ inputs = processor(images=image, return_tensors="pt")
56
+ outputs = model(**inputs)
57
+ logits = outputs.logits
58
+ predicted_class_idx = logits.argmax(-1).item()
59
+ print("Predicted class:", model.config.id2label[predicted_class_idx])
60
+ ```