edumunozsala commited on
Commit
9276798
1 Parent(s): 5004614

Upload README file

Browse files

Added an initial README file

Files changed (1) hide show
  1. README.md +102 -0
README.md ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: es
3
+ tags:
4
+ - sagemaker
5
+ - vit
6
+ - ImageClassification
7
+ - generated_from_trainer
8
+ license: apache-2.0
9
+ datasets:
10
+ - cifar100
11
+ metrics:
12
+ - accuracy
13
+ model-index:
14
+ - name: vit_base-224-in21k-ft-cifar100
15
+ results:
16
+ - task:
17
+ name: Image Classification
18
+ type: image-classification
19
+ dataset:
20
+ name: "Cifar100"
21
+ type: cifar100
22
+ metrics:
23
+ - name: Accuracy,
24
+ type: accuracy,
25
+ value: 0.9148
26
+ ---
27
+
28
+ # Model vit_base-224-in21k-ft-cifar100
29
+
30
+ ## **A finetuned model for Image classification in Spanish**
31
+
32
+ This model was trained using Amazon SageMaker and the Hugging Face Deep Learning container,
33
+ The base model is **Vision Transformer (base-sized model)** which is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224x224 pixels.[Link to base model](https://huggingface.co/google/vit-base-patch16-224-in21k)
34
+
35
+ ## Base model citation
36
+ ### BibTeX entry and citation info
37
+
38
+ ```bibtex
39
+ @misc{wu2020visual,
40
+ title={Visual Transformers: Token-based Image Representation and Processing for Computer Vision},
41
+ author={Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda},
42
+ year={2020},
43
+ eprint={2006.03677},
44
+ archivePrefix={arXiv},
45
+ primaryClass={cs.CV}
46
+ }
47
+ ```
48
+
49
+ ## Dataset
50
+ [Link to dataset description](http://www.cs.toronto.edu/~kriz/cifar.html)
51
+
52
+ The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton
53
+
54
+
55
+ The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
56
+ This dataset,CIFAR100, is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).
57
+
58
+ Sizes of datasets:
59
+ - Train dataset: 50,000
60
+ - Test dataset: 10,000
61
+
62
+
63
+ ## Intended uses & limitations
64
+
65
+ This model is intented for Image Classification.
66
+
67
+
68
+ ## Hyperparameters
69
+ {
70
+ "epochs": "5",
71
+ "train_batch_size": "32",
72
+ "eval_batch_size": "8",
73
+ "fp16": "true",
74
+ "learning_rate": "1e-05",
75
+ }
76
+
77
+ ## Test results
78
+
79
+ - Accuracy = 0.9148
80
+
81
+
82
+ ## Model in action
83
+
84
+ ### Usage for Image Classification
85
+
86
+ ```python
87
+ from transformers import ViTFeatureExtractor, ViTModel
88
+ from PIL import Image
89
+ import requests
90
+
91
+ url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
92
+ image = Image.open(requests.get(url, stream=True).raw)
93
+
94
+ feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224-in21k')
95
+ model = ViTModel.from_pretrained('edumunozsala/vit_base-224-in21k-ft-cifar100')
96
+ inputs = feature_extractor(images=image, return_tensors="pt")
97
+
98
+ outputs = model(**inputs)
99
+ last_hidden_states = outputs.last_hidden_state
100
+ ```
101
+
102
+ Created by [Eduardo Muñoz/@edumunozsala](https://github.com/edumunozsala)