Update README.md
Browse files
README.md
CHANGED
@@ -13,9 +13,7 @@ widget:
|
|
13 |
|
14 |
# Vision Transformer (base-sized model)
|
15 |
|
16 |
-
Vision Transformer (ViT) model
|
17 |
-
|
18 |
-
Finally the ViT was finetuned on the [Chaoyang dataset](https://paperswithcode.com/dataset/chaoyang) at resolution 384x384, using a fixed 10% of the training set as the validation set and evaluated on the official test set using the best validation model based on the loss
|
19 |
|
20 |
# Augmentation pipeline
|
21 |
To address the issue of class imbalance in our training set, we performed oversampling with repetition.
|
@@ -78,8 +76,7 @@ Currently, both the feature extractor and model support PyTorch. Tensorflow and
|
|
78 |
|
79 |
## Training data
|
80 |
|
81 |
-
The ViT model was
|
82 |
-
Finally the ViT was finetuned on the [Chaoyang dataset](https://paperswithcode.com/dataset/chaoyang) at resolution 384x384, using a fixed 10% of the training set as the validation set
|
83 |
|
84 |
## Training procedure
|
85 |
|
@@ -87,7 +84,7 @@ Finally the ViT was finetuned on the [Chaoyang dataset](https://paperswithcode.c
|
|
87 |
|
88 |
The exact details of preprocessing of images during training/validation can be found [here](https://github.com/google-research/vision_transformer/blob/master/vit_jax/input_pipeline.py).
|
89 |
|
90 |
-
Images are resized/rescaled to the same resolution
|
91 |
|
92 |
# License
|
93 |
This model is provided for non-commercial use only and may not be used in any research or publication without prior written consent from the author.
|
|
|
13 |
|
14 |
# Vision Transformer (base-sized model)
|
15 |
|
16 |
+
Vision Transformer (ViT) model trained on the [Chaoyang dataset](https://paperswithcode.com/dataset/chaoyang) at resolution 384x384, using a fixed 10% of the training set as the validation set and evaluated on the official test set using the best validation model based on the loss
|
|
|
|
|
17 |
|
18 |
# Augmentation pipeline
|
19 |
To address the issue of class imbalance in our training set, we performed oversampling with repetition.
|
|
|
76 |
|
77 |
## Training data
|
78 |
|
79 |
+
The ViT model was tuned on the [Chaoyang dataset](https://paperswithcode.com/dataset/chaoyang) at resolution 384x384, using a fixed 10% of the training set as the validation set
|
|
|
80 |
|
81 |
## Training procedure
|
82 |
|
|
|
84 |
|
85 |
The exact details of preprocessing of images during training/validation can be found [here](https://github.com/google-research/vision_transformer/blob/master/vit_jax/input_pipeline.py).
|
86 |
|
87 |
+
Images are resized/rescaled to the same resolution 384x384 during training and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5).
|
88 |
|
89 |
# License
|
90 |
This model is provided for non-commercial use only and may not be used in any research or publication without prior written consent from the author.
|