Model Details
This model is a variant of the ViT architecture, specifically based on the 'vit_base_patch16_224' configuration fine-tuned for satellite image classification tasks using the EuroSAT dataset.
Model type: Vision Transformer (ViT)
Finetuned from model : "timm/vit_base_patch16_224.augreg2_in21k_ft_in1k"
Model Sources
Repository: https://github.com/chathumal93/EuroSat-RGB-Classifiers
Training Details
Training Data
The dataset comprises JPEG composite chips extracted from Sentinel-2 satellite imagery, representing the Red, Green, and Blue bands. It encompasses 27,000 labeled and geo-referenced images across 10 Land Use and Land Cover (LULC) classes
Training Procedure
Preprocessing: Standard image preprocessing including resizing, center cropping, normalization, and data augmentation techniques [RandomHorizontalFlip and RandomVerticalFlip]
Training Hyperparameters
- Learning rate: 3e-5
- Batch size: 64
- Optimizer: AdamW
- Scheduler: PolynomialLR
- Loss: CrossEntropyLoss
- Betas=(0.9, 0.999)
- Weight_decay=0.01
- Epochs: 20
Evaluation
Results
Results on test dataset at 8th epoch.
Model | Phase | Avg Loss | Accuracy |
---|---|---|---|
vit-base-patch16-224-eurosat | Train | 0.012038 | 99.61% |
Validation | 0.023757 | 99.04% | |
Test | 0.040557 | 98.67% |
Model | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|
vit-base-patch16-224-eurosat | 98.67% | 0.98673 | 0.98667 | 0.98668 |
- Downloads last month
- 24