library_name: transformers
tags: []
Model Card for Model ID
This model is a finetuned image classificator based on vit-base-patch16-224 This model takes as input a picture from google maps' street view representing a road and returns a walkability score from 0 (worst score) to 4 (best score)
How to Use
Load the model with the following code:
from transformers import AutoModelForImageClassification
model = AutoModelForImageClassification.from_pretrained("AEnigmista/Sardegna-ViT", num_labels=5, ignore_mismatched_sizes=True)
For more information on the code: please visit the github repo
Training Hyper-parameters
This version's hyper-parameters for training are:
- Fp16 = True
- batch size = 32
- 10 epochs
- learning rate = 1e-4
- optimizer = 'adamw_hf'
Metrics
The metrics that are used for evaluation are accuracy, recall, precision, mse, confusion matrix and a custom metric called one_out. The one_out_accuracy uses the confusion matrix to check how many predictions of the model are within 1 from the ground truth (so label 2 is considered correct if ground truth is 1 or 3, incorrect if 0 or 5). Since each label is actually a walkability score, this metric is useful to see how many predictions of the model are correct or pretty close to the expected value, and, thus, how many predictions are way off (for example a street with 0 walkability score is predicted as 4)