metadata

library_name: transformers
tags: []

Model Card for Model ID

This model is a finetuned image classificator based on vit-base-patch16-224 This model takes as input a picture from google maps' street view representing a road and returns a walkability score from 0 (worst score) to 4 (best score)

How to Use

Load the model with the following code:

from transformers import AutoModelForImageClassification
model = AutoModelForImageClassification.from_pretrained("AEnigmista/Sardegna-ViT", num_labels=5, ignore_mismatched_sizes=True)

For more information on the code: please visit the github repo

Training Hyper-parameters

This version's hyper-parameters for training are:

Fp16 = True
batch size = 32
10 epochs
learning rate = 1e-4
optimizer = 'adamw_hf'

Metrics

The metrics that are used for evaluation are accuracy, recall, precision, mse, confusion matrix and a custom metric called one_out. The one_out_accuracy uses the confusion matrix to check how many predictions of the model are within 1 from the ground truth (so label 2 is considered correct if ground truth is 1 or 3, incorrect if 0 or 5). Since each label is actually a walkability score, this metric is useful to see how many predictions of the model are correct or pretty close to the expected value, and, thus, how many predictions are way off (for example a street with 0 walkability score is predicted as 4)