File size: 2,902 Bytes
05d39bf 54e537f a28da73 505495c a28da73 a15635e a28da73 a15635e a28da73 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
---
datasets:
- Matthijs/snacks
model-index:
- name: matteopilotto/vit-base-patch16-224-in21k-snacks
results:
- task:
type: image-classification
name: Image Classification
dataset:
name: Matthijs/snacks
type: Matthijs/snacks
config: default
split: test
metrics:
- name: Accuracy
type: accuracy
value: 0.8928571428571429
verified: true
- name: Precision Macro
type: precision
value: 0.8990033704680036
verified: true
- name: Precision Micro
type: precision
value: 0.8928571428571429
verified: true
- name: Precision Weighted
type: precision
value: 0.8972398709051788
verified: true
- name: Recall Macro
type: recall
value: 0.8914608843537415
verified: true
- name: Recall Micro
type: recall
value: 0.8928571428571429
verified: true
- name: Recall Weighted
type: recall
value: 0.8928571428571429
verified: true
- name: F1 Macro
type: f1
value: 0.892544821273258
verified: true
- name: F1 Micro
type: f1
value: 0.8928571428571429
verified: true
- name: F1 Weighted
type: f1
value: 0.8924168605019522
verified: true
- name: loss
type: loss
value: 0.479541540145874
verified: true
---
# Vision Transformer fine-tuned on `Matthijs/snacks` dataset
Vision Transformer (ViT) model pre-trained on ImageNet-21k and fine-tuned on [**Matthijs/snacks**](https://huggingface.co/datasets/Matthijs/snacks) for 5 epochs using various data augmentation transformations from `torchvision`.
The model achieves a **94.97%** and **94.43%** accuracy on the validation and test set, respectively.
## Data augmentation pipeline
The code block below shows the various transformations applied during pre-processing to augment the original dataset.
The augmented images where generated on-the-fly with the `set_transform` method.
```python
from transformers import ViTFeatureExtractor
from torchvision.transforms import (
Compose,
Normalize,
Resize,
RandomResizedCrop,
RandomHorizontalFlip,
RandomAdjustSharpness,
ToTensor
)
checkpoint = 'google/vit-base-patch16-224-in21k'
feature_extractor = ViTFeatureExtractor.from_pretrained(checkpoint)
# transformations on the training set
train_aug_transforms = Compose([
RandomResizedCrop(size=feature_extractor.size),
RandomHorizontalFlip(p=0.5),
RandomAdjustSharpness(sharpness_factor=5, p=0.5),
ToTensor(),
Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std),
])
# transformations on the validation/test set
valid_aug_transforms = Compose([
Resize(size=(feature_extractor.size, feature_extractor.size)),
ToTensor(),
Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std),
])
``` |