matteopilotto
/

vit-base-patch16-224-in21k-snacks

Image Classification

Inference Endpoints

Model card Files Files and versions Community

matteopilotto commited on May 14, 2022

Commit

a28da73

•

1 Parent(s): 05d39bf

Update README.md

Files changed (1) hide show

README.md +44 -1

README.md CHANGED Viewed

@@ -1,4 +1,47 @@
 ---
 datasets:
 - Matthijs/snacks
----

 ---
 datasets:
 - Matthijs/snacks
+---
+# Vision Transformer fine-tuned on `Matthijs/snacks` dataset
+Vision Transformer (ViT) model pre-trained on ImageNet-21k and fine-tuned [**Matthijs/snacks**](https://huggingface.co/datasets/Matthijs/snacks) dataset for 5 epochs using various data augmentation transformations from `torchvision`.
+The model achieves a **94.97%** and **94.43%** accuracy on the validation and test set, respectively.
+## Data augmentation pipeline
+The code block below shows the various transformations applied during pre-processing to augment the original dataset.
+The augmented images where generated on-the-fly with the `set_transform` method.
+```python
+from transformers import ViTFeatureExtractor
+from torchvision.transforms import (
+    Compose,
+    Normalize,
+    Resize,
+    RandomResizedCrop,
+    RandomHorizontalFlip,
+    RandomAdjustSharpness,
+    ToTensor
+)
+checkpoint = 'google/vit-base-patch16-224-in21k'
+feature_extractor = ViTFeatureExtractor.from_pretrained(checkpoint)
+# train
+train_aug_transforms = Compose([
+    RandomResizedCrop(size=feature_extractor.size),
+    RandomHorizontalFlip(p=0.5),
+    RandomAdjustSharpness(sharpness_factor=5, p=0.5),
+    ToTensor(),
+    Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std),
+])
+# validation/test
+valid_aug_transforms = Compose([
+    Resize(size=(feature_extractor.size, feature_extractor.size)),
+    ToTensor(),
+    Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std),
+])
+```