|
--- |
|
license: apache-2.0 |
|
base_model: microsoft/swinv2-base-patch4-window8-256 |
|
tags: |
|
- pytoroch |
|
- Swinv2ForImageClassification |
|
- food-classification |
|
- generated_from_trainer |
|
metrics: |
|
- accuracy |
|
- recall |
|
- precision |
|
- f1 |
|
model-index: |
|
- name: Swin-V2-base-Food |
|
results: [] |
|
datasets: |
|
- ItsNotRohit/Food121-224 |
|
- food101 |
|
language: |
|
- en |
|
library_name: transformers |
|
pipeline_tag: image-classification |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# Swin-V2-base-Food |
|
|
|
This model is a fine-tuned version of [microsoft/swinv2-base-patch4-window8-256](https://huggingface.co/microsoft/swinv2-base-patch4-window8-256) on the ItsNotRohit/Food121-224 dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.7099 |
|
- Accuracy: 0.8160 |
|
- Recall: 0.8160 |
|
- Precision: 0.8168 |
|
- F1: 0.8159 |
|
|
|
## Model description |
|
|
|
Swin v2 is a powerful vision model based on Transformers, achieving top-notch accuracy in image classification tasks. It excels thanks to: |
|
|
|
- __Hierarchical architecture__: Efficiently captures features at different scales, like CNNs. |
|
- __Shifted windows__: Improves information flow and reduces computational cost. |
|
- __Large model capacity__: Enables accurate and generalizable predictions. |
|
|
|
Swin v2 sets new records on ImageNet, even needing 40x less data and training time than similar models. It's also versatile, tackling various vision tasks and handling large images. |
|
|
|
The model was fine tuned on a 120 categories of food images. |
|
|
|
To use the model use the following code snippet: |
|
|
|
```python |
|
from transformers import pipeline |
|
from PIL import Image |
|
|
|
# init image classification pipeline |
|
classifier = pipeline("image-classification", "arnabdhar/Swin-V2-base-Food") |
|
|
|
# use pipeline for inference |
|
image = Image.open(image_path) |
|
results = classifier(image) |
|
``` |
|
|
|
## Intended uses |
|
|
|
The model can be used for the following tasks: |
|
|
|
- __Food Image Classification__: Use this model to classify food images using the Transformers `pipeline` module. |
|
- __Base Model for Fine Tuning__: If you want to use this model for your own custom dataset you can surely do so by treating this model as a base model and fine tune it for your own dataset. |
|
|
|
|
|
## Training procedure |
|
|
|
The fine tuning was done on Google Colab with a NVIDIA T4 GPU with 15GB of VRAM, the model was trained for 20,000 steps and it took ~5.5 hours for the fine tuning to complete which also included periodic evaluation of the model. |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-05 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 128 |
|
- seed: 17769929 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: cosine |
|
- lr_scheduler_warmup_ratio: 0.01 |
|
- training_steps: 20000 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Recall | Precision | F1 | |
|
|:-------------:|:-----:|:-----:|:---------------:|:--------:|:------:|:---------:|:------:| |
|
| 1.5169 | 0.33 | 2000 | 1.2680 | 0.6746 | 0.6746 | 0.7019 | 0.6737 | |
|
| 1.2362 | 0.66 | 4000 | 1.0759 | 0.7169 | 0.7169 | 0.7411 | 0.7178 | |
|
| 1.1076 | 0.99 | 6000 | 0.9757 | 0.7437 | 0.7437 | 0.7593 | 0.7430 | |
|
| 0.9163 | 1.32 | 8000 | 0.9123 | 0.7623 | 0.7623 | 0.7737 | 0.7628 | |
|
| 0.8291 | 1.65 | 10000 | 0.8397 | 0.7807 | 0.7807 | 0.7874 | 0.7796 | |
|
| 0.7949 | 1.98 | 12000 | 0.7724 | 0.7965 | 0.7965 | 0.8014 | 0.7965 | |
|
| 0.6455 | 2.31 | 14000 | 0.7458 | 0.8030 | 0.8030 | 0.8069 | 0.8031 | |
|
| 0.6332 | 2.64 | 16000 | 0.7222 | 0.8110 | 0.8110 | 0.8122 | 0.8106 | |
|
| 0.6132 | 2.98 | 18000 | 0.7021 | 0.8154 | 0.8154 | 0.8170 | 0.8155 | |
|
| 0.57 | 3.31 | 20000 | 0.7099 | 0.8160 | 0.8160 | 0.8168 | 0.8159 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.35.2 |
|
- Pytorch 2.1.0+cu121 |
|
- Datasets 2.15.0 |
|
- Tokenizers 0.15.0 |