Update README.md

e70b3ca 11 months ago

4.11 kB

	---
	license: apache-2.0
	base_model: microsoft/swinv2-base-patch4-window8-256
	tags:
	- pytoroch
	- Swinv2ForImageClassification
	- food-classification
	- generated_from_trainer
	metrics:
	- accuracy
	- recall
	- precision
	- f1
	model-index:
	- name: Swin-V2-base-Food
	results: []
	datasets:
	- ItsNotRohit/Food121-224
	- food101
	language:
	- en
	library_name: transformers
	pipeline_tag: image-classification
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Swin-V2-base-Food

	This model is a fine-tuned version of [microsoft/swinv2-base-patch4-window8-256](https://huggingface.co/microsoft/swinv2-base-patch4-window8-256) on the ItsNotRohit/Food121-224 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7099
	- Accuracy: 0.8160
	- Recall: 0.8160
	- Precision: 0.8168
	- F1: 0.8159

	## Model description

	Swin v2 is a powerful vision model based on Transformers, achieving top-notch accuracy in image classification tasks. It excels thanks to:

	- __Hierarchical architecture__: Efficiently captures features at different scales, like CNNs.
	- __Shifted windows__: Improves information flow and reduces computational cost.
	- __Large model capacity__: Enables accurate and generalizable predictions.

	Swin v2 sets new records on ImageNet, even needing 40x less data and training time than similar models. It's also versatile, tackling various vision tasks and handling large images.

	The model was fine tuned on a 120 categories of food images.

	To use the model use the following code snippet:

	```python
	from transformers import pipeline
	from PIL import Image

	# init image classification pipeline
	classifier = pipeline("image-classification", "arnabdhar/Swin-V2-base-Food")

	# use pipeline for inference
	image = Image.open(image_path)
	results = classifier(image)
	```

	## Intended uses

	The model can be used for the following tasks:

	- __Food Image Classification__: Use this model to classify food images using the Transformers `pipeline` module.
	- __Base Model for Fine Tuning__: If you want to use this model for your own custom dataset you can surely do so by treating this model as a base model and fine tune it for your own dataset.


	## Training procedure

	The fine tuning was done on Google Colab with a NVIDIA T4 GPU with 15GB of VRAM, the model was trained for 20,000 steps and it took ~5.5 hours for the fine tuning to complete which also included periodic evaluation of the model.

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 16
	- eval_batch_size: 128
	- seed: 17769929
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.01
	- training_steps: 20000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| Recall \| Precision \| F1 \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:--------:\|:------:\|:---------:\|:------:\|
	\| 1.5169 \| 0.33 \| 2000 \| 1.2680 \| 0.6746 \| 0.6746 \| 0.7019 \| 0.6737 \|
	\| 1.2362 \| 0.66 \| 4000 \| 1.0759 \| 0.7169 \| 0.7169 \| 0.7411 \| 0.7178 \|
	\| 1.1076 \| 0.99 \| 6000 \| 0.9757 \| 0.7437 \| 0.7437 \| 0.7593 \| 0.7430 \|
	\| 0.9163 \| 1.32 \| 8000 \| 0.9123 \| 0.7623 \| 0.7623 \| 0.7737 \| 0.7628 \|
	\| 0.8291 \| 1.65 \| 10000 \| 0.8397 \| 0.7807 \| 0.7807 \| 0.7874 \| 0.7796 \|
	\| 0.7949 \| 1.98 \| 12000 \| 0.7724 \| 0.7965 \| 0.7965 \| 0.8014 \| 0.7965 \|
	\| 0.6455 \| 2.31 \| 14000 \| 0.7458 \| 0.8030 \| 0.8030 \| 0.8069 \| 0.8031 \|
	\| 0.6332 \| 2.64 \| 16000 \| 0.7222 \| 0.8110 \| 0.8110 \| 0.8122 \| 0.8106 \|
	\| 0.6132 \| 2.98 \| 18000 \| 0.7021 \| 0.8154 \| 0.8154 \| 0.8170 \| 0.8155 \|
	\| 0.57 \| 3.31 \| 20000 \| 0.7099 \| 0.8160 \| 0.8160 \| 0.8168 \| 0.8159 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.15.0
	- Tokenizers 0.15.0