Realistic Image Classification with Vits
This repository contains a pre-trained model for realistic image classification using the Vision Transformer (ViT) architecture, along with a Python script to perform inference on your own images. The model has been fine-tuned on a massive dataset of 20,000 high-quality images to deliver high-performance results, especially for Stable Diffusion XL (SDXL) tasks.
Hugging Face Model Hub
You can access and download the pre-trained model from the Hugging Face Model Hub using the following link: Real Classifier Model (Vits)
Requirements
To run the inference script, you need to have the following dependencies installed:
- PyTorch
- Transformers library by Hugging Face
- Pillow (PIL)
You can install these requirements using pip:
pip install torch transformers Pillow
Feel free to explore the capabilities of this model and contribute to its development by sharing feedback or improvements. If you have any questions or encounter any issues, please don't hesitate to open an issue in this repository.
- Downloads last month
- 15
Evaluation results
- Accuracyself-reported0.924