Realistic Image Classification with Vits

This repository contains a pre-trained model for realistic image classification using the Vision Transformer (ViT) architecture, along with a Python script to perform inference on your own images. The model has been fine-tuned on a massive dataset of 20,000 high-quality images to deliver high-performance results, especially for Stable Diffusion XL (SDXL) tasks.

Hugging Face Model Hub

You can access and download the pre-trained model from the Hugging Face Model Hub using the following link: Real Classifier Model (Vits)

Requirements

To run the inference script, you need to have the following dependencies installed:

PyTorch
Transformers library by Hugging Face
Pillow (PIL)

You can install these requirements using pip:

pip install torch transformers Pillow

Feel free to explore the capabilities of this model and contribute to its development by sharing feedback or improvements. If you have any questions or encounter any issues, please don't hesitate to open an issue in this repository.

nekofura
/

real_classifier

Realistic Image Classification with Vits

Hugging Face Model Hub

Requirements

Evaluation results