Convnextv2 finetuned for camera level classification
Convnextv2 base-size model finetuned for the classification of camera angles. Cinescale dataset is used to finetune the model for 20 epochs.
Classifies an image into six classes: aerial, eye, ground, hip, knee, shoulder
Evaluation
On the test set (test.csv), the model has an accuracy of 89.82% and macro-f1 of 82.31%
How to use
from transformers import AutoModelForImageClassification
import torch
from torchvision.transforms import v2
from torchvision.io import read_image, ImageReadMode
model = AutoModelForImageClassification.from_pretrained("gullalc/convnextv2-base-22k-384-cinescale-level")
im_size = 384
# https://www.pexels.com/photo/aerial-view-of-city-buildings-8783146/
image = read_image("demo/level_demo.jpg", mode=ImageReadMode.RGB)
transform = v2.Compose([v2.Resize((im_size,im_size), antialias=True),
v2.ToDtype(torch.float32, scale=True),
v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
inputs = transform(image).unsqueeze(0)
with torch.no_grad():
outputs = model(pixel_values=inputs)
predicted_label = model.config.id2label[torch.argmax(outputs.logits).item()]
print(predicted_label)
# --> aerial
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.