Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card for ResNet-50 Text Detector

This model was trained with the intent to quickly classify whether or not an image contains legible text or not. It was trained as a binary classification problem on the COCO-Text dataset together with some images from LLaVAR. This came out to a total of ~70k images, where 50% of them had text and 50% of them had no legible text.

Model Details

How to Get Started with the Model

from PIL import Image
import requests

from transformers import AutoImageProcessor, AutoModelForImageClassification

model = AutoModelForImageClassification.from_pretrained(
    "miguelcarv/resnet-50-text-detector",
)

processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50", do_resize=False)

url = "http://images.cocodataset.org/train2017/000000044520.jpg"
image = Image.open(requests.get(url, stream=True).raw).convert('RGB').resize((256,256))

inputs = processor(image, return_tensors="pt").pixel_values

outputs = model(inputs)
logits_per_image = outputs.logits 
probs = logits_per_image.softmax(dim=1) 
print(probs)
# tensor([[0.1149, 0.8851]])

Training Details

  • Trained for three epochs
  • Resolution: 256x256
  • Learning rate: 5e-5
  • Optimizer: AdamW
  • Batch size: 64
  • Trained with FP32
Downloads last month
12
Safetensors
Model size
23.6M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.