license: apache-2.0
tags:
- object-detection
- face-mask-detection
datasets:
- coco
- face-mask-detection
widget:
- src: https://drive.google.com/uc?id=1VwYLbGak5c-2P5qdvfWVOeg7DTDYPbro
example_title: City Folk
- src: >-
https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg
example_title: Football Match
metrics:
- average precision
- recall
model-index:
- name: yolos-small-finetuned-masks
results: []
YOLOS (small-sized) model
The original YOLOS model was fine-tuned on COCO 2017 object detection (118k annotated images). It was introduced in the paper You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Fang et al. and first released in this repository.
This model was further fine-tuned on the face mask dataset from Kaggle. The dataset consists of 853 images of people with annotations categorised as "with mask","without mask" and "mask not worn correctly". The model was trained for 200 epochs on a single GPU usins Google Colab
Model description
YOLOS is a Vision Transformer (ViT) trained using the DETR loss. Despite its simplicity, a base-sized YOLOS model is able to achieve 42 AP on COCO validation 2017 (similar to DETR and more complex frameworks such as Faster R-CNN).
Intended uses & limitations
You can use the raw model for object detection. See the model hub to look for all available YOLOS models.
How to use
Here is how to use this model:
from transformers import YolosFeatureExtractor, YolosForObjectDetection
from PIL import Image
import requests
url = 'https://drive.google.com/uc?id=1VwYLbGak5c-2P5qdvfWVOeg7DTDYPbro'
image = Image.open(requests.get(url, stream=True).raw)
feature_extractor = YolosFeatureExtractor.from_pretrained('nickmuchi/yolos-small-finetuned-masks')
model = YolosForObjectDetection.from_pretrained('nickmuchi/yolos-small-finetuned-masks')
inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
# model predicts bounding boxes and corresponding face mask detection classes
logits = outputs.logits
bboxes = outputs.pred_boxes
Currently, both the feature extractor and model support PyTorch.
Training data
The YOLOS model was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively.
Training
This model was fine-tuned for 200 epochs on the face-mask-dataset.
Evaluation results
This model achieves an AP (average precision) of 53.1.
Accumulating evaluation results... DONE (t=0.14s). IoU metric: bbox Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.273 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.532 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.257 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.220 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.341 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.545 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.154 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.361 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.415 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.349 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.469 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.584