Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Face Detection using DEtection TRansformers from Facebook AI ๐Ÿš€

PyTorch 1.5 + torch vision 0.6 +

This repository includes

  • Training Pipeline for DETR on Custom dataset
  • Wider Face Dataset annotaions and images
  • Evaluation on test dataset
  • Trained weights for Wider Face Dataset in release page
  • Metrics Visualization

About Model

DETR or DEtection TRansformer is Facebookโ€™s newest addition to the market of available deep learning-based object detection solutions. Very simply, it utilizes the transformer architecture to generate predictions of objects and their position in an image. DETR is a joint Convolutional Neural Network (CNN) and Transformer with a feed-forward network as a head. This architecture allows the network to reliably reason about object relations in the image using the powerful multi-head attention mechanism inherent in the Transformer architecture using features extracted by the CNN.

DETR Architecutre

Face Dataset

Dataset Image

I've used WIDER FACE dataset which is a publicly available face detection benchmark dataset, consisting of 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, the original dataset was split into 40%/10%/50% as training, validation and testing sets.

By compiling the give code, the dataset will be automatically downloaded but you can download it manually from the official website or from my github release page.

In dataloader/face.py, I set the maximum width of images in the random transform to 800 pixels. This should allow for training on most GPUs, but it is advisable to change back to the original 1333 if your GPU can handle it.

Model

We're going to use DETR with a backbone of Resnet 50, pretrained on COCO 2017 dataset. AP is computed on COCO 2017 val5k, and inference time is over the first 100 val5k COCO images, with torchscript transformer. If you want to use other DETR models, you can find them in model zoo below.

Model Zoo

name backbone schedule inf_time box AP url size
0 DETR R50 500 0.036 42.0 model | logs 159Mb
1 DETR-DC5 R50 500 0.083 43.3 model | logs 159Mb
2 DETR R101 500 0.050 43.5 model | logs 232Mb
3 DETR-DC5 R101 500 0.097 44.9 model | logs 232Mb

Training and Evaluation Steps

Run all the cells of detr_custom_dataset.ipynb to train your model without any errors in Google Colaboratory.

Follow this readme to understand the training pipeline of DETR and evaluation on test images.

Results

COCO Evaluation Metrics on Validation Dataset (After 15 epochs of training)

It took me 4:59:45 hours to finish 15 epochs with batch_size=16 using Tesla P100-PCIE. If you want better accuracy, you can train more epochs.

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.393
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.766
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.370
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.055
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.391
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.615
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.201
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.448
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.194
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.519
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.706

Metrics Visualization

Augmentation methods

For train images,

T.RandomHorizontalFlip(),
            T.RandomSelect(
                T.RandomResize(scales, max_size=800),
                T.Compose([
                    T.RandomResize([400, 500, 600]),
                    T.RandomSizeCrop(384, 600),
                    T.RandomResize(scales, max_size=800),
                ])

For val images,

T.RandomResize([800], max_size=800)

References

DETR Tutorial by thedeepreader

Training DETR on your own dataset by Oliver Gyldenberg Hjermitslev

Facebook AI's original DETR repo

Downloads last month
2
Inference API
Unable to determine this modelโ€™s pipeline type. Check the docs .