Face Detection using DEtection TRansformers from Facebook AI 🚀

This repository includes

Training Pipeline for DETR on Custom dataset
Wider Face Dataset annotaions and images
Evaluation on test dataset
Trained weights for Wider Face Dataset in release page
Metrics Visualization

About Model

DETR or DEtection TRansformer is Facebook’s newest addition to the market of available deep learning-based object detection solutions. Very simply, it utilizes the transformer architecture to generate predictions of objects and their position in an image. DETR is a joint Convolutional Neural Network (CNN) and Transformer with a feed-forward network as a head. This architecture allows the network to reliably reason about object relations in the image using the powerful multi-head attention mechanism inherent in the Transformer architecture using features extracted by the CNN.

Face Dataset

I've used WIDER FACE dataset which is a publicly available face detection benchmark dataset, consisting of 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, the original dataset was split into 40%/10%/50% as training, validation and testing sets.

By compiling the give code, the dataset will be automatically downloaded but you can download it manually from the official website or from my github release page.

In dataloader/face.py, I set the maximum width of images in the random transform to 800 pixels. This should allow for training on most GPUs, but it is advisable to change back to the original 1333 if your GPU can handle it.

Model

We're going to use DETR with a backbone of Resnet 50, pretrained on COCO 2017 dataset. AP is computed on COCO 2017 val5k, and inference time is over the first 100 val5k COCO images, with torchscript transformer. If you want to use other DETR models, you can find them in model zoo below.

Model Zoo

	name	backbone	schedule	inf_time	box AP	url	size
0	DETR	R50	500	0.036	42.0	model \| logs	159Mb
1	DETR-DC5	R50	500	0.083	43.3	model \| logs	159Mb
2	DETR	R101	500	0.050	43.5	model \| logs	232Mb
3	DETR-DC5	R101	500	0.097	44.9	model \| logs	232Mb

Training and Evaluation Steps

Run all the cells of detr_custom_dataset.ipynb to train your model without any errors in Google Colaboratory.

Follow this readme to understand the training pipeline of DETR and evaluation on test images.

Results

COCO Evaluation Metrics on Validation Dataset (After 15 epochs of training)

It took me 4:59:45 hours to finish 15 epochs with batch_size=16 using Tesla P100-PCIE. If you want better accuracy, you can train more epochs.

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.393
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.766
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.370
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.055
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.391
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.615
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.201
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.448
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.194
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.519
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.706

Metrics Visualization

Augmentation methods

For train images,

T.RandomHorizontalFlip(),
            T.RandomSelect(
                T.RandomResize(scales, max_size=800),
                T.Compose([
                    T.RandomResize([400, 500, 600]),
                    T.RandomSizeCrop(384, 600),
                    T.RandomResize(scales, max_size=800),
                ])

For val images,

T.RandomResize([800], max_size=800)

References

DETR Tutorial by thedeepreader

Training DETR on your own dataset by Oliver Gyldenberg Hjermitslev

Facebook AI's original DETR repo