mAP drop
I tried to reproduce the results mentioned on this model card. The received mAP does not match the claimed mAP in the model card.
- Claimed mAP: 42.0
- Recieved mAP: 39.7
Here are the details for my validation:
- I instantiate pre-trained model with
transformers.pipeline()
and use COCO API to calculate AP from detection bboxes. - Evaluation was performed on macOS CPU.
- Dataset was downloaded from cocodataset.org
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.397
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.590
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.420
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.185
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.431
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.586
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.316
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.470
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.483
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.238
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.525
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.691
Hi,
Thanks for validating. When porting the model to HuggingFace format, I made sure the logits
and pred_boxes
match exactly on the same input data as seen here.
Additionally, the image transformations used during validation can be found here. Images are 1) resized with a minimum size of 800 and a maximum size of 1333. The pipeline is using DetrFeatureExtractor
behind the scenes to prepare images + targets for the model, and this one performs the same transformations as seen here.
Did you evaluate on COCO 2017?
Thanks for your help. Yes, I evaluated on COCO 2017.
Why does the validated model use a different preprocessing transform than the one provided in the DetrFeatureExtractor
? Does this explain all the discrepancies between my 39.7 mAP and the reported 42.0 mAP?
Could you clarify the difference? They should be equivalent.
We can test this by preparing an image using DetrImageProcessor (previously called feature extractor) and the original pipeline, like so (after pip installing transformers and git cloning the original DETR repo):
from transformers import DetrImageProcessor
import requests
from PIL import Image
import torch
from datasets.coco import make_coco_transforms
# load image
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
# prepare using image processor
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
pixel_values = processor(image, return_tensors="pt").pixel_values
# prepare using original code
original_transforms = make_coco_transforms("val")
original_pixel_values = original_transforms(image, None)[0].unsqueeze(0)
assert torch.allclose(pixel_values, original_pixel_values, atol=1e-4)
This passes locally for me.
I'd recommend taking a look at this notebook to evaluate the performance: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Evaluating_DETR_on_COCO_validation_2017.ipynb.
I wouldn't use the pipeline to evaluate the model, as that one uses a default threshold.
@mhyatt000 we have reproduced the DETR results on our open detection leaderboard: https://huggingface.co/spaces/rafaelpadilla/object_detection_leaderboard.