Simple Open-Vocabulary Object Detection with Vision Transformers
Paper
•
2205.06230
•
Published
•
1
Note OWL-VIT is the seminal paper on Open-set object detection. It take a pretrained CLIP model, upsamples the model and ads detection box heads. The model is fully-finetuned while training these box heads.