Florence-2 has a great capability of detecting various objects in a zero-shot setting with the task prompt "<OD>". However, if you want to detect specific objects that the base model is not able to in its current form, you can easily finetune it for this particular task. Below I show how to finetune the model to detect tables in a given image, but a similar process can be applied to detect any objects. Thanks to @andito, @merve, and @SkalskiP for sharing the fix for finetuning the Florence-2 model. Please also check their great blog post at https://huggingface.co/blog/finetune-florence2.