Post
2361
The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.
You can try it out with my space maxiw/Qwen2-VL-Detection
You can try it out with my space maxiw/Qwen2-VL-Detection