@maxiw on Hugging Face: "The new Qwen-2 VL models seem to perform quite well in object detection. You…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

maxiw

posted an update Sep 4

Post

2361

The new Qwen-2 VL models seem to perform quite well in object detection. You can prompt them to respond with bounding boxes in a reference frame of 1k x 1k pixels and scale those boxes to the original image size.

You can try it out with my space maxiw/Qwen2-VL-Detection

maxiw

Sep 4

According to @simonw Gemini might also be able to do this but OpenAI’s GPT-4o and Anthropic’s Claude 3 and Claude 3.5 models can’t.
https://simonwillison.net/2024/Aug/26/gemini-bounding-box-visualization/

fridayfairy

Sep 10

May I ask what dataset was used for fine tuning in this task? Was lora used and can the parameters of lora be shared out? Looking forward to your reply!

maxiw

Sep 10

@fridayfairy this is not fine-tuned. It's the base model just prompted to return bounding boxes in a specific format. The Qwen2-VL models must have been pre-trained on detection data.

thusinh1969

Oct 6

Truly likes the way Qwen2-VL does thing. I finetune from its and wow :)

In this post