- We found that VLMs can self-improve reasoning performance through a reflection mechanism, and importantly, this approach can scale through test-time computing.
- Evaluation on comprehensive and diverse Vision-Language reasoning tasks are included !
- We found that VLMs can self-improve reasoning performance through a reflection mechanism, and importantly, this approach can scale through test-time computing.
- Evaluation on comprehensive and diverse Vision-Language reasoning tasks are included !
Exciting to see open-source models thriving in the computer agent space! π₯ I just built a demo for OS-ATLAS: A Foundation Action Model For Generalist GUI Agents β check it out here: maxiw/OS-ATLAS
This demo predicts bounding boxes based on screenshot + instructions as input.