@maxiw on Hugging Face: "You can now try out computer use models from the hub to automate your local…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

maxiw

posted an update 5 days ago

Post

1891

You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")

Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B

SoonjaeLee

5 days ago

Thank you

KevinQHLin

3 days ago

Hi @maxiw , would you want to consider integrate our ShowUI?
a 2B model from Qwen2-VL-2B, but with strong UI grounding and navigation :)

maxiw

2 days ago

Hi @KevinQHLin , I integrated ShowUI in the latest release. Really cool model!

In this post