--- license: mit base_model: microsoft/Florence-2-base --- [![arXiv][paper-shield]][paper-url] [![MIT License][license-shield]][license-url]

TinyClick: Single-Turn Agent for Empowering GUI Automation

The code for running the model from paper: TinyClick: Single-Turn Agent for Empowering GUI Automation

## About The Project We present a single-turn agent for graphical user interface (GUI) interaction tasks, using Vision-Language Model Florence-2-Base. Main goal of the agent is to click on desired UI element based on the screenshot and user command. It demonstrates strong performance on Screenspot and OmniAct, while maintaining a compact size of 0.27B parameters and minimal latency. ## Usage To set up the environment for running the code, please refer to the [GitHub repository](https://github.com/SamsungLabs/TinyClick). All necessary libraries and dependencies are listed in the requirements.txt file ```python from transformers import AutoModelForCausalLM, AutoProcessor from PIL import Image import requests import torch device = torch.device("cuda" if torch.cuda.is_available() else "cpu") processor = AutoProcessor.from_pretrained( "Samsung/TinyClick", trust_remote_code=True ) model = AutoModelForCausalLM.from_pretrained( "Samsung/TinyClick", trust_remote_code=True, ).to(device) url = "https://huggingface.co/Samsung/TinyClick/resolve/main/sample.png" img = Image.open(requests.get(url, stream=True).raw) command = "click on accept and continue button" image_size = img.size input_text = ("What to do to execute the command? " + command.strip()).lower() inputs = processor( images=img, text=input_text, return_tensors="pt", do_resize=True, ) outputs = model.generate(**inputs) generated_texts = processor.batch_decode(outputs, skip_special_tokens=False) ``` For postprocessing fuction go to our github repository: https://github.com/SamsungLabs/TinyClick ```python from tinyclick_utils import postprocess result = postprocess(generated_texts[0], image_size) ``` ## Citation ``` @misc{pawlowski2024tinyclicksingleturnagentempowering, title={TinyClick: Single-Turn Agent for Empowering GUI Automation}, author={Pawel Pawlowski and Krystian Zawistowski and Wojciech Lapacz and Marcin Skorupa and Adam Wiacek and Sebastien Postansque and Jakub Hoscilowicz}, year={2024}, eprint={2410.11871}, archivePrefix={arXiv}, primaryClass={cs.HC}, url={https://arxiv.org/abs/2410.11871}, } ``` ## License Please check the MIT license that is listed in this repository. See `LICENSE` for more information.

(back to top)

[paper-shield]: https://img.shields.io/badge/2024-arXiv-red [paper-url]: https://arxiv.org/abs/2410.11871 [license-shield]: https://img.shields.io/badge/License-MIT-yellow.svg [license-url]: https://opensource.org/licenses/MIT