arxiv:2410.13757

MobA: A Two-Level Agent System for Efficient Mobile Task Automation

Published on Oct 17

· Submitted by

JamesZhutheThird on Oct 18

Upvote

Authors:

Zichen Zhu ,

Kunyao Lan ,

Situo Zhang ,

Abstract

Current mobile assistants are limited by dependence on system APIs or struggle with complex user instructions and diverse interfaces due to restricted comprehension and decision-making abilities. To address these challenges, we propose MobA, a novel Mobile phone Agent powered by multimodal large language models that enhances comprehension and planning capabilities through a sophisticated two-level agent architecture. The high-level Global Agent (GA) is responsible for understanding user commands, tracking history memories, and planning tasks. The low-level Local Agent (LA) predicts detailed actions in the form of function calls, guided by sub-tasks and memory from the GA. Integrating a Reflection Module allows for efficient task completion and enables the system to handle previously unseen complex tasks. MobA demonstrates significant improvements in task execution efficiency and completion rate in real-life evaluations, underscoring the potential of MLLM-empowered mobile assistants.

View arXiv page View PDF Add to collection

Community

JamesZhutheThird

Paper author Paper submitter 14 days ago

•

edited 14 days ago

🎮MobA manipulates mobile phones just like how you would, with a two-level agent system mimicking brain functions. The "cerebrum" (Global Agent) comprehends, plans, and reflects🎯, while the "cerebellum" (Local Agent) predicts actions based on current information🕹️. It achieves a superior scoring rate of 66.2% in 50 real-world scenarios with similar execution efficiency by human experts.