arxiv:2402.05929

An Interactive Agent Foundation Model

Published on Feb 8

· Submitted by

akhaliq on Feb 9

#3 Paper of the day

Upvote

Authors:

Zane Durante ,

Bidipta Sarkar ,

Ran Gong ,

Rohan Taori ,

Yusuke Noda ,

Ehsan Adeli ,

Shrinidhi Kowshika Lakshmikanth ,

Hoi Vo ,

Jianfeng Gao ,

Naoki Wake ,

Abstract

The development of artificial intelligence systems is transitioning from creating static, task-specific models to dynamic, agent-based systems capable of performing well in a wide range of applications. We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents across a wide range of domains, datasets, and tasks. Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction, enabling a versatile and adaptable AI framework. We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare. Our model demonstrates its ability to generate meaningful and contextually relevant outputs in each area. The strength of our approach lies in its generality, leveraging a variety of data sources such as robotics sequences, gameplay data, large-scale video datasets, and textual information for effective multimodal and multi-task learning. Our approach provides a promising avenue for developing generalist, action-taking, multimodal systems.

View arXiv page View PDF Add to collection