arxiv:2409.14674

RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning

Published on Sep 23

· Submitted by

tnlin on Sep 24

#1 Paper of the day

Upvote

Authors:

Yinpei Dai ,

Jayjun Lee ,

Nima Fazeli ,

Abstract

Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions. To address these issues, we propose a scalable data generation pipeline that automatically augments expert demonstrations with failure recovery trajectories and fine-grained language annotations for training. We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework, which combines failure recovery data with rich language descriptions to enhance robot control. RACER features a vision-language model (VLM) that acts as an online supervisor, providing detailed language guidance for error correction and task execution, and a language-conditioned visuomotor policy as an actor to predict the next actions. Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer (RVT) on RLbench across various evaluation settings, including standard long-horizon tasks, dynamic goal-change tasks and zero-shot unseen tasks, achieving superior performance in both simulated and real world environments. Videos and code are available at: https://rich-language-failure-recovery.github.io.

View arXiv page View PDF Add to collection

Community

Yinpei

Paper author 5 days ago

https://rich-language-failure-recovery.github.io/

Yinpei

Paper author 5 days ago

In this paper, we found that visuomotor policies trained with rich language instructions and failure recovery behaviors demonstrate superior robustness and adaptability.

Rich language instructions provide more comprehensive details for failure recovery, such as failure analysis, spatial movements, target object attributes, and the expected outcome, and can guide the policy with more accurate control while serving as a form of regularization to prevent overfitting and improve generalization.

Our proposed model, RACER, not only surpasses previous state-of-the-art baselines on standard RLbench tasks, but also excels in handling dynamic task goal changes, zero-shot transfer to unseen tasks, and real-world scenarios.

tnlin

Paper submitter 5 days ago

yjshin19

4 days ago

This comment has been hidden

KT313

4 days ago

•

edited 4 days ago

just out of curiosity, what made you choose that acronym "RACER" specifically? When i first read the title of the paper i was really struggling to see how you got from "Rich Language-Guided Failure Recovery Policies" to "RACER" lol

personally i'd prefer a more boring acronym that's better connected to what it stands for, instead of forcing a cool sounding one. at least actually using the beginnings of the words. Since it's a VLM that does the supervising, you could call it something like "Vision-Language guided error recovery / correction" and shorten it to "VL-GER" / "VL-GEC", just for example.

Maybe not as cool sounding as racer but at least the readers don't have to do mental acrobatics to go from "racer" to "Rich Language-Guided Failure Recovery Policies" everytime they read it