metadata

inference: false
license: apache-2.0
datasets:
  - VIMA/VIMA-Data
tags:
  - llara
  - robotics
  - vlm
pipeline_tag: object-detection

Model Card

This model is released with paper LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Xiang Li¹, Cristina Mata¹, Jongwoo Park¹, Kumara Kahatapitiya¹, Yoo Sung Jang¹, Jinghuan Shang¹, Kanchana Ranasinghe¹, Ryan Burgert¹, Mu Cai², Yong Jae Lee², and Michael S. Ryoo¹

¹Stony Brook University ²University of Wisconsin-Madison

Model details

Model type: This repository contains three models trained on three subsets respectively, converted from VIMA-Data. For the conversion code, please refer to convert_vima.ipynb

Paper or resources for more information: https://github.com/LostXine/LLaRA

Where to send questions or comments about the model: https://github.com/LostXine/LLaRA/issues