Commit
aa9579e
1 Parent(s): 307c120

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -13
README.md CHANGED
@@ -2,35 +2,50 @@
2
  license: mit
3
  language:
4
  - en
 
 
 
 
 
 
 
 
5
  ---
6
  # RDT-1B
7
 
8
- RDT-1B is a 1B-parameter imitation learning Diffusion Transformer pre-trained on 1M+ multi-robot episodes. Given a language instruction and 3-view RGB image observations, RDT can predict the next
9
- 64 robot actions. RDT is inherently compatible with almost all kinds of modern mobile manipulators, from single-arm to dual-arm, joint to EEF, pos. to vel., and even with a mobile chassis.
10
 
11
- All the [code]() and pretrained model weights are licensed under MIT license.
12
 
13
  Please refer to our [project page](https://rdt-robotics.github.io/rdt-robotics/) and [paper]() for more information.
14
 
15
  ## Model Details
16
 
17
- - **Developed by** RDT team from Tsinghua University
 
 
18
  - **License:** MIT
19
  - **Language(s) (NLP):** en
20
- - **Model Architecture:** Diffusion Transformer
21
- - **Pretrain dataset:** Curated pretrain dataset collected from 46 datasets. Please see [here]() for detail
 
 
22
  - **Repository:** [repo_url]
23
  - **Paper :** [paper_url]
24
  - **Project Page:** https://rdt-robotics.github.io/rdt-robotics/
25
 
26
  ## Uses
27
 
28
- RDT takes language instruction, image observations and proprioception as input, and predicts the next 64 robot actions in the form of unified action space vector.
29
- The unified action space vector includes all the main physical quantities of robots (e.g. the end-effector and joint, position and velocity, base movement, etc.) and can be applied to a wide range of robotic embodiments.
 
30
 
31
- The pre-trained RDT model can be fine-tuned for specific robotic embodiment and deployed on real-world robots.
32
- Here's an example of how to use the RDT-1B model for inference on a Mobile-ALOHA robot:
 
33
 
 
34
  ```python
35
  # Clone the repository and install dependencies
36
  from scripts.agilex_model import create_model
@@ -43,7 +58,7 @@ config = {
43
  'camera_names': CAMERA_NAMES,
44
  }
45
  pretrained_vision_encoder_name_or_path = "google/siglip-so400m-patch14-384"
46
- # Create the model with specified configuration
47
  model = create_model(
48
  args=config,
49
  dtype=torch.bfloat16,
@@ -64,8 +79,8 @@ actions = policy.step(
64
  )
65
  ```
66
 
67
- RDT-1B supports finetuning on custom dataset, deploying and inferencing on real-robots, as well as pretraining the model.
68
- Please refer to [our repository](https://github.com/GeneralEmbodiedSystem/RoboticsDiffusionTransformer/blob/main/docs/pretrain.md) for all the above guides.
69
 
70
 
71
  ## Citation
 
2
  license: mit
3
  language:
4
  - en
5
+ pipeline_tag: robotics
6
+ library_name: diffusers
7
+ tags:
8
+ - robotics
9
+ - multimodal
10
+ - pretraining
11
+ - vla
12
+ - diffusion
13
  ---
14
  # RDT-1B
15
 
16
+ RDT-1B is a 1B-parameter imitation learning Diffusion Transformer pre-trained on 1M+ multi-robot episodes. Given language instruction and RGB images of up to three views, RDT can predict the next
17
+ 64 robot actions. RDT is compatible with almost all modern mobile manipulators, from single-arm to dual-arm, joint to EEF, pos. to vel., and even with a mobile chassis.
18
 
19
+ All the [code](https://github.com/GeneralEmbodiedSystem/RoboticsDiffusionTransformer/tree/main?tab=readme-ov-file) and pre-trained model weights are licensed under the MIT license.
20
 
21
  Please refer to our [project page](https://rdt-robotics.github.io/rdt-robotics/) and [paper]() for more information.
22
 
23
  ## Model Details
24
 
25
+ - **Developed by:** The RDT team consisting of researchers from the [TSAIL group](https://ml.cs.tsinghua.edu.cn/) at Tsinghua University
26
+ - **Task Type:** Vision-Language-Action (language, image => robot actions)
27
+ - **Modle Type:** Diffusion Policy with Transformers
28
  - **License:** MIT
29
  - **Language(s) (NLP):** en
30
+ - **Multi-Modal Encoders:**
31
+ - **Vision Backbone:** [siglip-so400m-patch14-384](https://huggingface.co/google/siglip-so400m-patch14-384)
32
+ - **Language Model:** [t5-v1_1-xxl](https://huggingface.co/google/t5-v1_1-xxl)
33
+ - **Pre-Training Datasets:** 46 datasets consisting of [RT-1 Dataset](https://robotics-transformer1.github.io/), [RH20T](https://rh20t.github.io/), [DROID](https://droid-dataset.github.io/), [BridgeData V2](https://rail-berkeley.github.io/bridgedata/), [RoboSet](https://robopen.github.io/roboset/), and a subset of [Open X-Embodiment](https://robotics-transformer-x.github.io/). See [todo]() for a detailed list.
34
  - **Repository:** [repo_url]
35
  - **Paper :** [paper_url]
36
  - **Project Page:** https://rdt-robotics.github.io/rdt-robotics/
37
 
38
  ## Uses
39
 
40
+ RDT takes language instruction, RGB image (of up to three views), control frequency (if any), and proprioception as input and predicts the next 64 robot actions in the form of the unified action space vector.
41
+ The unified action space vector includes all the main physical quantities of the robot manipulator (e.g., the end-effector and joint, position and velocity, and base movement).
42
+ To deploy on your robot platform, you need to pick the relevant quantities from the unified vector. See our repository for more information.
43
 
44
+ **Out-of-Scope**: Due to the embodiment gap, RDT cannot yet generalize to new robot platforms (not seen in the pre-training datasets).
45
+ In this case, we recommend collecting a small dataset of the target robot and then using it to fine-tune RDT.
46
+ See our repository for a tutorial.
47
 
48
+ Here's an example of how to use the RDT-1B model for inference on a robot:
49
  ```python
50
  # Clone the repository and install dependencies
51
  from scripts.agilex_model import create_model
 
58
  'camera_names': CAMERA_NAMES,
59
  }
60
  pretrained_vision_encoder_name_or_path = "google/siglip-so400m-patch14-384"
61
+ # Create the model with the specified configuration
62
  model = create_model(
63
  args=config,
64
  dtype=torch.bfloat16,
 
79
  )
80
  ```
81
 
82
+ <!-- RDT-1B supports finetuning on custom datasets, deploying and inferencing on real robots, as well as retraining the model.
83
+ Please refer to [our repository](https://github.com/GeneralEmbodiedSystem/RoboticsDiffusionTransformer/blob/main/docs/pretrain.md) for all the above guides. -->
84
 
85
 
86
  ## Citation