robotics-diffusion-transformer commited on
Commit
b82f647
1 Parent(s): 5c4c7e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -8
README.md CHANGED
@@ -18,7 +18,7 @@ tags:
18
  RDT-1B is a 1B-parameter imitation learning Diffusion Transformer pre-trained on 1M+ multi-robot episodes. Given language instruction and RGB images of up to three views, RDT can predict the next
19
  64 robot actions. RDT is compatible with almost all modern mobile manipulators, from single-arm to dual-arm, joint to EEF, pos. to vel., and even with a mobile chassis.
20
 
21
- All the [code](https://github.com/GeneralEmbodiedSystem/RoboticsDiffusionTransformer/tree/main?tab=readme-ov-file) and pre-trained model weights are licensed under the MIT license.
22
 
23
  Please refer to our [project page](https://rdt-robotics.github.io/rdt-robotics/) and [paper]() for more information.
24
 
@@ -39,18 +39,23 @@ tags:
39
 
40
  ## Uses
41
 
42
- RDT takes language instruction, RGB image (of up to three views), control frequency (if any), and proprioception as input and predicts the next 64 robot actions in the form of the unified action space vector.
43
- The unified action space vector includes all the main physical quantities of the robot manipulator (e.g., the end-effector and joint, position and velocity, and base movement).
44
- To deploy on your robot platform, you need to pick the relevant quantities from the unified vector. See our repository for more information.
 
45
 
46
  **Out-of-Scope**: Due to the embodiment gap, RDT cannot yet generalize to new robot platforms (not seen in the pre-training datasets).
47
  In this case, we recommend collecting a small dataset of the target robot and then using it to fine-tune RDT.
48
  See our repository for a tutorial.
49
 
50
- Here's an example of how to use the RDT-1B model for inference on a robot:
51
  ```python
52
- # Clone the repository and install dependencies
 
 
 
53
  from scripts.agilex_model import create_model
 
54
  # Names of cameras used for visual input
55
  CAMERA_NAMES = ['cam_high', 'cam_right_wrist', 'cam_left_wrist']
56
  config = {
@@ -68,8 +73,9 @@ model = create_model(
68
  pretrained='robotics-diffusion-transformer/rdt-1b',
69
  control_frequency=25,
70
  )
 
71
  # Start inference process
72
- # Load pre-computed language embeddings
73
  lang_embeddings_path = 'your/language/embedding/path'
74
  text_embedding = torch.load(lang_embeddings_path)['embeddings']
75
  images: List(PIL.Image) = ... # The images from last 2 frame
@@ -82,7 +88,7 @@ actions = policy.step(
82
  )
83
  ```
84
 
85
- <!-- RDT-1B supports finetuning on custom datasets, deploying and inferencing on real robots, as well as retraining the model.
86
  Please refer to [our repository](https://github.com/GeneralEmbodiedSystem/RoboticsDiffusionTransformer/blob/main/docs/pretrain.md) for all the above guides. -->
87
 
88
 
 
18
  RDT-1B is a 1B-parameter imitation learning Diffusion Transformer pre-trained on 1M+ multi-robot episodes. Given language instruction and RGB images of up to three views, RDT can predict the next
19
  64 robot actions. RDT is compatible with almost all modern mobile manipulators, from single-arm to dual-arm, joint to EEF, pos. to vel., and even with a mobile chassis.
20
 
21
+ All the [code](https://github.com/GeneralEmbodiedSystem/RoboticsDiffusionTransformer/tree/main?tab=readme-ov-file), pre-trained model weights, and [data](https://github.com/thu-ml/RoboticsDiffusionTransformer) are licensed under the MIT license.
22
 
23
  Please refer to our [project page](https://rdt-robotics.github.io/rdt-robotics/) and [paper]() for more information.
24
 
 
39
 
40
  ## Uses
41
 
42
+ RDT takes language instruction, RGB images (of up to three views), control frequency (if any), and proprioception as input and predicts the next 64 robot actions.
43
+ RDT supports control of almost all robot manipulators with the help of the unified action space, which
44
+ includes all the main physical quantities of the robot manipulator (e.g., the end-effector and joint, position and velocity, and base movement).
45
+ To deploy on your robot platform, you need to fill the relevant quantities of the raw action vector into the unified space vector. See [our repository](https://github.com/thu-ml/RoboticsDiffusionTransformer) for more information.
46
 
47
  **Out-of-Scope**: Due to the embodiment gap, RDT cannot yet generalize to new robot platforms (not seen in the pre-training datasets).
48
  In this case, we recommend collecting a small dataset of the target robot and then using it to fine-tune RDT.
49
  See our repository for a tutorial.
50
 
51
+ Here's an example of how to use the RDT-1B model for inference on a robot:
52
  ```python
53
+ # Please first clone the repository and install dependencies
54
+ # Then switch to the root directory of the repository by "cd RoboticsDiffusionTransformer"
55
+
56
+ # Import a create function from the code base
57
  from scripts.agilex_model import create_model
58
+
59
  # Names of cameras used for visual input
60
  CAMERA_NAMES = ['cam_high', 'cam_right_wrist', 'cam_left_wrist']
61
  config = {
 
73
  pretrained='robotics-diffusion-transformer/rdt-1b',
74
  control_frequency=25,
75
  )
76
+
77
  # Start inference process
78
+ # Load the pre-computed language embeddings
79
  lang_embeddings_path = 'your/language/embedding/path'
80
  text_embedding = torch.load(lang_embeddings_path)['embeddings']
81
  images: List(PIL.Image) = ... # The images from last 2 frame
 
88
  )
89
  ```
90
 
91
+ <!-- RDT-1B supports finetuning on custom datasets, deploying and inferencing on real robots, and retraining the model.
92
  Please refer to [our repository](https://github.com/GeneralEmbodiedSystem/RoboticsDiffusionTransformer/blob/main/docs/pretrain.md) for all the above guides. -->
93
 
94