add the ONNX-TensorRT way of model conversion

Browse files

Files changed (6) hide show

README.md +39 -0
configs/inference.json +18 -9
configs/inference_trt.json +11 -0
configs/metadata.json +3 -2
docs/README.md +39 -0
scripts/detection_inferer.py +10 -3

README.md CHANGED Viewed

@@ -70,6 +70,33 @@ The validation accuracy in this curve is the mean of mAP, mAR, AP(IoU=0.1), and
 ![A graph showing the detection val accuracy](https://developer.download.nvidia.com/assets/Clara/Images/monai_retinanet_detection_val_acc_v2.png)
 ## MONAI Bundle Commands
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
@@ -98,6 +125,18 @@ Note that in inference.json, the transform "LoadImaged" in "preprocessing" and "
 This depends on the input images. LUNA16 needs `"affine_lps_to_ras": true`.
 It is possible that your inference dataset should set `"affine_lps_to_ras": false`.
 # References
 [1] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." ICCV 2017. https://arxiv.org/abs/1708.02002)

 ![A graph showing the detection val accuracy](https://developer.download.nvidia.com/assets/Clara/Images/monai_retinanet_detection_val_acc_v2.png)
+#### TensorRT speedup
+The `lung_nodule_ct_detection` bundle supports acceleration with TensorRT through the ONNX-TensorRT method. The table below displays the speedup ratios observed on an A100 80G GPU. Please note that when using the TensorRT model for inference, the `force_sliding_window` parameter in the `inference.json` file must be set to `true`. This ensures that the bundle uses the `SlidingWindowInferer` during inference and maintains the input spatial size of the network. Otherwise, if given an input with spatial size less than the `infer_patch_size`, the input spatial size of the network would be changed.
+| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 7449.84 | 996.08 | 976.67 | 626.90 | 7.63 | 7.63 | 11.88 | 1.56 |
+| end2end | 36458.26 | 7259.35 | 6420.60 | 4698.34 | 5.02 | 5.68 | 7.76 | 1.55 |
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+Currently, the only available method to accelerate this model is through ONNX-TensorRT. However, the Torch-TensorRT method is under development and will be available in the near future.
+This result is benchmarked under:
+ - TensorRT: 8.5.3+cuda11.8
+ - Torch-TensorRT Version: 1.4.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.8.10
+ - CUDA version: 12.0
+ - GPU models and configuration: A100 80G
 ## MONAI Bundle Commands
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
 This depends on the input images. LUNA16 needs `"affine_lps_to_ras": true`.
 It is possible that your inference dataset should set `"affine_lps_to_ras": false`.
+#### Export checkpoint to TensorRT based models with fp32 or fp16 precision
+```bash
+python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --input_shape "[1, 1, 512, 512, 192]"  --use_onnx "True" --use_trace "True" --onnx_output_names "['output_0', 'output_1', 'output_2', 'output_3', 'output_4', 'output_5']" --network_def#use_list_output "True"
+```
+#### Execute inference with the TensorRT model
+```
+python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
+```
 # References
 [1] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." ICCV 2017. https://arxiv.org/abs/1708.02002)

configs/inference.json CHANGED Viewed

@@ -13,6 +13,14 @@
     "test_datalist": "$monai.data.load_decathlon_datalist(@data_list_file_path, is_segmentation=True, data_list_key='validation', base_dir=@dataset_dir)",
     "device": "$torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')",
     "amp": true,
     "infer_patch_size": [
         512,
         512,
@@ -47,22 +55,22 @@
     "feature_extractor": "$monai.apps.detection.networks.retinanet_network.resnet_fpn_feature_extractor(@backbone,3,False,[1,2],None)",
     "network_def": {
         "_target_": "RetinaNet",
-        "spatial_dims": 3,
-        "num_classes": 1,
         "num_anchors": 3,
         "feature_extractor": "@feature_extractor",
-        "size_divisible": [
-            16,
-            16,
-            8
-        ]
     },
     "network": "$@network_def.to(@device)",
     "detector": {
         "_target_": "RetinaNetDetector",
         "network": "@network",
         "anchor_generator": "@anchor_generator",
-        "debug": false
     },
     "detector_ops": [
         "[email protected]_target_keys(box_key='box', label_key='label')",
@@ -136,7 +144,8 @@
     },
     "inferer": {
         "_target_": "scripts.detection_inferer.RetinaNetInferer",
-        "detector": "@detector"
     },
     "postprocessing": {
         "_target_": "Compose",

     "test_datalist": "$monai.data.load_decathlon_datalist(@data_list_file_path, is_segmentation=True, data_list_key='validation', base_dir=@dataset_dir)",
     "device": "$torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')",
     "amp": true,
+    "spatial_dims": 3,
+    "num_classes": 1,
+    "force_sliding_window": false,
+    "size_divisible": [
+        16,
+        16,
+        8
+    ],
     "infer_patch_size": [
         512,
         512,
     "feature_extractor": "$monai.apps.detection.networks.retinanet_network.resnet_fpn_feature_extractor(@backbone,3,False,[1,2],None)",
     "network_def": {
         "_target_": "RetinaNet",
+        "spatial_dims": "@spatial_dims",
+        "num_classes": "@num_classes",
         "num_anchors": 3,
         "feature_extractor": "@feature_extractor",
+        "size_divisible": "@size_divisible",
+        "use_list_output": false
     },
     "network": "$@network_def.to(@device)",
     "detector": {
         "_target_": "RetinaNetDetector",
         "network": "@network",
         "anchor_generator": "@anchor_generator",
+        "debug": false,
+        "spatial_dims": "@spatial_dims",
+        "num_classes": "@num_classes",
+        "size_divisible": "@size_divisible"
     },
     "detector_ops": [
         "[email protected]_target_keys(box_key='box', label_key='label')",
     },
     "inferer": {
         "_target_": "scripts.detection_inferer.RetinaNetInferer",
+        "detector": "@detector",
+        "force_sliding_window": "@force_sliding_window"
     },
     "postprocessing": {
         "_target_": "Compose",

configs/inference_trt.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+    "imports": [
+        "$import glob",
+        "$import os",
+        "$import torch_tensorrt"
+    ],
+    "force_sliding_window": true,
+    "handlers#0#_disabled_": true,
+    "network_def": "$torch.jit.load(@bundle_root + '/models/model_trt.ts')",
+    "evaluator#amp": false
+}

configs/metadata.json CHANGED Viewed

@@ -1,7 +1,8 @@
 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
-    "version": "0.5.5",
     "changelog": {
         "0.5.5": "update retrained validation results and training curve",
         "0.5.4": "add non-deterministic note",
         "0.5.3": "adapt to BundleWorkflow interface",
@@ -19,7 +20,7 @@
         "0.1.1": "add reference for LIDC dataset",
         "0.1.0": "complete the model package"
     },
-    "monai_version": "1.2.0rc4",
     "pytorch_version": "1.13.1",
     "numpy_version": "1.22.2",
     "optional_packages_version": {

 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
+    "version": "0.5.6",
     "changelog": {
+        "0.5.6": "add the ONNX-TensorRT way of model conversion",
         "0.5.5": "update retrained validation results and training curve",
         "0.5.4": "add non-deterministic note",
         "0.5.3": "adapt to BundleWorkflow interface",
         "0.1.1": "add reference for LIDC dataset",
         "0.1.0": "complete the model package"
     },
+    "monai_version": "1.2.0rc5",
     "pytorch_version": "1.13.1",
     "numpy_version": "1.22.2",
     "optional_packages_version": {

docs/README.md CHANGED Viewed

@@ -63,6 +63,33 @@ The validation accuracy in this curve is the mean of mAP, mAR, AP(IoU=0.1), and
 ![A graph showing the detection val accuracy](https://developer.download.nvidia.com/assets/Clara/Images/monai_retinanet_detection_val_acc_v2.png)
 ## MONAI Bundle Commands
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
@@ -91,6 +118,18 @@ Note that in inference.json, the transform "LoadImaged" in "preprocessing" and "
 This depends on the input images. LUNA16 needs `"affine_lps_to_ras": true`.
 It is possible that your inference dataset should set `"affine_lps_to_ras": false`.
 # References
 [1] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." ICCV 2017. https://arxiv.org/abs/1708.02002)

 ![A graph showing the detection val accuracy](https://developer.download.nvidia.com/assets/Clara/Images/monai_retinanet_detection_val_acc_v2.png)
+#### TensorRT speedup
+The `lung_nodule_ct_detection` bundle supports acceleration with TensorRT through the ONNX-TensorRT method. The table below displays the speedup ratios observed on an A100 80G GPU. Please note that when using the TensorRT model for inference, the `force_sliding_window` parameter in the `inference.json` file must be set to `true`. This ensures that the bundle uses the `SlidingWindowInferer` during inference and maintains the input spatial size of the network. Otherwise, if given an input with spatial size less than the `infer_patch_size`, the input spatial size of the network would be changed.
+| method | torch_fp32(ms) | torch_amp(ms) | trt_fp32(ms) | trt_fp16(ms) | speedup amp | speedup fp32 | speedup fp16 | amp vs fp16|
+| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| model computation | 7449.84 | 996.08 | 976.67 | 626.90 | 7.63 | 7.63 | 11.88 | 1.56 |
+| end2end | 36458.26 | 7259.35 | 6420.60 | 4698.34 | 5.02 | 5.68 | 7.76 | 1.55 |
+Where:
+- `model computation` means the speedup ratio of model's inference with a random input without preprocessing and postprocessing
+- `end2end` means run the bundle end-to-end with the TensorRT based model.
+- `torch_fp32` and `torch_amp` are for the PyTorch models with or without `amp` mode.
+- `trt_fp32` and `trt_fp16` are for the TensorRT based models converted in corresponding precision.
+- `speedup amp`, `speedup fp32` and `speedup fp16` are the speedup ratios of corresponding models versus the PyTorch float32 model
+- `amp vs fp16` is the speedup ratio between the PyTorch amp model and the TensorRT float16 based model.
+Currently, the only available method to accelerate this model is through ONNX-TensorRT. However, the Torch-TensorRT method is under development and will be available in the near future.
+This result is benchmarked under:
+ - TensorRT: 8.5.3+cuda11.8
+ - Torch-TensorRT Version: 1.4.0
+ - CPU Architecture: x86-64
+ - OS: ubuntu 20.04
+ - Python version:3.8.10
+ - CUDA version: 12.0
+ - GPU models and configuration: A100 80G
 ## MONAI Bundle Commands
 In addition to the Pythonic APIs, a few command line interfaces (CLI) are provided to interact with the bundle. The CLI supports flexible use cases, such as overriding configs at runtime and predefining arguments in a file.
 This depends on the input images. LUNA16 needs `"affine_lps_to_ras": true`.
 It is possible that your inference dataset should set `"affine_lps_to_ras": false`.
+#### Export checkpoint to TensorRT based models with fp32 or fp16 precision
+```bash
+python -m monai.bundle trt_export --net_id network_def --filepath models/model_trt.ts --ckpt_file models/model.pt --meta_file configs/metadata.json --config_file configs/inference.json --precision <fp32/fp16> --input_shape "[1, 1, 512, 512, 192]"  --use_onnx "True" --use_trace "True" --onnx_output_names "['output_0', 'output_1', 'output_2', 'output_3', 'output_4', 'output_5']" --network_def#use_list_output "True"
+```
+#### Execute inference with the TensorRT model
+```
+python -m monai.bundle run --config_file "['configs/inference.json', 'configs/inference_trt.json']"
+```
 # References
 [1] Lin, Tsung-Yi, et al. "Focal loss for dense object detection." ICCV 2017. https://arxiv.org/abs/1708.02002)

scripts/detection_inferer.py CHANGED Viewed

@@ -25,14 +25,19 @@ class RetinaNetInferer(Inferer):
     Args:
         detector: the RetinaNetDetector that converts network output BxCxMxN or BxCxMxNxP
             map into boxes and classification scores.
         args: other optional args to be passed to detector.
         kwargs: other optional keyword args to be passed to detector.
     """
-    def __init__(self, detector: RetinaNetDetector, *args, **kwargs) -> None:
         Inferer.__init__(self)
         self.detector = detector
         self.sliding_window_size = None
         if self.detector.inferer is not None:
             if hasattr(self.detector.inferer, "roi_size"):
                 self.sliding_window_size = np.prod(self.detector.inferer.roi_size)
@@ -52,8 +57,10 @@ class RetinaNetInferer(Inferer):
         # if image smaller than sliding window roi size, no need to use sliding window inferer
         # use sliding window inferer only when image is large
-        use_inferer = self.sliding_window_size is not None and not all(
-            [data_i[0, ...].numel() < self.sliding_window_size for data_i in inputs]
         )
         return self.detector(inputs, use_inferer=use_inferer, *args, **kwargs)

     Args:
         detector: the RetinaNetDetector that converts network output BxCxMxN or BxCxMxNxP
             map into boxes and classification scores.
+        force_sliding_window: whether to force using a SlidingWindowInferer to do the inference.
+                If False, will check the input spatial size to decide whether to simply
+                forward the network or using SlidingWindowInferer.
+                If True, will force using SlidingWindowInferer to do the inference.
         args: other optional args to be passed to detector.
         kwargs: other optional keyword args to be passed to detector.
     """
+    def __init__(self, detector: RetinaNetDetector, force_sliding_window: bool = False) -> None:
         Inferer.__init__(self)
         self.detector = detector
         self.sliding_window_size = None
+        self.force_sliding_window = force_sliding_window
         if self.detector.inferer is not None:
             if hasattr(self.detector.inferer, "roi_size"):
                 self.sliding_window_size = np.prod(self.detector.inferer.roi_size)
         # if image smaller than sliding window roi size, no need to use sliding window inferer
         # use sliding window inferer only when image is large
+        use_inferer = (
+            self.force_sliding_window
+            or self.sliding_window_size is not None
+            and not all([data_i[0, ...].numel() < self.sliding_window_size for data_i in inputs])
         )
         return self.detector(inputs, use_inferer=use_inferer, *args, **kwargs)