add lesion FROC calculation and wsi_reader

Browse files

Files changed (7) hide show

README.md +5 -13
configs/inference.json +3 -1
configs/metadata.json +2 -1
configs/train.json +3 -2
docs/README.md +5 -13
scripts/evaluate_froc.sh +20 -0
scripts/lesion_froc.py +56 -0

README.md CHANGED Viewed

@@ -29,9 +29,9 @@ Annotation information are adopted from [NCRF/jsons](https://github.com/baidu-re
 ### Data Preparation
-This MMAR expects the training/validation data (whole slide images) reside in `$DATA_ROOT/training/images`. By default `$DATA_ROOT` is pointing to `/workspace/data/medical/pathology/` You can easily modify `$DATA_ROOT` to point to a different directory in `config/environment.json`.
-To reduce the computation burden during the inference, patches are extracted only where there is tissue and ignoring the background according to a tissue mask. You should run `prepare_inference_data.sh` prior to the inference to generate foreground masks, where the input is the whole slide test images and the output is the foreground masks. Please also create a directory for prediction output, aligning with the one specified with `$MMAR_EVAL_OUTPUT_PATH` in `config/environment.json` (e.g. `/eval`)
 Please refer to "Annotation" section of [Camelyon challenge](https://camelyon17.grand-challenge.org/Data/) to prepare ground truth images, which are needed for FROC computation. By default, this data set is expected to be at `/workspace/data/medical/pathology/ground_truths`. But it can be modified in `evaluate_froc.sh`.
@@ -39,13 +39,14 @@ Please refer to "Annotation" section of [Camelyon challenge](https://camelyon17.
 The training was performed with the following:
-- Script: train.sh
 - GPU: at least 16 GB of GPU memory.
 - Actual Model Input: 224 x 224 x 3
 - AMP: True
 - Optimizer: Novograd
 - Learning Rate: 1e-3
 - Loss: BCEWithLogitsLoss
 ## Input
@@ -104,21 +105,12 @@ Export checkpoint to TorchScript file:
 TorchScript conversion is currently not supported.
-# Intended Use
-The model needs to be used with NVIDIA hardware and software. For hardware, the model can run on any NVIDIA GPU with memory greater than 16 GB. For software, this model is usable only as part of Transfer Learning & Annotation Tools in Clara Train SDK container.  Find out more about Clara Train at the [Clara Train Collections on NGC](https://ngc.nvidia.com/catalog/collections/nvidia:claratrainframework).
-**The pre-trained models are for developmental purposes only and cannot be used directly for clinical procedures.**
-# License
-[End User License Agreement](https://developer.nvidia.com/clara-train-eula) is included with the product. Licenses are also available along with the model application zip file. By pulling and using the Clara Train SDK container and downloading models, you accept the terms and conditions of these licenses.
 # References
 [1] He, Kaiming, et al, "Deep Residual Learning for Image Recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016. <https://arxiv.org/pdf/1512.03385.pdf>
 # License
 Copyright (c) MONAI Consortium
 Licensed under the Apache License, Version 2.0 (the "License");

 ### Data Preparation
+This bundle expects the training/validation data (whole slide images) reside in a `{data_root}/training/images`. By default `data_root` is pointing to `/workspace/data/medical/pathology/` You can modify `data_root` in the bundle config files to point to a different directory.
+To reduce the computation burden during the inference, patches are extracted only where there is tissue and ignoring the background according to a tissue mask. Please also create a directory for prediction output. By default `output_dir` is set to `eval` folder under the bundle root.
 Please refer to "Annotation" section of [Camelyon challenge](https://camelyon17.grand-challenge.org/Data/) to prepare ground truth images, which are needed for FROC computation. By default, this data set is expected to be at `/workspace/data/medical/pathology/ground_truths`. But it can be modified in `evaluate_froc.sh`.
 The training was performed with the following:
+- Config file: train.config
 - GPU: at least 16 GB of GPU memory.
 - Actual Model Input: 224 x 224 x 3
 - AMP: True
 - Optimizer: Novograd
 - Learning Rate: 1e-3
 - Loss: BCEWithLogitsLoss
+- Whole slide image reader: cuCIM (if running on Windows or Mac, please install `OpenSlide` on your system and change `wsi_reader` to "OpenSlide")
 ## Input
 TorchScript conversion is currently not supported.
 # References
 [1] He, Kaiming, et al, "Deep Residual Learning for Image Recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016. <https://arxiv.org/pdf/1512.03385.pdf>
 # License
 Copyright (c) MONAI Consortium
 Licensed under the Apache License, Version 2.0 (the "License");

configs/inference.json CHANGED Viewed

@@ -7,6 +7,7 @@
     "output_dir": "$os.path.join(@bundle_root, 'eval')",
     "dataset_dir": "/workspace/data/medical/pathology",
     "testing_file": "$os.path.join(@bundle_root, 'testing.csv')",
     "patch_size": [
         224,
         224
@@ -63,7 +64,8 @@
         "data": "@datalist",
         "mask_level": 6,
         "patch_size": "@patch_size",
-        "transform": "@preprocessing"
     },
     "dataloader": {
         "_target_": "DataLoader",

     "output_dir": "$os.path.join(@bundle_root, 'eval')",
     "dataset_dir": "/workspace/data/medical/pathology",
     "testing_file": "$os.path.join(@bundle_root, 'testing.csv')",
+    "wsi_reader": "cuCIM",
     "patch_size": [
         224,
         224
         "data": "@datalist",
         "mask_level": 6,
         "patch_size": "@patch_size",
+        "transform": "@preprocessing",
+        "reader": "@wsi_reader"
     },
     "dataloader": {
         "_target_": "DataLoader",

configs/metadata.json CHANGED Viewed

@@ -1,7 +1,8 @@
 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
-    "version": "0.3.3",
     "changelog": {
         "0.3.3": "update to use monai 1.0.1",
         "0.3.2": "enhance readme on commands example",
         "0.3.1": "fix license Copyright error",

 {
     "schema": "https://github.com/Project-MONAI/MONAI-extra-test-data/releases/download/0.8.1/meta_schema_20220324.json",
+    "version": "0.4.0",
     "changelog": {
+        "0.4.0": "add lesion FROC calculation and wsi_reader",
         "0.3.3": "update to use monai 1.0.1",
         "0.3.2": "enhance readme on commands example",
         "0.3.1": "fix license Copyright error",

configs/train.json CHANGED Viewed

@@ -12,6 +12,7 @@
     "training_file": "$os.path.join(@bundle_root, 'training.csv')",
     "validation_file": "$os.path.join(@bundle_root, 'validation.csv')",
     "data_root": "/workspace/data/medical/pathology",
     "region_size": [
         768,
         768
@@ -166,7 +167,7 @@
             "data": "@train#datalist",
             "patch_level": 0,
             "patch_size": "@region_size",
-            "reader": "cucim",
             "transform": "@train#preprocessing"
         },
         "dataloader": {
@@ -317,7 +318,7 @@
             "data": "@validate#datalist",
             "patch_level": 0,
             "patch_size": "@region_size",
-            "reader": "cucim",
             "transform": "@validate#preprocessing"
         },
         "dataloader": {

     "training_file": "$os.path.join(@bundle_root, 'training.csv')",
     "validation_file": "$os.path.join(@bundle_root, 'validation.csv')",
     "data_root": "/workspace/data/medical/pathology",
+    "wsi_reader": "cuCIM",
     "region_size": [
         768,
         768
             "data": "@train#datalist",
             "patch_level": 0,
             "patch_size": "@region_size",
+            "reader": "@wsi_reader",
             "transform": "@train#preprocessing"
         },
         "dataloader": {
             "data": "@validate#datalist",
             "patch_level": 0,
             "patch_size": "@region_size",
+            "reader": "@wsi_reader",
             "transform": "@validate#preprocessing"
         },
         "dataloader": {

docs/README.md CHANGED Viewed

@@ -22,9 +22,9 @@ Annotation information are adopted from [NCRF/jsons](https://github.com/baidu-re
 ### Data Preparation
-This MMAR expects the training/validation data (whole slide images) reside in `$DATA_ROOT/training/images`. By default `$DATA_ROOT` is pointing to `/workspace/data/medical/pathology/` You can easily modify `$DATA_ROOT` to point to a different directory in `config/environment.json`.
-To reduce the computation burden during the inference, patches are extracted only where there is tissue and ignoring the background according to a tissue mask. You should run `prepare_inference_data.sh` prior to the inference to generate foreground masks, where the input is the whole slide test images and the output is the foreground masks. Please also create a directory for prediction output, aligning with the one specified with `$MMAR_EVAL_OUTPUT_PATH` in `config/environment.json` (e.g. `/eval`)
 Please refer to "Annotation" section of [Camelyon challenge](https://camelyon17.grand-challenge.org/Data/) to prepare ground truth images, which are needed for FROC computation. By default, this data set is expected to be at `/workspace/data/medical/pathology/ground_truths`. But it can be modified in `evaluate_froc.sh`.
@@ -32,13 +32,14 @@ Please refer to "Annotation" section of [Camelyon challenge](https://camelyon17.
 The training was performed with the following:
-- Script: train.sh
 - GPU: at least 16 GB of GPU memory.
 - Actual Model Input: 224 x 224 x 3
 - AMP: True
 - Optimizer: Novograd
 - Learning Rate: 1e-3
 - Loss: BCEWithLogitsLoss
 ## Input
@@ -97,21 +98,12 @@ Export checkpoint to TorchScript file:
 TorchScript conversion is currently not supported.
-# Intended Use
-The model needs to be used with NVIDIA hardware and software. For hardware, the model can run on any NVIDIA GPU with memory greater than 16 GB. For software, this model is usable only as part of Transfer Learning & Annotation Tools in Clara Train SDK container.  Find out more about Clara Train at the [Clara Train Collections on NGC](https://ngc.nvidia.com/catalog/collections/nvidia:claratrainframework).
-**The pre-trained models are for developmental purposes only and cannot be used directly for clinical procedures.**
-# License
-[End User License Agreement](https://developer.nvidia.com/clara-train-eula) is included with the product. Licenses are also available along with the model application zip file. By pulling and using the Clara Train SDK container and downloading models, you accept the terms and conditions of these licenses.
 # References
 [1] He, Kaiming, et al, "Deep Residual Learning for Image Recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016. <https://arxiv.org/pdf/1512.03385.pdf>
 # License
 Copyright (c) MONAI Consortium
 Licensed under the Apache License, Version 2.0 (the "License");

 ### Data Preparation
+This bundle expects the training/validation data (whole slide images) reside in a `{data_root}/training/images`. By default `data_root` is pointing to `/workspace/data/medical/pathology/` You can modify `data_root` in the bundle config files to point to a different directory.
+To reduce the computation burden during the inference, patches are extracted only where there is tissue and ignoring the background according to a tissue mask. Please also create a directory for prediction output. By default `output_dir` is set to `eval` folder under the bundle root.
 Please refer to "Annotation" section of [Camelyon challenge](https://camelyon17.grand-challenge.org/Data/) to prepare ground truth images, which are needed for FROC computation. By default, this data set is expected to be at `/workspace/data/medical/pathology/ground_truths`. But it can be modified in `evaluate_froc.sh`.
 The training was performed with the following:
+- Config file: train.config
 - GPU: at least 16 GB of GPU memory.
 - Actual Model Input: 224 x 224 x 3
 - AMP: True
 - Optimizer: Novograd
 - Learning Rate: 1e-3
 - Loss: BCEWithLogitsLoss
+- Whole slide image reader: cuCIM (if running on Windows or Mac, please install `OpenSlide` on your system and change `wsi_reader` to "OpenSlide")
 ## Input
 TorchScript conversion is currently not supported.
 # References
 [1] He, Kaiming, et al, "Deep Residual Learning for Image Recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016. <https://arxiv.org/pdf/1512.03385.pdf>
 # License
 Copyright (c) MONAI Consortium
 Licensed under the Apache License, Version 2.0 (the "License");

scripts/evaluate_froc.sh ADDED Viewed

	@@ -0,0 +1,20 @@

+#!/usr/bin/env bash
+LEVEL=6
+SPACING=0.243
+READER=openslide
+EVAL_DIR=../eval
+GROUND_TRUTH_DIR=/workspace/data/medical/pathology/ground_truths
+echo "=> Level= ${LEVEL}"
+echo "=> Spacing = ${SPACING}"
+echo "=> WSI Reader: ${READER}"
+echo "=> Evaluation output directory: ${EVAL_DIR}"
+echo "=> Ground truth directory: ${GROUND_TRUTH_DIR}"
+python3 ./lesion_froc.py \
+    --level $LEVEL \
+    --spacing $SPACING \
+    --reader $READER \
+    --eval-dir ${EVAL_DIR} \
+    --ground-truth-dir ${GROUND_TRUTH_DIR}

scripts/lesion_froc.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import argparse
+import os
+from monai.apps.pathology import LesionFROC
+def full_path(dir: str, file: str):
+    return os.path.normpath(os.path.join(dir, file))
+def load_data(ground_truth_dir: str, eval_dir: str, level: int, spacing: float):
+    # Get the list of probability map result files
+    prob_files = os.listdir(eval_dir)
+    # read the dataset and create an eval_dataset based on that.
+    eval_dataset = []
+    for prob_name in prob_files:
+        if prob_name.endswith(".npy"):
+            sample = {
+                "tumor_mask": full_path(ground_truth_dir, prob_name.replace("npy", "tif")),
+                "prob_map": full_path(eval_dir, prob_name),
+                "level": level,
+                "pixel_spacing": spacing,
+            }
+            eval_dataset.append(sample)
+    return eval_dataset
+def evaluate_froc(data, reader):
+    lesion_froc = LesionFROC(data, image_reader_name=reader)
+    score = lesion_froc.evaluate()
+    return score
+if __name__ == "__main__":
+    # Parse command line arguments
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-s", "--spacing", type=float, default=0.243, dest="spacing")
+    parser.add_argument("-l", "--level", type=int, default=6, dest="level")
+    parser.add_argument("-r", "--reader", type=str, default="cucim", dest="reader")
+    parser.add_argument("-e", "--eval-dir", type=str, dest="eval_dir")
+    parser.add_argument("-g", "--ground-truth-dir", type=str, dest="ground_truth_dir")
+    args = parser.parse_args()
+    # prepare FROC input data
+    data = load_data(args.ground_truth_dir, args.eval_dir, args.level, args.spacing)
+    if len(data) < 1:
+        raise RuntimeError(f"No probability map result found in '{args.eval_dir}' with '.npy' extension.")
+    # evaluate FROC
+    score = evaluate_froc(data, args.reader)
+    with open(full_path(args.eval_dir, "froc_score.txt"), "w") as f:
+        f.write(f"FROC Score: {score}\n")
+    print(f"FROC Score: {score}")