|
--- |
|
license: etalab-2.0 |
|
tags: |
|
- pytorch |
|
- segmentation |
|
- point clouds |
|
- aerial lidar scanning |
|
- IGN |
|
model-index: |
|
- name: FRACTAL-LidarHD_7cl_randlanet |
|
results: |
|
- task: |
|
type: semantic-segmentation |
|
dataset: |
|
name: IGNF/FRACTAL |
|
type: point-cloud-segmentation-dataset |
|
metrics: |
|
- name: mIoU |
|
type: mIoU |
|
value: 77.5 |
|
- name: IoU Other |
|
type: IoU |
|
value: 47.5 |
|
- name: IoU Ground |
|
type: IoU |
|
value: 91.9 |
|
- name: IoU Vegetation |
|
type: IoU |
|
value: 93.8 |
|
- name: IoU Building |
|
type: IoU |
|
value: 90.4 |
|
- name: IoU Water |
|
type: IoU |
|
value: 90.1 |
|
- name: IoU Bridge |
|
type: IoU |
|
value: 65.2 |
|
- name: IoU Permanent Structure |
|
type: IoU |
|
value: 63.5 |
|
- task: |
|
type: semantic-segmentation |
|
dataset: |
|
name: eval67 (secret test set) |
|
type: point-cloud-segmentation-dataset |
|
metrics: |
|
- name: mIoU |
|
type: mIoU |
|
value: 60.8 |
|
- name: IoU Other |
|
type: IoU |
|
value: 22.3 |
|
- name: IoU Ground |
|
type: IoU |
|
value: 90.7 |
|
- name: IoU Vegetation |
|
type: IoU |
|
value: 91.4 |
|
- name: IoU Building |
|
type: IoU |
|
value: 86.9 |
|
- name: IoU Water |
|
type: IoU |
|
value: 77.7 |
|
- name: IoU Bridge |
|
type: IoU |
|
value: 38.0 |
|
- name: IoU Permanent Structure |
|
type: IoU |
|
value: 16.6 |
|
--- |
|
|
|
<div style="border:1px solid black; padding:25px; background-color:#FDFFF4 ; padding-top:10px; padding-bottom:1px;"> |
|
<h1>FRACTAL-LidarHD_7cl_randlanet</h1> |
|
<p>The general characteristics of this specific model <strong>FRACTAL-LidarHD_7cl_randlanet</strong> are :</p> |
|
<ul style="list-style-type:disc;"> |
|
<li>Trained with the FRACTAL dataset for the semantic segmentation of Lidar HD point clouds</li> |
|
<li>Aerial lidar point clouds, colorized with rgb + near-infrared, with high point density (~40 pts/m²)</li> |
|
<li>RandLa-Net architecture as implemented in the Myria3D library</li> |
|
<li>7 class nomenclature : other, ground, vegetation, building, water, bridge, permanent structure</li> |
|
</ul> |
|
</div> |
|
|
|
## Model Informations |
|
- **Code repository:** https://github.com/IGNF/myria3d (V3.8) |
|
- **Paper:** TBD |
|
- **Developed by:** IGN |
|
- **Compute infrastructure:** |
|
- software: python, pytorch-lightning |
|
- hardware: in-house HPC/AI resources |
|
- **License:** : Etalab 2.0 |
|
|
|
--- |
|
|
|
## Uses |
|
The model was specifically trained for the **semantic segmentation of aerial lidar point clouds from the [Lidar HD program (2020-2025)](https://geoservices.ign.fr/lidarhd)**. |
|
|
|
**_Aerial Lidar scene understanding_**: the model is designed for the segmentation of aerial lidar point clouds into 7 classes: other | ground | vegetation | building | water | bridge | permanent structure. |
|
While the model could be applied to other types of point clouds (mobile, terrestrial), aerial lidar scanning has specific geometric specifications (occlusions, homogeneous densities, variable scanner angle...). |
|
Furthermore, the aerial images used for point cloud colorization (from the ([BD ORTHO®](https://geoservices.ign.fr/bdortho)), have their own spatial and radiometric specifications. |
|
Therefore, the model is best optimized for aerial lidar point clouds with similar densities and colorimetries than the original ones. |
|
|
|
|
|
## Bias, Risks, Limitations and Recommendations |
|
|
|
**_Spatial Generalization_**: The FRACTAL dataset used for training covers 5 spatial domains from 5 southern regions of metropolitan France. |
|
While large and diverse, the dataset covers only a fraction of the French territory, and are not representative of its full diversity (landscapes, hardscapes, human-made objects...). |
|
Adequate verifications and evaluations should be done when applied to new spatial domains. |
|
|
|
**_Using the model for other data sources_**: The model was trained on Lidar HD data that was colorized with very high resolution aerial images from the ORTHO HR database. |
|
The data sources have their specificities in terms of resolution and spectral domains. Users can expect a drop in performance with other 3D and 2D data sources. |
|
This being said, while domain shifts are frequent for aerial imageries due to different acquisition conditions and downstream data processing, |
|
aerial lidar point clouds of comparable point densities (~40 pts/m²) are expected to have more consistent geometric characteristiques across spatial domains. |
|
|
|
--- |
|
|
|
## How to Get Started with the Model |
|
|
|
Model was trained in an open source deep learning code repository developped in-house: [github.com/IGNF/myria3d](https://github.com/IGNF/myria3d)). |
|
Inference is only supported in this library, and inference instructions are detailed in the code repository documentation. |
|
Patched inference from large point clouds (e.g. 1 x 1 km Lidar HD tiles) is supported, with or without (by default) overlapping sliding windows. |
|
The original point cloud is augmented with several dimensions: a PredictedClassification dimension, an entropy dimension, and (optionnaly) class probability dimensions (e.g. building, ground...). |
|
For convenience and scalable model deployment, Myria3D comes with a Dockerfile. |
|
|
|
--- |
|
|
|
## Training Details |
|
|
|
The data comes from the Lidar HD program, more specifically from acquisition areas that underwent automated classification followed by manual correction |
|
(so-called "optimized Lidar HD"). |
|
It meets the quality requirements of the Lidar HD program, which accepts a controlled level of classification errors for each semantic class. |
|
The model was trained on FRACTAL, a benchmark dataset for semantic segmentation. FRACTAL contains 250 km² of data sampled from an original 17440 km² area, with |
|
a large diversity of landscapes and scenes. |
|
|
|
|
|
### Training Data |
|
|
|
80,000 point cloud patches of 50 x 50 meters each (200 km²) were used to train the **FRACTAL-LidarHD_7cl_randlanet** model. |
|
10,000 additional patches (25 km²) were used for model validation. |
|
|
|
### Training Procedure |
|
|
|
#### Preprocessing |
|
|
|
Point clouds were preprocessed for training with point subsampling, filtering of artefacts points, on-the-fly creation of colorimetric features, and normalization of features and coordinates. |
|
For inference, a preprocessing as close as possible should be used. Refer to the inference configuration file, and to the Myria3D code repository (V3.8). |
|
|
|
#### Training Hyperparameters |
|
```yaml |
|
- Model architecture: RandLa-Net (implemented with the Pytorch-Geometric framework in [Myria3D](https://github.com/IGNF/myria3d/blob/main/myria3d/models/modules/pyg_randla_net.py)) |
|
- Augmentation : |
|
- VerticalFlip(p=0.5) |
|
- HorizontalFlip(p=0.5) |
|
- Features: |
|
- Lidar: x, y, z, echo number (1-based numbering), number of echos, reflectance (a.k.a intensity) |
|
- Colors: |
|
- Original: RGB + Near-Infrared (colorization from aerial images by vertical pixel-point alignement) |
|
- Derived: average color = (R+G+B)/3 and NDVI. |
|
- Input preprocessing: |
|
- grid sampling: 0.25 m |
|
- random sampling: 40,000 (if higher) |
|
- horizontal normalization: mean xy substraction |
|
- vertical normalization: min z substraction |
|
- coordinates normalization: division by 25 meters |
|
- basic occlusion model: nullify color channels if echo_number > 1 |
|
- features scaling (0-1 range): |
|
- echo number and number of echos: division by 7 |
|
- color (r, g, b, near-infrared, average color): division by 65280 (i.e. 255*256) |
|
- features normalization: |
|
- reflectance: log-normalization, standardization, clipping of amplitude above 3 standard deviations. |
|
- average color: same as reflectance. |
|
- Batch size: 10 (x 6 GPUs) |
|
- Number of epochs : 100 (min) - 150 (max) |
|
- Early stopping : patience 6 and val_loss as monitor criterium |
|
- Loss: Cross-Entropy |
|
- Optimizer : Adam |
|
- Scheduler : mode = "min", factor = 0.5, patience = 20, cooldown = 5 |
|
- Learning rate : 0.004 |
|
``` |
|
|
|
#### Speeds, Sizes, Times |
|
|
|
The **FRACTAL-LidarHD_7cl_randlanet** model was trained on an in-house HPC cluster. 6 V100 GPUs were used (2 nodes, 3 GPUS per node). With this configuration the approximate learning time is 30 minutes per epoch. |
|
The model was obtained for num_epoch=21 with corresponding val_loss=0.112. |
|
|
|
<div style="position: relative; text-align: center;"> |
|
<img src="FRACTAL-LidarHD_7cl_randlanet-train_val_losses.excalidraw.png" alt="train and val losses" style="width: 60%; display: block; margin: 0 auto;"/> |
|
</div> |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
The model was evaluated on the 10,000 data patches of the test set of the FRACTAL dataset, |
|
that are independant from train and val patches, and sampled from distinct areas in the five spatial domains of the dataset. |
|
The diversity of landscapes and scenes of the test set should closely match the one of the train and val sets. |
|
|
|
#### Metrics |
|
|
|
The **FRACTAL-LidarHD_7cl_randlanet** model achieves a performance of **mIoU=77.2%** and **OA=96.1%**. |
|
|
|
The following table gives the class-wise metrics on the test set: |
|
|
|
**Class**|**IoU**|**Accuracy**|**Precision**|**Recall**|**F1** |
|
-----|---|--------|---------|------|--- |
|
**other**|47.5|54.9|77.8|54.9|64.4 |
|
**ground**|91.9|97.7|93.8|97.7|95.8 |
|
**vegetation**|93.8|95.6|98.0|95.6|96.8 |
|
**building**|90.4|93.7|96.2|93.7|95.0 |
|
**water**|90.1|92.6|97.1|92.6|94.8 |
|
**bridge**|65.2|78.6|79.3|78.6|79.0 |
|
**permanent structure**|63.5|76.6|78.9|76.6|77.7 |
|
**Macro Average**|77.5|84.2|88.7|84.2|86.2 |
|
|
|
|
|
The following illustration gives the resulting confusion matrix : |
|
* Left : normalised acording to rows: rows sum at 100% and the **recall** is on the diagonal of the matrix |
|
* Right : normalised acording to columns: columns sum at 100% and the **precision** is on the diagonal of the matrix |
|
|
|
<div style="position: relative; text-align: center;"> |
|
<p style="margin: 0;">Normalized Confusion Matrices. (a) Recall, (b) Precision)</p> |
|
<img src="FRACTAL-LidarHD_7cl_randlanet-recall_confusion_matrix.excalidraw.png" alt="Confusion matrices" style="width: 70%; display: block; margin: 0 auto;"/> |
|
</div> |
|
|
|
|
|
### Results |
|
|
|
From test patches with at least 10k points (i.e. at least 4 pts/m²), we sample patches without cherry-picking, |
|
to match matches with the following metadata: a) URBAN, b) WATER & BRIDGE, c) OTHER_PARKING, d) BUILD_GREENHOUSE, e) HIGHSLOPE. |
|
|
|
<div style="position: relative; text-align: center;"> |
|
<p style="margin: 0;">Input point cloud, target classification, and model prediction for a subset of patches from the test set of FRACTAL.</p> |
|
<img src="FRACTAL-LidarHD_7cl_randlanet-sample_predictions.excalidraw.png" alt="Sample input pc, target, and predictions" style="width: 70%; display: block; margin: 0 auto;"/> |
|
</div> |
|
|
|
--- |
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
|
|
``` |
|
@misc{gaydon2024fractal, |
|
title={FRACTAL: An Ultra-Large-Scale Aerial Lidar Dataset for 3D Semantic Segmentation of Diverse Landscapes}, |
|
author={Charles Gaydon and Michel Daab and Floryne Roche}, |
|
year={2024}, |
|
eprint={TBD}, |
|
archivePrefix={arXiv}, |
|
url={https://arxiv.org/abs/TBD} |
|
primaryClass={cs.CV} |
|
} |
|
|
|
``` |
|
|
|
## Contact : TBD |
|
|