Omnidata (Steerable Datasets)

A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans (ICCV 2021)

Project Website · Paper · >> [Github] << · Data · Pretrained Weights · Annotator ·

DPT-Hybrid trained for surface normal estimation or depth estimation

Vision Transformer (ViT) model trained using a DPT (Dense Prediction Transformer) decoder.

Intended uses & limitations

You can use this model for monocular surface normal estimation or depth estimation.

Normal: estimates surface normals, a unit vector representing the tangent plane of the surface at each pixel.
Depth: estimates normalized depth, a relative depth rather then metric depth.

Models

Models to estimate surface depth from RGB images.

Architecture: DPT
Training resolutions: 384x384
Training data: Omnidate dataset
Input:
- Dimensions: 384x384
- Normalization: (normals: [0, 1], depth: [-1,1])

BibTeX entry and citation info

@inproceedings{eftekhar2021omnidata,
  title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},
  author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={10786--10796},
  year={2021}
}

In case you use our latest pretrained models please also cite the following paper for 3D data augmentations:

@inproceedings{kar20223d,
  title={3D Common Corruptions and Data Augmentation},
  author={Kar, O{\u{g}}uzhan Fatih and Yeo, Teresa and Atanov, Andrei and Zamir, Amir},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={18963--18974},
  year={2022}
}

...were you looking for the research paper or project website?