Omnidata (Steerable Datasets)
A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans (ICCV 2021)
Project Website
路 Paper
路 >> [Github] <<
路 Data
路 Pretrained Weights
路 Annotator
路
DPT-Hybrid trained for surface normal estimation or depth estimation
Vision Transformer (ViT) model trained using a DPT (Dense Prediction Transformer) decoder.
Intended uses & limitations
You can use this model for monocular surface normal estimation or depth estimation.
- Normal: estimates surface normals, a unit vector representing the tangent plane of the surface at each pixel.
- Depth: estimates normalized depth, a relative depth rather then metric depth.
Models
Models to estimate surface depth from RGB images.
- Architecture: DPT
- Training resolutions: 384x384
- Training data: Omnidate dataset
- Input:
- Dimensions: 384x384
- Normalization: (normals: [0, 1], depth: [-1,1])
BibTeX entry and citation info
@inproceedings{eftekhar2021omnidata,
title={Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets From 3D Scans},
author={Eftekhar, Ainaz and Sax, Alexander and Malik, Jitendra and Zamir, Amir},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={10786--10796},
year={2021}
}
In case you use our latest pretrained models please also cite the following paper for 3D data augmentations:
@inproceedings{kar20223d,
title={3D Common Corruptions and Data Augmentation},
author={Kar, O{\u{g}}uzhan Fatih and Yeo, Teresa and Atanov, Andrei and Zamir, Amir},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={18963--18974},
year={2022}
}
...were you looking for the research paper or project website?