Image-to-3D
English
make-a-shape
mv-to-3d
File size: 6,595 Bytes
feb894c
 
 
 
 
 
 
 
 
 
72ff5b1
feb894c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12366ec
feb894c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72ff5b1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
'[object Object]': null
language:
- en
license: other
license_name: autodesk-non-commercial-3d-generative-v1.0
license_link: LICENSE.md
tags:
- make-a-shape
- mv-to-3d
pipeline_tag: image-to-3d
---
---
# Model Card for Make-A-Shape Multi-View to 3D Model

This model is part of the Make-A-Shape paper, capable of generating high-quality 3D shapes from multi-view images with intricate geometric details, realistic structures, and complex topologies.

## Model Details

### Model Description

Make-A-Shape is a novel 3D generative framework trained on an extensive dataset of over 10 million publicly-available 3D shapes. The multi-view to 3D model is one of the conditional generation models in this framework. It can efficiently generate a wide range of high-quality 3D shapes from four view-specific images as inputs. The model uses a wavelet-tree representation and adaptive training strategy to achieve superior performance in terms of geometric detail and structural plausibility.

- **Developed by:** Ka-Hei Hui, Aditya Sanghi, Arianna Rampini, Kamal Rahimi Malekshan, Zhengzhe Liu, Hooman Shayani, Chi-Wing Fu
- **Model type:** 3D Generative Model
- **License:** Autodesk Non-Commercial (3D Generative) v1.0

For more information please look at the [Project](https://www.research.autodesk.com/publications/generative-ai-make-a-shape/) [Page](https://edward1997104.github.io/make-a-shape/) and [the ICML paper](https://proceedings.mlr.press/v235/hui24a.html).

### Model Sources

- **Repository:** [https://github.com/AutodeskAILab/Make-a-Shape](https://github.com/AutodeskAILab/Make-a-Shape)
- **Paper:** [ArXiv:2401.11067](https://arxiv.org/abs/2401.11067),   [ICML - Make-A-Shape: a Ten-Million-scale 3D Shape Model](https://proceedings.mlr.press/v235/hui24a.html)
- **Demo:** [Google Colab](https://colab.research.google.com/drive/1XIoeanLjXIDdLow6qxY7cAZ6YZpqY40d?usp=sharing)

## Uses 

### Direct Use 

This model is released by Autodesk and intended for academic and research purposes only for the theoretical exploration and demonstration of the Make-a-Shape 3D generative framework.  Please see [here](https://github.com/AutodeskAILab/Make-a-Shape?tab=readme-ov-file#multi-view-to-3d) for inferencing instructions. 

### Out-of-Scope Use 

The model should not be used for:

- Commercial purposes 

- Creation of load-bearing physical objects the failure of which could cause property damage or personal injury 

- Any usage not in compliance with the [license](https://huggingface.co/ADSKAILab/Make-A-Shape-multi-view-20m/blob/main/LICENSE.md), in particular, the "Acceptable Use" section. 

## Bias, Risks, and Limitations 

### Bias 

- The model may inherit biases present in the publicly-available training datasets, which could lead to uneven representation of certain object types or styles. 

- The model's performance may degrade for object categories or styles that are underrepresented in the training data. 

### Risks and Limitations 

- The quality of the generated 3D output may be impacted by the quality and clarity of the input image. 

- The model may occasionally generate implausible shapes, especially when the input image is ambiguous or of low quality.  Even theoretically plausible shapes should not be relied upon for real-world structural soundness.  

## How to Get Started with the Model 

Please refer to the instructions [here](https://github.com/AutodeskAILab/Make-a-Shape?tab=readme-ov-file#multi-view-to-3d).

## Training Details 

### Training Data 

The model was trained on a dataset of over 10 million 3D shapes aggregated from 18 different publicly-available sub-datasets, including ModelNet, ShapeNet, SMPL, Thingi10K, SMAL, COMA, House3D, ABC, Fusion 360, 3D-FUTURE, BuildingNet, DeformingThings4D, FG3D, Toys4K, ABO, Infinigen, Objaverse, and two subsets of ObjaverseXL (Thingiverse and GitHub).

### Training Procedure

#### Preprocessing 

Each 3D shape in the dataset was converted into a truncated signed distance function (TSDF) with a resolution of 256³. The TSDF was then decomposed using a discrete wavelet transform to create the wavelet-tree representation used by the model.

#### Training Hyperparameters

- **Training regime:** Please refer to the paper.

#### Speeds, Sizes, Times 

- The model was trained on 48 × A10G GPUs for about 20 days, amounting to around 23,000 GPU hours.
- The model can generate shapes within two seconds for most conditions.

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

The model was evaluated on a test set consisting of 2% of the shapes from each sub-dataset in the training data, as well as on the entire Google Scanned Objects (GSO) dataset, which was not part of the training data.

#### Factors

The evaluation considered various factors such as the quality of generated shapes, the ability to capture fine details and complex structures, and the model's performance across different object categories.

#### Metrics

The model was evaluated using the following metrics:
- Intersection over Union (IoU)
- Light Field Distance (LFD)
- Chamfer Distance (CD)

### Results

The multi-view to 3D model achieved the following results on the "Our Val" dataset:
- LFD: 2217.25
- IoU: 0.6707
- CD: 0.00350

On the GSO dataset:
- LFD: 1890.85
- IoU: 0.7460
- CD: 0.00337


## Technical Specifications 

### Model Architecture and Objective 

The model uses a U-ViT architecture with learnable skip-connections between the convolution and deconvolution blocks. It employs a wavelet-tree representation and a subband adaptive training strategy to effectively capture both coarse and fine details of 3D shapes. 

### Compute Infrastructure

#### Hardware

The model was trained on 48 × A10G GPUs.

## Citation 

**BibTeX:**
```latex
@InProceedings{pmlr-v235-hui24a,
  title = 	 {Make-A-Shape: a Ten-Million-scale 3{D} Shape Model},
  author =       {Hui, Ka-Hei and Sanghi, Aditya and Rampini, Arianna and Rahimi Malekshan, Kamal and Liu, Zhengzhe and Shayani, Hooman and Fu, Chi-Wing},
  booktitle = 	 {Proceedings of the 41st International Conference on Machine Learning},
  pages = 	 {20660--20681},
  year = 	 {2024},
  editor = 	 {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
  volume = 	 {235},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {21--27 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://raw.githubusercontent.com/mlresearch/v235/main/assets/hui24a/hui24a.pdf},
  url = 	 {https://proceedings.mlr.press/v235/hui24a.html},
}
```