File size: 3,056 Bytes
4d66044
 
 
b9b594a
 
 
c88564b
 
e893b66
c88564b
e893b66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ab7e819
e893b66
 
 
 
ab7e819
e893b66
 
 
ab7e819
e893b66
 
a3ff9b2
 
e893b66
 
 
 
a3ff9b2
e893b66
 
 
a3ff9b2
 
e893b66
 
 
 
 
ab7e819
e893b66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b9b594a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
license: cc0-1.0
tags:
- art
- computer vision
- Image segmentation
---

# DeepLabV3+ ResNet50 for human body parts segmentation

This is a very simple ONNX model that can segment human body parts.

## Why this model

This model is a ONNX transposition of [keras-io/deeplabv3p-resnet50](https://huggingface.co/keras-io/deeplabv3p-resnet50)
where the provided model can segment human body parts. All the others models that I found was trained on
city segmentation.

The original model is built for old version of Keras and cannot be used with recent version of TensorFlow.
I translated the model to ONNX format.

## Usage

Get the `deeplabv3p-resnet50-human.onnx` file and use it with ONNXRuntime package.

The result of `model.run` is a `(1, 1, 512, 512, 20)` tensor:

- 1: number of output (you can squeeze it)
- 1: batch size (you can squeeze it)
- 512, 512: the size of the image (fixed)
- 20: number of classes, so you can take the `argmax`` of the tensor to get the class of each pixel

```python
import onnxruntime
import numpy as np
from PIL import Image

model = onnxruntime.InferenceSession("deeplabv3p-resnet50-human.onnx")

img = Image.open(sys.argv[1] if len(sys.argv) > 1 else "image.jpg")
img = img.resize((512, 512))
img = np.array(img).astype(np.float32) / 127.5 - 1

# infer
input_name = model.get_inputs()[0].name
output_name = model.get_outputs()[0].name
result = model.run([output_name], {input_name: img})

# squeeze, argmax...
result = np.array(result[0])
# argmax the classes, remove the batch size
result = result.argmax(axis=3).squeeze(0)

# get the masks
for i in range(20):
    detected = result == i # get the detected pixels for the class i
    # detected  is a 512, 512 boolean array
    mask = np.zeros_like(img)
    mask[detected] = 255
    Image.fromarray(mask).show() # or save, or return the mask...
```

## Classes index

This is the list of classes that the model can detect (some classes are not specifically identified, see below):

- 0: "background",
- 1: "unknown",
- 2: "hair",
- 3: "unknown",
- 4: "glasses",
- 5: "top-clothes",
- 6: "unknown",
- 7: "unknown",
- 8: "unknown",
- 9: "bottom-clothes",
- 10: "torso-skin",
- 11: "unknown",
- 12: "unknown",
- 13: "face",
- 14: "left-arm",
- 15: "right-arm",
- 16: "left-leg",
- 17: "right-leg",
- 18: "left-foot",
- 19: "right-foot",

## Known limitation

- The model could fail on portrait images, because the model was trained on "full body" images.
- There are some classes that I don't know what they are. I can't find the list of classes (help !).
- The model is not perfect, and can fail on some images. I'm not the author of the model, so I can't fix it.

## License

The [original model card](https://huggingface.co/keras-io/deeplabv3p-resnet50/blob/main/README.md) proposes the "CC0-1.0"
license. I don't know if it's the right license for the model, but I keep it.

> Anyway, thanks to the authors of the model for sharing it and to leave it open to use.

This means that you may use the model, share, modify, and distribute it without any restriction.