File size: 3,056 Bytes
4d66044 b9b594a c88564b e893b66 c88564b e893b66 ab7e819 e893b66 ab7e819 e893b66 ab7e819 e893b66 a3ff9b2 e893b66 a3ff9b2 e893b66 a3ff9b2 e893b66 ab7e819 e893b66 b9b594a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
license: cc0-1.0
tags:
- art
- computer vision
- Image segmentation
---
# DeepLabV3+ ResNet50 for human body parts segmentation
This is a very simple ONNX model that can segment human body parts.
## Why this model
This model is a ONNX transposition of [keras-io/deeplabv3p-resnet50](https://huggingface.co/keras-io/deeplabv3p-resnet50)
where the provided model can segment human body parts. All the others models that I found was trained on
city segmentation.
The original model is built for old version of Keras and cannot be used with recent version of TensorFlow.
I translated the model to ONNX format.
## Usage
Get the `deeplabv3p-resnet50-human.onnx` file and use it with ONNXRuntime package.
The result of `model.run` is a `(1, 1, 512, 512, 20)` tensor:
- 1: number of output (you can squeeze it)
- 1: batch size (you can squeeze it)
- 512, 512: the size of the image (fixed)
- 20: number of classes, so you can take the `argmax`` of the tensor to get the class of each pixel
```python
import onnxruntime
import numpy as np
from PIL import Image
model = onnxruntime.InferenceSession("deeplabv3p-resnet50-human.onnx")
img = Image.open(sys.argv[1] if len(sys.argv) > 1 else "image.jpg")
img = img.resize((512, 512))
img = np.array(img).astype(np.float32) / 127.5 - 1
# infer
input_name = model.get_inputs()[0].name
output_name = model.get_outputs()[0].name
result = model.run([output_name], {input_name: img})
# squeeze, argmax...
result = np.array(result[0])
# argmax the classes, remove the batch size
result = result.argmax(axis=3).squeeze(0)
# get the masks
for i in range(20):
detected = result == i # get the detected pixels for the class i
# detected is a 512, 512 boolean array
mask = np.zeros_like(img)
mask[detected] = 255
Image.fromarray(mask).show() # or save, or return the mask...
```
## Classes index
This is the list of classes that the model can detect (some classes are not specifically identified, see below):
- 0: "background",
- 1: "unknown",
- 2: "hair",
- 3: "unknown",
- 4: "glasses",
- 5: "top-clothes",
- 6: "unknown",
- 7: "unknown",
- 8: "unknown",
- 9: "bottom-clothes",
- 10: "torso-skin",
- 11: "unknown",
- 12: "unknown",
- 13: "face",
- 14: "left-arm",
- 15: "right-arm",
- 16: "left-leg",
- 17: "right-leg",
- 18: "left-foot",
- 19: "right-foot",
## Known limitation
- The model could fail on portrait images, because the model was trained on "full body" images.
- There are some classes that I don't know what they are. I can't find the list of classes (help !).
- The model is not perfect, and can fail on some images. I'm not the author of the model, so I can't fix it.
## License
The [original model card](https://huggingface.co/keras-io/deeplabv3p-resnet50/blob/main/README.md) proposes the "CC0-1.0"
license. I don't know if it's the right license for the model, but I keep it.
> Anyway, thanks to the authors of the model for sharing it and to leave it open to use.
This means that you may use the model, share, modify, and distribute it without any restriction. |