File size: 3,879 Bytes
f36ecac
 
 
7337040
 
c989827
 
 
 
fc6481a
 
 
 
 
 
 
 
 
 
 
 
 
90d0abe
 
 
 
 
 
 
 
 
 
 
 
fc6481a
 
 
 
c989827
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
---
license: openrail
---
## ADetailer Custom Model
### Foot Model (YOLOv8x)
![foot v.1.0 card](https://huggingface.co/MonetEinsley/ADetailer_CM/resolve/main/Examples/Tharja%20Example.png)
![Tharja Feet Preview](https://huggingface.co/MonetEinsley/ADetailer_CM/resolve/main/Examples/Preview%20Image.png)
![Tharja output](https://huggingface.co/MonetEinsley/ADetailer_CM/resolve/main/Examples/00026-1935383540.png)

Thanks to sp00ns' guide:
[Training a Custom Adetailer Model | Civitai](https://civitai.com/articles/1224/training-a-custom-adetailer-model)
I created a custom foot model using yolov8x.

The foot model that sp00ns provided was helpful, but I wanted to see about making my own.

I'd tried using AutoDistiller and Grounded SAM to automatically label each of the 1000 images, but it partially failed, in that it also registered hands as feet. (Also I hate Colab, as I can't get work done there without it ending the job prematurely)
Therefore, I painstakingly labeled each and every image using RectLabel on my Mac, then spent about 8 hours training the YOLO model on my PC.

Though I'd planned for 500 epochs, it ended early and determined that the best was at the 93rd epoch.

I included a lot of my own generated images, as well as some stock images; anime, 3D models, and realistic images; male and female, varying skin tones, and various footwear configurations as well as barefoot images. That being said, there are some things it still cannot handle well, such as unconventional poses (like images rotated by 90 degrees), and images where the foot is the subject of composition. My guess is because the vast majority of the training images were of that with the feet taking up a small percentage of the canvas, not enough training was dedicated for closeups of feet. On the other hand, my intent was to use this model to refine the feet that isn't the focus.

### Version 2.0
V2 was done as a result of me noticing that I mislabeled my training/validation folders (i.e. my training folder was actually my validation folder, and vice-versa). So, I renamed the folders to what they're supposed to be, and I migrated a few of the images from the validation folder into the training folder, and added ~160 new images into the training dataset. I set the epochs for 200, but it determined that epoch no.
148 was the best, so this is what it is.

All the images below were detected with this version 2 model.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64321f3971bf2c8bcf6e7b74/2STuPS0QzVd8bdkrJvVWb.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64321f3971bf2c8bcf6e7b74/kukSUP3mpIujIeFZqVbG7.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64321f3971bf2c8bcf6e7b74/oe3aRssKDjiYtq_XCZuRI.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64321f3971bf2c8bcf6e7b74/3ceCfPytgfHqljuY9wC5p.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64321f3971bf2c8bcf6e7b74/CmjPSbUts7QoBRrMaHN51.png)

## Installation
Simply move the file into the ~\stable-diffusion-webui\models\adetailer folder and restart the webui. Should also work on ComfyUI, but I haven't tested it there. Of course, you'll need the ADetailer extension for Automatic 1111, or its equivalent on ComfyUI for any of this to work.

## Note
These days stable diffusion seems to be good at doing feet at least in portrait aspect ratios, so I had a hard time coming up with a good use case for portrait. So I instead used the model to paint Tharja's toenails in the example. But, this model will be especially good for landscape aspect ratios similar to what I do normally, as the feet tend to be quite low quality there.

In case anyone was wondering, I used other ADetailer models for the hands and face for the example images above for Tharja, this is why the faces and hands appear different.