Update README.md
Browse files
README.md
CHANGED
@@ -1,20 +1,12 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
-
|
5 |
-
|
6 |
-
Experimental opinionated aesthetic score models.
|
7 |
|
8 |
## CityAesthetics - Anime
|
9 |
|
10 |
[Training/inference code](https://github.com/city96/CityClassifiers) | [Live Demo](https://huggingface.co/spaces/city96/CityAesthetics-demo)
|
11 |
|
12 |
-
### Design goals
|
13 |
-
|
14 |
-
The goal was to create an aesthetic predictor that can work well on one specific type of image (in this case, anime) while filtering out everything else. To achieve this, the model was trained on a set of 3080 hand-scored images with multiple refinement steps, where false positives and negatives would be added to the training set with corrected scores after each test run.
|
15 |
-
|
16 |
-
This model focuses on as few false positives as possible. Only having one type of media seems to help with this, as predictors that attempt to do both real life and 2D images tend to produce false positives. If one were to have a mixed dataset with both types of images, then the simplest solution would be to use two separate aesthetic score models and a classifier to pick the appropriate one to use.
|
17 |
-
|
18 |
#### Intentional biases
|
19 |
|
20 |
- Completely negative towards real life photos (ideal score of 0%)
|
@@ -29,58 +21,3 @@ This model focuses on as few false positives as possible. Only having one type o
|
|
29 |
- Noticeable positive bias towards anime characters with animal ears
|
30 |
- Hit-or-miss with AI generated images due to style/quality not being correlated
|
31 |
|
32 |
-
#### Out-of-scope
|
33 |
-
|
34 |
-
- This model is not meant for moderation/live filtering/etc
|
35 |
-
- The demo code is not meant to work with large-scale datasets and is therefore only single-threaded. If you're working on something that requires an optimized version that can work on pre-computed CLIP embeddings for faster iteration, feel free to [contact me](mailto:[email protected]).
|
36 |
-
|
37 |
-
### Usecases
|
38 |
-
|
39 |
-
The main usecase will be to provide baseline filtering on large datasets (i.e. a high pass filter). For this, the score brackets were decided as follows:
|
40 |
-
|
41 |
-
- <10% - Real life photos, noise, excessive text (subtitles, memes, etc)
|
42 |
-
- 10-20% - Manga panels, images with no subject, non-human subjects
|
43 |
-
- 20-40% - Sketches, oekaki, rough lineart (score depends on quality)
|
44 |
-
- 40-50% - Flat shading, TV anime screenshots, average images
|
45 |
-
- \>50% - "High quality" images based on my personal style preferences
|
46 |
-
|
47 |
-
The \>60% score range is intended to help pick out the "best" images from a dataset. One could use it to filter by score (i.e. using it as a band pass filter), but the scores above 50% are a lot more vague. Instead, I'd recommend sorting the dataset by score instead and setting a limit on the total number of images to select.
|
48 |
-
|
49 |
-
Top 100 images from a subset of danbooru2021:
|
50 |
-
|
51 |
-
![AesPredv17_T100C](https://github.com/city96/CityClassifiers/assets/125218114/b7d8a167-a53a-46bb-8737-6c6c2a04f50f)
|
52 |
-
|
53 |
-
### Training
|
54 |
-
|
55 |
-
The training script provided is initialized with the current model settings as the defaults (7e-6 LR, cosine scheduler, 100K steps).
|
56 |
-
|
57 |
-
![loss](https://github.com/city96/CityClassifiers/assets/125218114/611ae144-1390-48d3-988d-59a03c4a2f26)
|
58 |
-
|
59 |
-
Final dataset score distribution for v1.8:
|
60 |
-
```
|
61 |
-
3080 images in dataset.
|
62 |
-
0 - 31 |
|
63 |
-
1 - 162 |||||
|
64 |
-
2 - 533 |||||||||||||||||
|
65 |
-
3 - 675 |||||||||||||||||||||
|
66 |
-
4 - 690 ||||||||||||||||||||||
|
67 |
-
5 - 576 ||||||||||||||||||
|
68 |
-
6 - 228 |||||||
|
69 |
-
7 - 95 |||
|
70 |
-
8 - 54 |
|
71 |
-
9 - 29
|
72 |
-
10 - 7
|
73 |
-
raw - 0
|
74 |
-
```
|
75 |
-
|
76 |
-
Version history:
|
77 |
-
|
78 |
-
- v1.0 - Initial test model with ~150 images to test viability
|
79 |
-
- v1.1 - Initialized top 5 score brackets with ~250 hand-picked images
|
80 |
-
- v1.2 - Manually scored ~2500 danbooru images for the main training set
|
81 |
-
- v1.3-v1.7 - Repeatedly ran the model against various datasets, adding the false negatives/positives to the training set to try and correct for various edgecases
|
82 |
-
- v1.8 - Added 3D and 2.5D images to the negative brackets to filter these as well
|
83 |
-
|
84 |
-
### Architecture
|
85 |
-
|
86 |
-
The model itself is fairly simple. It takes embeddings from a CLIP model (in this case, `openai/clip-vit-large-patch14`) and expands them to 1024 dimensions. From there, a single block with residuals is followed by a few linear layers which converge down to the final output - a single float between 0.0 and 1.0.
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
Experimental opinionated aesthetic score models. [See GitHub for more info](https://github.com/city96/CityClassifiers)
|
|
|
|
|
5 |
|
6 |
## CityAesthetics - Anime
|
7 |
|
8 |
[Training/inference code](https://github.com/city96/CityClassifiers) | [Live Demo](https://huggingface.co/spaces/city96/CityAesthetics-demo)
|
9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
#### Intentional biases
|
11 |
|
12 |
- Completely negative towards real life photos (ideal score of 0%)
|
|
|
21 |
- Noticeable positive bias towards anime characters with animal ears
|
22 |
- Hit-or-miss with AI generated images due to style/quality not being correlated
|
23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|