manga-colorizer / model.md
zaidmehdi's picture
Upload model.md with huggingface_hub
c2ccc23 verified
|
raw
history blame
3.79 kB
# Colorizer Model
**Table of Contents**
1. Objectives
2. Iterations
- Release v0.0.1
3. Conclusion
## 1. Objectives
Given a grayscale image of a manga, comic, or drawing in general, the goal is to output a colorized version of it.
The dataset I gathered for this model contains 1079 images extracted from:
- Bleach (Volume 1)
- Dragon Ball Super (Volume 21)
- Naruto (Volume 1)
- One Piece (Volume 99)
- Attack on Titan (Volume 1 and 2)
Of those 1079 images, 755 were used for the training (~ 70%), 215 for the validation (~ 20%), and 109 for the test set (~ 10%).
## 2. Iterations
### Release v0.0.1
For the first release, I trained an encoder-decoder model from scratch with the following architecture:
```
MangaColorizer(
(encoder): Sequential(
(0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(5): ReLU(inplace=True)
)
(decoder): Sequential(
(0): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(1): ReLU(inplace=True)
(2): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(3): ReLU(inplace=True)
(4): ConvTranspose2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): Tanh()
)
)
```
The model was trained for 100 epochs with the following optimizer, using the MSE as the loss function:
`optimizer = optim.Adam(model.parameters(), lr=0.0001)`
![training history](images/training_history.png)
From the training history curve, we see that we can probably benefit from increasing the learning rate or training the model for more epochs, as it seems that the loss is still decreasing at the last epochs.
However, we also see that the rate of decrease is not very high, which could indicate that the model doesn't have enough capacity to learn more signal from the data.
Let's have a look at some examples from the test set:
<div style="display:flex;">
<figure style="width:33.33%; text-align:center;">
<img src="images/examples/01_input.png" style="width:100%;">
<figcaption>Input Image</figcaption>
</figure>
<figure style="width:33.33%; text-align:center;">
<img src="images/examples/01_output.png" style="width:100%;">
<figcaption>Output Image</figcaption>
</figure>
<figure style="width:33.33%; text-align:center;">
<img src="images/examples/01_target.png" style="width:100%;">
<figcaption>Target Image</figcaption>
</figure>
</div>
<div style="display:flex;">
<figure style="width:33.33%; text-align:center;">
<img src="images/examples/02_input.png" style="width:100%;">
<figcaption>Input Image</figcaption>
</figure>
<figure style="width:33.33%; text-align:center;">
<img src="images/examples/02_output.png" style="width:100%;">
<figcaption>Output Image</figcaption>
</figure>
<figure style="width:33.33%; text-align:center;">
<img src="images/examples/02_target.png" style="width:100%;">
<figcaption>Target Image</figcaption>
</figure>
</div>
It seems that the model learned to output the same drawing as in the input image, and to add a little bit of color to the image, but it is still nowhere close to a satisfactory result.
**Performance on the test set**
- MSE: 0.008598181701702099
## 3. Conclusion
To conclude, training a model from scratch is probably not going to cut it for this task, especially given the lack of GPU resources (I use Kaggle's free GPU). The next step is to use some pre-trained models from hugging face and finetune with the manga data.