zaidmehdi commited on
Commit
e5654bc
1 Parent(s): 2c8c123

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ manga-colorizer/images/demo_screenshot.png filter=lfs diff=lfs merge=lfs -text
manga-colorizer/images/demo_screenshot.png ADDED

Git LFS Details

  • SHA256: 034dd4fdde7bc6493f9fdd597550541f9d5d9e2c5fda23ebf3b57e1833be188f
  • Pointer size: 132 Bytes
  • Size of remote file: 1.7 MB
manga-colorizer/images/examples/01_input.png ADDED
manga-colorizer/images/examples/01_output.png ADDED
manga-colorizer/images/examples/01_target.png ADDED
manga-colorizer/images/examples/02_input.png ADDED
manga-colorizer/images/examples/02_output.png ADDED
manga-colorizer/images/examples/02_target.png ADDED
manga-colorizer/images/training_history.png ADDED
manga-colorizer/model.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Colorizer Model
2
+
3
+ **Table of Contents**
4
+ 1. Objectives
5
+ 2. Iterations
6
+ - Release v0.0.1
7
+ 3. Conclusion
8
+
9
+ ## 1. Objectives
10
+ Given a grayscale image of a manga, comic, or drawing in general, the goal is to output a colorized version of it.
11
+ The dataset I gathered for this model contains 1079 images extracted from:
12
+ - Bleach (Volume 1)
13
+ - Dragon Ball Super (Volume 21)
14
+ - Naruto (Volume 1)
15
+ - One Piece (Volume 99)
16
+ - Attack on Titan (Volume 1 and 2)
17
+
18
+ Of those 1079 images, 755 were used for the training (~ 70%), 215 for the validation (~ 20%), and 109 for the test set (~ 10%).
19
+
20
+ ## 2. Iterations
21
+ ### Release v0.0.1
22
+ For the first release, I trained an encoder-decoder model from scratch with the following architecture:
23
+ ```
24
+ MangaColorizer(
25
+ (encoder): Sequential(
26
+ (0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
27
+ (1): ReLU(inplace=True)
28
+ (2): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
29
+ (3): ReLU(inplace=True)
30
+ (4): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
31
+ (5): ReLU(inplace=True)
32
+ )
33
+ (decoder): Sequential(
34
+ (0): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
35
+ (1): ReLU(inplace=True)
36
+ (2): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
37
+ (3): ReLU(inplace=True)
38
+ (4): ConvTranspose2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
39
+ (5): Tanh()
40
+ )
41
+ )
42
+ ```
43
+ The model was trained for 100 epochs with the following optimizer, using the MSE as the loss function:
44
+ `optimizer = optim.Adam(model.parameters(), lr=0.0001)`
45
+
46
+ ![training history](images/training_history.png)
47
+
48
+ From the training history curve, we see that we can probably benefit from increasing the learning rate or training the model for more epochs, as it seems that the loss is still decreasing at the last epochs.
49
+ However, we also see that the rate of decrease is not very high, which could indicate that the model doesn't have enough capacity to learn more signal from the data.
50
+
51
+ Let's have a look at some examples from the test set:
52
+
53
+ <div style="display:flex;">
54
+ <figure style="width:33.33%; text-align:center;">
55
+ <img src="images/examples/01_input.png" style="width:100%;">
56
+ <figcaption>Input Image</figcaption>
57
+ </figure>
58
+ <figure style="width:33.33%; text-align:center;">
59
+ <img src="images/examples/01_output.png" style="width:100%;">
60
+ <figcaption>Output Image</figcaption>
61
+ </figure>
62
+ <figure style="width:33.33%; text-align:center;">
63
+ <img src="images/examples/01_target.png" style="width:100%;">
64
+ <figcaption>Target Image</figcaption>
65
+ </figure>
66
+ </div>
67
+
68
+ <div style="display:flex;">
69
+ <figure style="width:33.33%; text-align:center;">
70
+ <img src="images/examples/02_input.png" style="width:100%;">
71
+ <figcaption>Input Image</figcaption>
72
+ </figure>
73
+ <figure style="width:33.33%; text-align:center;">
74
+ <img src="images/examples/02_output.png" style="width:100%;">
75
+ <figcaption>Output Image</figcaption>
76
+ </figure>
77
+ <figure style="width:33.33%; text-align:center;">
78
+ <img src="images/examples/02_target.png" style="width:100%;">
79
+ <figcaption>Target Image</figcaption>
80
+ </figure>
81
+ </div>
82
+
83
+ It seems that the model learned to output the same drawing as in the input image, and to add a little bit of color to the image, but it is still nowhere close to a satisfactory result.
84
+
85
+
86
+ **Performance on the test set**
87
+ - MSE: 0.008598181701702099
88
+
89
+ ## 3. Conclusion
90
+ To conclude, training a model from scratch is probably not going to cut it for this task, especially given the lack of GPU resources (I use Kaggle's free GPU). The next step is to use some pre-trained models from hugging face within the encoder decoder model.