Update README.md
Browse files
README.md
CHANGED
@@ -53,6 +53,56 @@ For example, if you want better background it can be simpler to switch the model
|
|
53 |
![00044-20230326044731](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00044-20230326044731.png)
|
54 |
|
55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
#### Text encoder learning rate
|
57 |
|
58 |
It is often suggested to set the text encoder learning rate to be smaller than that of unet.
|
@@ -73,7 +123,31 @@ This aspect is difficult to test, but it seems to be confirmed by this "umbrella
|
|
73 |
|
74 |
![00083-20230327015201](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00083-20230327015201.png)
|
75 |
|
76 |
-
In any case, I still believe if we want to get the best result we should avoid compeletely text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
|
79 |
#### Others
|
|
|
53 |
![00044-20230326044731](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00044-20230326044731.png)
|
54 |
|
55 |
|
56 |
+
#### Network dimension and alpha
|
57 |
+
|
58 |
+
This is one of the most debated topic in LoRa training.
|
59 |
+
Both the original paper and the initial implementation of LoRa for SD suggest using quite small ranks.
|
60 |
+
However, the 128 dim/alpha became the unfortunate default in many implementations for some times, which resulted in files with more than 100mb.
|
61 |
+
Every since LoCon got introduced, we advocate again the use of smaller dimension and default the value of alpha to 1.
|
62 |
+
|
63 |
+
As for LoHa, I have been insisting that the values that I am using here (net dim 8, conv dim 4, alpha 1) should be more than enough in most cases.
|
64 |
+
These values do not come from no where. In fact, after some analysis, it turns out almost every model fine-tuned from SD has the information of the weight difference matrix concentrated in fewer than 64 ranks (this applies even to WD 1.5).
|
65 |
+
Therefore, 64 should enough-- if we can get to the good point.
|
66 |
+
Nonetheless, optimization is quite tricky. Changing dimension does not only increase expressive power but also modify the optimization landscape. It is also exactly for the latter that alpha gets introduced.
|
67 |
+
It turns out it might be easier to get better results with larger dimension, which explains the success of compression after training.
|
68 |
+
Actually, for my 60K umamusume dataset I have LoCon extracted from fine-tuned model but I failed to directly train a LoCon on it.
|
69 |
+
|
70 |
+
To clarify all these, I test the following three setups for LoHa with net dim 32 and conv dim 16
|
71 |
+
- lr 2e-4, alpha 1
|
72 |
+
- lr 5e-4, alpha 1
|
73 |
+
- lr 2e-4, net alpha 16, conv alpha 8
|
74 |
+
|
75 |
+
I made the following observations
|
76 |
+
- I get good results with latter two configurations, which confirms that increasing alpha and learning rate have similar effects. More precisely, I have better backgrounds and better separation between fanart and screenshot styles (only for Mihari and Euphyllia though) compared to dimension 8/4 LoHas.
|
77 |
+
- Both of them however have their own strength and own weakness. The 5e-4 one works better for `Euphyllia; fanart`
|
78 |
+
![00081-20230327011335](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00081-20230327011335.png)
|
79 |
+
- Among all the modules I trained, only the dim 32/16 half alpha one can almost consistently output the correct outfit for Mihari
|
80 |
+
![00084-20230327021752](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00084-20230327021752.png)
|
81 |
+
- They seem to give better results for style training in general.
|
82 |
+
![00091-20230327040330](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00091-20230327040330.png)
|
83 |
+
![00094-20230327052628](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00094-20230327052628.png)
|
84 |
+
![00095-20230327055221](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00095-20230327055221.png)
|
85 |
+
|
86 |
+
One interesting observation is that in the first image we observe better background for small LoHa trained at higher resolution and larger LoHa trained only trained at resolution 512. This again suggests we may be able to get good results with small dimension if they are trained properly. It is however unclear how to achive that. Simply increasing the learning rate to 5e-4 does not seem to be sufficient in thise case (as can be seen from the above images).
|
87 |
+
|
88 |
+
Finally, these results do not mean that you would always want to use larger dimension, as probably you do not really need all these details that the additional dimension brings you.
|
89 |
+
|
90 |
+
|
91 |
+
#### Optimizer, learning rate scheduler, and learning rate
|
92 |
+
|
93 |
+
This is probably the most important things to tune after you get a good dataset, but I don't have many things to say here.
|
94 |
+
You should just find the one that works.
|
95 |
+
Some people suggest the lr finder strategy https://followfoxai.substack.com/p/find-optimal-learning-rates-for-stable
|
96 |
+
|
97 |
+
I tested several things, and here is what I can say
|
98 |
+
- Setting the learning rate larger of course makes training faster as long as it does not fry things up. Here switching the learning rate from lr 2e-4 to 5e-4 increases the likeliness. Would it however be better to train longer with smaller learning rate? This still needs more test. (I will zoom in on the case where we only change the text encoder learning rate below.)
|
99 |
+
- Cosine schduler learns slower than constant scheduler for a fixed learning rate.
|
100 |
+
- It seems that Dadaptation trains faster at styles but slower at characters. Why?
|
101 |
+
Since the outputs of Dadaptation seems to change more over time, I guess it may just have picked a larger learning rate.
|
102 |
+
![00074-20230326204643](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00074-20230326204643.png)
|
103 |
+
![00097-20230327063406](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00097-20230327063406.png)
|
104 |
+
|
105 |
+
|
106 |
#### Text encoder learning rate
|
107 |
|
108 |
It is often suggested to set the text encoder learning rate to be smaller than that of unet.
|
|
|
123 |
|
124 |
![00083-20230327015201](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00083-20230327015201.png)
|
125 |
|
126 |
+
In any case, I still believe if we want to get the best result we should avoid compeletely text encoder training and do [pivotal tuning](https://github.com/cloneofsimo/lora/discussions/121) instead.
|
127 |
+
|
128 |
+
|
129 |
+
#### LoRa, LoCon, LoHa
|
130 |
+
|
131 |
+
It may seem weird to mention this so late, but honestly I do not find them to be that different here.
|
132 |
+
The common belief is that LoHa trains more style than LoCon, which in turn trains more style than LoRa.
|
133 |
+
This seems to be mostly true, but the difference is quite subtle. Moreover, I would rather use the word "texture" instead of style.
|
134 |
+
I especially test whether any of them would be more favorable when transferred to different base model. No conclusion here.
|
135 |
+
|
136 |
+
- LoHA
|
137 |
+
![00067-20230326093940](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00067-20230326093940.png)
|
138 |
+
- LoCon
|
139 |
+
![00068-20230326095613](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00068-20230326095613.png)
|
140 |
+
- LoRa
|
141 |
+
![00069-20230326102713](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00069-20230326102713.png)
|
142 |
+
- Without additional network
|
143 |
+
![00070-20230326103743](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00070-20230326103743.png)
|
144 |
+
|
145 |
+
Some remarks
|
146 |
+
- In the above images, LoHa has dim 8/4, LoCon has dim 16/8, and LoRa has dim 8. LoHa and LoCon thus have roughly the same size (25mb) while LoRa is smaller (11mb). LoRa with smaller dimension seems to train faster here.
|
147 |
+
- Some comparaison between LoHa and LoCon do suggest that LoHa indeed trains faster at texture while LoCon faster at higher level traits. The difference is however very small so it is not really conclusive.
|
148 |
+
![00034-20230325234457](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00034-20230325234457.png)
|
149 |
+
![00035-20230325235521](https://huggingface.co/alea31415/LyCORIS-experiments/resolve/main/generated_samples/00035-20230325235521.png)
|
150 |
+
- In an [early experiment](https://civitai.com/models/17336/roukin8-character-lohaloconfullckpt-8) I saw that LoHa and LoCon training lead to quite different result. One possible explanation is that I am training on NAI now while I was training on [BP](https://huggingface.co/Crosstyan/BPModel) there.
|
151 |
|
152 |
|
153 |
#### Others
|