TheDrummer
commited on
Commit
•
a886873
1
Parent(s):
f6eabc1
Update README.md
Browse files
README.md
CHANGED
@@ -91,7 +91,7 @@ WIP
|
|
91 |
- The duplicated layers on all layer types (except one) are extra sensitive. `post_attention_layernorm` interestingly had some changes in the upscale's duplicated layers, unlike Cydonia where latter layers were completely unchanged.
|
92 |
- The duplicated layers in `o_proj` are less sensitive for some reason.
|
93 |
|
94 |
-
# Further
|
95 |
Given how the duplicated layers seem to have a stabilizing effect, it begs the question: What if we duplicate only ONE layer? What about five layers?
|
96 |
- Will fewer empty layers dampen the stabilizing effect?
|
97 |
- Will the few empty layers get 'filled' quickly? Will the 600MB dataset be enough?
|
|
|
91 |
- The duplicated layers on all layer types (except one) are extra sensitive. `post_attention_layernorm` interestingly had some changes in the upscale's duplicated layers, unlike Cydonia where latter layers were completely unchanged.
|
92 |
- The duplicated layers in `o_proj` are less sensitive for some reason.
|
93 |
|
94 |
+
# Further Experimentation
|
95 |
Given how the duplicated layers seem to have a stabilizing effect, it begs the question: What if we duplicate only ONE layer? What about five layers?
|
96 |
- Will fewer empty layers dampen the stabilizing effect?
|
97 |
- Will the few empty layers get 'filled' quickly? Will the 600MB dataset be enough?
|