cloneofsimo
commited on
Commit
•
24ce408
1
Parent(s):
c4ff943
Update README.md
Browse files
README.md
CHANGED
@@ -2,19 +2,19 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
-
#
|
6 |
|
7 |
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6311151c64939fabc00c8436/6DQGRWvQvDXp2xQlvwvwU.mp4"></video>
|
8 |
|
9 |
-
AuraEquiVAE is novel autoencoder that
|
10 |
-
|
11 |
|
12 |
-
To understand the equivariance, we
|
13 |
-
|
14 |
|
15 |
-
In our case
|
16 |
|
17 |
-
The model has been trained
|
18 |
|
19 |
## How to use
|
20 |
|
@@ -72,7 +72,7 @@ decimg = Image.fromarray(decimg) # PIL image.
|
|
72 |
|
73 |
## Citation
|
74 |
|
75 |
-
If you find this
|
76 |
|
77 |
```
|
78 |
@misc{Training VQGAN and VAE, with detailed explanation,
|
@@ -83,6 +83,4 @@ If you find this material useful, please cite:
|
|
83 |
journal = {GitHub repository},
|
84 |
howpublished = {\url{https://github.com/cloneofsimo/vqgan-training}},
|
85 |
}
|
86 |
-
```
|
87 |
-
|
88 |
-
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
# Equivariant 16ch, f8 VAE
|
6 |
|
7 |
<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6311151c64939fabc00c8436/6DQGRWvQvDXp2xQlvwvwU.mp4"></video>
|
8 |
|
9 |
+
AuraEquiVAE is a novel autoencoder that addresses multiple problems of existing conventional VAEs. First, unlike traditional VAEs that have significantly small log-variance, this model admits large noise to the latent space.
|
10 |
+
Additionally, unlike traditional VAEs, the latent space is equivariant under `Z_2 X Z_2` group operations (Horizontal / Vertical flip).
|
11 |
|
12 |
+
To understand the equivariance, we apply suitable group actions to both the latent space globally and locally. The latent is represented as `Z = (z_1, ..., z_n)`, and we perform a global permutation group action `g_global` on the tuples such that `g_global` is isomorphic to the `Z_2 x Z_2` group.
|
13 |
+
We also apply a local action `g_local` to individual `z_i` elements such that `g_local` is also isomorphic to the `Z_2 x Z_2` group.
|
14 |
|
15 |
+
In our specific case, `g_global` corresponds to flips, while `g_local` corresponds to sign flips on specific latent dimensions. Changing 2 channels for sign flips for both horizontal and vertical directions was chosen empirically.
|
16 |
|
17 |
+
The model has been trained using the approach described in [Mastering VAE Training](https://github.com/cloneofsimo/vqgan-training), where detailed explanations for the training process can be found.
|
18 |
|
19 |
## How to use
|
20 |
|
|
|
72 |
|
73 |
## Citation
|
74 |
|
75 |
+
If you find this model useful, please cite:
|
76 |
|
77 |
```
|
78 |
@misc{Training VQGAN and VAE, with detailed explanation,
|
|
|
83 |
journal = {GitHub repository},
|
84 |
howpublished = {\url{https://github.com/cloneofsimo/vqgan-training}},
|
85 |
}
|
86 |
+
```
|
|
|
|