training: 

dataset = images-damian2
last 2 text encoder layers unfrozen
30 epochs at 2e-6, then 20 epochs at 1e-6
cond_dropout= default