doohickey
/

neopian-diffusion

StableDiffusionPipeline

stable-diffusion

Inference Endpoints

Model card Files Files and versions Community

crumb commited on Nov 18, 2022

Commit

78a993e

•

1 Parent(s): 70b9cc1

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -1,3 +1,5 @@
 Stable Diffusion models, starting with [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5), trained on images extracted from gifs from https://www.neopets.com/funimages.phtml. CLIP ViT-B/32 (OpenAI) was used to filter the best matching frame of the GIF for every given caption/GIF pair. The frame with the minimum spherical distance was chosen and saved for training. In total this amounts to 1950 images around 100x100px. The DreamBooth models were finetuned at 448x448px on a Colab T4 with the term "low-resolution" concatenated onto 1/3 of prompts, to hopefully combat artifacting in the final results (see this link for a hypothesis from someone on Discord about using negative terms while training Textual Inversions https://cdn.discordapp.com/attachments/1008246088148463648/1041538692432527470/image.png).
 Example chosen frame of GIF from CLIP
@@ -5,4 +7,10 @@ Example chosen frame of GIF from CLIP
 | --- | --- | --- |
 | "yurble_baby_clap" | ![](https://images.neopets.com/template_images/yurble_baby_clap.gif) | ![](https://cdn.discordapp.com/attachments/1010693530181718146/1043310485413576794/yurble_baby_clap.jpg) |
 "Don't forget, if you use these images on a non-Neopets page, you need to include our Copyright Notice." https://www.neopets.com/terms.phtml

+# Neopian-Diffusion
 Stable Diffusion models, starting with [runwayml/stable-diffusion-v1-5](https://huggingface.co/runwayml/stable-diffusion-v1-5), trained on images extracted from gifs from https://www.neopets.com/funimages.phtml. CLIP ViT-B/32 (OpenAI) was used to filter the best matching frame of the GIF for every given caption/GIF pair. The frame with the minimum spherical distance was chosen and saved for training. In total this amounts to 1950 images around 100x100px. The DreamBooth models were finetuned at 448x448px on a Colab T4 with the term "low-resolution" concatenated onto 1/3 of prompts, to hopefully combat artifacting in the final results (see this link for a hypothesis from someone on Discord about using negative terms while training Textual Inversions https://cdn.discordapp.com/attachments/1008246088148463648/1041538692432527470/image.png).
 Example chosen frame of GIF from CLIP
 | --- | --- | --- |
 | "yurble_baby_clap" | ![](https://images.neopets.com/template_images/yurble_baby_clap.gif) | ![](https://cdn.discordapp.com/attachments/1010693530181718146/1043310485413576794/yurble_baby_clap.jpg) |
+### Training Details
+The text encoder was trained along with the UNet at half precision for 15% of the total 8,000 steps (1,200 steps), and then the UNet was trained alone for the rest. I used a polynomial learning rate decay starting at 2e-6 (the default in fast-DreamBooth).
+### Neopets Copyright Notice
 "Don't forget, if you use these images on a non-Neopets page, you need to include our Copyright Notice." https://www.neopets.com/terms.phtml