metadata

license: bigscience-bloom-rail-1.0
language:
  - en
library_name: diffusers
tags:
  - stable-diffusion
  - text-to-image

pony-diffusion-g5 - a new generation of waifus

pony-diffusion-g5 is a latent text-to-image diffusion model that has been conditioned on medium-to-low-quality pony images through fine-tuning.

Finetuned for MLP G5 main characters, based on AstraliteHeart/pony-diffusion

Dataset criteria

All training images from Derpibooru using the search criteria below

General: "g5, safe, solo, score.gte:250, -webm, -animate || g5, suggestive, solo, score.gte:250, -webm, -animate", 856 entries wo/ gifs, 5 epochs
Izzy moonbow: "izzy moonbow, safe, solo, score.gte:200, -webm, -animate || izzy moonbow, suggestive, solo, score.gte:200, -webm, -animate", 531 entries wo/ gifs, 3 epochs
Sunny starscout: "sunny starscout, safe, solo, score.gte:200, -webm, -animate || sunny starscout, suggestive, solo, score.gte:200, -webm, -animate", 252 entries wo/ gifs, 3 epochs
Pipp petals: "pipp petals, safe, solo, score.gte:200, -webm, -animate || pipp petals, suggestive, solo, score.gte:200, -webm, -animate", 218 entries wo/ gifs, 3 epochs
Zipp storm: "zipp storm, safe, solo, score.gte:200, -webm, -animate || pipp petals, suggestive, solo, score.gte:200, -webm, -animate", 167 entries wo/ gifs, 3 epochs
Hitch trailblzer: "hitch trailblazer, safe, solo, score.gte:200, -webm, -animate || hitch trailblazer, suggestive, solo, score.gte:200, -webm, -animate", 34 entries wo/ gifs (wat), 3 epochs

Why the model's quality is bad?

The amount of G5 pony images within the search criteria is little, so don't really expect the quality to be as high as the original model is

~~Also bcs im new to ai stuff i don't know how to train datasets correctly if u could help me great thx~~

Example code

from diffusers import StableDiffusionPipeline
import torch
from diffusers import DDIMScheduler

model_path = "./gen_model_izzy"  
prompt = "(((izzy moonbow))), pony, looking at you, smiling, sitting on beach, cute, portrait, intricate, digital painting, smooth, sharp, focus, depth of field"
negative= "3d sfm"
# torch.manual_seed(1145141919810)

pipe = StableDiffusionPipeline.from_pretrained(
        model_path, 
        torch_dtype=torch.float16,
        scheduler=DDIMScheduler(
            beta_start=0.00085,
            beta_end=0.012,
            beta_schedule="scaled_linear",
            clip_sample=False,
            set_alpha_to_one=True,
        ),
#        safety_checker=None
    )

pipe = pipe.to("cuda")
images = pipe(prompt, width=512, height=512, num_inference_steps=50, num_images_per_prompt=5, negative_prompt=negative).images
for i, image in enumerate(images):
    image.save(f"test-{i}.png")

Thanks

AstraliteHeart/pony-diffusion, for providing a solid start-point to train on

This project would not have been possible without the incredible work by the CompVis Researchers.

With special thanks to Waifu-Diffusion for providing finetuning expertise and Novel AI for providing necessary compute.