Spaces:
Running
on
CPU Upgrade
Trying to recreate styles in local stable diffusion xl instance.
So, I've been trying to extract the prompt for my local stable diffusion instance, cause I really like some of these styles.
I've reverse engineered them based on looking through the source code of both this space and VideoChain API. From what I gather it's supposed to be done with a base sdxl model.
For example, here's the modern american comic style:
Positive: beautiful, intricate details, modern american comic about {prompt}, digital color comicbook style, award winning, high resolution
Negative: watermark, copyright, blurry, low quality, ugly
Yet, when I try it on stable diffusion XL, my results are markedly worse, less in line with the style. The first image shows my local generation made with ClipDrop, the second is one I've made with Comic Factory.
Other styles are similarly worse.
Am I missing something? Is this using a custom model somehow?
Hello,
well they both looks pretty nice to be honest!
Now, regarding the differences there could be multiple causes:
my implementation crops the prompt (see my post about how the prompt is constructed), but still it is possible that sometimes some words that are after the user prompt are cropped, leading to different results
my SDXL code always add the following keywords:
positive = "beautiful", "intricate details" + prompt + "award winning", "high resolution"
negative = "watermark", "copyright", "blurry", "low quality", "ugly"
it's something I did a long time ago back in July, and I forgot to remove it
(ideally I would prefer to put those keywords in the client/frontend app - I've added a note to remind myself of refactoring that)for the minor differences around edges and lines, maybe it is cuased by different settings in the SDXL parameters
Here's what I use:
const rawResponse = (await api.predict("/run", [
positive, // string in 'Prompt' Textbox component
negative, // string in 'Negative prompt' Textbox component
positive, // string in 'Prompt 2' Textbox component
negative, // string in 'Negative prompt 2' Textbox component
true, // boolean in 'Use negative prompt' Checkbox component
false, // boolean in 'Use prompt 2' Checkbox component
false, // boolean in 'Use negative prompt 2' Checkbox component
seed, // number (numeric value between 0 and 2147483647) in 'Seed' Slider component
width, // number (numeric value between 256 and 1024) in 'Width' Slider component
height, // number (numeric value between 256 and 1024) in 'Height' Slider component
8, // number (numeric value between 1 and 20) in 'Guidance scale for base' Slider component
8, // number (numeric value between 1 and 20) in 'Guidance scale for refiner' Slider component
nbSteps, // number (numeric value between 10 and 100) in 'Number of inference steps for base' Slider component
nbSteps, // number (numeric value between 10 and 100) in 'Number of inference steps for refiner' Slider component
true, // boolean in 'Apply refiner' Checkbox component,
secretToken
])) as any
For reference the SDXL server I use is: https://huggingface.co/spaces/hysts/SD-XL
I still can't recreate the exact styles, but I guess it's close enough. High res upscale seems to give a much cleaner image, I also recommend increasing the step count significantly. Though I still wonder what leads to the palpable difference.