playgroundai
/

playground-v2-512px-base

@@ -8,7 +8,7 @@ tags:
 ---
 # Playground v2 – 512px Base Model
-This repository contains a base (pretrain) model that generates images of resolution 512x512.
 **This model is primarily for research purposes. It does not tend to produce highly aesthetic images.**
@@ -29,9 +29,9 @@ You can use the model with Hugging Face 🧨 Diffusers.
 **Playground v2** is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at [Playground](https://playground.com).
-Playground v2’s images are favored 2.5 times more than those produced by Stable Diffusion XL, according to Playground’s [user study](#user-study).
-We are thrilled to release all intermediate checkpoints at different training stages, including evaluation metrics, to the community. We hope this will foster more foundation model research in pixels.
 Lastly, we introduce a new benchmark, [MJHQ-30K](#mjhq-30k-benchmark), for automatic evaluation of a model’s aesthetic quality.
@@ -56,7 +56,7 @@ from diffusers import DiffusionPipeline
 import torch
 pipe = DiffusionPipeline.from_pretrained(
-    "playgroundai/playground-v2-512px-base",
     torch_dtype=torch.float16,
     use_safetensors=True,
     add_watermarker=False,
@@ -72,7 +72,7 @@ image  = pipe(prompt=prompt).images[0]
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63855d851769b7c4b10e1f76/8VzBkSYaUU3dt509Co9sk.png)
-According to user studies conducted by Playground, involving over 2,600 prompts and thousands of users, the images generated by Playground v2 are favored 2.5 times more than those produced by [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0).
 We report user preference metrics on [PartiPrompts](https://github.com/google-research/parti), following standard practice, and on an internal prompt dataset curated by the Playground team. The “Internal 1K” prompt dataset is diverse and covers various categories and tasks.
@@ -91,11 +91,11 @@ We introduce a new benchmark, [MJHQ-30K](https://huggingface.co/datasets/playgro
 We curate the high-quality dataset from Midjourney with 10 common categories, each category with 3K samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category.
-For Playground v2, we report both the overall FID and per-category FID. (All FID metrics are computed at resolution 1024x1024.)
 We release this benchmark to the public and encourage the community to adopt it for benchmarking their models’ aesthetic quality.
-### Base Models for all resolution
 | Model                        | FID    | Clip Score |
 | ---------------------------- | ------ | ---------- |

 ---
 # Playground v2 – 512px Base Model
+This repository contains a base (pre-train) model that generates images of resolution 512x512.
 **This model is primarily for research purposes. It does not tend to produce highly aesthetic images.**
 **Playground v2** is a diffusion-based text-to-image generative model. The model was trained from scratch by the research team at [Playground](https://playground.com).
+Playground v2’s images are favored **2.5** times more than those produced by Stable Diffusion XL, according to Playground’s [user study](#user-study).
+We are thrilled to release [intermediate checkpoints](#intermediate-base-models) at different training stages, including evaluation metrics, to the community. We hope this will foster more foundation model research in pixels.
 Lastly, we introduce a new benchmark, [MJHQ-30K](#mjhq-30k-benchmark), for automatic evaluation of a model’s aesthetic quality.
 import torch
 pipe = DiffusionPipeline.from_pretrained(
+    "playgroundai/playground-v2-1024px-aesthetic",
     torch_dtype=torch.float16,
     use_safetensors=True,
     add_watermarker=False,
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63855d851769b7c4b10e1f76/8VzBkSYaUU3dt509Co9sk.png)
+According to user studies conducted by Playground, involving over 2,600 prompts and thousands of users, the images generated by Playground v2 are favored **2.5** times more than those produced by [Stable Diffusion XL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0).
 We report user preference metrics on [PartiPrompts](https://github.com/google-research/parti), following standard practice, and on an internal prompt dataset curated by the Playground team. The “Internal 1K” prompt dataset is diverse and covers various categories and tasks.
 We curate the high-quality dataset from Midjourney with 10 common categories, each category with 3K samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category.
+For Playground v2, we report both the overall FID and per-category FID. All FID metrics are computed at resolution 1024x1024. Our benchmark results show that our model outperforms SDXL-1-0-refiner in overall FID and all category FIDs, especially in people and fashion categories. This is in line with the results of the user study, which indicates a correlation between human preference and FID score on the MJHQ30K benchmark.
 We release this benchmark to the public and encourage the community to adopt it for benchmarking their models’ aesthetic quality.
+### Intermediate Base Models
 | Model                        | FID    | Clip Score |
 | ---------------------------- | ------ | ---------- |