Questions

#2
by LOL2024 - opened

Hello, I just have some questions about this model, could you tell me how do you train the model, fine-tune or train from scratch? how many pictures get used for train this model? why architecture is SD1.5 instead of SDXL or any other architectures(such as Flux.1 Schnell or PixArt-Sigma)? And will you add CC0 images from Wikimedia Commons to the training dataset in the future?

excellent question, all of them will be answered in a paper soon, you can look at the microcosmos datatse that was what was used, I'm still working on a final version, The model was trained from scratch, not fine-tuned, and we used the Microcosmos dataset for this (you can check it out here: Microcosmos on HuggingFace). Right now, it's not the final version, but it's a good start. We used about 15k images, which is a bit on the lower side (the ideal number is closer to 50k), so it’s going to be overfitted, but still usable. I want to reach the mark of half a million quality + capitions cc0 images,

I went with Stable Diffusion 1.5 because it’s powerful but also less computationally demanding than something like SDXL or Flux.1 Schnell I have jus a 3090 and two 3060. I actually think good V-prediction models (which could be our next iteration) are on par with Flux and SDXL, so there's room to experiment with that in the future. As for Wikimedia Commons, it’s a bit tricky because while a lot of it is marked CC0, sometimes it’s not actually free for use, so if you're aiming for 100% correct usage, you have to be super careful with those images.

one of the biggest challenges now is producing high quality captures I'm using/ GIT Generative Image-to-text Transformer and gemini, and human reviwning all with a small number of images each error is much more significant and it saddens me that many of the cool things I can't put it in the data because they all have copyrgih, I hope to see how legislators will see this sitution cause i consider add some ai generted content to the dataset tha would help the missing concepts,

Thanks for answer, this model is very great, I hope I could see the final version soon, however, I have another question: Will it feature a rating system similar to Pony Diffusion, using prompts such as 'score_9', 'score_7_up', 'source_cartoon', and 'rating_safe' to control the quality and the contents of generated images? Or similar to common SD1.5 and SDXL models, using prompt such as 'masterpiece", 'best quality', 'medium quality' and 'worst quality' to control the quality of generated images?

In some subsest like the one from the open game art I did some score rating, because some of the art where good other where boring, i might ad more nuances of that in newer classified images, most of the CC0 images are boring...

Well, could you tell me how do I contribute CC0 images or own CC0 works(if exist) to the training dataset?
Just checking does it(they) really licensed CC0, upload it(them) and make pull requests for them?

I think it would be like that. I usually group elements by what is on top and what the images contain, and then by the source of order comes the image site and as it is licensed on that site, I need to add good or interesting illustrations

Emm, Does CC0 3d resources on opengameart.org get already used in this model's training dataset, such as texture resources for 3d models and preview (or multiple views) of 3d models.

I havent add any 3d imagens yet, i think it might help the model to achive betthe volume or reduce some flatiness, there are some cc0 models on sketchfab i have download some and take some screenshots i , but be aware tha some of the are marked as CC0 but when reading the descriptions its is cc by or other thinkg (teoricaly it would be ok to ad in the datatset, but I give more preference to a full CC0), If u could collect some 3d screenshots i would gladly classify and and to microcosmos dataset

Preferably larger equal to 768x768

Sign up or log in to comment