We have been also testing face embeddings, but even with multiple samples the quality is not anywhere close to what we expect. Even on the techniques that work, high quality (studio-level) pictures seem to be a must so another avenue that I'm curious is whether there is filter/segmentation of the input samples in an automatic way that people have looked in the past?
Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpWe have been also testing face embeddings, but even with multiple samples the quality is not anywhere close to what we expect. Even on the techniques that work, high quality (studio-level) pictures seem to be a must so another avenue that I'm curious is whether there is filter/segmentation of the input samples in an automatic way that people have looked in the past?
I think for faces it's this one: https://huggingface.co/spaces/multimodalart/Ip-Adapter-FaceID it's few shot and it works very well
gonna do a collection of merve avatars π₯
The current SOTA π techniques in terms of image generation for photorealistic generation is mixing mixed/merged
LORA, LyCORIS, and Embeddings models
with a current photorealistic merged model
such as Serenity v2.0 by
@malcolmrey
a merge of 45 models. https://civitai.com/models/110426?modelVersionId=248599 .
Serenity a underrated model merge of 18 photorealistic models is also on the hub right now.
https://huggingface.co/malcolmrey/serenity (3likes at time of posting)
Here is a article with more details on Serenity and v2.0
https://civitai.com/articles/3198/new-version-of-my-base-model-serenity-v2-for-sd-15
When you're training a personalized model(s) keep in mind that by mixing certain weights you could add or lose wrinkles/pores depending on your selection/needs.
Making someone appear younger or older add some tokens about the age but also increase the weight of the LoRA instead of the LyCORIS.
Decreasing the weights from Embeddings or increasing LoRA/LyCORIS has also found to be helpful for less realistic models.
In terms of fast personalized photorealistic image generation training merging/mixing LoRAs and Embeddings with an existing photorealistic merged model may be one of the best ways.
lets analyze some key pros/cons excluding LyCORIS from the mix.
- LyCORIS may improve/alter quality in certain ways
- LyCORIS tend to take up more space
- adding LyCORIS to the mix may take up more time/compute.
Here is a easy way to train a LoRA: https://huggingface.co/spaces/multimodalart/lora-ease
&
Here is a LoRA Merging repo:
https://github.com/mkshing/ziplora-pytorch .
Paper ZipLoRA: https://arxiv.org/abs/2311.13600
Paper HF: https://huggingface.co/papers/2311.13600
Here is a Textual Inversion embeddings creation/merging repo: https://github.com/klimaleksus/stable-diffusion-webui-embedding-merge
( resource to learn more about embeddings: https://stable-diffusion-art.com/embedding/ )
Decent/Good quality, Extremely fast & few shot easy to use method: https://huggingface.co/spaces/multimodalart/Ip-Adapter-FaceID
as also mentioned by
@merve
the method I mention should not be very time consuming and
theoretically should produce photorealistic high quality personalized images. the missing factor is automating all of this together which should be trivial. One could implement a version of this incorporating an LLM using HuggingGPT a project contributed by
@tricktreat
and many others, or some other method. https://huggingface.co/spaces/microsoft/HuggingGPT
This concept of model mixing/merging βΊ is one of the key trends in improving models, increasing quality in various domains.
this is an exhilarating opportunity to try mixing techniques from different domains including increasing the amount of experts in a MoE LLM & adding more modalities for fine tuning while analyzing the similar gains in performance across domains.
Here are some really helpful resources π
that I also use to refer in this post
shoutout again to
@malcolmrey
https://civitai.com/articles/1591/sdxl-lora-training
https://civitai.com/articles/3114/textual-inversion-embedding-training-guide
https://civitai.com/articles/7/dreambooth-lycoris-lora-guide
https://civitai.com/articles/1721/improving-results-by-using-multiple-models-of-the-same-concept-turning-it-to-11
https://civitai.com/articles/3527/bringing-it-up-to-twelve-going-deep-into-quality
These examples are for the SOTA techniques I Initially mention
and not the faster method by excluding LyCORIS.
I specifically chose these examples because you may see the artifacts from mixing older aged tokens with younger tokens in the forehead wrinkles, and the potential necessity of better embeddings
I quite like using https://github.com/s0md3v/sd-webui-roop which you can combine with SDXL and other tools compatible with SDXL, especially when dealing with multiple faces in an image. Here are some examples for a specific output style :
Holy papers! you guys I am excited to share this new project that recently dropped
'InstantID' https://huggingface.co/papers/2401.07519 Arxiv: https://arxiv.org/abs/2401.07519
Here is the project page: https://instantid.github.io/
Here is the code repo (code soon to be released) : https://github.com/InstantID/InstantID
SOTA Fast Personalized Image Generation
This method preserves both face and style magnificently. A more than worthy competitor (dare I say better) to Ip-Adapter-FaceID and roop..
I am excited to see the planned code release, and potential HF Space for this project π€
@samusenps
already posted my flow and I encourage you to try it but I would like to comment on one thing:
"We have been also testing face embeddings, but even with multiple samples the quality is not anywhere close to what we expect."
If you want to achieve the best results then mixing models is the way to go IMHO, but I've had quite good results with only the embeddings. You can check my models and filter by embeddings to see some examples: https://civitai.com/user/malcolmrey
If you like what you see, there is an article on how exactly (all parameters) I train those embeddings :)