Is it possible to share training skills or training parameters?

by nipi - opened Dec 27, 2022

nipi

Dec 27, 2022

•

edited Dec 27, 2022

Hello @Midu ，Thank you for your sharing. Is the finetune Chinese-style script based on the "diffusers" script? Is it possible to share training skills or training parameters?

tenpha

Dec 27, 2022

Hello @Midu ，Thank you for your sharing. Is the finetune Chinese-style script based on the "diffusers" script? Is it possible to share training skills or training parameters?

Midu

Owner Dec 28, 2022

Hello @Midu ，Thank you for your sharing. Is the finetune Chinese-style script based on the "diffusers" script? Is it possible to share training skills or training parameters?

Hi @nipi ,
Yes, my script is based on diffusers for a good Deepspeed support. I did not tune the hyperparameters a lot but only use a low learning rate (6e-5 for 256 batch size).

nipi

Dec 29, 2022

Hello @Midu ，Thank you for your sharing. Is the finetune Chinese-style script based on the "diffusers" script? Is it possible to share training skills or training parameters?

Hi @nipi ,
Yes, my script is based on diffusers for a good Deepspeed support. I did not tune the hyperparameters a lot but only use a low learning rate (6e-5 for 256 batch size).

Hello @Midu ,
Thank you for responding. The scheduler in the model's scheduler config.json uses EulerDiscreteScheduler, and the prediction type is v _prediction, but the diffuser's EulerDiscreteScheduler does not implement the get_velocity method. Do you make use of DDPMScheduler finetune? Furthermore, do you freeze the gradients of text encoder and vae during the training process, only finetuning the unet model?

nipi

Dec 29, 2022

Hello @Midu ，Thank you for your sharing. Is the finetune Chinese-style script based on the "diffusers" script? Is it possible to share training skills or training parameters?

Hi @nipi ,
Yes, my script is based on diffusers for a good Deepspeed support. I did not tune the hyperparameters a lot but only use a low learning rate (6e-5 for 256 batch size).

Hello @Midu ,
Thank you for responding. The scheduler in the model's scheduler config.json uses EulerDiscreteScheduler, and the prediction type is v _prediction, but the diffuser's EulerDiscreteScheduler does not implement the get_velocity method. Do you make use of DDPMScheduler finetune? Furthermore, do you freeze the gradients of text encoder and vae during the training process, only finetuning the unet model?

Hi @Midu ,
I can normally execute scripts with a batch size of 256 using the zero stage 3&pytorch lightning package, but I'm not sure how pytorch lightning uses the ema strategy under multiple gpus, and there's a dimension issue when directly modifying the diffusers text_to_image script. Could you please share the finetune code? If that is not possible, please send me a private message (my email address is [email protected]).

Midu

Owner Jan 3, 2023

Hello @Midu ，Thank you for your sharing. Is the finetune Chinese-style script based on the "diffusers" script? Is it possible to share training skills or training parameters?

Hi @nipi ,
Yes, my script is based on diffusers for a good Deepspeed support. I did not tune the hyperparameters a lot but only use a low learning rate (6e-5 for 256 batch size).

Hello @Midu ,
Thank you for responding. The scheduler in the model's scheduler config.json uses EulerDiscreteScheduler, and the prediction type is v _prediction, but the diffuser's EulerDiscreteScheduler does not implement the get_velocity method. Do you make use of DDPMScheduler finetune? Furthermore, do you freeze the gradients of text encoder and vae during the training process, only finetuning the unet model?

When finetuning, use DDPMScheduler to schedule noise and use get_velocity function, and use EulerDiscreteScheduler for sampling.

nipi changed discussion status to closed Jan 28, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment