Image-to-Video finetuning - zhuhz22/try4

Pipeline usage

You can use the pipeline like so:

from diffusers import EulerDiscreteScheduler
import torch
from diffusers.utils import load_image, export_to_video
from svd.inference.pipline_CILsvd import StableVideoDiffusionCILPipeline

# set the start time M (sigma_max) for inference
scheduler = EulerDiscreteScheduler.from_pretrained(
    "zhuhz22/try4",
    subfolder="scheduler",
    sigma_max=100
)

pipeline = StableVideoDiffusionCILPipeline.from_pretrained(
    "zhuhz22/try4", scheduler=scheduler, torch_dtype=torch.float16, variant="fp16"
) # Note that set the default parameters, fps, motion_bucket_id

pipeline.enable_model_cpu_offload()

# demo
image = load_image("demo/a car parked in a parking lot with palm trees nearby,calm seas and skies..png")
image = image.resize((512,320))

generator = torch.manual_seed(42)

# analytic_path:
# if is video path, compute the initial noise automatically.
# if is tensor path, load
# if none, standard inference
analytic_path=None

frames = pipeline(
    image, 
    height=image.height,
    width=image.width,
    num_frames=16,
    fps=3,
    motion_bucket_id=20, 
    decode_chunk_size=8, 
    generator=generator, 
    analytic_path=analytic_path
    ).frames[0]

export_to_video(frames, "generated.mp4", fps=7)

Intended uses & limitations

How to use

# TODO: add an example code snippet for running this diffusion pipeline

Limitations and bias

[TODO: provide examples of latent issues and potential remediations]

Training details

[TODO: describe the data used to train the model]