Fine-Tuning Mochi Text-to-Video: InfiniteZoom-Mochi

This project demonstrates the fine-tuning of the Mochi Text-to-Video model using a LoRA (Low-Rank Adaptation) approach, focusing on the infinite zoom art style.

Training Details

  • Model Base: genmo/mochi-1-preview
  • Fine-Tuning Dataset: 23 short video clips of infinite zoom art style, and .txt descriptions
  • Training Hardware: H100 GPU
  • Training Duration: 2h
Prompt
Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, zoom focuses on a can, all surface around it is made of liquid and objects swimming in it.
Prompt
Human fingers pinching to zoom on an infinite zoom canvas, spaceship going through space.
Prompt
Human fingers pinching to zoom on an infinite zoom canvas, orange cat in the middle of a canvas, looking upward.

lora.yaml:

init_checkpoint_path: /weights/dit.safetensors
checkpoint_dir: /finetunes/my_mochi_lora
train_data_dir: /videos_prepared
attention_mode: sdpa
single_video_mode: false # Useful for debugging whether your model can learn a single video

# You only need this if you're using wandb
wandb:
  # project: mochi_1_lora
  # name: ${checkpoint_dir}
  # group: null

optimizer:
  lr: 2e-4
  weight_decay: 0.01

model:
  type: lora
  kwargs:
    # Apply LoRA to the QKV projection and the output projection of the attention block.
    qkv_proj_lora_rank: 16
    qkv_proj_lora_alpha: 16
    qkv_proj_lora_dropout: 0.
    out_proj_lora_rank: 16
    out_proj_lora_alpha: 16
    out_proj_lora_dropout: 0.

training:
  model_dtype: bf16
  warmup_steps: 200
  num_qkv_checkpoint: 48
  num_ff_checkpoint: 48
  num_post_attn_checkpoint: 48
  num_steps: 2000
  save_interval: 200
  caption_dropout: 0.1
  grad_clip: 0.0
  save_safetensors: true

# Used for generating samples during training to monitor progress ...
sample:
   interval: 200
   output_dir: ${checkpoint_dir}/samples
   decoder_path: /weights/decoder.safetensors
   prompts:
      - Human fingers pinching to zoom on an infinite zoom canvas, a vast desert landscape stretches into the horizon. At the center, a giant hourglass sits, its glass exterior glinting in the sunlight. The zoom begins within the hourglass, revealing cascading grains of sand, each grain transitioning into a crystalline snowflake, leading to a frozen tundra as the scene deepens further.
      - Human fingers pinching to zoom on an infinite zoom canvas, a colossal tree rises from a lush forest, its bark covered with intricate carvings of stories. The zoom focuses on one carving, which transforms into a vibrant painting of a village. Zooming further, the village reveals bustling streets, where a single doorway becomes the entry to a glowing cosmos.
      - Human fingers pinching to zoom on an infinite zoom canvas, a tranquil ocean surface reflects the twilight sky. The zoom begins within a whirlpool, diving into vibrant coral reefs teeming with marine life. A single pearl on the ocean floor becomes the focus, transitioning into a marble palace with intricate golden inlays as the zoom continues seamlessly.
      - Human fingers pinching to zoom on an infinite zoom canvas, a glowing campfire crackles in a dense, dark forest. The zoom begins in the heart of the fire, revealing swirling embers that transition into galaxies of stars. The zoom then centers on a lone star, which transforms into a lantern hanging in a cozy mountain cabin, seamlessly revealing new layers.
      - Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, illuminated by neon lights and bustling with activity. The zoom focuses on a lit billboard advertising a soda can, transitioning into the sparkling surface of the liquid. As the zoom deepens, microscopic bubbles transform into entire ecosystems of floating islands within the soda.
   seed: 12345
   kwargs:
     height: 480
     width: 848
     num_frames: 37
     num_inference_steps: 64
     sigma_schedule_python_code: "linear_quadratic_schedule(64, 0.025)"
     cfg_schedule_python_code: "[6.0] * 64"
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Model tree for martintomov/InfiniteZoom-Mochi

Finetuned
(6)
this model