Hotshot-XL is an AI text-to-GIF model trained to work alongside [Stable Diffusion XL](https://stability.ai/stable-diffusion). Hotshot-XL can generate GIFs with any fine-tuned SDXL model. This means two things: 1. Youโll be able to make GIFs with any existing or newly fine-tuned SDXL model you may want to use. 2. If you'd like to make GIFs of personalized subjects, you can load your own SDXL based LORAs, and not have to worry about fine-tuning Hotshot-XL. This is awesome because itโs usually much easier to find suitable images for training data than it is to find videos. It also hopefully fits into everyone's existing LORA usage/workflows :) See more [here](#text-to-gif-with-personalized-loras). Hotshot-XL is compatible with SDXL ControlNet to make GIFs in the composition/layout youโd like. See the [ControlNet](#text-to-gif-with-controlnet) section below. Hotshot-XL was trained to generate 1 second GIFs at 8 FPS. Hotshot-XL was trained on various aspect ratios. For best results with the base Hotshot-XL model, we recommend using it with an SDXL model that has been fine-tuned with 512x512 images. You can find an SDXL model we fine-tuned for 512x512 resolutions [here](https://huggingface.co/hotshotco/SDXL-512). # ๐ Try It Try Hotshot-XL yourself here: https://www.hotshot.co Or, if you'd like to run Hotshot-XL yourself locally, continue on to the sections below. If youโre running Hotshot-XL yourself, you are going to be able to have a lot more flexibility/control with the model. As a very simple example, youโll be able to change the sampler. Weโve seen best results with Euler-A so far, but you may find interesting results with some other ones. # ๐ง Setup ### Environment Setup ``` pip install virtualenv --upgrade virtualenv -p $(which python3) venv source venv/bin/activate pip install -r requirements.txt ``` ### Download the Hotshot-XL Weights ``` # Make sure you have git-lfs installed (https://git-lfs.com) git lfs install git clone https://huggingface.co/hotshotco/Hotshot-XL ``` or visit [https://huggingface.co/hotshotco/Hotshot-XL](https://huggingface.co/hotshotco/Hotshot-XL) ### Download our fine-tuned SDXL model (or BYOSDXL) - *Note*: To maximize data and training efficiency, Hotshot-XL was trained at various aspect ratios around 512x512 resolution. For best results with the base Hotshot-XL model, we recommend using it with an SDXL model that has been fine-tuned with images around the 512x512 resolution. You can download an SDXL model we trained with images at 512x512 resolution below, or bring your own SDXL base model. ``` # Make sure you have git-lfs installed (https://git-lfs.com) git lfs install git clone https://huggingface.co/hotshotco/SDXL-512 ``` or visit [https://huggingface.co/hotshotco/SDXL-512](https://huggingface.co/hotshotco/SDXL-512) # ๐ฎ Inference ### Text-to-GIF ``` python inference.py \ --prompt="a bulldog in the captains chair of a spaceship, hd, high quality" \ --output="output.gif" ``` *What to Expect:* | **Prompt** | Sasquatch scuba diving | a camel smoking a cigarette | Ronald McDonald sitting at a vanity mirror putting on lipstick | drake licking his lips and staring through a window at a cupcake | |-----------|----------|----------|----------|----------| | **Output** | | | | | ### Text-to-GIF with personalized LORAs ``` python inference.py \ --prompt="a bulldog in the captains chair of a spaceship, hd, high quality" \ --output="output.gif" \ --spatial_unet_base="path/to/stabilityai/stable-diffusion-xl-base-1.0/unet" \ --lora="path/to/lora" ``` *What to Expect:* *Note*: The outputs below use the DDIMScheduler. | **Prompt** | sks person screaming at a capri sun | sks person kissing kermit the frog | sks person wearing a tuxedo holding up a glass of champagne, fireworks in background, hd, high quality, 4K | |-----------|----------|----------|----------| | **Output** | | | | ### Text-to-GIF with ControlNet ``` python inference.py \ --prompt="a girl jumping up and down and pumping her fist, hd, high quality" \ --output="output.gif" \ --control_type="depth" \ --gif="https://media1.giphy.com/media/v1.Y2lkPTc5MGI3NjExbXNneXJicG1mOHJ2dzQ2Y2JteDY1ZWlrdjNjMjl3ZWxyeWFxY2EzdyZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/YOTAoXBgMCmFeQQzuZ/giphy.gif" ``` By default, Hotshot-XL will create key frames from your source gif using 8 equally spaced frames and crop the keyframes to the default aspect ratio. For finer grained control, learn how to [vary aspect ratios](#varying-aspect-ratios) and [vary frame rates/lengths](#varying-frame-rates--lengths-experimental). Hotshot-XL currently supports the use of one ControlNet model at a time; supporting Multi-ControlNet would be [exciting](#-further-work). *What to Expect:* | **Prompt** | pixar style girl putting two thumbs up, happy, high quality, 8k, 3d, animated disney render | keanu reaves holding a sign that says "HELP", hd, high quality | a woman laughing, hd, high quality | barack obama making a rainbow with their hands, the word "MAGIC" in front of them, wearing a blue and white striped hoodie, hd, high quality | |-----------|----------|----------|----------|----------| | **Output** | | | | | | **Control** | | | | | ### Varying Aspect Ratios - *Note*: The base SDXL model is trained to best create images around 1024x1024 resolution. To maximize data and training efficiency, Hotshot-XL was trained at aspect ratios around 512x512 resolution. Please see [Additional Notes](#supported-aspect-ratios) for a list of aspect ratios the base Hotshot-XL model was trained with. Like SDXL, Hotshot-XL was trained at various aspect ratios with aspect ratio bucketing, and includes support for SDXL parameters like target-size and original-size. This means you can create GIFs at several different aspect ratios and resolutions, just with the base Hotshot-XL model. ``` python inference.py \ --prompt="a bulldog in the captains chair of a spaceship, hd, high quality" \ --output="output.gif" \ --width=