Stable Video Diffusion



We are releasing Stable Video Diffusion, an image-to-video model, for research purposes:
  • SVD: This model was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size. We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware .
  • SVD-XT: Same architecture as  but finetuned for 25 frame generation.
  • We provide a streamlit demo  and a standalone python script  for inference of both models.