Stable Video Diffusion
We are releasing Stable Video Diffusion, an image-to-video model, for research purposes:
- SVD: This model was trained to generate 14 frames at resolution 576x1024 given a context frame of the same size. We use the standard image encoder from SD 2.1, but replace the decoder with a temporally-aware .
- SVD-XT: Same architecture as but finetuned for 25 frame generation.
- We provide a streamlit demo and a standalone python script for inference of both models.
- Alongside the model, we release a technical report.