
Stability launches StableVideo 4D 2.0 with better output quality for real-world video data
Stability AI has launched Stable Video 4D 2.0 (SV4D 2.0), with improved output quality for real-world video. The model enables high-fidelity 4D asset creation from a single, object-centric video, without multi-camera setups or reference views. SV4D 2.0 achieves state-of-the-art results in 4D generation and novel-view synthesis, outperforming models like DreamGaussian4D, L4GM, SV3D, and Diffusion², and ranks highest on benchmarks including LPIPS, FVD-V, FVD-F, and FV4D.
The release features a redesigned network employing 3D attention to improve spatio-temporal consistency without reference imagery. Outputs are further enhanced by a two-phase training process that first develops static 3D assets and then introduces motion dynamics, resulting in sharper and more coherent results. The model leverages pre-trained video model knowledge, which improves generalization to real-world content even though training data is primarily synthetic.
While SV4D 2.0 simplifies multi-view asset generation for applications in games, film, and virtual environments, some artifacts may still appear in high-motion sequences. The model is provided under the Stability AI Community License for both commercial and non-commercial use, with model weights on Hugging Face, code on GitHub, and research available on arXiv.