Stability AI introduces Stable Virtual Camera to turn 2D images into immersive 3D videos
Mar 19, 2025 at 8:50 AM

Stability AI introduces Stable Virtual Camera to turn 2D images into immersive 3D videos

Stability AI has unveiled Stable Virtual Camera, a new multi-view diffusion model currently available in research preview. This innovative tool transforms 2D images into immersive 3D videos, offering realistic depth and perspective without the need for complex reconstruction or scene-specific optimization. Users can generate 3D videos from a single image or up to 32 images, following user-defined camera trajectories and 14 dynamic camera paths such as 360°, Lemniscate, Spiral, Dolly Zoom, Move, Pan, and Roll.

Stable Virtual Camera builds on the concept of traditional virtual cameras used in filmmaking and 3D animation, combining this familiar control with generative AI capabilities. This allows for precise and intuitive control over 3D video outputs. Unlike traditional 3D video models that require large input image sets or complex preprocessing, this model can create novel views from one or more images at specified camera angles, ensuring smooth and consistent 3D video outputs.

Stable Virtual Camera is available for research purposes under a non-commercial license. Researchers can download the model weights on Hugging Face and access the code on GitHub.

Mar 19, 2025 by Paul

cz
city_zen found this interesting
  • ...

Stable Video Diffusion is an AI Video Generator tailored for diverse video applications across Media, Entertainment, Education, and Marketing sectors. It enables users to convert text and image inputs into dynamic, cinematic scenes. Rated 3.7, it features AI-Powered capabilities, Text to Image Generation, and an Ad-free experience. Alternatives include Sora, Hotshot, and Wan.

Comments

UserPower
Mar 19, 2025
1

Needless to say that the objects in the video are static (not like other models that were making objects behaving pretty randomly), and is results are mediocre for complex objects, humans, animals. Video gen AI is a very complex subject (and very costly) that requires many different domains expertise so theses models are trained for limited propose to not offer too many uncanny valleys. It was the same with CGI during 2000s, it evolved slowly until we've got great results (after years and many $B invested in the research and computing). For now, theses many tools can greatly reduce cost (for clients, not for companies training and querying theses models) for making a storyboard, but with current technology and even projected computing power, we don't have any clear path to create a full theater movie for this decade.

1 reply
Mauricio B. Holguin
Mar 19, 2025

Which model would you say comes closest to delivering decent results for a medium-shot talking head video with some light hand movement? So far it seems like Google Veo 2 is the closest to something usable in a real-life context imo.

Gu