

Stable Audio
Stable Audio, developed by Stability AI, is a music generation product that utilizes a novel latent diffusion model to generate audio of varying lengths, conditioned on text metadata and timing, offering faster inference times and creative control over content and duration.
Cost / License
- Freemium (Subscription)
- Proprietary
Application types
Platforms
- Online
Features
- Virtual Instrument
- Music Sequencer
- AI-Powered
Tags
- stable-diffusion-api
- Stable Diffusion
Stable Audio News & Activities
Recent News
- POX published news article about Stable Audio
Stable Audio 2.5 from Stability AI delivers faster, more intricate AI‑generated musicStability AI has released Stable Audio 2.5, an update to its audio generation model, aimed at enter...
- Maoholguin published news article about Stable Audio Open
Stability unveils Stable Audio Open, a new open-source model for music & audio generationStability AI, known for its image generator Stable Diffusion, has launched an AI model for audio ge...
- POX published news article about Stable Audio
Stability AI launches Stable Audio 2.0 with full track generation up to three minutes longStability AI has unveiled Stable Audio 2.0, a fresh standard in AI-generated audio. This new model ...
Recent activities
oneLEAM added Stable Audio as alternative to Catophony
POX added Stable Audio as alternative to MusicAI
labeliqai added Stable Audio as alternative to LABEL IQ AI- babsors liked Stable Audio
Musicful added Stable Audio as alternative to Musicful- Danilo_Venom added Stable Audio as alternative to xaaimusic
voudzulp added Stable Audio as alternative to AI Song Creator
POX added Stable Audio as alternative to Lyria
ebalassanian added Stable Audio as alternative to Aimi Sync- POX added Stable Audio as alternative to Larnii
What is Stable Audio?
Stable Audio is a music generation product built by Stability AI. A first-of-its-kind product that uses the latest generative AI techniques to deliver faster, higher-quality music and sound effects via an easy-to-use web interface. Stability AI offers a basic free version of Stable Audio, which can be used to generate and download tracks of up to 45 seconds, and a ‘Pro’ subscription, which delivers 90-second tracks that are downloadable for commercial projects.
Stable Audio is ideal for musicians seeking to create samples to use in their music, but the opportunities for creators are limitless. Audio tracks are generated in response to descriptive text prompts supplied by the user, along with a desired length of audio. For instance, “Post-Rock, Guitars, Drum Kit, Bass, Strings, Euphoric, Up-Lifting, Moody, Flowing, Raw, Epic, Sentimental, 125 BPM” can be entered with a request for a 95-second track.
One of the main issues with generating audio using diffusion models is that diffusion models are usually trained to generate a fixed-size output. For example, an audio diffusion model might be trained on 30-second audio clips, and will only be able to generate audio in 30-second chunks. This is an issue when training on and trying to generate audio of greatly varying lengths, as is the case when generating full songs.
Audio diffusion models tend to be trained on randomly cropped chunks of audio from longer audio files, cropped or padded to fit the diffusion model’s training length. In the case of music, this causes the model to tend to generate arbitrary sections of a song, which may start or end in the middle of a musical phrase.
We introduce Stable Audio, a latent diffusion model architecture for audio conditioned on text metadata as well as audio file duration and start time, allowing for control over the content and length of the generated audio. This additional timing conditioning allows us to generate audio of a specified length up to the training window size.
Working with a heavily downsampled latent representation of audio allows for much faster inference times compared to raw audio. Using the latest advancements in diffusion sampling techniques, our flagship Stable Audio model is able to render 95 seconds of stereo audio at a 44.1 kHz sample rate in less than one second on an NVIDIA A100 GPU.







