

Voicebox Studio
The open-source voice synthesis studio.
Cost / License
- Free
- Open Source (MIT)
Platforms
- Windows
- Mac
- Linux
- Docker
Features
Voicebox Studio News & Activities
Recent activities
- niksavc liked Voicebox Studio
CoHarmonify added Voicebox Studio as alternative to CoHarmonify- muhammadfarag liked Voicebox Studio
- muhammadfarag added Voicebox Studio
muhammadfarag added Voicebox Studio as alternative to ElevenLabs, Chatterbox TTS, Balabolka and Kokoro
Voicebox Studio information
What is Voicebox Studio?
Voicebox: The Open-Source Voice Cloning Studio
Voicebox is a powerful, local-first alternative to services like ElevenLabs, designed for high-fidelity voice cloning and speech synthesis. It functions as a comprehensive creative suite, allowing users to clone voices from seconds of audio, generate speech in 23 languages, and orchestrate complex audio projects via a multi-track timeline—all while running entirely on your own hardware.
Key Capabilities
- Multi-Engine Synthesis Voicebox integrates five distinct Text-to-Speech (TTS) engines, allowing users to choose the best tool for the task:
- Qwen3-TTS: High-quality multilingual cloning with support for delivery instructions (e.g., "whisper").
- LuxTTS: A lightweight, ultra-fast engine optimized for 48kHz CPU generation.
- Chatterbox (Multilingual & Turbo): Offers the broadest language support and paralinguistic tags for expressive speech (laughs, sighs, gasps).
- TADA: A speech-language model designed for long-form, coherent audio (up to 700s+).
-
Advanced Audio Post-Processing Powered by Spotify’s pedalboard library, Voicebox includes eight real-time effects (Pitch Shift, Reverb, Compression, etc.). Users can build custom presets or use built-in profiles like "Radio" or "Robotic" to polish their clones.
-
Professional Workflow Tools
- Unlimited Generation: Uses smart auto-chunking and crossfading to generate up to 50,000 characters without breaks.
- Stories Editor: A multi-track timeline editor for composing podcasts, conversations, and narratives with drag-and-drop ease.
- Version Control: Tracks "Takes" and "Effects versions" for every generation, ensuring the original clean output is always preserved.
- Async Queue: A non-blocking generation system that allows you to queue multiple tasks without crashing your GPU.
- Voice & Model Management
- Profile Management: Create voice identities from recordings or files, supporting multi-sample inputs for higher cloning accuracy.
- Recording & Transcription: Built-in system audio capture and Whisper-powered transcription for seamless content creation.
- Hardware Efficiency: Local model management allows users to load/unload models to optimize VRAM usage.




