Voicebox Studio icon
Voicebox Studio icon

Voicebox Studio

The open-source voice synthesis studio.

Voicebox Studio screenshot 1

Cost / License

  • Free
  • Open Source (MIT)

Platforms

  • Windows
  • Mac
  • Linux  Pre-built binaries are not yet available. See voicebox.sh/linux-install for build-from-source instructions.
  • Docker
2likes
0comments
0articles

Features

Voicebox Studio News & Activities

Highlights All activities

Recent activities

Voicebox Studio information

  • Developed by

    CA flagjamiepine
  • Licensing

    Open Source (MIT) and Free product.
  • Written in

  • Alternatives

    26 alternatives listed
  • Supported Languages

    • English

GitHub repository

  •  14,430 Stars
  •  1,715 Forks
  •  191 Open Issues
  •   Updated  
View on GitHub
Voicebox Studio was added to AlternativeTo by Muhammad Farag on and this page was last updated .
No comments or reviews, maybe you want to be first?

What is Voicebox Studio?

Voicebox: The Open-Source Voice Cloning Studio

Voicebox is a powerful, local-first alternative to services like ElevenLabs, designed for high-fidelity voice cloning and speech synthesis. It functions as a comprehensive creative suite, allowing users to clone voices from seconds of audio, generate speech in 23 languages, and orchestrate complex audio projects via a multi-track timeline—all while running entirely on your own hardware.

Key Capabilities

  1. Multi-Engine Synthesis Voicebox integrates five distinct Text-to-Speech (TTS) engines, allowing users to choose the best tool for the task:
  • Qwen3-TTS: High-quality multilingual cloning with support for delivery instructions (e.g., "whisper").
  • LuxTTS: A lightweight, ultra-fast engine optimized for 48kHz CPU generation.
  • Chatterbox (Multilingual & Turbo): Offers the broadest language support and paralinguistic tags for expressive speech (laughs, sighs, gasps).
  • TADA: A speech-language model designed for long-form, coherent audio (up to 700s+).
  1. Advanced Audio Post-Processing Powered by Spotify’s pedalboard library, Voicebox includes eight real-time effects (Pitch Shift, Reverb, Compression, etc.). Users can build custom presets or use built-in profiles like "Radio" or "Robotic" to polish their clones.

  2. Professional Workflow Tools

  • Unlimited Generation: Uses smart auto-chunking and crossfading to generate up to 50,000 characters without breaks.
  • Stories Editor: A multi-track timeline editor for composing podcasts, conversations, and narratives with drag-and-drop ease.
  • Version Control: Tracks "Takes" and "Effects versions" for every generation, ensuring the original clean output is always preserved.
  • Async Queue: A non-blocking generation system that allows you to queue multiple tasks without crashing your GPU.
  1. Voice & Model Management
  • Profile Management: Create voice identities from recordings or files, supporting multi-sample inputs for higher cloning accuracy.
  • Recording & Transcription: Built-in system audio capture and Whisper-powered transcription for seamless content creation.
  • Hardware Efficiency: Local model management allows users to load/unload models to optimize VRAM usage.

Official Links