VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.




Enginn Studio is described as 'Enginn creates human-quality voices which don't belong to anyone, and provides a SaaS platform, Enginn Studio, allowing to use those voices for content production' and is a Text to Speech service. There are more than 10 alternatives to Enginn Studio, not only websites but also apps for a variety of platforms, including SaaS, Python, Self-Hosted and Google Chrome apps. The best Enginn Studio alternative is VoiceCraft, which is both free and Open Source. Other great sites and apps similar to Enginn Studio are ElevenLabs, X to Voice, Voice Engine and Chatterbox TTS.
VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.




ElevenLabs uses AI to deliver natural, expressive speech for diverse applications such as podcasts and videos. It features a user-friendly interface, customizable intonation, and offers seamless API integration. Privacy, scalability, and multilingual capabilities enhance its adaptability.




Open-source tool that analyzes your X/Twitter profile data to generate a custom voice with ElevenLabs Voice Design API, integrating with Hedra's video API for an innovative audio-visual experience.


Voice Engine is a text-to-voice generation platform from OpenAI, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.


We're excited to introduce Chatterbox, Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.

AIVocal is your all-in-one AI assistant for voice tasks—perfect for AI podcasting, speech generation, vocal editing, and voice control. From transcribing meetings to creating high-quality audio content, AIVocal makes voice work smarter and faster.

Dia is a 1.6B parameter text to speech model created by Nari Labs. It was pushed to the Hub using the PytorchModelHubMixin integration.

An AI-powered platform revolutionizing voice creation with cutting-edge technology. We provide advanced audio solutions for creators and businesses worldwide.



A privacy-focused online video editor that processes videos entirely in your browser using FFmpeg and WebAssembly. No uploads, no signups required.

AudiowaveAI lets you convert text to high-quality audio easily and affordably. Listen to PDFs, epubs, articles, blog posts, links, emails or anything else you want on any device with natural-sounding voices.




Convert text into natural-sounding speech using an API powered by the best of Google’s AI technologies.
