VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.




SpeakPerfect is described as 'Turns your fuzzy thoughts into great script + audio using AI. With SpeakPerfect, you no longer need to spend hours writing down the script before making a video' and is a Text to Speech service in the ai tools & services category. There are more than 25 alternatives to SpeakPerfect for a variety of platforms, including Web-based, Windows, Mac, Linux and SaaS apps. The best SpeakPerfect alternative is VoiceCraft, which is both free and Open Source. Other great apps like SpeakPerfect are ElevenLabs, X to Voice, RHVoice and NaturalReader.
VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts.




ElevenLabs uses AI to deliver natural, expressive speech for diverse applications such as podcasts and videos. It features a user-friendly interface, customizable intonation, and offers seamless API integration. Privacy, scalability, and multilingual capabilities enhance its adaptability.




Open-source tool that analyzes your X/Twitter profile data to generate a custom voice with ElevenLabs Voice Design API, integrating with Hedra's video API for an innovative audio-visual experience.




Natural Reader is a professional text to speech program that converts any written text into spoken words. The paid versions of Natural Reader have many more features.



Voice Engine is a text-to-voice generation platform from OpenAI, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker.



SherpaTTS is an Android Text-to-Speech engine based on Next-gen Kaldi using Piper or Coqui voices.


Transform text into speech with natural synthesis, offering smooth and fine-tuned audio export. Create high-quality voiceovers, download outputs for diverse applications, and experience excellent synthesis. Supports various languages and operates on multiple platforms.




We're excited to introduce Chatterbox, Resemble AI's first production-grade open source TTS model. Licensed under MIT, Chatterbox has been benchmarked against leading closed-source systems like ElevenLabs, and is consistently preferred in side-by-side evaluations.
AIVocal is your all-in-one AI assistant for voice tasks—perfect for AI podcasting, speech generation, vocal editing, and voice control. From transcribing meetings to creating high-quality audio content, AIVocal makes voice work smarter and faster.

Dia is a 1.6B parameter text to speech model created by Nari Labs. It was pushed to the Hub using the PytorchModelHubMixin integration.
