

Amphion
Amphion is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Cost / License
- Free
- Open Source
Application type
Platforms
- Self-Hosted
- Docker
- Python
Features
- Text to Speech
- Ad-free
- Voice synthesis
- AI-Powered
Tags
- speech-synthesis
- speech-synthesizer
- music-synthesizer
- voice-conversion
- ai-music-generation
- speech-generation
Amphion News & Activities
Recent activities
POX added Amphion as alternative to Hermes Voice
POX added Amphion as alternative to Scriber Pro
POX added Amphion as alternative to Transcriboar
Maoholguin added Amphion as alternative to Paraspeech
Maoholguin added Amphion as alternative to FLUID - AI Dictation
Amphion information
What is Amphion?
Amphion is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development. Amphion offers a unique feature: visualizations of classic models or architectures. We believe that these visualizations are beneficial for junior researchers and engineers who wish to gain a better understanding of the model.
The North-Star objective of Amphion is to offer a platform for studying the conversion of any inputs into audio. Amphion is designed to support individual generation tasks, including but not limited to,
- TTS: Text to Speech
- SVS: Singing Voice Synthesis
- VC: Voice Conversion
- SVC: Singing Voice Conversion
- TTA: Text to Audio
- TTM: Text to Music
- more…
In addition to the specific generation tasks, Amphion includes several vocoders and evaluation metrics. A vocoder is an important module for producing high-quality audio signals, while evaluation metrics are critical for ensuring consistent metrics in generation tasks. Moreover, Amphion is dedicated to advancing audio generation in real-world applications, such as building large-scale datasets for speech synthesis.




