

Gazelle Speech Language Model
Gazelle is a joint speech-language model by Tincans — for more details and prompt ideas, see our v0.2 announcement. This is an early research preview -- please temper expectations! Gazelle can take in text and audio as input (interchangeably) and generates text as output.
Cost / License
- Free
- Open Source
Platforms
- Online
Features
- Speech to text
- speech transcription
- Speech Recognition
- AI-Powered
Gazelle Speech Language Model News & Activities
Recent activities
gurjar added Gazelle Speech Language Model as alternative to OmniDictate
Gazelle Speech Language Model information
What is Gazelle Speech Language Model?
Gazelle is a joint speech-language model by Tincans — for more details and prompt ideas, see our v0.2 announcement. This is an early research preview -- please temper expectations! Gazelle can take in text and audio as input (interchangeably) and generates text as output. You can further synthesize the text output into audio via a TTS provider (not implemented here). Some example tasks include transcribing audio, answering questions, or understanding spoken audio. This approach will be superior for business use cases where latency and conversational quality matter - such as customer support, outbound sales, and more.
Known limitations exist! The model was only trained on English audio and is not expected to work well with other languages. Similarly, the model does not handle accents well yet. The gradio demo may have bugs with sample rate for audio. We also only accept a single audio input (microphone or upload).
Inference is done via serverless GPU's on Modal. As such, you may experience cold start delays (about 30 seconds) on first use, but subsequent responses will be faster. This demo is purposefully not optimized for inference speed, but rather to showcase the capabilities of Gazelle. We do not store any responses.




