Gazelle Speech Language Model icon
Gazelle Speech Language Model icon

Gazelle Speech Language Model

Gazelle is a joint speech-language model by Tincans — for more details and prompt ideas, see our v0.2 announcement. This is an early research preview -- please temper expectations! Gazelle can take in text and audio as input (interchangeably) and generates text as output.

Gazelle Speech Language Model screenshot 1

Cost / License

  • Free
  • Open Source

Platforms

  • Online
-
No reviews
0likes
0comments
0news articles

Features

Suggest and vote on features
  1.  Speech to text
  2.  speech transcription
  3.  Speech Recognition
  4.  AI-Powered

Gazelle Speech Language Model News & Activities

Highlights All activities

Recent activities

Show all activities

Gazelle Speech Language Model information

  • Developed by

    Tincans
  • Licensing

    Open Source (Apache-2.0) and Free product.
  • Written in

  • Alternatives

    3 alternatives listed
  • Supported Languages

    • English

AlternativeTo Category

AI Tools & Services

GitHub repository

  •  372 Stars
  •  33 Forks
  •  2 Open Issues
  •   Updated  
View on GitHub
Gazelle Speech Language Model was added to AlternativeTo by Paul on and this page was last updated .
No comments or reviews, maybe you want to be first?
Post comment/review

What is Gazelle Speech Language Model?

Gazelle is a joint speech-language model by Tincans — for more details and prompt ideas, see our v0.2 announcement. This is an early research preview -- please temper expectations! Gazelle can take in text and audio as input (interchangeably) and generates text as output. You can further synthesize the text output into audio via a TTS provider (not implemented here). Some example tasks include transcribing audio, answering questions, or understanding spoken audio. This approach will be superior for business use cases where latency and conversational quality matter - such as customer support, outbound sales, and more.

Known limitations exist! The model was only trained on English audio and is not expected to work well with other languages. Similarly, the model does not handle accents well yet. The gradio demo may have bugs with sample rate for audio. We also only accept a single audio input (microphone or upload).

Inference is done via serverless GPU's on Modal. As such, you may experience cold start delays (about 30 seconds) on first use, but subsequent responses will be faster. This demo is purposefully not optimized for inference speed, but rather to showcase the capabilities of Gazelle. We do not store any responses.

Official Links