WhisperWriter is a small speech-to-text app that uses OpenAI's Whisper model to auto-transcribe recordings from a user's microphone to the active window.
Once started, the script runs in the background and waits for a keyboard shortcut to be pressed (ctrl+shift+space by default). When the shortcut is pressed, the app starts recording from your microphone. There are four recording modes to choose from:
- continuous (default): Recording will stop after a long enough pause in your speech. The app will transcribe the text and then start recording again. To stop listening, press the keyboard shortcut again.
- voice_activity_detection: Recording will stop after a long enough pause in your speech. Recording will not start until the keyboard shortcut is pressed again.
- press_to_toggle Recording will stop when the keyboard shortcut is pressed again. Recording will not start until the keyboard shortcut is pressed again.
- hold_to_record Recording will continue until the keyboard shortcut is released. Recording will not start until the keyboard shortcut is held down again.
You can change the keyboard shortcut (activation_key) and recording mode in the Configuration Options. While recording and transcribing, a small status window is displayed that shows the current stage of the process (but this can be turned off). Once the transcription is complete, the transcribed text will be automatically written to the active window.
The transcription can either be done locally through the faster-whisper Python package or through a request to OpenAI's API. By default, the app will use a local model, but you can change this in the Configuration Options. If you choose to use the API, you will need to either provide your OpenAI API key or change the base URL endpoint.