Software Github address:https://github.com/Const-me/Whisper
Download Address:https://github.com/Const-me/Whisper/releases/tag/1.11.0
For Windows systems, download WhisperDesktop.zip and you’re done!
I. Introduction:
WhisperDesktop is based on the Whisper speech-to-text technology launched by OpenAI around early 2023. Through AI recognition technology, it can not only generate text quickly and correctly but also perform real-time translation. The advantage is that it is free and can be used offline on a standalone basis without uploading any data, but the disadvantage is that it needs to be run via python command line, which is really a little bit unfriendly to white users. That’s why there are many shell applications, and WhisperDesktop is one of the more convenient ones.
II. the use of methods:
1.In addition to downloading the WhisperDesktop software, you should also download a Whisper model at https://huggingface.co/datasets/ggerganov/whisper.cpp/tree/main. Developers recommend ggml-medium.bin for this version. of the model is available. Click into the appropriate model link, then click on the left download to download.
2.After unpacking the software and running WhisperDesktop.exe, the first time you run it you need to select the model.
3.See the following illustration for the specific operation interface:
4.Audio Capture (audio capture), the software also supports real-time audio capture to generate text, see the following chart for details:
5, conversion generation speed depends on your computer configuration, generally with a discrete graphics card, conversion generation of a 6-minute video will not exceed 1 and a half minutes (for reference only).
6, in general Whisper conversion recognition rate and accuracy rate has been very high (more than 95%), but still affected by the model, you can test the specific, you can also open the converted subtitle file to modify their own proofreading.
III. Summary:
WhisperDesktop is a free software, easy to operate, no need to upload anything to the cloud and no restrictions, plus a good recognition rate and generation speed, so it is worth recommending to those who need speech to text.