Transcribing recorded audio and video to text using Whisper AI on a Mac

February 15, 2023

Late last year, OpenAI announced Whisper, a new speech-to-text language model that is extremely accurate in translating many spoken languages into text. The whisper repository contains instructions for installation and use.

tl;dr:

# Install whisper and its dependencies.
pip3 install git+https://github.com/openai/whisper.git 

# (When needed) Update whisper.
pip3 install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

# Make sure ffmpeg is installed.
brew install ffmpeg

# Translate speech into text.
whisper my_audio_file.mp3 --language English

One thing I do quite regularly for my YouTube channel is extract the audio track, convert it to text using an online tool (I used to use Welder until they were bought out by Veed), and then hand-edit the file to fix references to product names, people, etc.

Then I upload either an edited .txt or .srt file alongside my video on YouTube, and people are able to use Closed Captions. YouTube shows whether a video has manually-curated captions with this handy little 'CC' icon:

But as Veed's free tier only allows up to 10 minutes of audio to be transcribed at a time, it was time to look elsewhere. And on my earlier blog post about using macOS's built-in Dictation feature for transcription, rasmi commented that a new tool was available, Whisper.

So I took it for a spin!

I installed it and ran it on one of my video's audio tracks using the commands at the top of this post, and I was pleasantly surprised:

Experimenting with the different models, base.en was very fast for English, but I found that small or medium were much better at identifying product names, obscure technical terms, etc. Honestly it blew me away that it picked up words like 'PlinkUSA', 'Sliger', and 'Raspberry Pi'—something other transcription tools would trip on.
You can even translate text files (using --translate), which is a neat trick. It will automatically identify the source language, or you can specify it with --language).
It's not quite perfect yet—I still need to touch up probably one word every 10 sentences. But it's a thousand times easier than trying to transcribe things manually! And it even does punctuation and outputs an .srt natively.

I've been scanning through discussions and there are already some great ones about features like diarization (being able to identify multiple speakers in a conversation) and performance benchmarking.

On my Mac Studio's CPU, the conversion process is only a little slower than real-time. I haven't yet tested it on my PC with a beefier GPU, but I plan on testing that soon.

Being fairly new, specific UIs for Whisper aren't mature yet... but I did find things like whisper-ui, and there's even a Hugging Face webapp Whisper Webui you can use for up to 10 minutes of audio transcription to get a feel for it.

And on macOS, if the command line isn't your thing, Jordi Bruin created an app MacWhisper, which is free for the standard version and includes a UI for editing the transcription live:

Hopefully more UIs are developed, especially something I could toss on one of my PCs here, so I could quickly throw an audio file at it from any device.

I'm generally a bit conservative when it comes at throwing AI at a problem, but speech to text (and vice-versa) is probably one of the most cut-and-dry uses that makes sense and doesn't carry a number of footguns.

Transcribing recorded audio and video to text using Whisper AI on a Mac

Transcribing recorded audio and video to text using Whisper AI on a Mac

Recommend

一加11性能评测：肆意“狂飙” 满足2023你对性能需求 - 手机中国

拳打奔驰C脚踢宝马3？蔚来ET5在多个城市销量大幅领先

Twitter Could Have A New CEO By End Of 2023

2024 Buick Encore GX Brings Luxury Avenir Trim To Small SUV

Amazon cuts ties with EU distributors amid wider push to trim costs

小Biu的平替家电，正成为小镇青年的新宠

South African Hacker Group Cheats Microsoft and Salesforce of Millions of Dollar...

February 16, 2023 – ‘Reality Pro’ delays, Ted Lasso season 3

华为汽车会是下一个荣耀吗？

被忽视的租车市场，能否成为二线新势力的新增长点

About Joyk