Video transcripts on a Mac
Whisper runs on the Mac you already own at 5–8x realtime. No API key, no per-minute charges, no audio leaving the laptop.
whisper.cpp is a single binary with Metal acceleration. No Python venv, no dependency hell. Homebrew installs the binary; the model weights come separately.
Install
brew install whisper-cpp ffmpeg
The Homebrew whisper-cli build does not shell out to ffmpeg automatically — it wants a 16 kHz mono WAV directly, so ffmpeg is what gets anything else into the right shape.
Get a model
mkdir -p ~/.whisper-models && cd ~/.whisper-modelscurl -L -O https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3.bin
That is ~3 GB and gives the best quality. ggml-large-v3-turbo.bin (~1.6 GB) is faster with marginal quality loss on English, but its drop on non-English content is larger than the headline numbers suggest. For anything mixed-language, stick with large-v3. Other sizes — medium, base, tiny — live at the same URL pattern.
Transcribe
Drop this into ~/.zshrc:
transcribe() {local input="$1"local base="${input%.*}"local tmpwav="/tmp/whisper_$$.wav"ffmpeg -loglevel error -i "$input" -ar 16000 -ac 1 -c:a pcm_s16le "$tmpwav" || return 1whisper-cli -m ~/.whisper-models/ggml-large-v3.bin \-l "${WHISPER_LANG:-auto}" -otxt -osrt -t 4 -pp \-of "$base" "$tmpwav"rm -f "$tmpwav"}
Open a new shell, then transcribe video.mp4 drops video.txt and video.srt next to the source. Set the language when you know it — WHISPER_LANG=hu transcribe video.mp4 — because auto-detect occasionally misfires on the first few seconds and you get the rest of the file in the wrong language.
Threads, counterintuitively
The wrapper pins -t 4. On an M4 Air with 4 performance and 6 efficiency cores, this is faster than -t 10: the E-cores drag down the average. Match the performance-core count of whichever M-series chip you are on.
Confirm Metal is on
First run, look for whisper_backend_init: using Metal backend in the startup log. If it says CPU, reinstall with brew reinstall whisper-cpp. The tensor API disabled for pre-M5 line that comes up on M1–M4 is not an error — it just notes a feature only the M5/A19 chips have.
Related
- whisper.cpp on GitHub — the source, the flags, and the docs
- ggml model files on HuggingFace — every model size, every variant
- Robust Speech Recognition via Large-Scale Weak Supervision — the original Whisper paper, if you want to know what the model is actually doing