Video transcripts on a Mac

Whisper runs on the Mac you already own at 5–8x realtime. No API key, no per-minute charges, no audio leaving the laptop.

whisper.cpp is a single binary with Metal acceleration. No Python venv, no dependency hell. Homebrew installs the binary; the model weights come separately.

Install

brew install whisper-cpp ffmpeg

The Homebrew whisper-cli build does not shell out to ffmpeg automatically — it wants a 16 kHz mono WAV directly, so ffmpeg is what gets anything else into the right shape.

Get a model

mkdir -p ~/.whisper-models && cd ~/.whisper-models
curl -L -O https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large-v3.bin

That is ~3 GB and gives the best quality. ggml-large-v3-turbo.bin (~1.6 GB) is faster with marginal quality loss on English, but its drop on non-English content is larger than the headline numbers suggest. For anything mixed-language, stick with large-v3. Other sizes — medium, base, tiny — live at the same URL pattern.

Transcribe

Drop this into ~/.zshrc:

transcribe() {
  local input="$1"
  local base="${input%.*}"
  local tmpwav="/tmp/whisper_$$.wav"

  ffmpeg -loglevel error -i "$input" -ar 16000 -ac 1 -c:a pcm_s16le "$tmpwav" || return 1

  whisper-cli -m ~/.whisper-models/ggml-large-v3.bin \
    -l "${WHISPER_LANG:-auto}" -otxt -osrt -t 4 -pp \
    -of "$base" "$tmpwav"

  rm -f "$tmpwav"
}

Open a new shell, then transcribe video.mp4 drops video.txt and video.srt next to the source. Set the language when you know it — WHISPER_LANG=hu transcribe video.mp4 — because auto-detect occasionally misfires on the first few seconds and you get the rest of the file in the wrong language.

Threads, counterintuitively

The wrapper pins -t 4. On an M4 Air with 4 performance and 6 efficiency cores, this is faster than -t 10: the E-cores drag down the average. Match the performance-core count of whichever M-series chip you are on.

Confirm Metal is on

First run, look for whisper_backend_init: using Metal backend in the startup log. If it says CPU, reinstall with brew reinstall whisper-cpp. The tensor API disabled for pre-M5 line that comes up on M1–M4 is not an error — it just notes a feature only the M5/A19 chips have.

whisper.cpp on GitHub — the source, the flags, and the docs
ggml model files on HuggingFace — every model size, every variant
Robust Speech Recognition via Large-Scale Weak Supervision — the original Whisper paper, if you want to know what the model is actually doing