PDF to Markdown on a Mac
Feeding a PDF straight to an LLM chat app works, but it is wasteful. Most apps fall back to a visual pipeline that rasterises every page and burns tokens describing layout you do not care about. Convert the PDF to Markdown first and the same document becomes searchable, diffable, and cheap to feed into a model.
marker does this conversion with surprisingly good handling of tables, equations, and code blocks. On Apple Silicon it runs locally on the GPU through MPS, so documents never leave the machine. The latest releases have a regression on MPS, so pin to 1.8.0 until issue #960 is resolved.
Install
uv tool install --python 3.12 'marker-pdf==1.8.0' --with psutilprintf '\nexport TORCH_DEVICE=mps\nexport PYTORCH_ENABLE_MPS_FALLBACK=1\n' >> ~/.zshrc
Open a new shell so the environment variables take effect. TORCH_DEVICE=mps tells PyTorch to use the Metal backend, and the fallback flag lets operations that are not yet implemented for MPS fall back to CPU instead of crashing.
Convert a single PDF
marker_single input.pdf --output_dir ./out
The output directory will contain a Markdown file, extracted images, and a JSON metadata sidecar.
Convert a folder of PDFs
marker ./pdfs --output_dir ./out --workers 4
Tune --workers to taste. On an M-series Mac with plenty of RAM, 4 is a reasonable starting point — push it higher and you will start swapping.
Related
- marker on GitHub — the source and the docs
- MPS backend regression — issue #960 — why we pin to 1.8.0
- PyTorch MPS backend docs — what the Metal backend can and cannot do