István

Agent Skills

Your agent does not know how your team works. It does not know how to set up a testing environment, how to conduct a UI review, or how to write a migration script that follows your conventions. A skill is a markdown file that teaches it — loaded into context on demand, only when relevant. That single distinction makes skills fundamentally different from system prompts, and far more powerful.

The system prompt trap

System prompts are loaded into every conversation. Every instruction you add competes for space in the context window — space you need for the actual task. The natural impulse is to keep adding: coding conventions, testing patterns, deployment rules, error handling guidelines. Before long you have a monolithic instruction file that tries to cover everything and covers nothing well.

The agent reads all of it, all the time, whether it needs it or not. Your commit message conventions are taking up context when the agent is writing a database migration. Your deployment checklist is sitting there while the agent fixes a CSS bug. This is waste, and it gets worse as the system prompt grows.

On-demand expertise

Skills solve this the same way lazy loading solves bundle size: don’t pay for what you don’t use. The agent pulls in only the skill it needs for the task at hand. Writing a commit? Load the commit skill. Deploying? Load the deployment skill. Writing tests? Load the testing skill. The rest stays out of the context window.

You can have dozens of skills across completely different domains — branding guidelines, UX patterns, API conventions, low-level debugging playbooks, deployment checklists — and the agent only loads what it needs. A project with thirty skills behaves the same as one with three, because only the relevant skill enters the context.

This means each skill can be thorough. You’re not rationing instructions to fit a budget — you have the full context window to work with. A testing skill can include examples of good tests, patterns to follow, libraries to use, and edge cases to consider. None of that bloats conversations that have nothing to do with testing.

What belongs in a skill

A good skill reads like instructions to a competent developer who knows the tools but not your conventions. Specific enough to produce consistent results, not so prescriptive that it breaks when the context changes.

Scope it to one task. “How we write API endpoints” is a skill. “Everything about our backend” is a system prompt that should have been broken into skills.

Include examples. Show the agent what good output looks like. A commit message skill with three example commits will produce better results than two paragraphs describing the format. Models learn from patterns faster than from rules.

State the non-obvious. Skip what any model already knows. “Use descriptive variable names” wastes tokens. “We prefix all database migration files with the Jira ticket number” — that’s worth including because no model would guess it.

Keep it current. A stale skill is worse than no skill. If it tells the agent to use a deprecated API or a pattern you’ve moved away from, it actively produces wrong output. Version control your skills and update them when conventions change.

Encoding tribal knowledge

Every team has conventions that exist only in people’s heads. “We never auto-scale past four replicas because the database can’t handle it.” “Feature flags for billing changes need sign-off from the payments team before you flip them.” “The legacy auth service returns 200 with an error body — don’t trust the status code.” This is tribal knowledge — it’s real, it matters, and it’s invisible to anyone who wasn’t in the room when the decision was made.

Skills make it explicit. When you write a skill, you’re extracting knowledge from someone’s head and putting it into a file that both agents and new team members can consume. The skill becomes the canonical source of truth for how a task should be done. No more “ask Sarah, she knows how we handle that.”

This is the real value of skills — not just agent productivity, but organisational knowledge that’s finally written down.

Composability

Skills compose with commands. A command is a predefined prompt you invoke by name — /commit, /test, /deploy. The command triggers the agent, the agent loads the relevant skill, and the skill provides the detailed instructions. You get one-keystroke actions backed by thorough playbooks. And if the agent fails to load a skill automatically, you can always invoke it yourself with a slash command — it is a reliable fallback, not a workaround.

The layers work together. Your system prompt carries the essentials — tech stack, project structure, the minimum the agent always needs. Skills carry task-specific depth. Commands make it fast to invoke. Each layer does one thing, nothing overlaps, and the context window stays focused.

The ecosystem

You don’t have to write every skill from scratch. skills.sh is a registry of community-contributed skills you can install with a single command. There are skills for testing, deployment, code review, documentation — the common tasks that most projects share.

But don’t stop at installing. Generic skills produce generic results. The real power comes from customising them for your project — or writing your own from scratch. The best skills are the ones that capture your team’s specific way of working, not a generic best practice that could apply to anyone.

From zero to useful

Here is what this looks like in practice. An empty directory, one skill, and a prompt.

mkdir hn-highlights && cd hn-highlights
npx skills add https://github.com/anthropics/claude-code-playwright --skill playwright-cli
claude

Three commands. The directory has nothing in it except a skill file that teaches the agent how to drive a browser. You open Claude Code and ask it to scrape Hacker News for today’s highlights.

❯ Scrape Hacker News, look at the top articles and their top comment
threads, and give me a summary of the highlights.
⏺ Plan: Scrape and Summarize Hacker News Top Articles
Using the playwright-cli skill (already installed in the project), I will:
1. Open browser and navigate to HN
2. Capture the front page snapshot
3. Visit top comment threads
4. Summarize
5. Close browser
...
⏺ Bash(playwright-cli open https://news.ycombinator.com/)
⎿ Browser opened with pid 31344.
⏺ I have the front page. Now let me visit the top comment threads for
the most popular articles.
⏺ 5 Bash agents finished
├─ Scrape HN comments thread 1 · 12 tool uses · 23.5k tokens
├─ Scrape HN comments thread 2 · 30 tool uses · 32.6k tokens
├─ Scrape HN comments thread 3 · 18 tool uses · 24.3k tokens
├─ Scrape HN comments thread 4 · 16 tool uses · 15.3k tokens
└─ Scrape HN comments thread 5 · 14 tool uses · 21.8k tokens
⏺ Bash(playwright-cli close)
⎿ Browser 'default' closed
⏺ Here's your Hacker News Highlights for February 21, 2026:
1. Keep Android Open (1661 pts, 597 comments)
...
8. Andrej Karpathy Talks About "Claws" (127 pts, 174 comments)
...
✻ Brewed for 4m 42s

The agent read the skill, opened a browser, scraped the front page, visited comment threads in parallel, and produced a digest. It knew how to drive Playwright because the skill told it — not because we described every step in the prompt.

Then we ask it to make the process repeatable.

❯ Make this process easier next time, write a prompt where you instruct
yourself along with some helper scripts written in TypeScript / Bun.
...
⏺ Instead of the browser-based approach (which required 5 parallel
subagents and ~3 minutes), you just say "get me HN highlights" and
I'll:
1. bun src/scrape.ts — Fetches everything from the HN API in ~3
seconds (no browser needed)
2. bun src/format.ts — Produces a raw markdown digest
3. Read & summarize — I read the digest and write a polished summary
✻ Brewed for 5m 23s

The agent did not just automate the same approach. It found the Hacker News Firebase API, realised it was faster and more reliable than driving a browser, and rewrote the workflow around it. It created TypeScript scripts, a CLAUDE.md with instructions for future sessions, and pre-approved the shell commands so next time there are no permission prompts.

Now, opening Claude Code in that directory and saying “get me HN highlights” just works. One sentence. No setup. No explanation.

That is the compound effect: a skill taught the agent a capability, the agent found a better path on its own, built infrastructure around it, and made future sessions faster. Each layer builds on the last.

The meta-skill

Writing good skills is itself a skill worth developing. It’s a form of technical writing — clear, structured, example-driven. The better your skills, the better your agent output, across every session, for every developer on the team. Time spent here compounds faster than almost anything else in agentic coding.

Use the agent to help write them. Have it draft a skill based on how you just completed a task, then refine it. The model knows what format works best for itself — it’s the ideal co-author for its own instructions.


Related