GuidesMar 7, 2026

Spec Driven Development

The robot coding agent and a human hand jointly reviewing a large blueprint-style spec document covered in checkmarks and diagrams — humans and agents approving a plan together.

Your coding agent is a monkey with an AK-47. It runs around, shoots things, sometimes hits the correct target — just so it can forget what the target was in the next session.

Where it starts falling apart

Most people start the same way: tell the agent to do something and let it run. Sometimes it works. It fixes the bug, adds the feature, cleans up the tests. That first success teaches you the wrong lesson. It makes the workflow feel reliable when it is often a coin flip.

Fix the billing cancellation flow and make it better.

That prompt sounds reasonable, but it does not say what “better” means. The agent will still act confident. It will produce files and diffs, but that does not mean it understood the task.

And once it goes wrong, the follow-up prompt is usually worse. You are no longer asking the agent to solve the original problem. You are asking it to recover from its own misunderstanding, on top of code it already changed. That is how you end up with a messy repo and no clear line between the original task and the damage control.

A common example: on day one you ask the agent to add address fields to the payment form because the bank wants more information for fraud prevention. On day two you realise the full address is not needed, so you ask the agent to remove it again. The agent removes the postcode field as well, because it sees postcode as part of the address. It has no idea that in the UK the postcode was the one field still needed for anti-fraud checks.

Day 1:
"Add address fields to the payment form for fraud prevention."

Day 2:
"We don't need the full address after all. Remove the address fields."

What the agent hears:
"Delete everything that looks like address data."

What you meant:
"Keep the postcode field. Remove the rest."

And the problem does not stop at today's mistake. A year later you may not remember why the postcode field survived while the rest of the address was removed. If you are no longer on the project, your replacement definitely will not know. Without a spec, that remaining field just looks arbitrary, and the next cleanup pass will be another coin flip.

From prompt to plan to spec

Prompt

Add a self-serve cancel subscription button in billing settings.

This feels complete. It is not. What happens to the current billing period? Does the user get a confirmation step? What about already-cancelled subscriptions? The prompt does not say, and the agent will not ask. That gap is what leads to plan mode.

Plan mode

Nobody is writing code yet. The agent is in a feedback loop: it asks clarifying questions, surfaces assumptions, and proposes steps — and the only way out is a plan you accept. A decent output for the cancellation example might look like this:

Questions
- What happens to the remaining billing period after cancellation?
- Is an explicit confirmation step required?
- Should the user receive an email confirmation?
- What should the UI show for already-cancelled subscriptions?

Plan
1. Inspect the billing settings page and subscription lifecycle
2. Add a cancel action with an explicit confirmation step
3. Update the backend to mark the subscription to end at period end
4. Show the effective end date in the UI
5. Send a cancellation confirmation email
6. Add tests for active and inactive subscription states

Once those assumptions are exposed, you can correct them before the agent touches the codebase. That is why plan mode works: both you and the agent are focused on one thing — understanding what needs to be done.

But once the session ends, those insights are gone. The next agent can see what its predecessor did — the code is right there — but it will not know why. The reasoning, the trade-offs, the rejected alternatives: all trapped in a chat transcript that no future session will read.

The spec

What if that plan did not get binned with the session? What if it was saved somewhere your agent could recall it in a future conversation? That is the spec: the approved plan, extracted from the chat and stored as a durable artifact. Tooling varies, but the shape stays the same. Here is a minimal example in the style of OpenSpec.

## Why
Users can only cancel subscriptions by emailing support. This is slow,
inconsistent, and creates avoidable manual work.

## What Changes
- Add a self-serve cancel action in billing settings
- Require an explicit confirmation step before cancellation
- Send a cancellation confirmation email

## Impact
- Affected specs: billing
- Affected code: billing UI, subscriptions API, email worker

## ADDED Requirements
### Requirement: Subscription Cancellation
The system SHALL allow a signed-in customer to cancel an active subscription
from the billing settings page.

#### Scenario: Successful cancellation
- **WHEN** the customer confirms cancellation for an active subscription
- **THEN** the subscription is marked to cancel at the end of the billing period
- **AND** the UI shows the effective end date
- **AND** a confirmation email is sent

#### Scenario: No active subscription
- **WHEN** the customer opens billing settings without an active subscription
- **THEN** the cancellation action is not shown

That is the whole ladder: prompt, plan, spec. The important step is the extraction. If the plan stays trapped in the conversation, you do not have spec driven development — you just have a longer prompt. Once it is written down, a reviewer can challenge scope, find missing edge cases, and decide whether the change should exist at all, before code is written.

How information flows

Prompt

You

Agent

Code

One shot. No review. Intent dies with the session.

Plan mode

You

Agent

Code

Feedback loop improves the plan. But the plan dies with the session.

Spec driven

You

Agent

Spec

new session

Agent

Code

Spec persists. New agent reads it and starts with full context.

What goes in the spec

A good spec answers four questions:

What problem are we solving?
What is in scope, and what is not?
How do we know the work is correct?
What parts of the system will change?

In practice that usually means a short why, a concrete list of changes, and explicit acceptance criteria. If the reviewer has to infer behavior from implementation details, the spec is too vague. Leave out incidental implementation details, verbatim chat logs, and vague goals. The spec describes behaviour and acceptance, not how you arrived there. You do not need special tooling for this — plain markdown files in a repo work fine. Tools like OpenSpec add structure, but the hard part is deciding that conversations are drafts and specs are the artifact.

Review the spec, not the chat

Reviewers should not have to reconstruct intent from a diff or a conversation transcript. The spec is the contract. If the code and the spec disagree, one of them is wrong — and that gives you a clean place to resolve ambiguity. It also makes agents safer to use: point a fresh session at the approved spec and it starts clean, with no stale assumptions or inherited confusion.

That usefulness does not expire when the feature ships. A year later you can still query the spec directly — what is the system supposed to do, which scenarios matter, does the code still match the requirement. If the implementation has drifted, you know you have a bug, an undocumented change, or a spec that needs updating.

"What does the payment form spec say about postcode?"
"Check whether the current implementation still matches the cancellation spec."

It does not stop at specs

Skills fill two gaps that specs and code alone cannot cover. The first is tribal knowledge: conventions, architectural decisions, and preferences that live in your team's heads but never make it into the codebase. A skill can encode those rules so the agent follows them without being told every session.

The second is keeping the model's existing knowledge current. A skill can instruct the agent to read the latest documentation for a framework or library before writing any code, instead of relying on whatever it absorbed at training time. Specs say what should happen; skills say how to verify and which sources to trust. For more on skills, see the agentic coding survival guide.

What the agent knows

Before

Training data

Broad but frozen at a cutoff date — goes stale.

Code

Shows what exists, but not why it was built that way.

After

Training data

Broad but frozen at a cutoff date — goes stale.

Skills

Encode tribal knowledge and force the agent to read the latest docs before acting.

Code

Shows what exists, but not why it was built that way.

Specs

The durable intent. Shows what should exist, why, and how to verify it.

The payoff

Every spec you write makes the agent's understanding of the project deeper. Over time, the accumulated specs mean the agent knows more about intent, constraints, and history than any single person on the team. People leave, context fades, institutional memory erodes — but the specs stay. You are not just improving handoffs. You are building a system that understands the project better than the people who built it. Whether that excites you or unsettles you probably depends on how many specs you have written.

Spec Driven Development

Where it starts falling apart

From prompt to plan to spec

Prompt

Plan mode

The spec

What goes in the spec

Review the spec, not the chat

It does not stop at specs

The payoff

Related