István

Agentic Coding Survival Guide

The model

An LLM predicts the next token in a sequence of tokens. That is all it does. Everything else — planning, tool use, “reasoning” — is emergent behaviour from next-token prediction. Pick the latest Claude model.

Context window

The maximum length of text the model can see at once. Every message, file, and tool result competes for space. Longer context means higher cost and worse performance.

Context rot

As a conversation grows, the context fills with old mistakes, abandoned approaches, and redundant information. The model treats all of it as relevant, so its responses get worse over time — not better. This is context rot. The fix is simple: start a new session. Carry over what matters in your first message, leave the rest behind.

Statelessness

The model has no memory between sessions. Every conversation starts from zero. Everything it knows comes from the context window — your messages, tool results, and system prompts. It does not “learn” your codebase over time. This is why system prompts and specs matter: they are the only way to carry knowledge forward.

Harness

The program that runs your model and manages interaction between the model and the user. Claude Code, Cursor, Windsurf — these are harnesses. The model is interchangeable; the harness defines your workflow.

Tools

Functions the model can call to interact with the outside world. Without tools, a model can only produce text. With tools, it can act.

  • read_file(path) — read a file
  • write_file(path, content) — create or overwrite a file
  • run_command(cmd) — execute a shell command
  • search(query) — search the codebase

The model predicts that the next tokens should be a tool call. The harness intercepts that, executes the function, and injects the result back into the context. That loop is the entirety of agentic behaviour.

[user]
What does the User model look like?
[assistant]
I'll check the User model.
<tool_call>
read_file
{"path": "src/models/user.ts"}
</tool_call>
[tool_result]
export interface User {
id: string;
email: string;
name: string;
}
[assistant]
The User model has 3 fields: id, email, and name.
There is no password hash — authentication is handled elsewhere.

The model produces the <tool_call> tokens, the harness strips them out, executes the function, injects the result, and the model continues predicting from there.

Modes

Preset configurations that control which tools the agent can access.

Ask — read-only. The model can read files and answer questions but cannot modify anything. Use when you need to understand code, not change it.

Agent — full access. The model can read, write, and execute commands. Use when the task is well-defined and the model knows exactly what to do.

Plan — read-only until a plan is approved, then switches to write mode. The right default for anything non-trivial. The plan becomes part of the context, making subsequent actions more coherent.

Prompting

The quality of your output is bounded by the quality of your input. Vague prompts get vague results. Good prompts are specific, constrained, and include examples when the expected output is not obvious.

Bad: “Make the code better.”
Good: “The auth middleware tests in src/middleware/__tests__/auth.test.ts are failing because the mock for jwt.verify does not match the new function signature. Update the mock to match.”

Context matters — “fix the tests” is a fine prompt if the model just broke them, because the context already has everything it needs. It is a bad prompt in a fresh session with no prior context. The rule is: the less the model already knows, the more you need to say.

References

Most harnesses let you attach context directly to your prompt — files, code selections, images, URLs. Use them. Pointing the model at the exact file is faster and more reliable than describing where something is.

You can reference a whole file, a highlighted selection, or paste in a screenshot of a design or error message. But beware: models do not see images the way we do. They interpret them as tokens, which means they can miss layout, spacing, and visual hierarchy that would be obvious to you. Use images as a supplement, not as your only input.

System prompts

Instructions that are injected at the start of every conversation. Most harnesses support this — Claude Code uses CLAUDE.md files, Cursor uses .cursorrules. This is where you put things the model should always know: project conventions, tech stack, file structure, testing patterns.

Think of it as onboarding a new developer. The system prompt is the README they read on day one. Without it, the model guesses. With it, the model follows your conventions from the first message.

Skills

Reusable instruction files that teach the agent how to do a specific task the way you want it done. “How we write API endpoints,” “how we structure tests,” “how we deploy.”

Why not just put everything in the system prompt? Context window. The system prompt is loaded into every conversation — the bigger it gets, the less room you have for actual work. Skills are loaded on demand. The agent pulls in only the skill it needs for the task at hand, keeping the context focused and the window free.

Commands

Predefined prompts you can invoke by name. Instead of typing the same instruction every time, you write it once and trigger it with a shortcut. “Run the linter and fix what it finds,” “write a test for this file,” “review this diff.” Commands are prompts that got promoted to a workflow. A command can reference a skill, so you get a one-keystroke action backed by a detailed playbook.

Approval flow

Most harnesses let you choose between confirming each action or letting the agent run autonomously. Auto-approve is faster but riskier — the model can delete files, run destructive commands, or go down a rabbit hole burning tokens on a wrong approach.

A sensible default: auto-approve read operations, confirm writes and shell commands. As you build trust with a specific task, open it up. If the model starts looping or producing nonsense, that is context rot — stop it, start fresh.

Checkpoints

Commit before you let the agent loose. If it makes a mess, git checkout . gets you back in seconds. If it does something surprisingly good, you have a clean diff to review. Treat agent sessions like spikes — cheap to try, cheap to throw away.

Spec Driven Development

Take your output from Plan mode and save it to a file that you can add to version control. Now you have a specification that any agent — or any developer — can pick up and implement. The spec survives context rot, works across sessions, and becomes the source of truth for what should be built. Review the spec, not the conversation.

The meta game

Your system prompts, skills, commands, and specs are all text that one AI session produces and another AI session consumes. Use the model to write the instructions that future sessions will follow. It is cheaper and faster than writing them by hand, and the model knows what format works best for itself.

Warnings

Model eagerness. The model wants to succeed. Not in a conscious way — it predicts tokens that look like progress, which means it is biased toward declaring victory. “10 passed, 1 failed” becomes “tests are passing” in the model’s summary. It will gloss over errors, skip edge cases, and tell you the task is done when it is not. In chat sessions, this gets worse — the model is biased toward your point of view and the way you frame the question. If you ask “should I use Redis here?” the model will find reasons to say yes. Read the actual output, not the model’s interpretation of it.

Don’t ship code you don’t understand. The agent writes it, you own it. If you cannot explain why something works, you cannot debug it when it breaks. And it will break.

Abandon bad approaches early. Context rot, wrong direction, model going in circles — recognise when a session is cooked and start fresh. It is cheap to try again. Cheaper than nursing a broken conversation for another 20 messages. Sunk cost fallacy kills more agent sessions than bad prompts.

Know when to take over. Sometimes the agent is burning tokens on something you could fix in 30 seconds. Grab the keyboard. The goal is shipping, not proving the agent can do everything.

Automated testing is no longer optional. When an agent writes your code, tests are how you verify it actually works. The good news is the agent can write those too. Get your test coverage up before you start handing implementation work to the model — it gives you a safety net and the agent a success signal.

Take care of yourself. Agentic coding makes everything feel fast and cheap. You will be tempted to take on more, ship more, automate more. Just because it looks easy does not mean it is — the cognitive load of reviewing, directing, and course-correcting an agent is real work. Burnout does not care that a model wrote the code for you.