Context Engineering Setup

Use this when an AI coding agent keeps reading the wrong files, missing conventions, or losing the plot across sessions in a real codebase. This is not prompt engineering, it is engineering the whole information environment the agent operates in: the repo layout it navigates, the instruction file it reads, the tools and MCP servers it can reach, what gets retrieved into the window, and what persists between sessions. If you only need a one-off setup for a single Claude project chat, use /claude-project-setup. If you are scoping one feature for an agent to build, use /spec-driven-feature.

Related skills: Scopes the work an agent will do with /spec-driven-feature. Stands up the project shell with /claude-project-setup. Picks the local toolchain with /vibe-coding-stack. Wires external tools and servers with /mcp-integration-plan.

The hard part most teams miss

The context window is the constraint, not the codebase. A team that treats "give the agent more context" as the goal is optimizing the wrong thing.

Context is curation, not accumulation. More context is not better, it is worse. Every file, doc, and instruction you load competes for a finite window and pushes the signal the agent actually needs further from its attention. The skill is deciding what to leave out. A lean environment the agent can fully use beats a complete one it has to wade through.
Feature-driven layout beats layer-driven, because the agent reads only the directory it touches. A layer-driven tree (controllers/, services/, models/) scatters one feature across five folders, so an agent working on billing must read the whole repo to find the four files that matter. A feature-driven tree (src/auth/, src/billing/) co-locates everything billing in one directory the agent can read in full and ignore the rest. Layout is a context decision.
The instruction file is a public good competing for the window. CLAUDE.md / AGENTS.md is loaded on every turn, so every line is rent paid out of the budget on every single request. The test for any line is "does the model already know this?" If yes, cut it. The file is for what is specific to this repo (commands, conventions, guardrails), never for what a competent model already knows about the language or framework.

Process

Step 1: Gather inputs

Ask the user:

What is the repo and what does it do? ({{repo_name}} and a one-line purpose. The domain, not the stack.)
What is the current layout? (Layer-driven, feature-driven, flat, a monorepo with packages. A tree -L 2 is ideal.)
What does the agent get wrong today? (Reads the wrong files, ignores conventions, breaks the build, forgets context between sessions. Concrete failures, not "it is unreliable.")
What conventions are non-obvious? (Test command, lint command, naming rules, the one library you must use, the thing that breaks if done wrong.)
What tools and external systems must it reach? (Test runner, a database, an API, a deploy step, anything behind an MCP server.)
How many sessions does a typical task span? (One sitting, or does work hand off across days. This decides how much memory and persistence matter.)

Step 2: Audit the current environment

Establish the baseline before changing anything:

Layout shape. Is it layer-driven or feature-driven? How many directories must an agent read to change one feature? If the answer is more than two, layout is a cost center.
Instruction file. Read the existing CLAUDE.md / AGENTS.md. Flag every line that fails the "model already knows this" test, and every convention that is missing.
Tool reach. What can the agent run today, and what does it have to ask a human to do? Each manual step is context the agent cannot act on.
Persistence. Is there any memory across sessions, or does each one start cold? Map where prior decisions live, if anywhere.

Step 3: Decide the repo layout

Default to feature-driven for agent-heavy work. Group by domain (src/auth/, src/billing/, src/reporting/) so an agent loads one directory and gets the whole vertical slice. The win is that it reads less and touches less.
Co-locate what changes together. Tests, types, and helpers for a feature live with the feature, not in a parallel tests/ mirror the agent has to cross-reference.
Name for discoverability. Directory and file names are the agent's primary retrieval signal. billing/refunds.ts is findable; utils/helpers2.ts is not. Boring, literal names beat clever ones.
Do not refactor the whole repo on day one. Note the target shape, and migrate feature by feature as work touches each area. A half-migrated tree with a clear direction beats a big-bang move that stalls.

Step 4: Write the instruction file

The instruction file (CLAUDE.md or AGENTS.md) earns its place line by line.

Include what is specific to this repo and changes how the agent acts:

Commands. Exact test, lint, build, and run commands. The single most valuable lines in the file.
Conventions. Naming, structure, the one library that must be used, the pattern to copy.
Guardrails. What never to touch, what always to gate, the irreversible action that needs a human.

Exclude what fails the test:

Anything a competent model already knows about the language, framework, or general practice.
Long prose explanations. The agent reads this on every turn; write it as terse rules, not an essay.
Aspirational standards nobody follows. The file describes the repo as it is, not as you wish it were.

Step 5: Select skills, tools, and MCP wiring

Tools the agent can run beat steps it must ask for. Every capability you wire in (test runner, formatter, DB query) is one less round-trip through a human and one less thing to explain in the instruction file.
Gate the irreversible. Reads run freely; writes, deploys, and external sends sit behind confirmation or an allowlist.
Wire external systems through MCP deliberately. Each connected server is reach and also context the agent must reason over. Wire only what a task actually needs, and scope the work with /mcp-integration-plan.
Prefer a few sharp tools over a broad pile. A large tool surface dilutes selection the same way a large context does.

Step 6: Plan retrieval, memory, and persistence

Retrieval is a budget, not a buffet. Decide what gets pulled into the window by default versus on demand. The default set should be the smallest that lets the agent start, not everything that might help.
Exclude noise explicitly. Lockfiles, build output, generated code, vendored deps: keep them out of the agent's retrieval path (an ignore file) so they never crowd out signal.
Persist decisions, not transcripts. For multi-session work, write durable decisions and state to a known location (a memory file, an ADR, a continuation prompt), not the raw chat. The next session reads the conclusion, not the conversation.
Make persistence discoverable. Name and place memory where the agent will actually look. Memory the agent does not find is wasted budget.

Step 7: Output the context-environment plan

# Context Environment: (repo_name)

**Purpose:** (one line)
**Tasks span:** (single session / multi-session)
**Today's top failure:** (the concrete thing the agent gets wrong)

## Repo layout
- Current shape: (layer-driven / feature-driven / flat / monorepo)
- Target shape: (usually feature-driven; name the directories)
- Migration order: (which features move first, or "already feature-driven")

## Instruction file (CLAUDE.md / AGENTS.md)
| Section | Keep / Add / Cut | Why |
|---|---|---|
| Commands | | |
| Conventions | | |
| Guardrails | | |
- Lines cut for "model already knows this": (list)

## Tools & MCP
| Capability | Source (local tool / MCP server) | Gated? | Needed by which task |
|---|---|---|---|

## Retrieval & persistence
- Default retrieval set: (smallest set to start a task)
- Explicitly excluded from retrieval: (lockfiles, build output, etc.)
- Memory location & format: (file path, what it holds)
- What persists across sessions: (decisions / state, not transcript)

## Open questions
- (unresolved decisions)

Step 8: Review