Use this when a prototype proved the idea and now has to survive production: real users, real data, real failure modes, and someone other than the author maintaining it. The prototype answered "should we build this." This skill answers "how do we build it for real" by writing the spec you skipped, drawing the line between load-bearing code and throwaway scaffolding, and deciding what to keep, rewrite, and delete. If you are still exploring whether the idea works at all, do not run this yet, keep vibing. Run this only when the demo lands and someone says "ship it."
Related skills: Writes the missing spec with
/spec-driven-feature. Sets up the agent context the prototype lacked via/context-engineering-setup. Reviews the hardened code with/code-review. Adds the tests it never had through/tdd-implement.
The hard part most teams miss
Vibe code is fast to write and expensive to own. The speed that made the prototype worth building is the same speed that makes it dangerous to ship.
- Vibe code becomes a liability the moment it ships, because nobody understands it, including the author. Code generated in a flow state has no design rationale anyone can recover. The first production bug lands on a system where "why is it like this" has no answer. Understanding has to be rebuilt before the code can be trusted, and that work does not get cheaper by waiting.
- The prototype proved the what; you still owe the how. A demo establishes that the feature is worth having. It says nothing about correctness under load, data integrity, auth, error paths, or the next engineer's ability to change it safely. Skipping the how is exactly how prototypes calcify into production systems no one can maintain.
- The dangerous move is shipping the prototype as-is because it "works." It works in the demo, on the happy path, with the author driving and clean inputs. Demos are designed to hide the failure modes production exposes: concurrency, hostile input, partial failure, scale. "It works" is a statement about the demo, not the system.
Process
Step 1: Gather inputs
Ask the user:
- What does the prototype do, and what did it prove? ({{what_it_proves}} -- the validated bet, in one or two sentences.)
- Where does the code live, and can I read it? ({{repo_or_path}} -- you cannot harden what you cannot see.)
- What does production mean here? ({{production_definition}} -- internal tool, paying customers, regulated data? This sets the bar.)
- What is the worst thing that happens if it breaks in production? (Wrong number on a dashboard, or leaked customer data. This sets where you spend.)
- Who maintains it after this, and do they understand it today? (If the answer is "nobody understands it," that is the first problem to solve.)
- What is the deadline, and is it real? (A hard date changes what you keep versus rewrite. Name the tradeoff out loud.)
Step 2: Map the code into three buckets
Read the prototype and sort every meaningful piece into one of three buckets. Be explicit; ambiguity here is how scaffolding ships as if it were load-bearing.
- Keep: correct, understood, and worth the maintenance cost as-is. Rare in vibe code. Earn this label, do not assume it.
- Rewrite: the behavior is right but the implementation is not trustworthy, untyped, untested, tangled, or unexplainable. The spec from Step 3 defines what the rewrite must satisfy.
- Delete: dead code, abandoned experiments, AI slop (over-abstracted helpers, defensive code for cases that cannot happen, comments that restate the code), and anything the proven bet does not depend on.
For each piece, also tag whether it is load-bearing (production correctness depends on it) or scaffolding (it got you to the demo and can be thrown away). Load-bearing code earns real engineering. Scaffolding earns deletion.
Step 3: Write the spec you skipped
The prototype is the spec now, but only as observed behavior, not intent. Reconstruct intent explicitly. Hand off to /spec-driven-feature for the full treatment, or capture the essentials inline: what the feature must do, the inputs and their valid ranges, the outputs, the invariants that must always hold, and the failure behavior. This is the contract the rewrite is measured against. Without it, "rewrite" means "rewrite into different code nobody understands either."
Step 4: Plan the hardening passes
Decide which passes the code needs and in what order. Not every prototype needs all of them; the production bar from Step 1 decides the depth.
- Types and contracts: add types at the boundaries first, then inward. Types are the cheapest documentation and the fastest bug finder.
- Tests: characterize current behavior, then test the spec from Step 3. Use
/tdd-implementfor anything you rewrite. Untested load-bearing code is not shippable. - Security pass: validate and sanitize all input, check authz and authn on every path, remove hardcoded secrets, audit dependencies. Vibe code routinely trusts input and skips auth.
- Error handling: replace happy-path-only code with explicit handling of the failure modes the demo hid. Decide retry, fail-closed, or degrade per path.
- Observability: add logging, metrics, and alerting so production failures are visible, not silent. Set up agent and runtime context with
/context-engineering-setupif relevant. - Slop and dead-code strip: delete everything tagged Delete in Step 2. Less code is less liability.
Step 5: Output the hardening plan
# Hardening Plan: (prototype name)
**Proven bet:** (what the prototype validated)
**Production bar:** (internal / customer-facing / regulated)
**Blast radius if it breaks:** (worst realistic outcome)
**Maintainer + current understanding:** (who, and do they get it today)
## Keep / Rewrite / Delete
| Component | Bucket | Load-bearing? | Rationale |
|---|---|---|---|
## Spec gaps to close
- Invariants that must hold: (list)
- Input ranges / validation: (list)
- Failure behavior per path: (list)
## Hardening passes (in order)
| Pass | Scope | Why it matters here | Done means |
|---|---|---|---|
## Cut for now (deferred, with reason)
- (what you are deliberately not doing, and the risk you accept)
## Open questions
- (unresolved decisions)
Step 6: Review
Ask the user:
- What is still in the Keep bucket only because reading it was hard? (Suspect anything understood by assumption.)
- Which load-bearing path has no test, and why is that acceptable to ship?
- What input does this code still trust that an attacker or a bad upstream could send?
- When it fails in production at 2am, what does the on-call person see, and is that enough?
- What did you cut for now, and who agreed to accept that risk?
Anti-patterns
| Anti-pattern | Why it fails | Do instead |
|---|---|---|
| Ship the prototype as-is because it works | The demo hid the failure modes; production finds them first, in front of users | Harden to the spec before production, every load-bearing path |
| Rewrite everything from scratch | Throws away the validated behavior and the deadline with it | Bucket into keep/rewrite/delete; rewrite only what is load-bearing and untrustworthy |
| Add tests after, around whatever the code already does | Locks in the prototype's accidental behavior as the contract | Write the spec first, then test intent, not the current implementation |
| Keep the AI slop because deleting feels risky | Dead code and defensive cruft for impossible cases is pure maintenance tax | Delete aggressively; less code is less liability |
| Treat "it ran in the demo" as "it is correct" | Happy path with clean input is not correctness under load or hostile input | Define invariants and failure behavior, then verify them |
| Defer the security pass to a later sprint | Untrusted input and missing auth are exploitable the day you ship, not later | Validate input and check auth on every path before production |
Output location
Present the hardening plan as formatted text in the conversation for the user to paste into their design doc or ticket.