Debug Assist

Use this when you're stuck on a bug and want a systematic approach instead of random guessing. Works for runtime errors, unexpected behavior, test failures, and "it works on my machine" problems.

Process

Step 1: Capture the problem

Ask the engineer:

What did you expect to happen?
What actually happened? (Error message, wrong output, crash, hang -- be specific.)
When did it start? (After a deploy, a dependency update, a code change, or "it's always been like this.")
Can you reproduce it reliably? (Always, sometimes, only in production, only on Tuesdays.)
What have you already tried?

Step 2: Narrow the problem space

Based on the answers, classify the bug:

Category	Signals	First moves
Crash / exception	Stack trace, error message	Read the stack trace bottom-up. Identify the throwing line. Check inputs to that function.
Wrong output	Code runs but produces incorrect results	Identify the last point where data is correct. Trace forward to where it diverges.
Performance	Slow, hanging, timeouts	Check for N+1 queries, unbounded loops, missing indexes, blocking I/O.
Intermittent	Flaky, race condition, timing-dependent	Look for shared mutable state, missing locks, order-dependent operations, time-sensitive logic.
Environment	Works locally, fails in CI/staging/prod	Compare env vars, dependency versions, OS differences, network access, file paths.
Agent / LLM silent failure	Pipeline completes, no exception, but output is wrong: bad retrieval chunk, wrong tool pick, dropped summary, low eval score	Nothing throws, so a stack trace won't help. Switch to Step 4b: trajectory-level RCA across the agent's steps.

Step 3: Form hypotheses

Generate 2-4 hypotheses ranked by likelihood. For each:

Hypothesis: One sentence -- what might be causing this.
Evidence for: What supports this theory.
Evidence against: What doesn't fit.
Test: The fastest way to confirm or eliminate this hypothesis (a log statement, a test case, a config change -- not a rewrite).

Present hypotheses and ask the engineer which to investigate first, or recommend the fastest-to-test option.

Step 4: Investigate

For the chosen hypothesis:

Read the relevant code.
Suggest the minimal investigation step (add a log, write a failing test, check a config value, inspect state at a breakpoint).
If the hypothesis is confirmed -- propose the fix.
If eliminated -- move to the next hypothesis.
If investigation reveals new information -- update hypotheses and re-rank.

Repeat until the root cause is identified.

Step 4b: Agentic RCA mode (for silent agent/LLM failures)

Use this when the bug is a non-throwing failure in an AI agent, RAG pipeline, or multi-step LLM chain. By 2026 these became their own debugging discipline, separate from APM and single-call LLM logging, because the failure is silent: the pipeline runs to completion and produces wrong output that only surfaces as a low evaluator score days later. A bottom-up stack trace has nothing to read. The work is localizing which step in the trajectory broke.

Get the trajectory trace, not just the final output. Pull per-span tracing for the whole run (LangSmith, Langfuse, Galileo, Arize Phoenix, or whatever the team already emits). One row per step: the retrieval call, each tool selection, every model turn, the final synthesis. If the agent isn't instrumented for span-level tracing yet, that gap is the first finding, and recommend /llm-observability-plan to fix it before chasing the bug further.
Score each step, don't eyeball it. Run per-step evaluators so "which step broke" becomes a query instead of a read-through: Groundedness on the retrieval (did the chunk actually support the claim), ToolSelectionAccuracy on each tool pick, TaskCompletion on the final answer. The lowest-scoring step is your prime suspect.
Localize, then form hypotheses as usual. Once a step is implicated (bad chunk retrieved, wrong tool called, context dropped between turns), return to Step 3 and treat that step as the bug surface. The hypothesis-and-cheapest-test discipline still applies; you've just narrowed a 5-step pipeline down to the one span that matters.
Verify the fix with an eval, not a single re-run. Agent failures are often probabilistic. Confirm the fix against a small held-out set of inputs and a continuous in-production eval, not one lucky green run. This is the executable-guardrail idea from Step 5, moved into the agent's trajectory.

Step 5: Fix and verify

Once the root cause is found:

Write a failing test that reproduces the bug (if one doesn't exist).
Propose the fix.
Run the test -- confirm it passes.
Run the full test suite -- confirm nothing else broke.
Briefly explain the root cause so the team can learn from it.

Step 5b: Escalation assessment

Before closing the debugging session, evaluate whether this bug signals something larger.

Incident indicators -- escalate if any are true:

User-facing impact affecting more than one user or session
Data integrity compromise (wrong data written, data loss, corruption)
Security implications (unauthorized access, data exposure)
SLA breach or error budget burn
The fix requires coordination across multiple teams or services
The same root cause could affect other features or services

If escalation is warranted, recommend who to notify, what communication is needed (status page, Slack, stakeholder update), and whether to trigger a formal postmortem using /incident-postmortem.

Pattern recognition -- check for systemic signals:

Has this category of bug occurred before? (Check incident history, bug tracker.)
Is the root cause in a shared component, library, or infrastructure layer?
Would this bug have been caught by existing tests if they existed?
Does the fix address the symptom or the underlying cause?

If a pattern emerges, recommend whether this warrants a /tech-debt-assessment for the affected area or a /dx-assessment if the pattern points to tooling or workflow friction.

Output location

Present hypotheses, investigation results, and the fix in the conversation. Apply code changes directly to the codebase when approved.

Example Output

Input

Engineer's report: "Our checkout API is returning 500 errors for roughly 30% of requests in production. No errors locally or in staging."
Error message: NullPointerException: Cannot read property 'taxRate' of undefined in PricingService.calculateTotal() at line 84
When it started: After Friday's deploy of pricing-service v2.4.1, which added multi-currency support
Reproducibility: Intermittent in production only — seems to happen more on orders from international customers
Already tried: Rolled back a feature flag for the new currency selector UI, but errors continued

Output (abbreviated)

Debug Session: Checkout 500 Errors — `taxRate` NullPointerException

Step 1: Problem Summary

Field	Detail
Expected	`calculateTotal()` returns a valid price object for all checkout requests
Actual	`NullPointerException` on `taxRate` — crashing ~30% of prod requests
Started	After deploy of `pricing-service` v2.4.1 on Friday
Reproducibility	Intermittent, production only, skews toward international orders
Tried	Feature flag rollback — no effect

Bug category: Environment + Wrong Output (data-dependent crash) The flag rollback ruling out the UI layer is a strong signal. The fault likely lives in the new pricing-service backend logic or its data dependencies.

Step 2: Hypotheses

Hypothesis 1 — Missing tax region data for non-US locales (Most likely)

What: v2.4.1 introduced currency lookups but may not have backfilled taxRate for all regions in the tax_regions table or config map.
Evidence for: Skews international; NullPointerException specifically on taxRate; feature was added in this version.
Evidence against: Staging has no errors — but staging may only have US test fixtures.
Test: Run SELECT region_code, tax_rate FROM tax_regions WHERE tax_rate IS NULL; against the prod database. Also check if the code has a null guard before accessing taxRate.

Hypothesis 2 — Race condition in currency resolution cache

What: The new multi-currency layer caches exchange rates; a cache miss returns undefined instead of a fallback, and calculateTotal() doesn't guard against it.
Evidence for: Intermittent failure rate (not 100%) fits a cache miss pattern.
Evidence against: Would expect a different error path — cache misses usually return a typed empty object, not undefined.
Test: Add a log line immediately before line 84: console.log('regionConfig:', regionConfig) — confirm whether the entire object is undefined or just the taxRate property.

Hypothesis 3 — Environment config missing in prod

What: A new TAX_CONFIG_PATH or similar env var introduced in v2.4.1 was set in staging but not added to the prod deployment manifest.
Evidence for: Classic "works in staging, fails in prod" pattern; env vars are a common deployment gap.
Evidence against: Would expect 100% failure rate, not 30% — unless config controls a subset of code paths.
Test: printenv | grep TAX on a prod container vs. staging. Check the v2.4.1 release notes and deployment checklist for new required vars.

Hypothesis 4 — Null not handled for orders with no shipping address country

What: Guest checkout or saved addresses without a country field causes the region lookup to return undefined.
Evidence for: ~30% could correlate with guest checkout rate.
Evidence against: International framing suggests country is present, just unsupported.
Test: Filter 500 error logs by user_type: guest — check if error rate is disproportionate.

Step 3: Recommended First Move

Start with Hypothesis 1 — fastest, highest-signal, zero-risk.

-- Run against prod (read-only)
SELECT region_code, tax_rate
FROM tax_regions
WHERE tax_rate IS NULL OR tax_rate = 0;

Simultaneously, inspect line 84 of PricingService.calculateTotal():

// Current (suspected)
const total = subtotal * (1 + regionConfig.taxRate);

// Is there a null guard? If not, that's a latent bug regardless of root cause.
const total = subtotal * (1 + (regionConfig?.taxRate ?? 0));

Step 4: If Hypothesis 1 Confirmed — Proposed Fix

Immediate (stop the bleeding):

// PricingService.js, line 84
const taxRate = regionConfig?.taxRate ?? DEFAULT_TAX_RATE;
const total = subtotal * (1 + taxRate);

Backfill missing data:

INSERT INTO tax_regions (region_code, tax_rate, currency)
VALUES ('GB', 0.20, 'GBP'), ('DE', 0.19, 'EUR'), ('FR', 0.20, 'EUR')
-- ... remaining missing regions from audit query
ON CONFLICT (region_code) DO UPDATE SET tax_rate = EXCLUDED.tax_rate;

Regression test:

it('should not throw when taxRate is missing for a region', () => {
  const mockRegionConfig = { currency: 'GBP' }; // no taxRate
  expect(() => calculateTotal(100, mockRegionConfig)).not.toThrow();
  expect(calculateTotal(100, mockRegionConfig)).toBe(100); // fallback to 0% tax
});

Step 5b: Escalation Assessment

⚠️ Escalation warranted.

Indicator	Status
User-facing impact (>1 user)	✅ Yes — ~30% of checkout attempts failing
Data integrity risk	✅ Monitor — orders that did complete may have incorrect tax applied
SLA / revenue impact	✅ Yes — checkout failures directly block revenue
Multi-service coordination needed	Possibly — if `tax_regions` is shared with `invoice-service`

Recommended actions:

Notify: #incidents Slack, eng lead, and head of payments immediately
Status page: Post degraded checkout experience for international users
Check invoice-service — if it reads the same tax_regions table, invoices generated since Friday may have incorrect tax calculations (data integrity issue)
Postmortem: Run /incident-postmortem — this is the second pricing-related regression in 90 days

Pattern flag: This bug would have been caught by integration tests using non-US fixtures. Recommend a /dx-assessment for the pricing-service test strategy — staging data coverage appears systematically US-only.

Run this now

Try /debug-assist on your own input

0/4000

Related Engineering skills

ADR Generate AI Testing Strategy Architecture Context Reviewer Architecture Discovery Boris Model Build vs Buy Code Review Codependency Analyzer

Back to Skills Catalog