Skip to main content
Engineering/code-review

Code Review

You want structured review feedback tied to standards.

Use this when you have code changes (a PR, a diff, or modified files) and want a structured review covering test coverage, naming, error handling, security, and alignment with the story's acceptance criteria.

Process

Step 1: Gather inputs

Ask the user to provide:

  1. Code changes — a diff, PR link (if MCP available), file paths, or pasted code. Include both new and modified files.
  2. Story or acceptance criteria — the user story this code implements (optional but strongly recommended). Used to check that the code actually delivers what was specified.
  3. Test files — the tests covering these changes (if separate from the diff).
  4. Team conventions — any specific coding standards, linting rules, or conventions to check against (optional — the review will use general best practices if none provided).

Step 2: Review the changes

Evaluate the code across these dimensions:

Acceptance criteria alignment

  • Does the code implement what the story specifies?
  • Are all acceptance criteria covered, or are some missing from the implementation?
  • Does the code do things the story didn't ask for (scope creep)?

Test coverage

  • Are there tests for the changed code?
  • Do tests cover happy path, error paths, and edge cases?
  • Are tests testing behavior (not implementation details)?
  • Are there missing test cases?
  • Gherkin traceability: Is there a test for every Gherkin scenario in the story's AC? Any AC without a corresponding test?

Code quality

  • Naming — are variables, functions, and classes named clearly? Can you understand the code without comments?
  • Structure — is the code organized logically? Are functions doing one thing?
  • Duplication — is there copy-pasted logic that should be extracted?
  • Complexity — are there deeply nested conditionals, long functions, or hard-to-follow control flow?

Error handling

  • Are errors handled, not swallowed?
  • Do error messages help with debugging?
  • Are edge cases handled (null inputs, empty collections, missing data)?
  • Are external calls (APIs, database, file system) wrapped with appropriate error handling?

Architecture and system impact

  • Coupling — does this change introduce tight coupling between modules or services? Could these components change independently?
  • Data flow — does data flow in the expected direction? Are there new cross-service calls that should be async instead of sync?
  • Observability — are new code paths instrumented? Are errors logged with sufficient context for debugging in production?
  • Migration safety — if this changes a database schema or API contract, is it backward compatible? Is there a rollback path?
  • Dependency direction — do dependencies point in the right direction? (Domain logic should not depend on infrastructure details.)

Security

  • Input validation — is user input validated before use?
  • Injection risks — SQL injection, XSS, command injection?
  • Authentication/authorization — are access checks in place?
  • Data exposure — are sensitive fields (passwords, tokens, PII) protected?
  • Dependency concerns — any new dependencies with known vulnerabilities?

Style and consistency

  • Does the code follow the project's existing patterns?
  • Are imports organized consistently?
  • Is formatting consistent with the rest of the codebase?

Step 3: Generate the review

Output a structured review:


Code Review: (PR title or description)

Summary

(One sentence: Approve / Approve with suggestions / Request changes)

Must-fix issues

(Issues that should be resolved before merging)

  1. [file:line] (issue) — (why it matters) — (suggested fix)

Suggestions

(Improvements that would make the code better but aren't blockers)

  1. [file:line] (suggestion) — (rationale)

Acceptance criteria coverage

CriterionCovered?Notes
(AC from story)Yes/No/Partial(details)

Test coverage assessment

  • (What's well tested)
  • (What's missing)

Test quality

Gherkin scenarioTest exists?Test typeQuality
(AC scenario)Yes/NoUnit/Integration/E2eAdequate / Needs improvement / Missing

Related skills: Use /test-coverage-audit for a full AC-to-test mapping across the codebase.

Architecture impact

  • (Assessment of coupling, data flow, observability additions)
  • (Any concerns about system-level implications of this change)

What's good

  • (Specific things done well — call out clean patterns, good naming, thorough error handling)

Step 4: Discuss

Ask the user:

  • Any issues you'd like me to explain further?
  • Want me to write fixes for the must-fix issues?
  • Want me to write the missing tests?

Step 5: Team-level review standards (for engineering leads)

If the user is setting up or improving code review culture for a team, generate team review standards:

## Code Review Standards -- {{team name}}

### Review SLOs
| Change type | Target turnaround | Escalation if missed |
|------------|------------------|---------------------|
| Bug fix (< 50 lines) | 4 business hours | Author pings reviewer directly |
| Feature (50-300 lines) | 1 business day | Raise in standup |
| Large change (300+ lines) | 2 business days | Split the PR or schedule synchronous review |
| Security-sensitive | 4 business hours | Ping security champion + lead |

### Review depth by change type
| Change type | Expected depth | Focus areas |
|------------|---------------|-------------|
| Bug fix | Targeted -- verify the fix, check for regression risk | Root cause addressed (not just symptom), test coverage for the bug |
| New feature | Thorough -- full review of all dimensions | AC alignment, test coverage, error handling, security |
| Refactor | Structure-focused -- verify behavior preservation | Tests still pass, no behavior changes, improved readability |
| Dependency update | Risk-focused -- check for breaking changes | Changelog review, breaking API changes, security advisories |
| Config/infra change | Impact-focused -- blast radius assessment | What environments affected, rollback path, monitoring |

### Reviewer assignment
- **Round-robin** for general changes -- spread knowledge, avoid bottlenecks
- **Domain expert** for security, performance, or architecture changes
- **Two reviewers** for changes touching shared libraries, APIs, or data models
- **Pair review** (synchronous) for changes > 500 lines or cross-cutting concerns

### Review health metrics
Track quarterly:
- **Review cycle time:** Median time from PR opened to approved (target: < 1 business day)
- **Review throughness:** % of PRs with at least one substantive comment (not just "LGTM")
- **Knowledge distribution:** Number of unique reviewers per codebase area (avoid single-reviewer silos)
- **PR size distribution:** Track % of PRs > 300 lines (goal: shrink over time)

Ask the user:

  • Does the team have existing review norms this should build on?
  • Any pain points with current review practices (bottlenecks, rubber-stamping, nitpick wars)?
  • Want me to draft a team-specific version based on your current workflow?

Output location

Present the review as formatted text in the conversation.

Example Output

Input

  • Code changes: Diff for a new /api/v1/payments/refund endpoint in a FastAPI service — adds a POST route, a RefundService class, and a Pydantic request model
  • Story/AC: "As a support agent, I can issue a full or partial refund for a completed order so that customer disputes are resolved without manual DB edits. AC: (1) Given a valid order ID and refund amount ≤ original charge, the refund is processed and a refund.created event is emitted. (2) Given an amount exceeding the original charge, a 422 is returned with a clear error message. (3) Given an already-refunded order, a 409 is returned. (4) All refund attempts are logged with agent ID, order ID, amount, and outcome."
  • Test files: tests/test_refund_endpoint.py — 3 test functions covering the happy path and the over-amount case
  • Team conventions: PEP 8, no bare except, all external calls must use the internal http_client wrapper, secrets via settings object only

Output (abbreviated)

Code Review: Add POST /api/v1/payments/refund

Summary

Request changes — two must-fix security and error-handling issues before merging; AC coverage is incomplete, and one test scenario is missing.


Must-fix issues

  1. services/refund_service.py:34 Bare except clause swallows all exceptions from the Stripe API call — violates team convention and hides failures silently. Wrap in except StripeError as e and re-raise as a structured PaymentGatewayError with the original message and request ID attached.

  2. routers/refund.py:18 refund_amount is read directly from the request body and passed to Stripe without validating it is a positive number. A value of 0 or -50.00 will reach the external API. Add a Pydantic @validator enforcing amount > 0.

  3. services/refund_service.py:61 Stripe API key is accessed via os.environ["STRIPE_SECRET_KEY"] — violates the team convention requiring secrets through the settings object. Change to settings.stripe_secret_key.


Suggestions

  1. services/refund_service.py:45–72 process_refund() is doing three things: validating business rules, calling Stripe, and emitting the event. Extract into _validate_refund_eligibility() and _emit_refund_event() — easier to test and reason about.

  2. routers/refund.py:12 agent in the request model would be clearer as agent_id: UUID — matches the naming used everywhere else in the codebase and enforces the type at the boundary.

  3. services/refund_service.py:58 Consider returning the Stripe refund_id in the 200 response body so callers can correlate with Stripe's dashboard without a separate lookup.


Acceptance criteria coverage

CriterionCovered?Notes
Valid refund processed + refund.created event emittedPartialRefund is processed; event emission is implemented but not verified in tests
Amount > original charge → 422 with clear messageYesValidated in model and service; error message is descriptive
Already-refunded order → 409NoNo check against existing refund status in process_refund(); no test
All attempts logged with agent ID, order ID, amount, outcomeNoLogging call on line 67 omits agent_id and doesn't log failure outcomes

Test coverage assessment

  • Well tested: Happy path (201 response, correct payload shape) and over-amount validation (422) are covered with clear assertions
  • Missing: No test for the duplicate-refund 409 case (AC #3); no test for Stripe API failure (500 vs. 502 distinction); no test verifying the refund.created event is actually emitted; edge case of amount = 0 not covered

Test quality

Gherkin scenarioTest exists?Test typeQuality
Valid full/partial refund processedYesUnit (mocked Stripe)Adequate
Amount exceeds original charge → 422YesUnitAdequate
Already-refunded order → 409NoMissing
All attempts logged with required fieldsNoMissing

Related skills: Use /test-coverage-audit for a full AC-to-test mapping across the payments service.


Architecture impact

  • Coupling: RefundService instantiates its own stripe.Client internally — makes the dependency hard to swap for testing and couples the service layer to the SDK. Inject the client via the constructor or use the existing http_client wrapper per team convention.
  • Observability: No structured log on the success path includes the Stripe refund_id, meaning production incidents can't be correlated to Stripe events without manual lookup. Add it to the success log line.
  • Migration safety: No schema changes — low risk. The refund.created event is a new event type; confirm the event bus consumer team is aware before merging to avoid silent drops.

What's good

  • The Pydantic request model is clean and uses Decimal for amount rather than float — correct choice for currency
  • 409 vs. 422 status code distinction is semantically accurate (conflict vs. validation error)
  • The router stays thin — business logic is correctly pushed into RefundService, not the route handler
  • Dependency on settings is used correctly everywhere except the one flagged violation

Want me to:

  • Write fixes for the three must-fix issues?
  • Generate the two missing test cases (409 duplicate refund + event emission assertion)?
  • Explain the event bus concern in more detail?