Story Review - AI Agent Skill

Use this when you have a draft user story and want to check it against INVEST criteria and testable-acceptance-criteria standards before it enters an iteration. Catches vague criteria, missing error states, untestable conditions, and scope issues.

Process

Step 1: Gather the story

Ask the user to provide the story to review. Accept any of:

A full story (title, description, acceptance criteria)
A story pasted from a tracker (Jira, Linear, Shortcut)
A link to a story (if MCP is available)

If the story is incomplete (missing sections), note which sections are missing but still review what's provided.

Step 2: INVEST criteria check

Evaluate the story against each INVEST criterion. For each, give a pass/fail/warning:

Criterion	Check
Independent	Can this story be built and delivered without waiting on other unfinished work?
Negotiable	Is the story written at a level where the team can discuss alternatives, or is it over-specified with implementation details?
Valuable	Is there clear user or business value stated? Does the "so that" clause articulate real value?
Estimable	Is there enough detail for the team to estimate effort? Are there unresolved unknowns that block estimation?
Small	Can this reasonably be completed in 1-3 days? If not, flag it as a split candidate.
Testable	Can every acceptance criterion be verified through observation or automated tests?

Step 3: Acceptance criteria deep review

For each acceptance criterion, check:

Gherkin structure -- Does it follow Given/When/Then format? Are Given conditions specific enough to set up the test?
Testability -- Can this be verified by observing the system? Flag criteria that describe internal state ("database is updated") instead of observable outcomes.
Specificity -- Are values concrete? Flag vague terms: "appropriate message," "reasonable time," "relevant data," "properly formatted."
Happy path coverage -- Does at least one criterion cover the primary success scenario?
Error path coverage -- Are failure cases covered? (invalid input, unauthorized access, network failure, empty states)
Edge cases -- Are boundary conditions addressed? (max lengths, zero items, concurrent actions, first-time vs. returning user)
Consistency -- Do the criteria use consistent terminology? Flag when the same concept has different names.
Test design technique coverage -- Apply a quick check from practices/qa/test-design-techniques.md:
- Boundary values: For fields with constraints, are both sides of the boundary tested? (e.g., min and min-1, max and max+1)
- Equivalence partitions: Are different categories of valid and invalid input each represented? (Not 5 tests for invalid emails, but one from each invalid category)
- E2e testability: Could each scenario generate a failing Playwright or Cypress test? If the Then clause isn't specific enough for an e2e assertion, flag it.

Step 3.5: Cross-functional lens check

Check the story from Design and Engineering perspectives:

Design lens:

Are all UI states covered in acceptance criteria? Check for: happy path, empty state, loading state, error state, partial data state.
Are interaction patterns specified? (What happens on click, hover, keyboard navigation?)
Is accessibility addressed? (Screen reader behavior, keyboard-only flow, color contrast requirements)
If the story involves UI, are there design references (Figma links, wireframes)?

Engineering lens:

Is this technically feasible as scoped? Flag anything that seems like it would require significant architectural changes.
Are there hidden dependencies? (APIs that don't exist yet, data migrations, third-party integrations)
Can the acceptance criteria be automated as tests? If not, flag which ones need rewriting.

Report findings as:

✅ Covered
⚠️ Partially covered -- (what's missing)
❌ Not addressed -- (what needs to be added)

Step 4: Structural review

Check for:

Missing sections -- Title, description (As a / I want / So that), acceptance criteria, technical details, out of scope
Scope creep signals -- Multiple user personas in one story, "and also" clauses, acceptance criteria that belong to a different story
Design references -- Are wireframes or mockups referenced if the story involves UI changes?
Out of scope clarity -- Is it clear what this story does NOT include?

Step 5: Generate the review

Output a structured review with three sections:

Story Review: (story title)

Summary verdict

(One sentence: Ready for iteration / Needs revision / Needs splitting)

INVEST scorecard

Criterion	Status	Notes
Independent	Pass/Fail/Warning	(brief explanation)
Negotiable	Pass/Fail/Warning	(brief explanation)
Valuable	Pass/Fail/Warning	(brief explanation)
Estimable	Pass/Fail/Warning	(brief explanation)
Small	Pass/Fail/Warning	(brief explanation)
Testable	Pass/Fail/Warning	(brief explanation)

Acceptance criteria issues

(For each issue found, list:)

AC #(number): (the issue) -- (suggested fix)

Test design quality

Check	Status	Notes
Boundary values specified	Pass/Fail/Warning	(Are min/max tested on both sides?)
Equivalence partitions covered	Pass/Fail/Warning	(One test per input category, not redundant tests?)
E2e testable	Pass/Fail/Warning	(Could each scenario generate a failing e2e test?)

If test design quality is weak, recommend: "Run /test-case-design to systematically expand boundaries and partitions, or /edge-case-hunt for persona-driven edge cases."

Missing coverage

(List any missing error paths, edge cases, or scenarios that should be covered)

Recommended changes

(Specific, actionable change)
(Specific, actionable change)
...

Step 6: Offer to fix

Ask the user:

Want me to rewrite the story with these fixes applied?
Want me to split this story if it's too large?
Any issues you disagree with?

Remember: this draft multiplies your output, not your judgment. You own the final narrative.

Related skills

/story-write -- write the story this skill reviews
/test-case-design -- expand AC with BVA, equivalence partitioning, and decision tables
/edge-case-hunt -- find persona-driven edge cases
/e2e-test-gen -- generate failing e2e tests from the Gherkin AC

Output location

Present the review as formatted text in the conversation. If the user asks, rewrite the story with fixes applied.

Example Output

Input

Story title: Export Transactions to CSV
Team: Rho Finance, payments product squad
Story as written:

As a finance admin, I want to export my transaction history to CSV so that I can share it with my accountant.

Acceptance Criteria:

Given the user is on the Transactions page, when they click Export, then a CSV file downloads.

The CSV should contain the appropriate columns.

Large exports should complete in a reasonable time.

Given the user selects a date range, when they click Export, then only transactions in that range are included.

Story source: Pasted from Linear, no Figma link attached
Iteration target: Sprint starting Monday

Output (abbreviated)

Story Review: Export Transactions to CSV

Summary verdict

Needs revision. The core flow is clear and the scope is reasonable, but two acceptance criteria are untestable as written, error states are entirely absent, and column content is unspecified — blocking both implementation and QA.

INVEST scorecard

Criterion	Status	Notes
Independent	Pass	No apparent dependency on unfinished upstream work
Negotiable	Warning	AC #1 specifies a click trigger — acceptable, but leaves no room to discuss keyboard shortcut or bulk-action menu alternatives
Valuable	Pass	"So that" clause names a real workflow; accountant hand-off is a concrete outcome
Estimable	Warning	"Appropriate columns" is unresolved — engineers can't size the mapping work without a column spec
Small	Pass	Scoped to a single export action; completable in 2–3 days once columns are defined
Testable	Fail	AC #2 and AC #3 cannot be verified as written; see below

Acceptance criteria issues

AC #2: "Appropriate columns" is vague — a test cannot pass or fail without a defined list. Fix: Replace with an explicit column spec, e.g., "Then the CSV includes columns: Date, Merchant, Amount (USD), Category, Status, Transaction ID."
AC #3: "Large exports" and "reasonable time" are untestable. What row count? What SLA? Fix: "Given an export of 10,000 transactions, when the user clicks Export, then the file downloads within 8 seconds."
AC #1: The Then clause says "a CSV file downloads" but doesn't specify filename format. Minor, but an e2e assertion needs something to check. Fix: Add "…with filename transactions_YYYY-MM-DD.csv."
AC #4: No criterion covers what happens when the selected date range contains zero transactions. Fix: Add an empty-state AC: "Given no transactions exist in the selected range, when they click Export, then a toast displays 'No transactions found for this date range' and no file downloads."

Test design quality

Check	Status	Notes
Boundary values specified	Fail	Date range boundaries not tested: what happens with a single-day range? A range where end < start? A future date?
Equivalence partitions covered	Warning	Only one valid export scenario covered; no partition for "user has no transactions at all" or "user has transactions but none match the filter"
E2e testable	Warning	AC #1 and AC #4 could generate Playwright assertions today; AC #2 and AC #3 cannot until values are made concrete

Recommendation: Run /test-case-design to systematically define boundary values for the date range filter and row-count thresholds.

Missing coverage

Error states: What happens if the export fails mid-download (server timeout, S3 error)? No AC covers this.
Authorization: Can a non-admin role (e.g., Viewer) reach the Export button? Should it be hidden or disabled?
Large file / browser behavior: Files above ~50 MB may trigger browser download warnings — is that in scope or explicitly out?
Empty state (no date filter applied): What is exported when no range is selected — all time, or last 30 days?
Loading state: Is there a spinner or progress indicator while the file generates? No UI state defined for exports that take > 1 second.

Recommended changes

Define the column spec explicitly in AC #2 — align with the accountant's actual import format (QuickBooks, Xero, etc.) if known.
Add a numeric SLA to AC #3 — agree on a row-count ceiling and time threshold with engineering before the sprint starts.
Add an error-state AC for export failure: toast message, no partial file download.
Add an empty-state AC for filtered ranges with zero results.
Clarify default export scope when no date range is selected — add to the description or an explicit out-of-scope note.
Add a Figma reference for the Export button placement, loading indicator, and toast design — design lens shows all three states are currently uncovered.

Want me to rewrite the story with these fixes applied? Or split off the permission/authorization behavior into a separate story?

Run this now

Try /story-review on your own input

0/4000

Part of these Playbook topics

Iteration Planning Story Writing

Related Product Management skills

2x2 Prioritize A/B Test Planner Activation Optimization AI Prototype Guide Analytics Dashboard Design Audience Segmentation Backlog Craft Backlog Refine

Back to Skills Catalog