Use this when the team feels stuck, velocity is declining, or something is off with delivery and you need to systematically identify root causes and recommend targeted interventions.
Related skills: For systems-level root cause analysis using feedback loops and archetypes, see
/systems-diagnosis.
Process
Step 1: Gather inputs
Ask the user to provide:
- Iteration metrics — for the last 4-6 iterations: stories committed, completed, carry-overs, and any disruptions.
- Symptoms — what feels wrong? (e.g., "we keep carrying over stories," "demos feel thin," "team morale is low," "stakeholders are frustrated")
- Recent retro themes — what has the team been raising in retrospectives?
- Team composition — who's on the team and any recent changes (people joining, leaving, changing roles)?
Step 2: Form hypotheses
Based on the data and symptoms, map to potential root causes from the failure modes catalog:
| Hypothesis | What to look for |
|---|---|
| Scope creep | Stories growing during iteration, acceptance criteria expanding mid-build, "just one more thing" pattern |
| Story readiness | Stories entering iteration without clear acceptance criteria, designs, or unresolved questions |
| Estimation accuracy | Consistently under-estimating story size, no estimation happening at all |
| External blockers | Waiting on client decisions, environment access, third-party APIs |
| Team misalignment | PM, design, and engineering working toward different goals |
| Stakeholder absence | No feedback loop, decisions delayed, surprise rejections in demo |
| Technical debt | Increasing time spent on bugs, slow CI/CD, fragile infrastructure |
| CI/CD friction | Build times > 15 min, flaky tests blocking merges, manual deploy steps, no rollback automation, environment drift between staging and prod |
| Architecture bottleneck | One service in every critical path, database is the scaling constraint, monolith coupling prevents independent deploys, shared libraries force coordinated releases |
| Knowledge concentration | One person owns a critical system (bus factor = 1), PRs wait for specific reviewers, on-call rotation is unbalanced, new team members ramp slowly |
| Pairing issues | Knowledge silos, unbalanced pairs, no pair rotation |
| Meeting overload | More time in meetings than building, ceremony fatigue |
| AI output quality | AI-generated work shipping without review, quality regressions |
Step 3: Ask diagnostic questions
Based on initial hypotheses, ask 3-5 targeted questions to confirm or rule out causes:
- "Are stories fully groomed before they enter the iteration, or do questions come up mid-build?"
- "How often do stories change scope after the team commits to them?"
- "When was the last time a stakeholder attended a demo and gave feedback?"
- "How much time does the team spend on bug fixes vs. new feature work?"
- "Is the team pair programming? How often do pairs rotate?"
Step 4: Generate the diagnosis
Output in this format:
Delivery Diagnostic: (team/engagement name)
Date: (today's date) Symptom: (what the user described) Assessment period: (iterations reviewed)
Velocity analysis
| Iteration | Committed | Completed | Rate | Trend |
|---|---|---|---|---|
| (number) | (X) | (Y) | (%) |
Pattern: (what the numbers show — declining, erratic, stable but low)
Root cause hypotheses (ranked by confidence)
1. (Root cause) — Confidence: High / Medium
- Evidence: (specific signals from the data and conversation)
- Contributing factors: (what's making this worse)
- If unaddressed: (what happens if this continues)
2. (Root cause) — Confidence: High / Medium
- (Same structure)
Ruled out
- (Hypotheses considered but not supported by evidence)
Recommended interventions
| Priority | Intervention | Owner | Try for | Success signal |
|---|---|---|---|---|
| 1 | (specific action) | (role) | (1-2 iterations) | (what changes if it works) |
| 2 | (specific action) | (role) | (timeline) | (what changes) |
What NOT to do
- (Common knee-jerk reactions that won't help — e.g., "adding more people won't fix a scope problem")
Step 4b: Engineering health signals
When diagnosing an engineering-heavy team, add these indicators to the analysis:
### Engineering health indicators
#### DORA metrics
| Metric | Current | Benchmark (High performer) | Assessment |
|--------|---------|--------------------------|------------|
| Deployment frequency | (how often?) | On-demand / multiple per day | (healthy / needs attention / critical) |
| Lead time for changes | (commit to production) | Less than one day | (healthy / needs attention / critical) |
| Change failure rate | (% of deploys causing failure) | 0-15% | (healthy / needs attention / critical) |
| Mean time to restore | (time to recover from failure) | Less than one hour | (healthy / needs attention / critical) |
#### Build and CI health
| Signal | Current state | Impact |
|--------|-------------|--------|
| Build time | (minutes) | (> 10 min = context-switching tax, > 30 min = batching deploys) |
| CI pass rate | (%) | (< 95% = flaky tests eroding trust, < 90% = broken pipeline) |
| Flaky test count | (number) | (each flaky test = ~30 min/week wasted across team) |
| Deploy pipeline duration | (minutes) | (> 30 min = rollback is slow, hotfixes are painful) |
#### On-call and incident burden
| Signal | Current state | Trend |
|--------|-------------|-------|
| Pages per week | (number) | (increasing / stable / decreasing) |
| Off-hours pages | (% of total) | (> 30% = burnout risk) |
| On-call rotation size | (people) | (< 4 = unsustainable frequency) |
| Incident frequency | (per month) | (trend matters more than absolute number) |
#### Technical debt velocity
| Signal | Current state | Trend |
|--------|-------------|-------|
| % of capacity on debt/bugs | (%) | (> 30% = debt is winning) |
| Age of oldest open bug | (days) | (> 90 days = triage is broken) |
| Dependency update cadence | (last update) | (> 6 months = accumulating risk) |
Use these signals alongside the standard delivery diagnostic to distinguish between process problems and engineering infrastructure problems. A team can have perfect story readiness and still underdeliver if CI takes 40 minutes and deploys fail 20% of the time.
Step 5: Discuss
Ask the user:
- Does this diagnosis match what you're experiencing?
- Any evidence that points in a different direction?
- Which intervention do you want to try first?
- Should we raise this in the retro?
Output location
Present the diagnosis as formatted text in the conversation.
Example Output
Input
- Team/engagement: Meridian Health — Patient Portal squad (6 engineers, 1 PM, 1 designer)
- Iteration metrics (last 5 sprints):
- Sprint 21: committed 34pts, completed 28pts (82%), 2 carryovers
- Sprint 22: committed 30pts, completed 22pts (73%), 4 carryovers, mid-sprint scope add from compliance team
- Sprint 23: committed 26pts, completed 18pts (69%), 3 carryovers, one engineer out sick
- Sprint 24: committed 28pts, completed 17pts (61%), 5 carryovers, "emergency" accessibility fix injected day 3
- Sprint 25: committed 24pts, completed 15pts (63%), 4 carryovers
- Symptoms: "We keep carrying the same stories across multiple sprints, demos feel thin and stakeholders are visibly frustrated, team morale has dropped noticeably in the last month"
- Retro themes: Stories aren't ready when we pull them in; too many interruptions from compliance and security; the lead engineer reviews almost every PR and is a bottleneck; pairing has dropped off since two new engineers joined in Sprint 22
Delivery Diagnostic: Meridian Health — Patient Portal Squad
Date: June 12, 2025 Symptom: Chronic story carryovers, thin demos, stakeholder frustration, declining morale Assessment period: Sprints 21–25
Velocity analysis
| Iteration | Committed | Completed | Rate | Trend |
|---|---|---|---|---|
| Sprint 21 | 34 | 28 | 82% | Baseline |
| Sprint 22 | 30 | 22 | 73% | ↓ |
| Sprint 23 | 26 | 18 | 69% | ↓ |
| Sprint 24 | 28 | 17 | 61% | ↓ |
| Sprint 25 | 24 | 15 | 63% | Flat/low |
Pattern: Sustained decline over 5 sprints despite the team self-correcting by lowering commitments. Completion rate has fallen 19 points from baseline. Lowering commitment without fixing root causes is masking the problem, not solving it.
Root cause hypotheses (ranked by confidence)
1. Story readiness — Confidence: High
- Evidence: Retro feedback explicitly names stories arriving ungroomed; 4 of 5 sprints show carryovers beginning mid-sprint, not at the end — classic sign that questions surface during build, not before; carryover count is increasing even as commitment decreases
- Contributing factors: Compliance and security teams injecting requirements post-commit; designer and PM may not be completing grooming before sprint planning
- If unaddressed: Carryover rate will continue to climb; engineers will stop trusting that accepted stories are actually ready; planning becomes performative
2. Knowledge concentration / PR bottleneck — Confidence: High
- Evidence: Team retro explicitly names the lead engineer as a PR review bottleneck; two new engineers joined in Sprint 22 — exactly when velocity began its steepest drop; pairing has declined since their arrival, meaning onboarding is happening through solo work and slow PR feedback loops
- Contributing factors: No defined PR review rotation; new engineers may be producing more review-intensive work; lead engineer is likely context-switching between coding and reviewing
- If unaddressed: Bus factor stays at 1 for critical subsystems; new engineers remain underleveled; lead burns out; velocity stays depressed even when story readiness improves
3. External blockers / unplanned injection — Confidence: Medium
- Evidence: Sprint 22 had a mid-sprint compliance scope add; Sprint 24 had an injected accessibility fix on day 3; these aren't outliers — they're becoming a pattern without a defined intake process
- Contributing factors: No formal mechanism for compliance/security to submit urgent work; PM may lack authority to push back or defer non-emergency requests
- If unaddressed: Sprint planning commitments become meaningless; team loses sense of predictability; stakeholder frustration compounds because demos reflect interrupted, half-finished work
Ruled out
- Estimation accuracy — The team is actively reducing commitment each sprint, showing self-awareness; the gap between committed and completed points to execution problems, not sizing problems
- Meeting overload — Not raised in retros; no data pointing to ceremony bloat
- Team misalignment — PM, design, and engineering appear to be working on the same goals; the issue is upstream readiness, not directional disagreement
Recommended interventions
| Priority | Intervention | Owner | Try for | Success signal |
|---|---|---|---|---|
| 1 | Implement a story readiness checklist (AC written, designs attached, open questions resolved) — no story enters sprint planning without it | PM + Designer | 2 sprints | Carryovers drop below 2 per sprint; mid-sprint question volume falls |
| 2 | Establish a PR review rotation: every engineer reviews, lead engineer approves only when architecture judgment is needed | Lead Engineer + PM | 1 sprint | PR cycle time drops below 4 hours; lead's review queue shrinks by 50% |
| 3 | Restart pair rotation with intentional pairing of new engineers with mid-level engineers — not just the lead | Engineering Lead | 2 sprints | New engineers begin merging PRs with less review churn; knowledge spread improves |
| 4 | Create an injection protocol: compliance/security requests go to PM backlog and are scheduled — only P0 safety issues bypass planning | PM | 1 sprint | Zero unplanned mid-sprint injections; compliance team has a clear, respected intake path |
What NOT to do
- Don't keep reducing sprint commitment without fixing readiness — you'll hit a floor where demos are so thin that stakeholder trust collapses entirely
- Don't add a second senior engineer to solve the PR bottleneck before fixing the rotation process — another senior will just become a second bottleneck
- Don't run a morale initiative (team lunch, recognition program) as a substitute for fixing the structural problems — morale is a symptom here, not the root cause; it will recover when work starts flowing again
Engineering health indicators
DORA metrics
| Metric | Current | Benchmark (High performer) | Assessment |
|---|---|---|---|
| Deployment frequency | Once per sprint (2 weeks) | On-demand / multiple per day | Needs attention |
| Lead time for changes | ~8 days (commit to staging review) | Less than one day | Critical |
| Change failure rate | ~18% (2 of last 11 deploys caused rollback) | 0–15% | Needs attention |
| Mean time to restore | ~3 hours average | Less than one hour | Needs attention |
Build and CI health
| Signal | Current state | Impact |
|---|---|---|
| Build time | 22 minutes | Context-switching tax; engineers not waiting for green before moving on |
| CI pass rate | 91% | Approaching broken-pipeline threshold; flaky tests eroding trust in CI signal |
| Flaky test count | ~14 known flaky tests | ~7 hours/week wasted team-wide; engineers beginning to ignore red builds |
| Deploy pipeline duration | 38 minutes | Rollback is slow; hotfixes are painful; contributing to batched deploys |
CI/CD note: The 22-minute build time and 38-minute deploy pipeline are compounding the PR bottleneck. Even if review rotation improves, engineers are losing 60+ minutes per PR cycle waiting on automation. This should be addressed in parallel with process interventions.
Next questions for discussion
- Does this diagnosis match what you're experiencing on the ground — especially the readiness and PR bottleneck findings?
- The compliance injection pattern: does the PM have the authority to defer those requests, or is there an organizational dynamic we need to surface to leadership?
- Which intervention do you want to run first — story readiness gate or PR rotation? (Recommend readiness gate as it has the highest leverage on carryovers.)
- Should we bring the PR bottleneck finding directly to the lead engineer, or surface it first in the retro as a team-owned problem?