Delivery Diagnose - AI Agent Skill

Use this when the team feels stuck, velocity is declining, or something is off with delivery and you need to systematically identify root causes and recommend targeted interventions.

Related skills: For systems-level root cause analysis using feedback loops and archetypes, see /systems-diagnosis.

Process

Step 1: Gather inputs

Ask the user to provide:

Iteration metrics -- for the last 4-6 iterations: stories committed, completed, carry-overs, and any disruptions.
Symptoms -- what feels wrong? (e.g., "we keep carrying over stories," "demos feel thin," "team morale is low," "stakeholders are frustrated")
Recent retro themes -- what has the team been raising in retrospectives?
Team composition -- who's on the team and any recent changes (people joining, leaving, changing roles)?

Step 2: Form hypotheses

Based on the data and symptoms, map to potential root causes from the failure modes catalog:

Hypothesis	What to look for
Scope creep	Stories growing during iteration, acceptance criteria expanding mid-build, "just one more thing" pattern
Story readiness	Stories entering iteration without clear acceptance criteria, designs, or unresolved questions
Estimation accuracy	Consistently under-estimating story size, no estimation happening at all
External blockers	Waiting on client decisions, environment access, third-party APIs
Team misalignment	PM, design, and engineering working toward different goals
Stakeholder absence	No feedback loop, decisions delayed, surprise rejections in demo
Technical debt	Increasing time spent on bugs, slow CI/CD, fragile infrastructure
CI/CD friction	Build times > 15 min, flaky tests blocking merges, manual deploy steps, no rollback automation, environment drift between staging and prod
Architecture bottleneck	One service in every critical path, database is the scaling constraint, monolith coupling prevents independent deploys, shared libraries force coordinated releases
Knowledge concentration	One person owns a critical system (bus factor = 1), PRs wait for specific reviewers, on-call rotation is unbalanced, new team members ramp slowly
Pairing issues	Knowledge silos, unbalanced pairs, no pair rotation
Meeting overload	More time in meetings than building, ceremony fatigue
AI output quality	AI-generated work shipping without review, quality regressions

Step 3: Ask diagnostic questions

Based on initial hypotheses, ask 3-5 targeted questions to confirm or rule out causes:

"Are stories fully groomed before they enter the iteration, or do questions come up mid-build?"
"How often do stories change scope after the team commits to them?"
"When was the last time a stakeholder attended a demo and gave feedback?"
"How much time does the team spend on bug fixes vs. new feature work?"
"Is the team pair programming? How often do pairs rotate?"

Step 4: Generate the diagnosis

Output in this format:

Delivery Diagnostic: (team/engagement name)

Date: (today's date) Symptom: (what the user described) Assessment period: (iterations reviewed)

Velocity analysis

Iteration	Committed	Completed	Rate	Trend
(number)	(X)	(Y)	(%)

Pattern: (what the numbers show -- declining, erratic, stable but low)

Root cause hypotheses (ranked by confidence)

1. (Root cause) -- Confidence: High / Medium

Evidence: (specific signals from the data and conversation)
Contributing factors: (what's making this worse)
If unaddressed: (what happens if this continues)

2. (Root cause) -- Confidence: High / Medium

(Same structure)

Ruled out

(Hypotheses considered but not supported by evidence)

Recommended interventions

Priority	Intervention	Owner	Try for	Success signal
1	(specific action)	(role)	(1-2 iterations)	(what changes if it works)
2	(specific action)	(role)	(timeline)	(what changes)

What NOT to do

(Common knee-jerk reactions that won't help -- e.g., "adding more people won't fix a scope problem")

Step 4b: Engineering health signals

When diagnosing an engineering-heavy team, add these indicators to the analysis:

### Engineering health indicators

#### DORA metrics
| Metric | Current | Benchmark (High performer) | Assessment |
|--------|---------|--------------------------|------------|
| Deployment frequency | (how often?) | On-demand / multiple per day | (healthy / needs attention / critical) |
| Lead time for changes | (commit to production) | Less than one day | (healthy / needs attention / critical) |
| Change failure rate | (% of deploys causing failure) | 0-15% | (healthy / needs attention / critical) |
| Mean time to restore | (time to recover from failure) | Less than one hour | (healthy / needs attention / critical) |

#### Build and CI health
| Signal | Current state | Impact |
|--------|-------------|--------|
| Build time | (minutes) | (> 10 min = context-switching tax, > 30 min = batching deploys) |
| CI pass rate | (%) | (< 95% = flaky tests eroding trust, < 90% = broken pipeline) |
| Flaky test count | (number) | (each flaky test = ~30 min/week wasted across team) |
| Deploy pipeline duration | (minutes) | (> 30 min = rollback is slow, hotfixes are painful) |

#### On-call and incident burden
| Signal | Current state | Trend |
|--------|-------------|-------|
| Pages per week | (number) | (increasing / stable / decreasing) |
| Off-hours pages | (% of total) | (> 30% = burnout risk) |
| On-call rotation size | (people) | (< 4 = unsustainable frequency) |
| Incident frequency | (per month) | (trend matters more than absolute number) |

#### Technical debt velocity
| Signal | Current state | Trend |
|--------|-------------|-------|
| % of capacity on debt/bugs | (%) | (> 30% = debt is winning) |
| Age of oldest open bug | (days) | (> 90 days = triage is broken) |
| Dependency update cadence | (last update) | (> 6 months = accumulating risk) |

Use these signals alongside the standard delivery diagnostic to distinguish between process problems and engineering infrastructure problems. A team can have perfect story readiness and still underdeliver if CI takes 40 minutes and deploys fail 20% of the time.

Step 5: Discuss

Ask the user:

Does this diagnosis match what you're experiencing?
Any evidence that points in a different direction?
Which intervention do you want to try first?
Should we raise this in the retro?

Output location

Present the diagnosis as formatted text in the conversation.

Example Output

Input

Team/engagement: Meridian Health — Patient Portal squad (6 engineers, 1 PM, 1 designer)
Iteration metrics (last 5 sprints):
- Sprint 21: committed 34pts, completed 28pts (82%), 2 carryovers
- Sprint 22: committed 30pts, completed 22pts (73%), 4 carryovers, mid-sprint scope add from compliance team
- Sprint 23: committed 26pts, completed 18pts (69%), 3 carryovers, one engineer out sick
- Sprint 24: committed 28pts, completed 17pts (61%), 5 carryovers, "emergency" accessibility fix injected day 3
- Sprint 25: committed 24pts, completed 15pts (63%), 4 carryovers
Symptoms: "We keep carrying the same stories across multiple sprints, demos feel thin and stakeholders are visibly frustrated, team morale has dropped noticeably in the last month"
Retro themes: Stories aren't ready when we pull them in; too many interruptions from compliance and security; the lead engineer reviews almost every PR and is a bottleneck; pairing has dropped off since two new engineers joined in Sprint 22

Delivery Diagnostic: Meridian Health — Patient Portal Squad

Date: June 12, 2025 Symptom: Chronic story carryovers, thin demos, stakeholder frustration, declining morale Assessment period: Sprints 21–25

Velocity analysis

Iteration	Committed	Completed	Rate	Trend
Sprint 21	34	28	82%	Baseline
Sprint 22	30	22	73%	↓
Sprint 23	26	18	69%	↓
Sprint 24	28	17	61%	↓
Sprint 25	24	15	63%	Flat/low

Pattern: Sustained decline over 5 sprints despite the team self-correcting by lowering commitments. Completion rate has fallen 19 points from baseline. Lowering commitment without fixing root causes is masking the problem, not solving it.

Root cause hypotheses (ranked by confidence)

1. Story readiness — Confidence: High

Evidence: Retro feedback explicitly names stories arriving ungroomed; 4 of 5 sprints show carryovers beginning mid-sprint, not at the end — classic sign that questions surface during build, not before; carryover count is increasing even as commitment decreases
Contributing factors: Compliance and security teams injecting requirements post-commit; designer and PM may not be completing grooming before sprint planning
If unaddressed: Carryover rate will continue to climb; engineers will stop trusting that accepted stories are actually ready; planning becomes performative

2. Knowledge concentration / PR bottleneck — Confidence: High

Evidence: Team retro explicitly names the lead engineer as a PR review bottleneck; two new engineers joined in Sprint 22 — exactly when velocity began its steepest drop; pairing has declined since their arrival, meaning onboarding is happening through solo work and slow PR feedback loops
Contributing factors: No defined PR review rotation; new engineers may be producing more review-intensive work; lead engineer is likely context-switching between coding and reviewing
If unaddressed: Bus factor stays at 1 for critical subsystems; new engineers remain underleveled; lead burns out; velocity stays depressed even when story readiness improves

3. External blockers / unplanned injection — Confidence: Medium

Evidence: Sprint 22 had a mid-sprint compliance scope add; Sprint 24 had an injected accessibility fix on day 3; these aren't outliers — they're becoming a pattern without a defined intake process
Contributing factors: No formal mechanism for compliance/security to submit urgent work; PM may lack authority to push back or defer non-emergency requests
If unaddressed: Sprint planning commitments become meaningless; team loses sense of predictability; stakeholder frustration compounds because demos reflect interrupted, half-finished work

Ruled out

Estimation accuracy — The team is actively reducing commitment each sprint, showing self-awareness; the gap between committed and completed points to execution problems, not sizing problems
Meeting overload — Not raised in retros; no data pointing to ceremony bloat
Team misalignment — PM, design, and engineering appear to be working on the same goals; the issue is upstream readiness, not directional disagreement

Recommended interventions

Priority	Intervention	Owner	Try for	Success signal
1	Implement a story readiness checklist (AC written, designs attached, open questions resolved) — no story enters sprint planning without it	PM + Designer	2 sprints	Carryovers drop below 2 per sprint; mid-sprint question volume falls
2	Establish a PR review rotation: every engineer reviews, lead engineer approves only when architecture judgment is needed	Lead Engineer + PM	1 sprint	PR cycle time drops below 4 hours; lead's review queue shrinks by 50%
3	Restart pair rotation with intentional pairing of new engineers with mid-level engineers — not just the lead	Engineering Lead	2 sprints	New engineers begin merging PRs with less review churn; knowledge spread improves
4	Create an injection protocol: compliance/security requests go to PM backlog and are scheduled — only P0 safety issues bypass planning	PM	1 sprint	Zero unplanned mid-sprint injections; compliance team has a clear, respected intake path

What NOT to do

Don't keep reducing sprint commitment without fixing readiness — you'll hit a floor where demos are so thin that stakeholder trust collapses entirely
Don't add a second senior engineer to solve the PR bottleneck before fixing the rotation process — another senior will just become a second bottleneck
Don't run a morale initiative (team lunch, recognition program) as a substitute for fixing the structural problems — morale is a symptom here, not the root cause; it will recover when work starts flowing again

Engineering health indicators

DORA metrics

Metric	Current	Benchmark (High performer)	Assessment
Deployment frequency	Once per sprint (2 weeks)	On-demand / multiple per day	Needs attention
Lead time for changes	~8 days (commit to staging review)	Less than one day	Critical
Change failure rate	~18% (2 of last 11 deploys caused rollback)	0–15%	Needs attention
Mean time to restore	~3 hours average	Less than one hour	Needs attention

Build and CI health

Signal	Current state	Impact
Build time	22 minutes	Context-switching tax; engineers not waiting for green before moving on
CI pass rate	91%	Approaching broken-pipeline threshold; flaky tests eroding trust in CI signal
Flaky test count	~14 known flaky tests	~7 hours/week wasted team-wide; engineers beginning to ignore red builds
Deploy pipeline duration	38 minutes	Rollback is slow; hotfixes are painful; contributing to batched deploys

CI/CD note: The 22-minute build time and 38-minute deploy pipeline are compounding the PR bottleneck. Even if review rotation improves, engineers are losing 60+ minutes per PR cycle waiting on automation. This should be addressed in parallel with process interventions.

Next questions for discussion

Does this diagnosis match what you're experiencing on the ground — especially the readiness and PR bottleneck findings?
The compliance injection pattern: does the PM have the authority to defer those requests, or is there an organizational dynamic we need to surface to leadership?
Which intervention do you want to run first — story readiness gate or PR rotation? (Recommend readiness gate as it has the highest leverage on carryovers.)
Should we bring the PR bottleneck finding directly to the lead engineer, or surface it first in the retro as a team-owned problem?

Run this now

Try /delivery-diagnose on your own input

0/4000

Part of these Playbook topics

Getting Unstuck Balanced Teams Sustainable Pace

Related Assessment & Diagnostics skills

10x Move Evaluator AI Adoption Evaluator AI Maturity Assess AI Maturity Org Annual Strategy Review Attribution Model Designer Bias Spotter Brand Audit

Back to Skills Catalog