Use this when engineering leadership wants to understand what's slowing developers down, where the team's biggest friction points are, and what investments would have the highest impact on productivity and retention. This produces a structured DX assessment covering tooling, workflow, knowledge access, cognitive load, and team health signals.
Related skills: DX friction often surfaces as a symptom in
/delivery-diagnose. Tooling and infrastructure debt feeds into/tech-debt-assessment. On-call and incident burden connects to/incident-reviewpatterns.
Process
Step 1: Gather inputs
Ask the user to provide:
- Team scope -- which team or teams are being assessed? How many engineers?
- Known frustrations -- what do engineers complain about most? What comes up in retros?
- Recent changes -- any new tooling, process changes, re-orgs, or platform migrations in the last 3-6 months?
- Attrition signals -- has anyone left recently citing engineering culture or tooling? Are people disengaged?
- Available data -- do you have survey results, CI metrics, PR cycle times, on-call logs, or DORA metrics to reference?
Step 2: Assess across DX dimensions
Evaluate the developer experience across six dimensions. For each, rate as Strong / Adequate / Weak based on the evidence gathered:
2a. Local development environment
| Signal | What to check | Red flag |
|---|---|---|
| Setup time | How long for a new engineer to go from clone to running app? | > 1 day |
| Reliability | Does the local env break frequently? Do engineers waste time fixing it? | Weekly "it works on my machine" issues |
| Parity | Does local env match staging/prod? | Significant drift causing surprises at deploy |
| Documentation | Is setup documented and current? | Engineers rely on asking teammates |
2b. CI/CD and deployment
| Signal | What to check | Red flag |
|---|---|---|
| Build time | End-to-end CI pipeline duration | > 15 minutes |
| Flaky tests | Percentage of test runs that fail then pass on retry | > 2% |
| Deploy frequency | How often the team deploys to production | < 1/week for an active team |
| Deploy confidence | Do engineers deploy without anxiety? | Deploys require "deploy captain" or are batched weekly |
| Rollback ease | Can a bad deploy be reverted quickly? | Manual rollback or no rollback at all |
2c. Code review and collaboration
| Signal | What to check | Red flag |
|---|---|---|
| Time to first review | Median time from PR open to first review | > 24 hours |
| PR cycle time | Median time from PR open to merge | > 2 business days |
| Review bottleneck | Do PRs wait for specific people? | > 30% of PRs blocked on one person |
| Review quality | Are reviews substantive or rubber-stamps? | Mostly "LGTM" with no comments |
| Pairing culture | Does the team pair or mob? | Knowledge stuck in silos |
2d. Knowledge access and onboarding
| Signal | What to check | Red flag |
|---|---|---|
| Onboarding time | How long until a new engineer ships their first meaningful PR? | > 4 weeks |
| Documentation | Are architecture decisions, runbooks, and domain context documented? | "Ask Sarah, she knows" |
| Bus factor | Are there critical systems only one person understands? | Bus factor = 1 for any production system |
| Search and discovery | Can engineers find answers without asking someone? | Tribal knowledge dominates |
2e. On-call and operational burden
| Signal | What to check | Red flag |
|---|---|---|
| On-call rotation | Is on-call distributed fairly? | Same 2-3 people always on-call |
| Alert noise | Signal-to-noise ratio of production alerts | > 50% alerts are non-actionable |
| Incident frequency | How often does the team get paged? | > 2 incidents/week requiring human response |
| Toil | Time spent on repetitive operational work | > 20% of engineering time on toil |
2f. Cognitive load and flow
| Signal | What to check | Red flag |
|---|---|---|
| Context switching | How often are engineers interrupted by meetings, Slack, or incidents? | > 3 context switches per focused work block |
| Meeting load | Percentage of the week in meetings | > 30% for individual contributors |
| Scope of ownership | How many systems does each engineer own? | Ownership so broad that nothing gets deep attention |
| Decision autonomy | Can engineers make technical decisions without escalation? | Every decision requires manager or architect approval |
Step 3: Identify the top friction points
From the assessment, identify the 3-5 highest-impact friction points. For each:
- What it is -- the specific friction, with evidence
- Who it affects -- all engineers, a subset, or specific roles?
- Impact type -- slows delivery, hurts quality, causes attrition, blocks scaling, or all of the above?
- Root cause -- is this a tooling problem, a process problem, a people problem, or a structural problem?
- Trend -- getting worse, stable, or already improving?
Step 4: Generate recommendations
For each friction point, recommend an intervention:
| Field | Description |
|---|---|
| Recommendation | Specific, actionable change |
| Effort | S / M / L (to implement) |
| Expected impact | What improves and by how much? |
| Leading indicator | How will you know it's working within 2-4 weeks? |
| Owner | Who drives this? |
Sequence recommendations: quick wins first (high impact, low effort), then structural improvements.
Step 5: Produce the assessment
Output in this format:
Developer Experience Assessment: {{team-or-org}}
Date: {{date}} | Team size: {{count}} | Assessed by: {{who}}
DX scorecard
| Dimension | Rating | Key signal |
|---|---|---|
| Local development | {{Strong/Adequate/Weak}} | {{most telling signal}} |
| CI/CD and deployment | {{Strong/Adequate/Weak}} | {{most telling signal}} |
| Code review and collaboration | {{Strong/Adequate/Weak}} | {{most telling signal}} |
| Knowledge access and onboarding | {{Strong/Adequate/Weak}} | {{most telling signal}} |
| On-call and operational burden | {{Strong/Adequate/Weak}} | {{most telling signal}} |
| Cognitive load and flow | {{Strong/Adequate/Weak}} | {{most telling signal}} |
Top friction points (ranked by impact)
1. {{friction point}}
- Evidence: {{specific data or observation}}
- Affects: {{who}}
- Impact: {{delivery / quality / attrition / scaling}}
- Root cause: {{tooling / process / people / structural}}
- Trend: {{worse / stable / improving}}
2. {{friction point}}
- (Same structure)
Recommendations
| # | Recommendation | Effort | Expected impact | Leading indicator | Owner |
|---|---|---|---|---|---|
| 1 | {{action}} | {{S/M/L}} | {{what improves}} | {{early signal}} | {{who}} |
What's working well
- {{Things the team should keep doing}}
Suggested reassessment date
{{3-6 months out, with specific metrics to re-measure}}
Step 6: Discuss
Ask the user:
- Does the scorecard match your intuition? Any ratings that surprise you?
- Are there friction points I missed?
- Which recommendations resonate most with the team?
- Are there organizational constraints that would block any of these?
- Should we run this as a team survey to validate with the broader engineering group?
Output location
Present the assessment as formatted text in the conversation or write to a file if the user specifies a path.
Example Output
Input
- Team scope: Platform Engineering team at Meridian Health Systems, 14 engineers (8 senior, 4 mid-level, 2 junior) building internal developer tooling and shared infrastructure for 6 product teams
- Known frustrations: Engineers constantly complain about flaky integration tests, slow CI pipelines, and getting pulled into Slack threads to answer questions other teams should be able to self-serve; retros repeatedly surface "I can't find deep focus time"
- Recent changes: Migrated from Jenkins to GitHub Actions 4 months ago (incomplete — 3 legacy pipelines still on Jenkins); onboarded 4 new engineers in Q3; one senior engineer (Priya) left 6 weeks ago citing burnout and "being the only one who understood the auth service"
- Attrition signals: Priya's exit interview explicitly mentioned on-call burden and knowledge silo stress; two engineers flagged similar concerns in the last engagement survey (eNPS dropped from +32 to +11 over two quarters)
- Available data: GitHub Actions dashboards showing median CI time of 22 minutes, flaky test rate of ~6.5%, PR cycle time averaging 3.4 business days; PagerDuty logs show 4.2 incidents/week with 60% classified as non-actionable noise; no formal onboarding tracking
Output (abbreviated)
Developer Experience Assessment: Meridian Health Systems — Platform Engineering
Date: 2025-07-14 | Team size: 14 engineers | Assessed by: Engineering Leadership + DX Assessment
DX scorecard
| Dimension | Rating | Key signal |
|---|---|---|
| Local development | Adequate | No data on setup time; parity issues suspected given dual CI system |
| CI/CD and deployment | Weak | Median CI time 22 min; 6.5% flaky test rate; 3 pipelines still on Jenkins |
| Code review and collaboration | Weak | PR cycle time 3.4 business days; knowledge concentrated after Priya's departure |
| Knowledge access and onboarding | Weak | No onboarding tracking; auth service now bus factor = 1 (or zero); "ask in Slack" is the de facto docs |
| On-call and operational burden | Weak | 4.2 incidents/week; 60% non-actionable alerts; eNPS drop signals burnout accumulation |
| Cognitive load and flow | Weak | Platform team fielding cross-team Slack interruptions continuously; 4 new engineers adding ramp burden |
Top friction points (ranked by impact)
1. Flaky and slow CI pipeline is the team's biggest daily tax
- Evidence: 22-minute median CI time (red flag threshold: 15 min); 6.5% flaky test rate (threshold: 2%); hybrid Jenkins/GitHub Actions environment adds configuration complexity and cognitive overhead
- Affects: All 14 engineers, plus the 6 downstream product teams depending on shared pipelines
- Impact: Delivery slowdown, quality erosion (engineers learn to distrust test results), scaling blocker as product teams grow
- Root cause: Tooling — incomplete migration left technical debt mid-stream; no dedicated effort to quarantine or fix flaky tests
- Trend: Worse — migration is stalled and flaky test count has grown since Q3 onboarding
2. Auth service knowledge void — single point of organizational failure
- Evidence: Priya's departure 6 weeks ago left auth service effectively undocumented; exit interview cited isolation and burnout as direct causes; no runbooks confirmed in knowledge audit
- Affects: All engineers on rotation, all product teams integrating auth, future incident responders
- Impact: Attrition risk (same conditions remain for the next senior engineer), delivery risk (any auth incident is now a crisis), onboarding blocker
- Root cause: Structural — no knowledge-sharing practice, no documentation culture, rotation didn't distribute ownership
- Trend: Worse — the gap is widening as time passes without intervention
3. Alert noise consuming on-call engineers and eroding trust in monitoring
- Evidence: 4.2 incidents/week with 60% non-actionable (PagerDuty logs); eNPS dropped from +32 to +11 in two quarters, strongly correlated with on-call burden growth
- Affects: Whoever is on rotation; disproportionately senior engineers who inherited legacy alerting configs
- Impact: Attrition risk, cognitive load, quality (engineers who are sleep-deprived or interrupted make worse decisions)
- Root cause: Process + tooling — no alert review cadence, no ownership assigned to reducing noise, thresholds never tuned post-Jenkins migration
- Trend: Stable at a bad level; no active effort to improve
4. Platform team acting as human API for other teams' questions
- Evidence: Engineers report continuous Slack interruptions; retros have flagged this for multiple quarters; no self-service documentation portal or searchable runbook library exists
- Affects: All engineers, but senior engineers disproportionately; 4 new engineers also impacted as they can't self-onboard
- Impact: Cognitive load, flow destruction, scaling ceiling (team can't grow its impact if it's also customer support)
- Root cause: Structural — the team has never invested in productizing its knowledge; documentation is treated as optional
- Trend: Worse — each new product team onboarded amplifies the interruption surface
Recommendations
| # | Recommendation | Effort | Expected impact | Leading indicator | Owner |
|---|---|---|---|---|---|
| 1 | Quarantine all known flaky tests into a separate suite; block merge only on stable suite | S | Restore CI signal reliability within 2 weeks; developer trust in green builds returns | Flaky test rate in primary suite drops to <1% | Platform TL |
| 2 | Complete Jenkins → GitHub Actions migration with a hard cutoff date (30 days); assign one engineer as DRI | M | Eliminate dual-system cognitive overhead; unblock pipeline optimization | All pipelines running in GHA; Jenkins decommissioned | EM + Platform TL |
| 3 | Run an auth service documentation sprint — 2 engineers pair on Priya's domain for 2 sprints, producing runbooks + architecture doc | M | Bus factor rises from ~0 to 3+; on-call confidence improves | Runbook published; 2 engineers can independently handle auth incidents | Senior engineer + EM |
| 4 | Alert audit: review all PagerDuty rules, silence or raise thresholds on non-actionable alerts, assign ownership per service | S | Reduce incident volume by ~40%; reduce on-call burnout | Non-actionable alert rate drops below 25% within 4 weeks | On-call rotation lead |
| 5 | Create a Platform team "office hours" model (2x/week, 30 min) + internal docs site (Notion or Backstage) to deflect async questions | M | Reduce ad-hoc Slack interruptions by 50%+; unblock product teams to self-serve | Measurable drop in #platform-help thread volume; new engineer time-to-first-PR improves | EM + one mid-level engineer as DX champion |
| 6 | Instrument onboarding: track time-to-first-meaningful-PR for all new engineers; set 3-week target | S | Creates accountability; identifies where new engineers get stuck | First data point visible after next hire | EM |
What's working well
- GitHub Actions adoption shows the team is willing to invest in tooling improvement — the migration intent was right, execution just stalled
- The team is retaining 12 of 14 engineers despite high friction, suggesting psychological safety and team cohesion are intact — a real asset to protect
- Engagement survey data and exit interview candor indicate a feedback culture exists; signals are visible and honest, which makes this assessment actionable
Suggested reassessment date
October 14, 2025 (90 days)
Metrics to re-measure at that point:
- Median CI time (target: ≤12 minutes)
- Flaky test rate in primary suite (target: <1%)
- Non-actionable alert rate (target: <25%)
- PR cycle time (target: <2 business days)
- eNPS (target: return to +25 or above)
- Auth service bus factor (target: ≥3 engineers)
- Time-to-first-meaningful-PR for new engineers (target: ≤3 weeks)