Skip to main content
Engineering/dx-assessment

DX Assessment

You need to assess developer experience – tooling friction, CI/CD health, code review bottlenecks, knowledge access, on-call burden, and cognitive load.

Use this when engineering leadership wants to understand what's slowing developers down, where the team's biggest friction points are, and what investments would have the highest impact on productivity and retention. This produces a structured DX assessment covering tooling, workflow, knowledge access, cognitive load, and team health signals.

Related skills: DX friction often surfaces as a symptom in /delivery-diagnose. Tooling and infrastructure debt feeds into /tech-debt-assessment. On-call and incident burden connects to /incident-review patterns.

Process

Step 1: Gather inputs

Ask the user to provide:

  1. Team scope -- which team or teams are being assessed? How many engineers?
  2. Known frustrations -- what do engineers complain about most? What comes up in retros?
  3. Recent changes -- any new tooling, process changes, re-orgs, or platform migrations in the last 3-6 months?
  4. Attrition signals -- has anyone left recently citing engineering culture or tooling? Are people disengaged?
  5. Available data -- do you have survey results, CI metrics, PR cycle times, on-call logs, or DORA metrics to reference?

Step 2: Assess across DX dimensions

Evaluate the developer experience across six dimensions. For each, rate as Strong / Adequate / Weak based on the evidence gathered:

2a. Local development environment

SignalWhat to checkRed flag
Setup timeHow long for a new engineer to go from clone to running app?> 1 day
ReliabilityDoes the local env break frequently? Do engineers waste time fixing it?Weekly "it works on my machine" issues
ParityDoes local env match staging/prod?Significant drift causing surprises at deploy
DocumentationIs setup documented and current?Engineers rely on asking teammates

2b. CI/CD and deployment

SignalWhat to checkRed flag
Build timeEnd-to-end CI pipeline duration> 15 minutes
Flaky testsPercentage of test runs that fail then pass on retry> 2%
Deploy frequencyHow often the team deploys to production< 1/week for an active team
Deploy confidenceDo engineers deploy without anxiety?Deploys require "deploy captain" or are batched weekly
Rollback easeCan a bad deploy be reverted quickly?Manual rollback or no rollback at all

2c. Code review and collaboration

SignalWhat to checkRed flag
Time to first reviewMedian time from PR open to first review> 24 hours
PR cycle timeMedian time from PR open to merge> 2 business days
Review bottleneckDo PRs wait for specific people?> 30% of PRs blocked on one person
Review qualityAre reviews substantive or rubber-stamps?Mostly "LGTM" with no comments
Pairing cultureDoes the team pair or mob?Knowledge stuck in silos

2d. Knowledge access and onboarding

SignalWhat to checkRed flag
Onboarding timeHow long until a new engineer ships their first meaningful PR?> 4 weeks
DocumentationAre architecture decisions, runbooks, and domain context documented?"Ask Sarah, she knows"
Bus factorAre there critical systems only one person understands?Bus factor = 1 for any production system
Search and discoveryCan engineers find answers without asking someone?Tribal knowledge dominates

2e. On-call and operational burden

SignalWhat to checkRed flag
On-call rotationIs on-call distributed fairly?Same 2-3 people always on-call
Alert noiseSignal-to-noise ratio of production alerts> 50% alerts are non-actionable
Incident frequencyHow often does the team get paged?> 2 incidents/week requiring human response
ToilTime spent on repetitive operational work> 20% of engineering time on toil

2f. Cognitive load and flow

SignalWhat to checkRed flag
Context switchingHow often are engineers interrupted by meetings, Slack, or incidents?> 3 context switches per focused work block
Meeting loadPercentage of the week in meetings> 30% for individual contributors
Scope of ownershipHow many systems does each engineer own?Ownership so broad that nothing gets deep attention
Decision autonomyCan engineers make technical decisions without escalation?Every decision requires manager or architect approval

Step 3: Identify the top friction points

From the assessment, identify the 3-5 highest-impact friction points. For each:

  • What it is -- the specific friction, with evidence
  • Who it affects -- all engineers, a subset, or specific roles?
  • Impact type -- slows delivery, hurts quality, causes attrition, blocks scaling, or all of the above?
  • Root cause -- is this a tooling problem, a process problem, a people problem, or a structural problem?
  • Trend -- getting worse, stable, or already improving?

Step 4: Generate recommendations

For each friction point, recommend an intervention:

FieldDescription
RecommendationSpecific, actionable change
EffortS / M / L (to implement)
Expected impactWhat improves and by how much?
Leading indicatorHow will you know it's working within 2-4 weeks?
OwnerWho drives this?

Sequence recommendations: quick wins first (high impact, low effort), then structural improvements.

Step 5: Produce the assessment

Output in this format:


Developer Experience Assessment: {{team-or-org}}

Date: {{date}} | Team size: {{count}} | Assessed by: {{who}}

DX scorecard

DimensionRatingKey signal
Local development{{Strong/Adequate/Weak}}{{most telling signal}}
CI/CD and deployment{{Strong/Adequate/Weak}}{{most telling signal}}
Code review and collaboration{{Strong/Adequate/Weak}}{{most telling signal}}
Knowledge access and onboarding{{Strong/Adequate/Weak}}{{most telling signal}}
On-call and operational burden{{Strong/Adequate/Weak}}{{most telling signal}}
Cognitive load and flow{{Strong/Adequate/Weak}}{{most telling signal}}

Top friction points (ranked by impact)

1. {{friction point}}

  • Evidence: {{specific data or observation}}
  • Affects: {{who}}
  • Impact: {{delivery / quality / attrition / scaling}}
  • Root cause: {{tooling / process / people / structural}}
  • Trend: {{worse / stable / improving}}

2. {{friction point}}

  • (Same structure)

Recommendations

#RecommendationEffortExpected impactLeading indicatorOwner
1{{action}}{{S/M/L}}{{what improves}}{{early signal}}{{who}}

What's working well

  • {{Things the team should keep doing}}

Suggested reassessment date

{{3-6 months out, with specific metrics to re-measure}}


Step 6: Discuss

Ask the user:

  • Does the scorecard match your intuition? Any ratings that surprise you?
  • Are there friction points I missed?
  • Which recommendations resonate most with the team?
  • Are there organizational constraints that would block any of these?
  • Should we run this as a team survey to validate with the broader engineering group?

Output location

Present the assessment as formatted text in the conversation or write to a file if the user specifies a path.

Example Output

Input

  • Team scope: Platform Engineering team at Meridian Health Systems, 14 engineers (8 senior, 4 mid-level, 2 junior) building internal developer tooling and shared infrastructure for 6 product teams
  • Known frustrations: Engineers constantly complain about flaky integration tests, slow CI pipelines, and getting pulled into Slack threads to answer questions other teams should be able to self-serve; retros repeatedly surface "I can't find deep focus time"
  • Recent changes: Migrated from Jenkins to GitHub Actions 4 months ago (incomplete — 3 legacy pipelines still on Jenkins); onboarded 4 new engineers in Q3; one senior engineer (Priya) left 6 weeks ago citing burnout and "being the only one who understood the auth service"
  • Attrition signals: Priya's exit interview explicitly mentioned on-call burden and knowledge silo stress; two engineers flagged similar concerns in the last engagement survey (eNPS dropped from +32 to +11 over two quarters)
  • Available data: GitHub Actions dashboards showing median CI time of 22 minutes, flaky test rate of ~6.5%, PR cycle time averaging 3.4 business days; PagerDuty logs show 4.2 incidents/week with 60% classified as non-actionable noise; no formal onboarding tracking

Output (abbreviated)

Developer Experience Assessment: Meridian Health Systems — Platform Engineering

Date: 2025-07-14 | Team size: 14 engineers | Assessed by: Engineering Leadership + DX Assessment


DX scorecard

DimensionRatingKey signal
Local developmentAdequateNo data on setup time; parity issues suspected given dual CI system
CI/CD and deploymentWeakMedian CI time 22 min; 6.5% flaky test rate; 3 pipelines still on Jenkins
Code review and collaborationWeakPR cycle time 3.4 business days; knowledge concentrated after Priya's departure
Knowledge access and onboardingWeakNo onboarding tracking; auth service now bus factor = 1 (or zero); "ask in Slack" is the de facto docs
On-call and operational burdenWeak4.2 incidents/week; 60% non-actionable alerts; eNPS drop signals burnout accumulation
Cognitive load and flowWeakPlatform team fielding cross-team Slack interruptions continuously; 4 new engineers adding ramp burden

Top friction points (ranked by impact)

1. Flaky and slow CI pipeline is the team's biggest daily tax

  • Evidence: 22-minute median CI time (red flag threshold: 15 min); 6.5% flaky test rate (threshold: 2%); hybrid Jenkins/GitHub Actions environment adds configuration complexity and cognitive overhead
  • Affects: All 14 engineers, plus the 6 downstream product teams depending on shared pipelines
  • Impact: Delivery slowdown, quality erosion (engineers learn to distrust test results), scaling blocker as product teams grow
  • Root cause: Tooling — incomplete migration left technical debt mid-stream; no dedicated effort to quarantine or fix flaky tests
  • Trend: Worse — migration is stalled and flaky test count has grown since Q3 onboarding

2. Auth service knowledge void — single point of organizational failure

  • Evidence: Priya's departure 6 weeks ago left auth service effectively undocumented; exit interview cited isolation and burnout as direct causes; no runbooks confirmed in knowledge audit
  • Affects: All engineers on rotation, all product teams integrating auth, future incident responders
  • Impact: Attrition risk (same conditions remain for the next senior engineer), delivery risk (any auth incident is now a crisis), onboarding blocker
  • Root cause: Structural — no knowledge-sharing practice, no documentation culture, rotation didn't distribute ownership
  • Trend: Worse — the gap is widening as time passes without intervention

3. Alert noise consuming on-call engineers and eroding trust in monitoring

  • Evidence: 4.2 incidents/week with 60% non-actionable (PagerDuty logs); eNPS dropped from +32 to +11 in two quarters, strongly correlated with on-call burden growth
  • Affects: Whoever is on rotation; disproportionately senior engineers who inherited legacy alerting configs
  • Impact: Attrition risk, cognitive load, quality (engineers who are sleep-deprived or interrupted make worse decisions)
  • Root cause: Process + tooling — no alert review cadence, no ownership assigned to reducing noise, thresholds never tuned post-Jenkins migration
  • Trend: Stable at a bad level; no active effort to improve

4. Platform team acting as human API for other teams' questions

  • Evidence: Engineers report continuous Slack interruptions; retros have flagged this for multiple quarters; no self-service documentation portal or searchable runbook library exists
  • Affects: All engineers, but senior engineers disproportionately; 4 new engineers also impacted as they can't self-onboard
  • Impact: Cognitive load, flow destruction, scaling ceiling (team can't grow its impact if it's also customer support)
  • Root cause: Structural — the team has never invested in productizing its knowledge; documentation is treated as optional
  • Trend: Worse — each new product team onboarded amplifies the interruption surface

Recommendations

#RecommendationEffortExpected impactLeading indicatorOwner
1Quarantine all known flaky tests into a separate suite; block merge only on stable suiteSRestore CI signal reliability within 2 weeks; developer trust in green builds returnsFlaky test rate in primary suite drops to <1%Platform TL
2Complete Jenkins → GitHub Actions migration with a hard cutoff date (30 days); assign one engineer as DRIMEliminate dual-system cognitive overhead; unblock pipeline optimizationAll pipelines running in GHA; Jenkins decommissionedEM + Platform TL
3Run an auth service documentation sprint — 2 engineers pair on Priya's domain for 2 sprints, producing runbooks + architecture docMBus factor rises from ~0 to 3+; on-call confidence improvesRunbook published; 2 engineers can independently handle auth incidentsSenior engineer + EM
4Alert audit: review all PagerDuty rules, silence or raise thresholds on non-actionable alerts, assign ownership per serviceSReduce incident volume by ~40%; reduce on-call burnoutNon-actionable alert rate drops below 25% within 4 weeksOn-call rotation lead
5Create a Platform team "office hours" model (2x/week, 30 min) + internal docs site (Notion or Backstage) to deflect async questionsMReduce ad-hoc Slack interruptions by 50%+; unblock product teams to self-serveMeasurable drop in #platform-help thread volume; new engineer time-to-first-PR improvesEM + one mid-level engineer as DX champion
6Instrument onboarding: track time-to-first-meaningful-PR for all new engineers; set 3-week targetSCreates accountability; identifies where new engineers get stuckFirst data point visible after next hireEM

What's working well

  • GitHub Actions adoption shows the team is willing to invest in tooling improvement — the migration intent was right, execution just stalled
  • The team is retaining 12 of 14 engineers despite high friction, suggesting psychological safety and team cohesion are intact — a real asset to protect
  • Engagement survey data and exit interview candor indicate a feedback culture exists; signals are visible and honest, which makes this assessment actionable

Suggested reassessment date

October 14, 2025 (90 days)

Metrics to re-measure at that point:

  • Median CI time (target: ≤12 minutes)
  • Flaky test rate in primary suite (target: <1%)
  • Non-actionable alert rate (target: <25%)
  • PR cycle time (target: <2 business days)
  • eNPS (target: return to +25 or above)
  • Auth service bus factor (target: ≥3 engineers)
  • Time-to-first-meaningful-PR for new engineers (target: ≤3 weeks)