Five Engineering Velocity Metrics and How to Fix Them

The companion case study Reading a Velocity Heat Map Live shows how I turned a grid of these numbers into a single decision in a live session. This post is the reference behind it: what each metric means and what to do about it.

Most engineering organizations already have velocity data. Dashboards full of it. What they usually lack is the next sentence: given this number, what do we change on Monday. A metric you cannot act on is just anxiety with a y-axis.

Here are five delivery metrics I keep coming back to, each with a plain-English name, what a bad reading is actually telling you, and the fix that tends to move it. Then the part most teams skip: how to choose which one to work on first.

1. Cycle Time, or "Ship Faster"

What it measures: how long a unit of work takes from first commit to in production.

What a bad number is telling you: work is sitting, not flowing. Long cycle time is almost never about typing speed. It is about wait states: waiting for review, waiting for a deploy window, waiting for a dependency, waiting for a decision. The code is done; the system around it is slow.

The fix that usually moves it: find the longest wait, not the longest task. Shrink batch size so changes are small and reviewable. Automate the deploy path so shipping is not an event. Look hard at the handoffs between "developer done" and "users have it," because that gap is where the days hide.

2. PR Success Rate, or "Waste Less Work"

What it measures: the share of pull requests that merge cleanly versus those abandoned, reverted, or heavily reworked after the fact.

What a bad number is telling you: effort is going in and not coming out. A low success rate means people are building things that get thrown away, which is the most demoralizing kind of waste because it is invisible in a velocity-of-commits view. Often it points upstream: unclear requirements, missing alignment before code, or work that started before it was understood.

The fix that usually moves it: move the conversation earlier. Lightweight design alignment before a branch is opened, clearer acceptance criteria, and a quick "is this the right thing" check before the expensive "is this built right" review. You want to spend the disagreement before the code, not after.

3. PR Review Cycles, or "Avoid Approval Pong"

What it measures: how many back-and-forth rounds a pull request goes through before it merges.

What a bad number is telling you: the review process itself is the bottleneck. Three, four, five rounds of comments mean reviewers and authors are ping-ponging, and every round adds a context-switch and a wait. This is the metric I most often see hiding as the real constraint, because it taxes every single change the team makes.

The fix that usually moves it: make reviews smaller and faster. Cap PR size so a review fits in one sitting. Set a team norm on review turnaround. Separate blocking feedback from nice-to-have. Pair or mob on the gnarly changes so the review happens as the code is written instead of after. The goal is one or two clean rounds, not a negotiation.

4. Rework Rate, or "Fix and Adjust Less"

What it measures: how much recently shipped code gets changed again soon after, a proxy for churn and instability.

What a bad number is telling you: quality is leaking and you are paying for it twice. High rework means things ship before they are right, then come back. It can be thin testing, unclear requirements (again), or pressure to ship that defers the real work into a more expensive future. Some rework is healthy iteration; a lot of it is a tax.

The fix that usually moves it: invest where the churn concentrates. Better automated tests on the hot spots, tightening the definition of done, and protecting the time to do it right the first time. If a few files account for most of the rework, that is an architecture conversation, not a discipline one.

5. Traceability, or "Stick to the Plan"

What it measures: how reliably shipped work connects back to a tracked intent, a ticket, an outcome, a decision.

What a bad number is telling you: you cannot see the line from strategy to code. Low traceability means work is happening that no one can connect to a goal, which makes prioritization guesswork and makes it impossible to tell whether you are building the right things at all. It is the quiet one, but it undermines every other metric, because if you cannot trace work to intent you cannot trust the rest of the picture.

The fix that usually moves it: make the link cheap, not bureaucratic. Tie commits and PRs to the work item with the lightest touch that still creates the trail, and connect work items to outcomes, not just outputs. The aim is a visible thread from "why" to "what shipped," without turning engineers into clerks.

The move that matters most: pick one

Here is the part teams get wrong. Faced with five metrics in the red, the instinct is to launch five workstreams. That is the fastest way to move none of them.

The better move is to find the single metric that is both the highest pain and the most feasible, where "feasible" means you could start tomorrow, and aim there. In practice the leverage point is often review cycles, because it taxes every change the team makes, and because it is usually fixable with team norms rather than a re-platforming. But the right answer depends on your numbers, not a template.

So:

Turn the grid into a picture. Color the painful cells. The diagnosis becomes visible before anyone argues.
Rename the metrics into things people remember and repeat. "Avoid Approval Pong" gets acted on; "PR review cycles" gets nodded at.
Plot pain against "can we start tomorrow," and let the board choose the target.
Write the fix as a falsifiable bet: "we believe doing X leads to faster releases at higher quality," then ship it and check.

Fifteen numbers are a table. A diagnosis is a decision. The metrics are only worth collecting if they end in a thing you change.

If you want help reading your own delivery data and turning it into one clear next step, that is exactly the kind of working session I run. Bring me the grid and watch me whiteboard it: the Watch Me Think offer. To practice the framing yourself, the Whiteboard Prompt Translator scaffolds the moves from any prompt.

The companion case study Reading a Velocity Heat Map Live shows how I turned a grid of these numbers into a single decision in a live session. This post is the reference behind it: what each metric means and what to do about it.

Turn the grid into a picture. Color the painful cells. The diagnosis becomes visible before anyone argues.
Rename the metrics into things people remember and repeat. "Avoid Approval Pong" gets acted on; "PR review cycles" gets nodded at.
Plot pain against "can we start tomorrow," and let the board choose the target.
Write the fix as a falsifiable bet: "we believe doing X leads to faster releases at higher quality," then ship it and check.

Fifteen numbers are a table. A diagnosis is a decision. The metrics are only worth collecting if they end in a thing you change.

Five Engineering Velocity Metrics, and What to Actually Do About Them

1. Cycle Time, or "Ship Faster"

2. PR Success Rate, or "Waste Less Work"

3. PR Review Cycles, or "Avoid Approval Pong"

4. Rework Rate, or "Fix and Adjust Less"

5. Traceability, or "Stick to the Plan"

The move that matters most: pick one

Want to work together?

Five Engineering Velocity Metrics, and What to Actually Do About Them

1. Cycle Time, or "Ship Faster"

2. PR Success Rate, or "Waste Less Work"

3. PR Review Cycles, or "Avoid Approval Pong"

4. Rework Rate, or "Fix and Adjust Less"

5. Traceability, or "Stick to the Plan"

The move that matters most: pick one

Want to work together?