Why You Shouldn't Compare Teams Using Metrics

In the right hands, a team's metrics are a mirror. In the wrong hands, the same numbers become a lever. An essay on why team-level metrics should never be used to compare teams — and what actually goes wrong when leaders reach down for the numbers.

Why You Shouldn't Compare Teams Using Metrics
Why Team Metrics Shouldn’t Be Used to Compare Teams

Numbers can illuminate the work, or they can distort it. The difference is almost always about who's reading them, and why.


A measure is a mirror. Until it becomes a lever.

In the right hands, a team's metrics are a mirror. They reflect the shape of the work back to the people doing it. Cycle times. Throughput. Velocity. Rework. Mood. The quieter signals — where energy drains, where a conversation keeps stalling, where momentum returns after a difficult week.

These numbers are close to the work. They describe the lived reality of creating something. When the team reads them, the team sees itself more clearly.

In the wrong hands, the same numbers become a lever. Someone distant from the work reaches down, compares one team's throughput to another team's throughput, and asks the obvious-sounding question: why are they faster than you?

The moment that question is asked, the mirror stops reflecting. The team starts performing for the observer rather than examining itself for the learning. The signal degrades. The number improves. The outcome does not.

This is the specific failure mode I want to name.


Editor's note — where this sits

This article sits in the Physics layer of the Idea to Value system — the layer concerned with how ideas move toward value, and how measurement either clarifies or distorts that movement. It is a deep-dive on a specific sub-argument within the canonical piece on measurement, KPIs, Metrics and Measures. It is also Wiring work: the argument is ultimately about how signals designed for one audience get corrupted when they travel to another, and what happens to meaning in transit.

The Idea to Value system — five layers

The mapDirection & orientationWhere we're going and where we are
The physicsHow ideas move to valueThis article
The wiringCommunication & meaningHow signals travel — also relevant
The engineHuman creative intelligenceThe full range of intelligence applied to work
The flywheelHabits & compounding practiceSmall actions that build lasting capability
Explore the full Idea to Value system →

What team-level metrics are actually for

Team-level metrics have one honest purpose: self-improvement. They exist to help the team ask three questions.

Where are we stuck? Where are we repeating ourselves? Where can we make this smoother?

That's it. That's the entire brief. The numbers are instruments for internal reflection, not scorecards for external judgement. When used that way, they help. When repurposed, they actively harm.

The harm comes from a specific structural confusion about what measurement is for at different levels of an organisation. Measurement at the team level is concerned with the mechanics of the work — how it flows, where it stalls, what's getting in the way. Measurement above the team level is concerned with the trajectory of the enterprise — milestones, financials, value realised, waypoints. Both are legitimate. Both are necessary. Neither should replace the other.

Blur those layers, and something predictable happens. The organisation starts optimising the report instead of the result. Leaders begin asking for team-level data to inform enterprise-level decisions. Teams begin shaping their team-level signals to survive enterprise-level scrutiny.

Work fragments. Tasks shrink. Movement gets mistaken for progress. Behaviour bends toward the metric instead of toward the value.

Nothing was built wrong. The mechanism is just being used for the opposite of its purpose.


Every team is incomparable

There's a quieter problem underneath the structural one. Even if you solved the layer confusion — even if leadership only ever looked at team-level data for genuine developmental reasons — the comparison itself is still broken.

Every team is different. Different domain. Different complexity. Different legacy code, legacy decisions, legacy people. Different history of trust. Different history of learning. Different problems that the organisation handed to them and different problems the team inherited without anyone noticing.

When you rank two teams by a single productivity measure, you're asserting that everything on the list above is either equal or irrelevant. Both claims are wrong. The numbers look comparable — they're in the same units, on the same dashboard, produced by the same tooling — but the underlying realities they describe are not.

This is the tempting mistake. Metrics look like they should be comparable because they are made of the same arithmetic. A team doing fifteen story points a week and a team doing eight story points a week appear, numerically, to be performing differently. But "story points" is a team's own internal unit for its own internal work on its own internal problems. Comparing them is like comparing two novels by word count.


Separating the views

Healthy organisations don't fight this problem. They design around it, by separating the layers of measurement cleanly:

  • Teams see the mechanics of their own work. Cycle time, flow, quality, learning, the texture of how they move. This data lives with them.
  • Leaders see the trajectory of the enterprise. Value realised, milestones hit, customer outcomes, financial health. This data lives at their level.
  • Planners see the flow between. What's moving through the pipeline, where bottlenecks appear, where investment is translating into outcome.

Each lens is valid. Each shows something the others can't. None of them should be forced to do the job of another.

The work of leadership, when it comes to measurement, is not to collapse these views into a single universal dashboard. It's to maintain the distinctness of them, and to know which one to look at when.

A leader asking for team-level metrics to inform an enterprise-level decision is usually asking the wrong question at the wrong altitude. A team handing up their internal flow data as if it were an enterprise performance report is usually being forced to solve the wrong problem.


The quiet point

The purpose of measurement isn't to accelerate activity. It's to improve the outcomes. And outcomes aren't created by numbers. They're created by people, using numbers wisely, quietly, and mostly for themselves.

A good measure helps someone make a decision. If the measure isn't helping a specific person make a specific decision, it isn't a signal — it's just noise wearing the clothes of data.

At the team level, the person making the decision is the team. They're the only audience for their own mirror. The moment that mirror is held up to anyone else, it becomes something else entirely — and the thing it becomes is almost always worse than useless.


This article sits alongside the canonical piece on measurement:

KPIs, Metrics and Measures — How to Track What Actually Matters — the foundational essay on the four categories of measurement (financial, products and services, people, delivery) and why measurement should guide understanding, not control behaviour. This article is the deep-dive on one specific distortion within that framework.

When the Measure Becomes the Problem — the closely related essay on what happens when metrics drift from instruments to objectives.

Why Too Many Metrics Create Confusion — the companion piece on mixing metrics across layers, and why that mixing erodes clarity.


Go Deeper