Core Idea
Calibration is the cross-manager process that ensures performance designations mean the same thing across teams. Without it, “Senior Engineer” in Team A drifts away from “Senior Engineer” in Team B, and the entire performance system loses credibility. Larson provides four rules that prevent the most common calibration failures.
Calibration System for Performance
A career ladder and review cycle alone cannot produce fair outcomes. Without calibration, each manager applies the ladder through their own lens — biased by advocacy for their own reports, persuasion skills, and local team context. Calibration corrects this by creating a shared, structured comparison process.
What Calibration Is
- A structured meeting where managers review performance assessments across teams
- Purpose: ensure designations are applied consistently against the ladder, not against each other
- Participants: managers at the same level (e.g., all engineering managers in a department)
- Output: designation confirmations, adjustments, and documented rationale
Larson’s Four Rules
1. Shared quest, not a competition
- All participants seek to apply the ladder consistently — not to advocate for their own reports
- Failure mode prevented: advocacy bias — managers arguing to “win” promotions for their people rather than evaluating fairly
- Practical signal: if managers feel defensive about their assessments, calibration has become adversarial
2. Read, don’t present
- Written assessments are shared before the meeting; everyone reads in advance
- The meeting is for discussion and comparison — not for managers to pitch their reports
- Failure mode prevented: presentation bias — charismatic or senior managers get better outcomes simply through persuasion
3. Compare to the ladder, not to peers
- The question is always: “Does this person’s impact match the level description?”
- Never: “Is this person better or worse than that person?”
- Failure mode prevented: relative ranking — punishes engineers on strong teams and rewards those on weaker ones
- Academic grounding: criterion-referenced appraisal consistently outperforms norm-referenced appraisal for fairness and development outcomes
4. Study the distribution
- After calibration, review the population distribution of designations
- Warning signs: 80%+ at “Exceeds Expectations”; senior levels far above industry benchmarks
- Healthy distribution: roughly bell-curved around the middle designation
- Failure mode prevented: grade inflation — all managers grade high to avoid difficult conversations, eroding the system’s meaning
What a Calibration Meeting Looks Like
- Assessments distributed to all participants 2-3 days before the meeting
- Participants read and note questions or comparisons
- Meeting opens with borderline or contested cases — not straightforward ones
- Discussion anchors on specific ladder criteria: “What evidence do we have for this impact at this scope?”
- Facilitator (usually the senior manager) tracks emerging distribution in real time
- Decisions documented with rationale; manager communicates outcome to engineer
Connection to Designation Momentum
Past ratings create institutional memory. An engineer calibrated as “Exceeds” builds momentum that makes future downward adjustments socially difficult — even when warranted. This is why applying the rules rigorously from the start matters more than correcting drift later.
Related Concepts
- Larson-2019-An-Elegant-Puzzle
- Performance-Management-System
- Designation-Momentum
- Career-Level-Dynamics
Sources
-
Larson, Will (2019). An Elegant Puzzle: Systems of Engineering Management. Stripe Press. ISBN: 978-1-7322651-8-9.
- Chapter 6.5 — primary source for the four rules of calibration and the calibration meeting structure
-
DeNisi, Angelo S. and Michael K. Murphy (2017). “Performance Appraisal and Performance Management: 100 Years of Progress?” Journal of Applied Psychology, Vol. 102(3), pp. 421-433. DOI: 10.1037/apl0000085.
- Comprehensive review of a century of appraisal research; documents leniency bias, halo effects, and distributional errors as persistent problems; supports criterion-referenced over norm-referenced approaches as more accurate and fair
-
Colquitt, Jason A. (2001). “On the Dimensionality of Organizational Justice: A Construct Validation of a Measure.” Journal of Applied Psychology, Vol. 86(3), pp. 386-400. DOI: 10.1037/0021-9010.86.3.386
- Foundational study on procedural fairness (N=776); demonstrates that consistent, bias-suppressed, representative procedures drive perceived fairness independent of outcome; theoretical underpinning for why calibration rules matter
-
Scullen, Steven E., Michael K. Mount, and Maynard Goff (2000). “Understanding the Latent Structure of Job Performance Ratings.” Journal of Applied Psychology, Vol. 85(6), pp. 956-970. DOI: 10.1037/0021-9010.85.6.956
- Analysis of variance in performance ratings; idiosyncratic rater effects account for 62% of variance — more than true performance — making cross-rater calibration essential for any fair system
-
Orosz, Gergely (2022). “Performance Reviews for Software Developers – How I Do Them In a (Hopefully) Fair Way.” The Pragmatic Engineer. Available: https://blog.pragmaticengineer.com/performance-reviews-for-software-engineers/
- Practitioner account of calibration practice at scale in tech; covers the “read before meeting” norm and distribution monitoring; corroborates Larson’s rules from industry experience
Note
This content was drafted with assistance from AI tools for research, organization, and initial content generation. All final content has been reviewed, fact-checked, and edited by the author to ensure accuracy and alignment with the author’s intentions and perspective.