Core Idea
Latency percentiles (P50, P90, P95, P99) are the correct lens for measuring system performance because latency distributions are long-tailed — averages hide the worst-case experiences that real users encounter. The Nth percentile means N% of requests complete within that time; the remaining (100-N)% are slower.
Definition
A latency percentile (Pn) is the response time threshold below which n% of requests fall:
- P50 (median): Half of requests complete within this time — your baseline, “typical” experience
- P90: 90% of requests are this fast or faster — the majority experience
- P95: Only 5% of requests are slower — an early-warning tier for emerging problems
- P99: Only 1% of requests are slower — tail latency, the worst-case for most users
- P99.9: Only 0.1% of requests are slower — extreme reliability requirements (financial, safety-critical)
Why Averages Fail
Latency distributions are not normal curves — they are long-tailed:
- GC pauses (JVM stop-the-world events), cold starts, cache misses, and network jitter create rare but dramatic outliers
- These outliers drag the arithmetic mean upward without affecting most users
- Example: A system where 90% of requests complete in 5ms and 10% take 1,000ms has a mean of ~104ms — a number that describes nobody’s actual experience
- The mean obscures the bimodal reality: most users see 5ms, some users see 1,000ms
The Percentile Ladder as a Diagnostic Tool
Each tier of the ladder signals something different:
- Stable P50, rising P99 → Rare pathological events (GC, lock contention, noisy neighbour) — not systemic load
- Rising P50 and P99 together → Systemic load problem — capacity or efficiency
- Wide P50-to-P99 gap → High variance/inconsistency; even if both pass SLO targets, the experience is unpredictable
- Narrow P50-to-P99 gap → Highly consistent system; the 99th percentile closely tracks the median
The Coordinated Omission Problem
Naive benchmarking tools systematically underreport bad latency — a phenomenon called coordinated omission (Gil Tene, Azul Systems):
- When a benchmark tool stalls waiting for a slow response, it stops issuing new requests
- This means the stall period generates zero latency measurements instead of thousands of “slow” ones
- The resulting histogram looks excellent because the worst moments are invisible
- Gil Tene: “The number one indicator you should never get rid of is the maximum value. That is not noise, that is the signal.”
- Correct approach: Tools like HdrHistogram account for scheduled-but-not-yet-issued requests during stalls
The 99th Percentile Is More Common Than It Seems
A common misconception: “P99 only affects 1% of users, so it’s not important.”
- On a page that loads 200 resources (images, scripts, API calls), the probability that a user escapes P99 latency on every single one is 0.99^200 ≈ 13%
- In other words, ~87% of page loads will be affected by at least one P99 event
- For microservices systems with fan-out calls, see Tail-Latency for how this compounds
The Skewed Call-Count Pitfall
Aggregate percentiles pool all requests together and say nothing about individual clients or tenants:
- A large enterprise customer generating 10× more requests than others (due to larger data volumes) may experience systematically worse latency
- Yet the aggregate P95 can still pass because their requests are diluted across the distribution
- Conversely, a high-volume client can dominate the distribution and mask the worse experience of lower-volume clients
- There is no canned solution: whether to measure aggregate, per-tenant, or per-endpoint percentiles is a business decision based on client contracts, data volume distribution, and accountability commitments
- For B2B SaaS with large enterprise customers, per-tenant percentiles are often the correct unit of accountability
Industry Benchmarks
| Domain | P99 Target | Rationale |
|---|---|---|
| AdTech (RTB) | < 100ms | Real-time bidding auction window |
| Financial trading | < 500μs | Order execution competitiveness |
| E-commerce page load | < 2s | Conversion rate preservation |
| Interactive APIs | < 200ms | Perceived instantaneousness threshold |
Google research: 500ms additional latency caused a 20% drop in search traffic. Amazon: 100ms of extra latency costs 1% of revenue.
Related Concepts
- Tail-Latency — Why P99 values compound catastrophically in distributed fan-out calls
- Service-Level-Indicators — How percentiles become formalized SLI/SLO/SLA commitments
- Operational-Measures — Percentiles as operational runtime metrics
- Measuring-Architecture-Characteristics — The broader measurement framework
- Fitness Functions — Automated enforcement of latency targets in CI/CD
- Fallacy-Latency-Is-Zero — Why latency magnitude matters in distributed systems
Sources
-
Aerospike (2024). “What Is P99 Latency? Understanding the 99th Percentile of Performance.” Aerospike Blog.
- Comprehensive practitioner overview of P99 with industry benchmarks
- Available: https://aerospike.com/blog/what-is-p99-latency/
-
OneUptime (2025). “P50 vs P95 vs P99 Latency Explained: What Each Percentile Tells You.” OneUptime Blog.
- Percentile ladder explanation with diagnostic interpretation guidance
- Available: https://oneuptime.com/blog/post/2025-09-15-p50-vs-p95-vs-p99-latency-percentiles/view
-
Tene, Gil (2014). “How NOT to Measure Latency.” Talk at Strange Loop Conference.
- Original exposition of coordinated omission and why maximum values matter
- Via: Grigorik, Ilya. “Everything You Know About Latency Is Wrong.” Brave New Geek.
- Available: https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
-
Gudigar, Anil (2024). “Mastering Latency Metrics: P90, P95, P99.” Javarevisited, Medium.
- Practitioner guide to interpreting each percentile tier and setting targets
- Available: https://medium.com/javarevisited/mastering-latency-metrics-p90-p95-p99-d5427faea879
-
Gatling (2024). “Latency Percentiles for Load Testing Analysis.” Gatling Blog.
- Load testing perspective on percentile selection and interpretation
- Available: https://gatling.io/blog/latency-percentiles-for-load-testing-analysis
Note
This content was drafted with assistance from AI tools for research, organization, and initial content generation. All final content has been reviewed, fact-checked, and edited by the author to ensure accuracy and alignment with the author’s intentions and perspective.