Latency Percentiles

Core Idea

Latency percentiles (P50, P90, P95, P99) are the correct lens for measuring system performance because latency distributions are long-tailed — averages hide the worst-case experiences that real users encounter. The Nth percentile means N% of requests complete within that time; the remaining (100-N)% are slower.

A latency percentile (Pn) is the response time threshold below which n% of requests fall:

P50 (median): Half of requests complete within this time — baseline “typical” experience
P95: Only 5% of requests are slower — early-warning tier for emerging problems
P99: Only 1% of requests are slower — tail latency, the worst-case for most users
P99.9: Only 0.1% are slower — extreme reliability requirements (financial, safety-critical)

Why averages fail: Latency distributions are long-tailed, not normal curves. GC pauses, cold starts, cache misses, and network jitter create rare but dramatic outliers. Example: if 90% of requests complete in 5ms and 10% take 1,000ms, the mean is ~104ms—a number that describes nobody’s actual experience.

The percentile ladder as a diagnostic tool:

Stable P50, rising P99: Rare pathological events (GC, lock contention, noisy neighbour)—not systemic load
Rising P50 and P99 together: Systemic load problem—capacity or efficiency
Wide P50-to-P99 gap: High variance; experience is unpredictable even if both pass SLO targets

P99 is more common than it seems: On a page loading 200 resources, the probability a user escapes P99 latency on every single one is 0.99²⁰⁰ ≈ 13%—meaning ~87% of page loads are touched by at least one P99 event. In microservices with fan-out calls, this compounds further (see Tail-Latency).

Coordinated omission: Naive benchmarks systematically underreport bad latency. When a tool stalls waiting for a slow response, it stops issuing new requests, generating zero measurements during the worst moments. As Gil Tene noted: “The number one indicator you should never get rid of is the maximum value. That is not noise, that is the signal.”

The skewed call-count pitfall: Aggregate percentiles say nothing about individual tenants. A large enterprise customer generating 10× more requests may experience systematically worse latency while the aggregate P95 still passes. Whether to measure aggregate, per-tenant, or per-endpoint percentiles is a business decision.

Tail-Latency — Why P99 values compound catastrophically in distributed fan-out calls
Service-Level-Indicators — How percentiles become formalized SLI/SLO/SLA commitments
Operational-Measures — Percentiles as operational runtime metrics
Measuring-Architecture-Characteristics — The broader measurement framework
Fitness Functions — Automated enforcement of latency targets in CI/CD
Fallacy-Latency-Is-Zero — Why latency magnitude matters in distributed systems

Sources

Aerospike (2024). “What Is P99 Latency? Understanding the 99th Percentile of Performance.” Aerospike Blog.
- Comprehensive practitioner overview of P99 with industry benchmarks
- Available: https://aerospike.com/blog/what-is-p99-latency/
OneUptime (2025). “P50 vs P95 vs P99 Latency Explained: What Each Percentile Tells You.” OneUptime Blog.
- Percentile ladder explanation with diagnostic interpretation guidance
- Available: https://oneuptime.com/blog/post/2025-09-15-p50-vs-p95-vs-p99-latency-percentiles/view
Tene, Gil (2014). “How NOT to Measure Latency.” Talk at Strange Loop Conference.
- Original exposition of coordinated omission and why maximum values matter
- Via: Grigorik, Ilya. “Everything You Know About Latency Is Wrong.” Brave New Geek.
- Available: https://bravenewgeek.com/everything-you-know-about-latency-is-wrong/
Gudigar, Anil (2024). “Mastering Latency Metrics: P90, P95, P99.” Javarevisited, Medium.
- Practitioner guide to interpreting each percentile tier and setting targets
- Available: https://medium.com/javarevisited/mastering-latency-metrics-p90-p95-p99-d5427faea879
Gatling (2024). “Latency Percentiles for Load Testing Analysis.” Gatling Blog.
- Load testing perspective on percentile selection and interpretation
- Available: https://gatling.io/blog/latency-percentiles-for-load-testing-analysis

Note

This content was drafted with assistance from AI tools for research, organization, and initial content generation. All final content has been reviewed, fact-checked, and edited by the author to ensure accuracy and alignment with the author’s intentions and perspective.

Manu's Vault

Explorer

Latency Percentiles

Sources

Graph View

Table of Contents

Backlinks

Manu's Vault

Explorer

Latency Percentiles

Related Concepts

Sources

Graph View

Table of Contents

Backlinks