Core Idea

Latency percentiles (P50, P90, P95, P99) are the correct lens for measuring system performance because latency distributions are long-tailed — averages hide the worst-case experiences that real users encounter. The Nth percentile means N% of requests complete within that time; the remaining (100-N)% are slower.

A latency percentile (Pn) is the response time threshold below which n% of requests fall:

  • P50 (median): Half of requests complete within this time — baseline “typical” experience
  • P95: Only 5% of requests are slower — early-warning tier for emerging problems
  • P99: Only 1% of requests are slower — tail latency, the worst-case for most users
  • P99.9: Only 0.1% are slower — extreme reliability requirements (financial, safety-critical)

Why averages fail: Latency distributions are long-tailed, not normal curves. GC pauses, cold starts, cache misses, and network jitter create rare but dramatic outliers. Example: if 90% of requests complete in 5ms and 10% take 1,000ms, the mean is ~104ms—a number that describes nobody’s actual experience.

The percentile ladder as a diagnostic tool:

  • Stable P50, rising P99: Rare pathological events (GC, lock contention, noisy neighbour)—not systemic load
  • Rising P50 and P99 together: Systemic load problem—capacity or efficiency
  • Wide P50-to-P99 gap: High variance; experience is unpredictable even if both pass SLO targets

P99 is more common than it seems: On a page loading 200 resources, the probability a user escapes P99 latency on every single one is 0.99²⁰⁰ ≈ 13%—meaning ~87% of page loads are touched by at least one P99 event. In microservices with fan-out calls, this compounds further (see Tail-Latency).

Coordinated omission: Naive benchmarks systematically underreport bad latency. When a tool stalls waiting for a slow response, it stops issuing new requests, generating zero measurements during the worst moments. As Gil Tene noted: “The number one indicator you should never get rid of is the maximum value. That is not noise, that is the signal.”

The skewed call-count pitfall: Aggregate percentiles say nothing about individual tenants. A large enterprise customer generating 10× more requests may experience systematically worse latency while the aggregate P95 still passes. Whether to measure aggregate, per-tenant, or per-endpoint percentiles is a business decision.

Sources

Note

This content was drafted with assistance from AI tools for research, organization, and initial content generation. All final content has been reviewed, fact-checked, and edited by the author to ensure accuracy and alignment with the author’s intentions and perspective.