Service Level Indicators

Core Idea

The SLI/SLO/SLA hierarchy formalizes performance and reliability aspirations into measurable commitments. SLIs are the metrics you measure, SLOs are the targets you set for those metrics, and SLAs are the external contracts with consequences for missing them. Error budgets derived from SLOs turn reliability governance into a continuous decision-making framework rather than a periodic audit.

The three-tier hierarchy:

SLI (Service Level Indicator): A quantitative measure of system behaviour—p95 request latency, p99 error rate, availability ratio. SLIs are facts, not targets
SLO (Service Level Objective): A target range for an SLI, e.g., “p95 latency < 200ms over a 30-day rolling window.” Internal commitments—missing one triggers internal action (deployment freeze, engineering sprint). SLOs should maintain a safety margin below the external SLA
SLA (Service Level Agreement): An externally committed contract with financial or legal consequences for violation. The key distinction: SLA violations have external consequences; SLO violations trigger internal response

Selecting the right percentile for your SLI:

User-facing interactive services: Target p95—this is the majority experience
Internal platforms and APIs: Target p99—all tenants matter equally and have no alternative
Batch systems: Throughput-focused SLIs (jobs per hour, queue depth) rather than per-request latency

Error budgets as a governance mechanism: Error budget = 1 − SLO target. A 99.5% availability SLO yields 0.5%—roughly 3.6 hours of allowed downtime over 30 days. Budget burning fast means freeze non-critical deployments and prioritize reliability work. Budget ample means accelerate feature velocity. This converts reliability from a subjective debate into a quantified business trade-off.

Alerting on burn rate, not raw thresholds: Alerting when p99 exceeds 500ms for one minute produces alert storms and misses slow gradual degradation. Better: alert when the error budget is being consumed at 5× the expected rate for the past hour—catching both acute spikes and slow burns while reducing page fatigue.

Anti-patterns to avoid:

Setting SLOs from current performance—bakes in existing inefficiencies rather than reflecting user needs
Monitoring averages instead of percentiles—average latency hides tail latency (see Latency-Percentiles)
Defining SLOs without an agreed-upon error budget policy—makes them decoration, not governance

Latency-Percentiles — The measurement foundation for latency SLIs
Availability — SLA availability targets (nines) and how they relate to SLOs
Fitness Functions — Automated enforcement of SLO thresholds in CI/CD
Operational-Measures — SLIs as a specific class of operational runtime metrics
Architecture-Decision-Records — SLO targets warrant ADR documentation as architectural decisions

Sources

Beyer, Betsy, Chris Jones, Jennifer Petoff, and Niall Richard Murphy (2016). Site Reliability Engineering: How Google Runs Production Systems. O’Reilly Media. Chapter: “Service Level Objectives.” ISBN: 978-1-491-92912-4.
- Canonical definition of SLI/SLO/SLA hierarchy and error budget concept
- Available: https://sre.google/sre-book/service-level-objectives/
Uptrace (2025). “Defining SLA/SLO-Driven Monitoring Requirements in 2025.” Uptrace Blog.
- Practitioner guide to implementing SLO-driven monitoring and alerting
- Available: https://uptrace.dev/blog/sla-slo-monitoring-requirements
Nobl9 (2024). “SLO Metrics: A Best Practices Guide.” Nobl9 Blog.
- Best practices for selecting SLIs and setting realistic SLO targets
- Available: https://www.nobl9.com/service-level-objectives/slo-metrics
incident.io (2024). “SLOs, SLAs, and SLIs: A complete guide to service reliability metrics.” incident.io Blog.
- Comprehensive guide distinguishing the three tiers with practical examples
- Available: https://incident.io/blog/slo-sla-sli

Note

This content was drafted with assistance from AI tools for research, organization, and initial content generation. All final content has been reviewed, fact-checked, and edited by the author to ensure accuracy and alignment with the author’s intentions and perspective.

Manu's Vault

Explorer

Service Level Indicators

Sources

Graph View

Table of Contents

Backlinks

Manu's Vault

Explorer

Service Level Indicators

Related Concepts

Sources

Graph View

Table of Contents

Backlinks