Product · SLO tracking
SLOs that say something useful.
Most SLO products are uptime checkboxes with extra fields. Our model is the one in the SRE workbook: a target, a window, an explicit good-event definition, and an error budget that gets spent when things go wrong. We page on burn rate, not on raw failures.
yaml
name: payments-availability-99.9
target: 0.999
window: 28d
sli:
source_monitor: payments-api-charge
good_event:
status_in: [200, 201, 204]
latency_lt_ms: 1000
alert:
rule: |
burn_rate(short=1h) > 14.4
AND
burn_rate(long=6h) > 14.4
channels: [pagerduty:payments-oncall]Multi-window burn-rate alerting
The Google CRE pattern. A short window catches fast outages; a long window catches slow regressions. Both windows have to agree before an alert fires, which kills almost all transient false positives.
Amaia’s burn-rate cheat-sheet is a useful pocket reference. The full doc is slo-multi-window-alerting.
Status pages, included.
Every SLO can back a public or private status page component. NorthLoop runs a per-customer status page off this — see the NorthLoop case study for the operational shape.