SLO burn rates: a cheatsheet for picking thresholds
A pocket reference for the multi-window burn-rate thresholds you actually need to memorize.
the table
For a 28-day SLO window, here are the burn-rate thresholds we recommend (lifted from the Google CRE workbook with the rationale spelled out).
| Pattern | Short window | Long window | Burn-rate threshold | Detection time (worst case) |
|---|---|---|---|---|
| Fast burn | 1 hour | 6 hours | 14.4 | ~1 hour |
| Medium burn | 6 hours | 1 day | 6.0 | ~6 hours |
| Slow burn | 1 day | 3 days | 3.0 | ~1 day |
A burn rate of 14.4 means using up 1% of your budget in 1 hour. For a 28-day 99.9% SLO that's about 24 minutes of unreliability — page-worthy.
A burn rate of 6.0 means using up 1% of your budget in 6 hours. Slack-worthy for the day-shift on-call.
the why behind two windows
A single short window catches fast outages but gives a lot of false positives. A single long window has stable thresholds but takes ages to fire. The two-window AND combination keeps both properties — fast detection of real bad events, low false-positive rate from blips.
what to put in PagerDuty vs Slack
Roughly:
- 14.4 → PagerDuty (someone wakes up).
- 6.0 → Slack (someone fixes it during the day).
- 3.0 → Linear ticket auto-filed (someone fixes it this sprint).
The DevFlow alert calculator in the dashboard does this math for you against your specific window.
— Amaia