Case study: Cleo Software
Background
Cleo Software operates a financial-coaching API serving over four million end users. The team's previous synthetic-monitoring stack ran 5-minute checks from a single region — adequate for the first product version, inadequate by Series B.
The problem
In Q3 2024 Cleo shipped two latency regressions that took over 40 minutes to detect — both because their previous monitoring ran every 5 minutes and only from us-east-1. By the time the on-call team saw the alert, customer support had already filed three tickets.
What we built together
Cleo migrated to DevFlow Watch in October 2024. They moved their critical revenue paths (login, balance-fetch, send) to 30-second checks across us-east-1, eu-west-1, and ap-southeast-1. They adopted SLO-based alerting with multi-window burn rates, replacing the per-failure pager culture they had previously.
Results
| Metric | Before | After |
|---|---|---|
| Time to detect (regression) | 40 min | 3 min |
| Pager events / week | 17 | 4 |
| SLO attainment (90-day) | 99.84% | 99.93% |
| On-call sentiment (NPS) | -12 | +34 |
“We caught a 700ms regression three minutes after deploy. The previous tool wouldn't have noticed for an hour.”
Cleo continues to use DevFlow as the source of truth for its public uptime page and its quarterly board reliability review. Their on-call rotation has shrunk from "everyone, sometimes" to "two people, predictably".