Case study: Cleo Software

Background

Cleo Software operates a financial-coaching API serving over four million end users. The team's previous synthetic-monitoring stack ran 5-minute checks from a single region — adequate for the first product version, inadequate by Series B.

The problem

In Q3 2024 Cleo shipped two latency regressions that took over 40 minutes to detect — both because their previous monitoring ran every 5 minutes and only from us-east-1. By the time the on-call team saw the alert, customer support had already filed three tickets.

What we built together

Cleo migrated to DevFlow Watch in October 2024. They moved their critical revenue paths (login, balance-fetch, send) to 30-second checks across us-east-1, eu-west-1, and ap-southeast-1. They adopted SLO-based alerting with multi-window burn rates, replacing the per-failure pager culture they had previously.

Results

Metric	Before	After
Time to detect (regression)	40 min	3 min
Pager events / week	17	4
SLO attainment (90-day)	99.84%	99.93%
On-call sentiment (NPS)	-12	+34

“We caught a 700ms regression three minutes after deploy. The previous tool wouldn't have noticed for an hour.”

— Lara Mendelsohn, Staff SRE, Cleo Software

Cleo continues to use DevFlow as the source of truth for its public uptime page and its quarterly board reliability review. Their on-call rotation has shrunk from "everyone, sometimes" to "two people, predictably".