Troubleshooting: flapping monitors
Flapping monitors
A flapping monitor goes red, green, red, green every few checks. It's almost always one of three causes.
1. timeout too tight
If the upstream's p95 is 1,800 ms and your timeout is 1,500 ms, you'll be red ~5% of the time. Loosen the timeout to match real latency, or tighten the upstream.
timeout_ms: 5000 # was 1500The dashboard's per-monitor latency panel shows p50 / p95 / p99 over the window — pick a timeout above p99.
2. assertion too strict
A response that includes a generated request_id will fail an assertion that checks for an exact string match. Switch to a presence/type check (response-assertions):
- body_jsonpath:
path: $.request_id
type: string # was: eq: "REQ-12345"3. retry policy off when it should be on
A 30-second monitor against an endpoint with a 0.5% transient error rate flaps unless you retry. Three attempts with exponential backoff usually fixes it (retry-policy).
retry:
max_attempts: 3
backoff: exponential
initial_delay_ms: 250
jitter: fullwhat flapping is NOT
A flap is not "the upstream is down 1% of the time." That's a real reliability problem and the monitor is correctly reporting it. Use a slo-overview burn-rate alert instead of a per-check page — the slo-multi-window-alerting pattern is exactly for this.
tools
devflow monitor inspect payments-api-charge --window 24hPrints per-check timing, assertion outcomes, and the regions that disagreed.
still flapping?
If none of these apply, contact support@devflow.io with the monitor ID and a 24h window — Kelly's team will look. We've seen DNS-anycast-related flap before; it's rare but real.