Troubleshooting: false positives and noise

Owner Kelly Mendoza · Last updated 2026-03-16 · v2.4

troubleshootingfalse-positivenoisedebugtuning

False positives

A false positive is an alert that fired when the system was actually fine. It's a more general problem than flapping (troubleshooting-flapping-monitors) — the cure is similar but worth thinking about distinctly.

diagnose first

Before tuning, ask: was the customer-facing request actually fine, or did our check just disagree?

Check the upstream's own metrics for the same window. If they were 100% healthy, the false positive is real.
Check the assertion that failed. Was it about the customer experience, or an implementation detail?

If the upstream had a real blip — even one your customers wouldn't notice — that is a true positive. The right fix is upstream, not in DevFlow.

fix candidates

Loosen the assertion to what customers actually depend on. A response that returns a stable contract but with a new optional field shouldn't fail your check.
Move from monitor-failure paging to SLO burn-rate paging. A 0.5% blip doesn't trip a 14.4× burn rate over a 1h window — see slo-multi-window-alerting.
Use multi-region with multi-region-setup fail-quorum to absorb single-edge weirdness.
Use retry-policy for endpoints with known transient noise.

TLS-related noise

If you keep getting handshake failures, check whether your server's cert chain includes the intermediate. Browsers fix this with cached intermediates; curl and our probe don't. mtls covers the related debug command.

rate-limit-related noise

If your service rate-limits anonymous traffic, you may see DevFlow's checks getting 429. Service-account auth + IP allow-list (rest-api-authentication) is the usual fix on your side.

still seeing false positives?

bash

devflow monitor inspect <name> --window 7d --include-assertions

Prints assertion-by-assertion outcomes for every check. Pattern-match what's failing — it's usually obvious.