Retry policy: exponential backoff, jitter, and idempotency
Retry policy
DevFlow can retry a failed check before counting it as a failure. Retry is opt-in, off by default for most monitors.
why retry at all
A single failure on a 30-second monitor is, conditional on a real outage being rare, more likely to be a transient blip than a real problem. Retry trades a small amount of alert latency for a large reduction in false positives.
the policy
retry:
max_attempts: 3
backoff: exponential
initial_delay_ms: 250
max_delay_ms: 2000
jitter: fullmax_attempts: 3means three total tries — original + two retries.backoff: exponentialdoublesinitial_delay_msafter each failure.jitter: fullapplies AWS-style full jitter to spread retry storms across edges.
idempotency
If the underlying call mutates state (POST that creates an order, etc.), retry only if the API is idempotent — typically with an Idempotency-Key header. The http-monitor-basics doc shows how to use {{run_id}} for that.
headers:
Idempotency-Key: "df-{{run_id}}"Otherwise: max_attempts: 1.
interaction with frequency
Retry costs latency budget. A 30-second monitor that retries 3x with 2-second max delay can take ~10 seconds longer than the base case. If your alerting target is sub-30-second, lower max_delay_ms or accept fewer retries — see monitor-frequency for the math.
interaction with multi-region
Multi-region quorum already absorbs single-region blips. If you've enabled fail-quorum across 3+ regions (multi-region-setup), retry is often unnecessary — and adds noise to investigation.