Retry policy: exponential backoff, jitter, and idempotency

Owner Rohan Mehta · Last updated 2026-03-18 · v3.7

retrybackoffjitteridempotencymonitor

Retry policy

DevFlow can retry a failed check before counting it as a failure. Retry is opt-in, off by default for most monitors.

why retry at all

A single failure on a 30-second monitor is, conditional on a real outage being rare, more likely to be a transient blip than a real problem. Retry trades a small amount of alert latency for a large reduction in false positives.

the policy

yaml

retry:
  max_attempts: 3
  backoff: exponential
  initial_delay_ms: 250
  max_delay_ms: 2000
  jitter: full

max_attempts: 3 means three total tries — original + two retries.
backoff: exponential doubles initial_delay_ms after each failure.
jitter: full applies AWS-style full jitter to spread retry storms across edges.

idempotency

If the underlying call mutates state (POST that creates an order, etc.), retry only if the API is idempotent — typically with an Idempotency-Key header. The http-monitor-basics doc shows how to use {{run_id}} for that.

yaml

headers:
  Idempotency-Key: "df-{{run_id}}"

Otherwise: max_attempts: 1.

interaction with frequency

Retry costs latency budget. A 30-second monitor that retries 3x with 2-second max delay can take ~10 seconds longer than the base case. If your alerting target is sub-30-second, lower max_delay_ms or accept fewer retries — see monitor-frequency for the math.

interaction with multi-region

Multi-region quorum already absorbs single-region blips. If you've enabled fail-quorum across 3+ regions (multi-region-setup), retry is often unnecessary — and adds noise to investigation.