Skip to content

Retry Policies

Named retry policies provide preset configurations for common retry patterns instead of raw max_attempts values.

Policies

PolicyMax AttemptsBase DelayBackoffMax DelayUse Case
none1Steps that must not retry
standard31s2x exponential30sDefault for implementation steps
aggressive5200ms2x exponential30sAPI calls, fetches, publishes
patient35s3x exponential90sAnalysis, scanning, exploration

Usage

yaml
steps:
  - id: fetch
    retry:
      policy: aggressive        # 5 attempts, fast backoff

  - id: implement
    retry:
      policy: standard          # 3 attempts, balanced
      max_attempts: 5           # override: more attempts than default

  - id: analyze
    retry:
      policy: patient           # 3 attempts, slow backoff

Explicit fields override policy defaults — set policy for the base, then override individual fields as needed.

Model Tier Escalation on Retry

When a step retries, Wave automatically escalates the model one tier stronger along the cost ladder cheapest -> balanced -> strongest. The first retry moves up one tier, the second another, and once strongest is reached further retries stay there.

yaml
steps:
  - id: implement
    persona: craftsman
    model: cheapest          # attempt 1: cheapest -> haiku
    retry:
      policy: standard       # attempt 2: balanced -> adapter default
                             # attempt 3: strongest -> opus

Escalation only applies when the step's effective model is a recognized tier name (cheapest, balanced, strongest). Literal model IDs (e.g. claude-opus-4, gpt-4o-mini) are user-pinned overrides and are preserved verbatim across retries.

Set retry.no_escalate: true to disable escalation and reuse the same model across retries:

yaml
steps:
  - id: scan
    persona: navigator
    model: cheapest
    retry:
      policy: standard
      no_escalate: true      # keep using cheapest on every retry

Failure Classification

The retry system classifies failures into 6 categories:

ClassRetryable?Example
transientYes (auto-retry)API 429, timeout
deterministicNoInvalid API key, missing binary
budget_exhaustedNo (trigger fallback)Context window exceeded
contract_failureYes (rework)JSON schema mismatch
test_failureYes (fix loop)go test exit code 1
canceledNoSIGINT, timeout

Circuit Breaker

Repeated identical failures terminate the step, preventing infinite retry loops on persistent issues:

yaml
runtime:
  circuit_breaker:
    limit: 3
    tracked_classes: [deterministic, contract_failure, test_failure]

Failure fingerprinting: The circuit breaker tracks identical errors by creating a fingerprint from step ID, failure class, and error message. Only the same error repeated counts—not different errors.

tracked_classes: Configure which failure types count toward the limit:

  • deterministic — Invalid API keys, missing binaries (won't succeed on retry)
  • contract_failure — Schema mismatches, output validation failures
  • test_failure — Test suite failures
  • transient — Network timeouts, rate limits
  • budget_exhausted — Context window exceeded

vs max_visits: max_visits counts any step visit (same or different errors), useful for limiting total attempts. Circuit breaker only trips on repeated identical errors, useful for detecting persistent failures.

Stall Watchdog

Steps producing no progress events for 30 minutes are terminated:

yaml
runtime:
  stall_timeout: 1800s

Released under the MIT License.