Skip to content

Retry Policies

Named retry policies provide preset configurations for common retry patterns instead of raw max_attempts values.

Policies

PolicyMax AttemptsBase DelayBackoffMax DelayUse Case
none1Steps that must not retry
standard31s2x exponential30sDefault for implementation steps
aggressive5200ms2x exponential30sAPI calls, fetches, publishes
patient35s3x exponential90sAnalysis, scanning, exploration

Usage

yaml
steps:
  - id: fetch
    retry:
      policy: aggressive        # 5 attempts, fast backoff

  - id: implement
    retry:
      policy: standard          # 3 attempts, balanced
      max_attempts: 5           # override: more attempts than default

  - id: analyze
    retry:
      policy: patient           # 3 attempts, slow backoff

Explicit fields override policy defaults — set policy for the base, then override individual fields as needed.

Failure Classification

The retry system classifies failures into 6 categories:

ClassRetryable?Example
transientYes (auto-retry)API 429, timeout
deterministicNoInvalid API key, missing binary
budget_exhaustedNo (trigger fallback)Context window exceeded
contract_failureYes (rework)JSON schema mismatch
test_failureYes (fix loop)go test exit code 1
canceledNoSIGINT, timeout

Circuit Breaker

Repeated identical failures terminate the step, preventing infinite retry loops on persistent issues:

yaml
runtime:
  circuit_breaker:
    limit: 3
    tracked_classes: [deterministic, contract_failure, test_failure]

Failure fingerprinting: The circuit breaker tracks identical errors by creating a fingerprint from step ID, failure class, and error message. Only the same error repeated counts—not different errors.

tracked_classes: Configure which failure types count toward the limit:

  • deterministic — Invalid API keys, missing binaries (won't succeed on retry)
  • contract_failure — Schema mismatches, output validation failures
  • test_failure — Test suite failures
  • transient — Network timeouts, rate limits
  • budget_exhausted — Context window exceeded

vs max_visits: max_visits counts any step visit (same or different errors), useful for limiting total attempts. Circuit breaker only trips on repeated identical errors, useful for detecting persistent failures.

Stall Watchdog

Steps producing no progress events for 30 minutes are terminated:

yaml
runtime:
  stall_timeout: 1800s

Released under the MIT License.