WAF++ WAF++
Back to WAF++ Homepage

WAF-REL-050 – Circuit Breaker & Timeout Configuration

Description

All outgoing HTTP/gRPC calls MUST define explicit timeout values. Critical service dependencies MUST implement circuit breakers. Retry logic MUST use exponential backoff with jitter. Connection pools MUST define maximum sizes. No service may use default timeouts (undefined or infinite) for external calls.

Rationale

Cascading failures are the primary cause of major cloud outages. Without circuit breakers, a slow dependency gradually exhausts thread pools and connection pools until the dependent service itself fails and the cascade continues. Explicit timeouts prevent resource threads from waiting forever on non-responding services.

Threat Context

Risk Description

Thread Pool Exhaustion

Slow external API leaves all handler threads waiting → service completely blocked.

Connection Pool Depletion

Shared DB connection pool is exhausted → all dependent services fail.

Retry Storm

1000 clients retry synchronously without jitter → 1000x load spike on degraded service.

Optional Dep Brings Down Main Service

Non-critical enrichment API without circuit breaker → total failure instead of feature loss.

Requirement

  • Explicit timeouts for all outgoing calls (connect + read separately)

  • Circuit breaker for all critical synchronous dependencies

  • Retry: maximum 3 attempts, exponential backoff (100ms → 200ms → 400ms), jitter ±50%

  • Connection pool: maximum size per dependency class defined

  • Bulkhead: separate resource pools for different dependency classes

  • Load balancer: explicit idle_timeout – no provider default

Implementation Guidance

  1. Timeout audit: Check all outgoing HTTP clients for explicit timeout values

  2. Configure circuit breaker: Resilience4j, pybreaker, or service mesh outlierDetection

  3. Configure retry: maxAttempts=3, initialDelay=100ms, multiplier=2, jitter=0.5

  4. Connection pools: Separate HTTP client instances per dependency

  5. ALB idle_timeout: Set explicitly to match API latency (typically 30s for REST APIs)

  6. Chaos test: Validate circuit breaker through latency injection

Maturity Levels

Level Name Criteria

1

No Timeouts

Default/infinite timeouts for external calls.

2

Timeouts Configured

Connect and read timeouts defined for external HTTP calls.

3

Circuit Breaker + Retry

CB for all critical deps; retry with backoff and jitter; connection pools.

4

Bulkheads + Service Mesh

Bulkhead isolation; Istio/Linkerd manages CB declaratively; chaos tests.

5

Adaptive Thresholds

CB thresholds auto-tuned; request hedging; complete resilience matrix.

Terraform Checks

waf-rel-050.tf.aws.alb-idle-timeout

Checks: ALB has explicit idle_timeout – no provider default.

Compliant Non-Compliant
resource "aws_lb" "api" {
  name               = "payment-api-alb"
  load_balancer_type = "application"
  security_groups    =
    [aws_security_group.alb.id]
  subnets = var.public_subnet_ids
  idle_timeout = 30  # Explicitly set
  tags = var.mandatory_tags
}
resource "aws_lb" "api" {
  name               = "payment-api-alb"
  load_balancer_type = "application"
  security_groups    =
    [aws_security_group.alb.id]
  subnets = var.public_subnet_ids
  # No idle_timeout –
  # AWS default 60s is used
  # WAF-REL-050 Violation
}

Remediation: Set idle_timeout explicitly. REST APIs: 30s; file uploads: 300s.

Evidence

Type Required Description

IaC

✅ Required

Terraform or service mesh configuration with timeout and circuit breaker settings.

Config

✅ Required

Application configuration files with explicit timeout values for all dependencies.

Process

Optional

Latency injection test results with circuit breaker activation documented.