WAF-REL-050 – Circuit Breaker & Timeout Configuration
Description
All outgoing HTTP/gRPC calls MUST define explicit timeout values. Critical service dependencies MUST implement circuit breakers. Retry logic MUST use exponential backoff with jitter. Connection pools MUST define maximum sizes. No service may use default timeouts (undefined or infinite) for external calls.
Rationale
Cascading failures are the primary cause of major cloud outages. Without circuit breakers, a slow dependency gradually exhausts thread pools and connection pools until the dependent service itself fails and the cascade continues. Explicit timeouts prevent resource threads from waiting forever on non-responding services.
Threat Context
| Risk | Description |
|---|---|
Thread Pool Exhaustion |
Slow external API leaves all handler threads waiting → service completely blocked. |
Connection Pool Depletion |
Shared DB connection pool is exhausted → all dependent services fail. |
Retry Storm |
1000 clients retry synchronously without jitter → 1000x load spike on degraded service. |
Optional Dep Brings Down Main Service |
Non-critical enrichment API without circuit breaker → total failure instead of feature loss. |
Requirement
-
Explicit timeouts for all outgoing calls (connect + read separately)
-
Circuit breaker for all critical synchronous dependencies
-
Retry: maximum 3 attempts, exponential backoff (100ms → 200ms → 400ms), jitter ±50%
-
Connection pool: maximum size per dependency class defined
-
Bulkhead: separate resource pools for different dependency classes
-
Load balancer: explicit
idle_timeout– no provider default
Implementation Guidance
-
Timeout audit: Check all outgoing HTTP clients for explicit timeout values
-
Configure circuit breaker: Resilience4j, pybreaker, or service mesh
outlierDetection -
Configure retry:
maxAttempts=3,initialDelay=100ms,multiplier=2,jitter=0.5 -
Connection pools: Separate HTTP client instances per dependency
-
ALB idle_timeout: Set explicitly to match API latency (typically 30s for REST APIs)
-
Chaos test: Validate circuit breaker through latency injection
Maturity Levels
| Level | Name | Criteria |
|---|---|---|
1 |
No Timeouts |
Default/infinite timeouts for external calls. |
2 |
Timeouts Configured |
Connect and read timeouts defined for external HTTP calls. |
3 |
Circuit Breaker + Retry |
CB for all critical deps; retry with backoff and jitter; connection pools. |
4 |
Bulkheads + Service Mesh |
Bulkhead isolation; Istio/Linkerd manages CB declaratively; chaos tests. |
5 |
Adaptive Thresholds |
CB thresholds auto-tuned; request hedging; complete resilience matrix. |
Terraform Checks
waf-rel-050.tf.aws.alb-idle-timeout
Checks: ALB has explicit idle_timeout – no provider default.
| Compliant | Non-Compliant |
|---|---|
|
|
Remediation: Set idle_timeout explicitly. REST APIs: 30s; file uploads: 300s.
Evidence
| Type | Required | Description |
|---|---|---|
IaC |
✅ Required |
Terraform or service mesh configuration with timeout and circuit breaker settings. |
Config |
✅ Required |
Application configuration files with explicit timeout values for all dependencies. |
Process |
Optional |
Latency injection test results with circuit breaker activation documented. |