WAF-REL-020 – Health Checks & Readiness Probes Configured
Description
All production services MUST expose health check endpoints and configure readiness/liveness probes. Load balancers MUST use health checks with explicitly configured paths, intervals and thresholds – no cloud provider defaults.
No deployment without working health checks. This is a non-negotiable prerequisite for automatic failover and zero-downtime deployments.
Rationale
Without health checks, faulty instances continue to receive traffic. Kubernetes without a readiness probe sends traffic to pods that are not yet ready or have already failed. Load balancers without explicit health check configuration use defaults that are often too tolerant (30s interval, no path validation).
Threat Context
| Risk | Description |
|---|---|
Traffic to Faulty Instances |
Without health check, LB continues routing requests to instances returning errors. |
Deadlock Undetected |
Without liveness probe, a deadlocked process runs forever and blocks resources. |
Premature Traffic |
Without readiness probe, an uninitialized pod receives traffic and produces errors. |
Default Timeout Too Tolerant |
Cloud provider defaults are often 30s interval and 3 failures – too long for fast recovery. |
Requirement
All services MUST:
-
Expose
/health/liveendpoint (liveness: process is alive) -
Expose
/health/readyendpoint (readiness: traffic-capable, dependencies OK) -
Kubernetes: configure
readinessProbeandlivenessProbewith measuredinitialDelaySeconds -
Load balancer health checks: explicit path, interval, timeout, healthy/unhealthy threshold
-
No cloud provider defaults for health check configuration
Implementation Guidance
-
Implement endpoints:
/health/live(process only),/health/ready(check deps) -
Measure initialDelaySeconds: Measure service startup time, add buffer
-
Configure intervals:
interval=15s,timeout=5s,failureThreshold=3 -
ALB health check:
path=/health/ready,interval=15,matcher=200 -
Liveness ONLY for process liveness: Do not check external dependencies in liveness probe
-
Test: Simulate health check failure in staging and observe behavior
Maturity Levels
| Level | Name | Criteria |
|---|---|---|
1 |
No Health Checks |
No probes; LB uses TCP ping as health check. |
2 |
Basic LB Health Check |
ALB health check on "/" configured; no Kubernetes probes. |
3 |
ReadinessProbe + LivenessProbe |
Both probes with measured delays; LB checks /health/ready; failures generate alerts. |
4 |
Deep Health Checks |
Readiness checks real dependencies; StartupProbe for slow services. |
5 |
Synthetic Monitoring |
External validation of health endpoints; health check latency as SLI. |
Terraform Checks
waf-rel-020.tf.aws.alb-target-group-health-check
Checks: ALB Target Group has explicit health_check block with path, interval and thresholds.
| Compliant | Non-Compliant |
|---|---|
|
|
Remediation: Add health_check block with explicit path, interval, timeout,
healthy_threshold and unhealthy_threshold.
Evidence
| Type | Required | Description |
|---|---|---|
IaC |
✅ Required |
Terraform or Kubernetes manifests with readiness and liveness probe configuration. |
Config |
✅ Required |
Load balancer health check configuration with explicit path and thresholds. |
Process |
Optional |
Test results: health check failure simulated and documented in staging. |