WAF-PERF-020 – Auto-Scaling Configured & Tested
Description
All stateless production workloads with variable or unpredictable traffic MUST have auto-scaling configured. Auto-scaling policies MUST be based on meaningful metrics (request latency, request rate, queue depth). Auto-scaling configurations MUST be tested under realistic load before going into production.
No deployment to production without a validated scaling path.
Rationale
Static capacity creates an unsolvable dilemma: over-provisioning for peaks (expensive) or under-provisioning and degradation under unexpected load (risky). Auto-scaling solves this dilemma – but only if correctly configured. Wrong thresholds, missing cooldowns, or the absence of instance warmup configuration leads to scaling failure at the critical moment.
Any auto-scaling configuration that has never been tested under load is effectively nonexistent.
Threat Context
| Risk | Description |
|---|---|
Capacity Bottleneck During Load Spike |
Non-scaling service degrades or fails during traffic peaks. |
Scaling Oscillation |
Incorrectly configured cooldowns lead to constant scale-out/scale-in (cost + instability). |
Scaling Too Late |
Threshold too high or scaling metric wrong → scaling triggers only after SLO is already violated. |
Cold-Start Latency |
Instance warmup missing → new instances are immediately hit with full traffic before they are ready. |
Requirement
-
All stateless production workloads MUST have auto-scaling configured (min >= 1, max >= 2)
-
Scaling metrics MUST be based on application behavior, not just CPU
-
Auto-scaling MUST be validated through load testing (evidence: test report)
-
Scale-in cooldown MUST be >= scale-out cooldown (conservative scale-in)
Implementation Guidance
-
Choose scaling metric: ALB request count (HTTP APIs), queue depth (workers), custom metrics (special workloads)
-
Derive thresholds from load test: Which requests/s value leads to P95 latency of SLO/2?
-
Configure min/max: min >= 2 for redundancy; max = 3–5x expected normal load
-
Configure cooldowns: Scale-out 60s, scale-in 300s (conservative)
-
Configure instance warmup: 60–120s, so new instances are not immediately overloaded
-
Run load test: Gradual load profile to 2x peak; validate auto-scaling trigger
-
Configure monitoring: Alert when desired capacity >= 80% max capacity
Maturity Levels
| Level | Name | Criteria |
|---|---|---|
1 |
Static Capacity |
No auto-scaling; manual adjustment during load spikes; typically uncritically over-provisioned. |
2 |
Configured, Not Validated |
ASG/VMSS configured; default CPU threshold; never tested under load. |
3 |
Validated with Load Test |
Correct metrics; load test validation; documented limits; health check type configured. |
4 |
Predictive & Event-Driven |
Predictive scaling; queue-based scaling; scale-out duration measured within SLO. |
5 |
Autonomous Capacity Management |
Fully automated; ML-based policies; SLO breach prediction before occurrence. |
Terraform Checks
waf-perf-020.tf.aws.autoscaling-group-policy
Checks: AWS Auto Scaling Groups must have min >= 1, max >= 2 and health check type.
| Compliant | Non-Compliant |
|---|---|
|
|
Remediation: min_size >= 2 for production redundancy, max_size >= 3 for scaling capability.
Add scaling policy (TargetTracking or StepScaling). Set health check type to ELB.
Evidence
| Type | Required | Description |
|---|---|---|
IaC |
✅ Required |
Auto-scaling configuration with min/max and scaling policy. |
Process |
✅ Required |
Load test results demonstrating that scaling triggers within the latency SLO. |
Config |
Optional |
CloudWatch/Azure Monitor/GCP alerts for scaling events configured. |
Governance |
Optional |
Runbook with documented scaling limits and known bottlenecks. |