WAF++ WAF++
Back to WAF++ Homepage

WAF-PERF-020 – Auto-Scaling Configured & Tested

Description

All stateless production workloads with variable or unpredictable traffic MUST have auto-scaling configured. Auto-scaling policies MUST be based on meaningful metrics (request latency, request rate, queue depth). Auto-scaling configurations MUST be tested under realistic load before going into production.

No deployment to production without a validated scaling path.

Rationale

Static capacity creates an unsolvable dilemma: over-provisioning for peaks (expensive) or under-provisioning and degradation under unexpected load (risky). Auto-scaling solves this dilemma – but only if correctly configured. Wrong thresholds, missing cooldowns, or the absence of instance warmup configuration leads to scaling failure at the critical moment.

Any auto-scaling configuration that has never been tested under load is effectively nonexistent.

Threat Context

Risk Description

Capacity Bottleneck During Load Spike

Non-scaling service degrades or fails during traffic peaks.

Scaling Oscillation

Incorrectly configured cooldowns lead to constant scale-out/scale-in (cost + instability).

Scaling Too Late

Threshold too high or scaling metric wrong → scaling triggers only after SLO is already violated.

Cold-Start Latency

Instance warmup missing → new instances are immediately hit with full traffic before they are ready.

Requirement

  • All stateless production workloads MUST have auto-scaling configured (min >= 1, max >= 2)

  • Scaling metrics MUST be based on application behavior, not just CPU

  • Auto-scaling MUST be validated through load testing (evidence: test report)

  • Scale-in cooldown MUST be >= scale-out cooldown (conservative scale-in)

Implementation Guidance

  1. Choose scaling metric: ALB request count (HTTP APIs), queue depth (workers), custom metrics (special workloads)

  2. Derive thresholds from load test: Which requests/s value leads to P95 latency of SLO/2?

  3. Configure min/max: min >= 2 for redundancy; max = 3–5x expected normal load

  4. Configure cooldowns: Scale-out 60s, scale-in 300s (conservative)

  5. Configure instance warmup: 60–120s, so new instances are not immediately overloaded

  6. Run load test: Gradual load profile to 2x peak; validate auto-scaling trigger

  7. Configure monitoring: Alert when desired capacity >= 80% max capacity

Maturity Levels

Level Name Criteria

1

Static Capacity

No auto-scaling; manual adjustment during load spikes; typically uncritically over-provisioned.

2

Configured, Not Validated

ASG/VMSS configured; default CPU threshold; never tested under load.

3

Validated with Load Test

Correct metrics; load test validation; documented limits; health check type configured.

4

Predictive & Event-Driven

Predictive scaling; queue-based scaling; scale-out duration measured within SLO.

5

Autonomous Capacity Management

Fully automated; ML-based policies; SLO breach prediction before occurrence.

Terraform Checks

waf-perf-020.tf.aws.autoscaling-group-policy

Checks: AWS Auto Scaling Groups must have min >= 1, max >= 2 and health check type.

Compliant Non-Compliant
resource "aws_autoscaling_group" "api" {
  min_size          = 2
  max_size          = 10
  desired_capacity  = 2
  health_check_type = "ELB"
}
resource "aws_autoscaling_policy" "scale" {
  policy_type = "TargetTrackingScaling"
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ALBRequestCountPerTarget"
    }
    target_value = 1000.0
  }
}
resource "aws_autoscaling_group" "api" {
  min_size = 1
  max_size = 1
  # max=min=1 – no scaling
  # WAF-PERF-020 Violation
}

Remediation: min_size >= 2 for production redundancy, max_size >= 3 for scaling capability. Add scaling policy (TargetTracking or StepScaling). Set health check type to ELB.

Evidence

Type Required Description

IaC

✅ Required

Auto-scaling configuration with min/max and scaling policy.

Process

✅ Required

Load test results demonstrating that scaling triggers within the latency SLO.

Config

Optional

CloudWatch/Azure Monitor/GCP alerts for scaling events configured.

Governance

Optional

Runbook with documented scaling limits and known bottlenecks.