WAF-PERF-020 – Auto-Scaling Configured & Tested

Pillar: Performance Efficiency | Severity: High | Category: Auto-Scaling | Automatable: High

Description

All stateless production workloads with variable or unpredictable traffic MUST have auto-scaling configured. Auto-scaling policies MUST be based on meaningful metrics (request latency, request rate, queue depth). Auto-scaling configurations MUST be tested under realistic load before going into production.

No deployment to production without a validated scaling path.

Rationale

Static capacity creates an unsolvable dilemma: over-provisioning for peaks (expensive) or under-provisioning and degradation under unexpected load (risky). Auto-scaling solves this dilemma – but only if correctly configured. Wrong thresholds, missing cooldowns, or the absence of instance warmup configuration leads to scaling failure at the critical moment.

Any auto-scaling configuration that has never been tested under load is effectively nonexistent.

Threat Context

Risk	Description
Capacity Bottleneck During Load Spike	Non-scaling service degrades or fails during traffic peaks.
Scaling Oscillation	Incorrectly configured cooldowns lead to constant scale-out/scale-in (cost + instability).
Scaling Too Late	Threshold too high or scaling metric wrong → scaling triggers only after SLO is already violated.
Cold-Start Latency	Instance warmup missing → new instances are immediately hit with full traffic before they are ready.

Risk

Description

Capacity Bottleneck During Load Spike

Non-scaling service degrades or fails during traffic peaks.

Scaling Oscillation

Incorrectly configured cooldowns lead to constant scale-out/scale-in (cost + instability).

Scaling Too Late

Threshold too high or scaling metric wrong → scaling triggers only after SLO is already violated.

Cold-Start Latency

Instance warmup missing → new instances are immediately hit with full traffic before they are ready.

Requirement

All stateless production workloads MUST have auto-scaling configured (min >= 1, max >= 2)
Scaling metrics MUST be based on application behavior, not just CPU
Auto-scaling MUST be validated through load testing (evidence: test report)
Scale-in cooldown MUST be >= scale-out cooldown (conservative scale-in)

Implementation Guidance

Choose scaling metric: ALB request count (HTTP APIs), queue depth (workers), custom metrics (special workloads)
Derive thresholds from load test: Which requests/s value leads to P95 latency of SLO/2?
Configure min/max: min >= 2 for redundancy; max = 3–5x expected normal load
Configure cooldowns: Scale-out 60s, scale-in 300s (conservative)
Configure instance warmup: 60–120s, so new instances are not immediately overloaded
Run load test: Gradual load profile to 2x peak; validate auto-scaling trigger
Configure monitoring: Alert when desired capacity >= 80% max capacity

Maturity Levels

Level	Name	Criteria
1	Static Capacity	No auto-scaling; manual adjustment during load spikes; typically uncritically over-provisioned.
2	Configured, Not Validated	ASG/VMSS configured; default CPU threshold; never tested under load.
3	Validated with Load Test	Correct metrics; load test validation; documented limits; health check type configured.
4	Predictive & Event-Driven	Predictive scaling; queue-based scaling; scale-out duration measured within SLO.
5	Autonomous Capacity Management	Fully automated; ML-based policies; SLO breach prediction before occurrence.

Level

Name

Criteria

Static Capacity

No auto-scaling; manual adjustment during load spikes; typically uncritically over-provisioned.

Configured, Not Validated

ASG/VMSS configured; default CPU threshold; never tested under load.

Validated with Load Test

Correct metrics; load test validation; documented limits; health check type configured.

Predictive & Event-Driven

Predictive scaling; queue-based scaling; scale-out duration measured within SLO.

Autonomous Capacity Management

Fully automated; ML-based policies; SLO breach prediction before occurrence.

Terraform Checks

waf-perf-020.tf.aws.autoscaling-group-policy

Checks: AWS Auto Scaling Groups must have min >= 1, max >= 2 and health check type.

Compliant Non-Compliant

Compliant	Non-Compliant
`resource "aws_autoscaling_group" "api" { min_size = 2 max_size = 10 desired_capacity = 2 health_check_type = "ELB" } resource "aws_autoscaling_policy" "scale" { policy_type = "TargetTrackingScaling" target_tracking_configuration { predefined_metric_specification { predefined_metric_type = "ALBRequestCountPerTarget" } target_value = 1000.0 } }`	`resource "aws_autoscaling_group" "api" { min_size = 1 max_size = 1 # max=min=1 – no scaling # WAF-PERF-020 Violation }`

resource "aws_autoscaling_group" "api" {
  min_size          = 2
  max_size          = 10
  desired_capacity  = 2
  health_check_type = "ELB"
}
resource "aws_autoscaling_policy" "scale" {
  policy_type = "TargetTrackingScaling"
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ALBRequestCountPerTarget"
    }
    target_value = 1000.0
  }
}

resource "aws_autoscaling_group" "api" {
  min_size = 1
  max_size = 1
  # max=min=1 – no scaling
  # WAF-PERF-020 Violation
}

Remediation: min_size >= 2 for production redundancy, max_size >= 3 for scaling capability. Add scaling policy (TargetTracking or StepScaling). Set health check type to ELB.

Evidence

Type	Required	Description
IaC	✅ Required	Auto-scaling configuration with min/max and scaling policy.
Process	✅ Required	Load test results demonstrating that scaling triggers within the latency SLO.
Config	Optional	CloudWatch/Azure Monitor/GCP alerts for scaling events configured.
Governance	Optional	Runbook with documented scaling limits and known bottlenecks.

Type

Required

Description

IaC

✅ Required

Auto-scaling configuration with min/max and scaling policy.

Process

✅ Required

Load test results demonstrating that scaling triggers within the latency SLO.

Config

Optional

CloudWatch/Azure Monitor/GCP alerts for scaling events configured.

Governance

Optional

Runbook with documented scaling limits and known bottlenecks.

Related Controls

Best Practice

Configure Auto-Scaling →