WAF++ WAF++
Back to WAF++ Homepage

WAF-PERF-020 – Auto-Scaling Configured & Tested

Description

All stateless production workloads with variable or unpredictable traffic MUST have auto-scaling configured. Auto-scaling policies MUST be based on meaningful metrics (request latency, request rate, queue depth). Auto-scaling configurations MUST be tested under realistic load before going into production.

No deployment to production without a validated scaling path.

Rationale

Static capacity creates an unsolvable dilemma: over-provisioning for peaks (expensive) or under-provisioning and degradation under unexpected load (risky). Auto-scaling solves this dilemma – but only if correctly configured. Wrong thresholds, missing cooldowns, or the absence of instance warmup configuration leads to scaling failure at the critical moment.

Any auto-scaling configuration that has never been tested under load is effectively nonexistent.

Threat Context

Risk Description

Capacity Bottleneck During Load Spike

Non-scaling service degrades or fails during traffic peaks.

Scaling Oscillation

Incorrectly configured cooldowns lead to constant scale-out/scale-in (cost + instability).

Scaling Too Late

Threshold too high or scaling metric wrong → scaling triggers only after SLO is already violated.

Cold-Start Latency

Instance warmup missing → new instances are immediately hit with full traffic before they are ready.

Requirement

  • All stateless production workloads MUST have auto-scaling configured (min >= 1, max >= 2)

  • Scaling metrics MUST be based on application behavior, not just CPU

  • Auto-scaling MUST be validated through load testing (evidence: test report)

  • Scale-in cooldown MUST be >= scale-out cooldown (conservative scale-in)

Implementation Guidance

  1. Choose scaling metric: ALB request count (HTTP APIs), queue depth (workers), custom metrics (special workloads)

  2. Derive thresholds from load test: Which requests/s value leads to P95 latency of SLO/2?

  3. Configure min/max: min >= 2 for redundancy; max = 3–5x expected normal load

  4. Configure cooldowns: Scale-out 60s, scale-in 300s (conservative)

  5. Configure instance warmup: 60–120s, so new instances are not immediately overloaded

  6. Run load test: Gradual load profile to 2x peak; validate auto-scaling trigger

  7. Configure monitoring: Alert when desired capacity >= 80% max capacity

Maturity Levels

Level Name Criteria

1

Static Capacity

No auto-scaling; manual adjustment during load spikes; typically uncritically over-provisioned.

2

Configured, Not Validated

ASG/VMSS configured; default CPU threshold; never tested under load.

3

Validated with Load Test

Correct metrics; load test validation; documented limits; health check type configured.

4

Predictive & Event-Driven

Predictive scaling; queue-based scaling; scale-out duration measured within SLO.

5

Autonomous Capacity Management

Fully automated; ML-based policies; SLO breach prediction before occurrence.

Terraform Checks

waf-perf-020.tf.aws.autoscaling-group-policy

Checks: AWS Auto Scaling Groups must have min >= 1, max >= 2 and health check type.

Compliant Non-Compliant
resource "aws_autoscaling_group" "api" {
  min_size          = 2
  max_size          = 10
  desired_capacity  = 2
  health_check_type = "ELB"
}
resource "aws_autoscaling_policy" "scale" {
  policy_type = "TargetTrackingScaling"
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ALBRequestCountPerTarget"
    }
    target_value = 1000.0
  }
}
resource "aws_autoscaling_group" "api" {
  min_size = 1
  max_size = 1
  # max=min=1 – no scaling
  # WAF-PERF-020 Violation
}

Remediation: min_size >= 2 for production redundancy, max_size >= 3 for scaling capability. Add scaling policy (TargetTracking or StepScaling). Set health check type to ELB.

Evidence

Type Required Description

IaC

✅ Required

Auto-scaling configuration with min/max and scaling policy.

Process

✅ Required

Load test results demonstrating that scaling triggers within the latency SLO.

Config

Optional

CloudWatch/Azure Monitor/GCP alerts for scaling events configured.

Governance

Optional

Runbook with documented scaling limits and known bottlenecks.

Regulatorisches Mapping

Framework Controls

ISO/IEC 25010:2011

8.3.2 – Performance efficiency; 8.3.2.1 – Time behaviour; 8.3.2.2 – Resource utilisation; 8.3.2.3 – Capacity

AWS Well-Architected Framework

Performance Efficiency Pillar – Select the right resource types and sizes

Azure Well-Architected Framework

Performance Efficiency – Choose the right resources

Google Cloud Architecture Framework

Performance optimization – Right-size your instances

TOGAF 10

ADM Phase B – Business architecture; ADM Phase C – Application architecture

DORA

DORA 2024 – Technical practices; DORA 2024 – Performance monitoring

ISO/IEC 29119

4.4.3 – Test design techniques; 4.5.3 – Test execution

ISO/IEC 12207

8.2.2.3 – Design and development of software

ITIL 4

SVS – Service value system; DP – Design principle

BSI C5:2020

OPS-01 – Operational monitoring; OPS-02 – Operational control

CIS Controls v8

CIS 8 – Continuous Vulnerability Management

NIST SP 800-53

RA-1 – Security assessment policy; RA-2 – Security assessment controls

NIST CSF 2.0

DE.CM – Continuous monitoring; DE.AE – Anomaly detection

FedRAMP

RA-2, RA-5 (Moderate/High baseline)

SOC 2 Type II

CC6.1 – Logical access security software; CC7.1 – Infrastructure and software monitoring

TISAX

Information security – Performance monitoring

ANSSI SecNumCloud

Domain – Performance monitoring

BIO

BIO – Prestatiedoelstellingen

ENS High

op.exp.2 – Configuración de seguridad

UK NCSC CAF

B4 – System security; B5 – System performance

CMMC 2.0

RA.L2-3.8.1 – Automated monitoring

IRAP

ISM – Performance monitoring

CCCS PBMM

RA-2 – Security assessment controls; RA-5 – Security assessments

MAS TRM

Ch.5 – Technology risk governance

ISMAP

Performance monitoring and validation

FISC

Technical measures – Performance monitoring

Best Practice