WAF++ WAF++
Back to WAF++ Homepage

Best Practice: Compute Sizing & Instance Selection

Context

Compute sizing is one of the most common sources of both resource waste and performance problems. Over-provisioned instances waste money; under-provisioned instances cause latency spikes and failures under load.

Common problems without structured sizing:

  • Instance types are chosen by gut feeling or "that’s how it’s always been"

  • Previous generations (t2, m4, c4) continue running for years without review

  • CPU utilization < 5% – a classic sign of unreflective over-provisioning

  • No sizing documentation: nobody remembers why a particular instance type was chosen

Target State

A mature sizing strategy:

  • Data-driven: Sizing decisions are based on measured CPU/memory/network baselines

  • Documented: Every production resource has a sizing rationale in an ADR or sizing sheet

  • Current: Quarterly review; cloud provider upgrade recommendations are tracked

  • Current generation: All resources use current instance generations

Technical Implementation

Step 1: Collect Baseline Data

Before making a sizing decision, 2–4 weeks of metrics must be available:

# AWS: Query CloudWatch CPU metrics for an EC2 instance
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time 2026-03-01T00:00:00Z \
  --end-time 2026-03-18T00:00:00Z \
  --period 3600 \
  --statistics Average Maximum \
  --query 'Datapoints[*].[Timestamp,Average,Maximum]' \
  --output table

# GCP: Machine type recommendation
gcloud recommender recommendations list \
  --project=my-project \
  --location=europe-west3-a \
  --recommender=google.compute.instance.MachineTypeRecommender

Step 2: Create a Sizing Document

# docs/sizing/payment-api.yml
service: "payment-api"
last_reviewed: "2026-03-18"
reviewed_by: "platform-team"

current:
  provider: "aws"
  instance_type: "t3.medium"
  vcpu: 2
  memory_gb: 4

measured_baseline:
  period: "2026-02-15 to 2026-03-15"
  cpu_average_pct: 18
  cpu_p95_pct: 45
  memory_average_gb: 2.1
  memory_max_gb: 2.8
  network_avg_mbps: 12

assessment:
  status: "appropriately_sized"
  rationale: >
    CPU headroom adequate for 2.5x spikes before auto-scaling triggers.
    Memory utilization at 52% of available; sufficient headroom.
    Next review: 2026-06-18

auto_scaling:
  min_instances: 2
  max_instances: 10
  scale_out_trigger: "ALBRequestCountPerTarget > 800"

Step 3: Use Current Generation in Terraform

# Compliant: Current generation, explicit sizing rationale as comment
locals {
  # t3.medium chosen based on docs/sizing/payment-api.yml
  # CPU avg 18%, P95 45% – 2 vCPU provides sufficient headroom for 2.5x spikes
  instance_type = "t3.medium"
}

resource "aws_launch_template" "app" {
  name_prefix   = "lt-payment-api-"
  image_id      = data.aws_ami.ubuntu.id
  instance_type = local.instance_type

  tag_specifications {
    resource_type = "instance"
    tags = {
      workload        = "payment-api"
      sizing-reviewed = "2026-03-18"
      owner           = "platform-team"
    }
  }
}

Step 4: Use AWS Compute Optimizer

# Activate Compute Optimizer enrollment
resource "aws_computeoptimizer_enrollment_status" "main" {
  status = "Active"
}

# Retrieve recommendations via CLI and include in sizing review
# aws compute-optimizer get-ec2-instance-recommendations \
#   --filters name=Finding,values=UNDER_PROVISIONED,OVER_PROVISIONED

Step 5: Quarterly Review Process

# docs/processes/compute-sizing-review.yml
frequency: "quarterly"
next_review: "2026-06-18"

checklist:
  - Review Compute Optimizer recommendations
  - Identify instances with CPU avg < 10% → rightsizing candidates
  - Identify instances with CPU p95 > 80% → upgrade candidates
  - Identify previous-generation instances (t2, m4, c4) → migration plan
  - Update sizing documents

output:
  - Sizing review report (which changes were made)
  - Jira tickets for rightsizing actions
  - Updated sizing documents in docs/sizing/

Common Anti-Patterns

  • "We needed more in the past": Historical peaks do not justify permanent over-provisioning. Auto-scaling handles peak load.

  • "t2.large was always good enough": t2 is a previous generation; t3 offers better performance at a lower price.

  • "We’d rather provision more to be safe": Vertical sizing increases are not a substitute for auto-scaling.

  • Sizing from AWS defaults: Default recommendations are not workload-specific.

Metrics

  • Average CPU utilization across all compute resources (target: 20–70%)

  • Proportion of resources with CPU avg < 10% (target: < 5% of resources)

  • Proportion of previous-generation instances (target: 0%)

  • Proportion of resources without sizing documentation (target: 0% for production)

Maturity Level

Level 1 – No standard; sizing by intuition or historical values
Level 2 – Experience-based sizing; occasional reviews
Level 3 – Data-driven sizing with documented baselines; quarterly review
Level 4 – Compute Optimizer integration; automatic rightsizing tickets
Level 5 – ML-based predictive sizing; self-optimizing capacity