WAF++ WAF++
Back to WAF++ Homepage

Best Practice: Brownfield Cost Optimization

Context

Brownfield cost optimization is the most challenging discipline of the cost pillar. Existing infrastructure has dependencies, political history, and often a debt load that has built up over years. At the same time, it offers the greatest quick-win potential: idle instances, missing lifecycle policies, and unused reservations are often addressable with minimal effort.

This guide structures brownfield optimization into three phases: Discovery → Quick Wins → Structural Improvements.

All WAF-COST controls are relevant; prioritized order for brownfield: WAF-COST-010 → WAF-COST-020 → WAF-COST-040 → WAF-COST-030 → WAF-COST-100 → WAF-COST-050

Phase 1: Discovery (Weeks 1–4)

Step 1.1: Create Cost Baseline

Before optimizing, the current state must be clear:

#!/bin/bash
# scripts/cost-baseline.sh – AWS Cost Baseline

echo "=== Cost Baseline Report ==="
echo ""

# Total costs for the last 3 months
echo "--- Total Costs (last 3 months) ---"
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '3 months ago' +%Y-%m-01),End=$(date +%Y-%m-01) \
  --granularity MONTHLY \
  --metrics "BlendedCost" \
  --query 'ResultsByTime[].{Month:TimePeriod.Start, Cost:Total.BlendedCost.Amount}' \
  --output table

echo ""
echo "--- Top-10 Cost Drivers (Service) ---"
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '1 month ago' +%Y-%m-01),End=$(date +%Y-%m-01) \
  --granularity MONTHLY \
  --metrics "BlendedCost" \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'sort_by(ResultsByTime[0].Groups, &Keys[0])[-10:] | reverse(@)[].{Service:Keys[0], Cost:Metrics.BlendedCost.Amount}' \
  --output table

echo ""
echo "--- Untagged Costs ---"
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '1 month ago' +%Y-%m-01),End=$(date +%Y-%m-01) \
  --granularity MONTHLY \
  --metrics "BlendedCost" \
  --filter '{"Not":{"Tags":{"Key":"workload","Values":[""],"MatchOptions":["ABSENT"]}}}' \
  --query 'ResultsByTime[0].Total.BlendedCost.Amount' \
  --output text

Step 1.2: Waste Discovery

#!/bin/bash
# scripts/waste-discovery.sh – Identify common waste sources

echo "=== S3 Buckets Without Lifecycle Policy ==="
aws s3api list-buckets --query "Buckets[].Name" --output text | \
  tr '\t' '\n' | while read BUCKET; do
    LIFECYCLE=$(aws s3api get-bucket-lifecycle-configuration --bucket "$BUCKET" 2>/dev/null)
    if [ -z "$LIFECYCLE" ]; then
      SIZE=$(aws s3api list-objects-v2 --bucket "$BUCKET" \
        --query "sum(Contents[].Size)" --output text 2>/dev/null || echo "0")
      echo "NO LIFECYCLE: $BUCKET (${SIZE} bytes)"
    fi
  done

echo ""
echo "=== CloudWatch Log Groups Without Retention ==="
aws logs describe-log-groups \
  --query 'logGroups[?retentionInDays==`null`].{Name:logGroupName, StoredBytes:storedBytes}' \
  --output table

echo ""
echo "=== Unused Elastic IPs ==="
aws ec2 describe-addresses \
  --query 'Addresses[?AssociationId==`null`].{AllocationId:AllocationId, PublicIp:PublicIp}' \
  --output table

echo ""
echo "=== Unused EBS Volumes ==="
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[].{VolumeId:VolumeId, Size:Size, CreateTime:CreateTime}' \
  --output table

Step 1.3: Tagging Audit

#!/bin/bash
# scripts/tagging-audit.sh – Check tagging compliance

MANDATORY_TAGS=("cost-center" "owner" "environment" "workload")
NON_COMPLIANT=0
TOTAL=0

# Check EC2 instances
echo "=== EC2 Tagging Compliance ==="
aws ec2 describe-instances \
  --query 'Reservations[].Instances[]' \
  --output json | \
  jq -r '.[] | {id: .InstanceId, tags: ([.Tags[]? | {(.Key): .Value}] | add)} |
    select(.tags["cost-center"] == null or .tags["owner"] == null or
           .tags["environment"] == null or .tags["workload"] == null) |
    "NON-COMPLIANT: \(.id)"'

Phase 2: Quick Wins (Months 1–3)

Quick wins are measures with high ROI and low implementation effort. They create immediate savings and momentum for the structural improvements.

Quick-Win Prioritization Matrix

Category Measure Effort Typical Saving Control

Idle Shutdown

Identify and shut down dev/test instances

Low

200–2,000 EUR/month

WAF-COST-030

Set Log Retention

Configure CloudWatch Log Groups with retention_in_days != 0

Low

50–500 EUR/month

WAF-COST-040, WAF-COST-070

S3 Lifecycle

Lifecycle policies for the top 10 largest buckets

Low–Medium

100–1,000 EUR/month

WAF-COST-040

Unused Resources

EIP, unattached EBS, unused load balancers

Low

50–500 EUR/month

WAF-COST-030

Non-Prod Auto-Shutdown

Schedule for dev/test: Mon–Fri 8–20:00

Medium

500–3,000 EUR/month

WAF-COST-030

Check RI Usage

Identify unused reserved instances, reassign

Medium

Variable, often significant

WAF-COST-080

Set Up Budget Alerting Immediately (WAF-COST-020)

# Immediate action: budget alert for all accounts
resource "aws_budgets_budget" "monthly_total" {
  name         = "monthly-total-budget"
  budget_type  = "COST"
  limit_amount = var.monthly_budget_limit
  limit_unit   = "USD"
  time_unit    = "MONTHLY"

  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 80
    threshold_type             = "PERCENTAGE"
    notification_type          = "ACTUAL"
    subscriber_email_addresses = [var.finops_team_email]
  }

  notification {
    comparison_operator        = "GREATER_THAN"
    threshold                  = 100
    threshold_type             = "PERCENTAGE"
    notification_type          = "ACTUAL"
    subscriber_email_addresses = [var.finops_team_email, var.engineering_manager_email]
  }
}

Phase 3: Structural Improvements (Months 4–18)

Structural improvements address architectural cost debt and require more coordination, architecture board involvement, and planning effort.

Structural Measures

Measure Description Time Horizon Control

Fully enforce tagging

Introduce mandatory tag module, activate CI gate, backfill legacy resources

3–6 months

WAF-COST-010

FinOps review cycle

Establish monthly and quarterly reviews, action item tracker

1–2 months

WAF-COST-060

Cost debt register

Document existing debt, assign owners, prioritize paydown

1–3 months

WAF-COST-100

Extend ADR process

Add cost impact section to ADR template, retroactively for critical decisions

2–4 months

WAF-COST-050

Reservation restructuring

Analyze RI portfolio, restructure misaligned RIs, introduce savings plans

3–6 months

WAF-COST-080

HA review

Review multi-AZ/multi-region services based on SLO, record oversized HA as cost debt

6–12 months

WAF-COST-050, WAF-COST-100

Initialize Cost Debt Register for Brownfield

# docs/cost-debt-register.yml – Brownfield start
version: "1.0"
last_reviewed: "2025-03-01"
status: "initial-population"

note: >
  Initial population from Brownfield Discovery Phase (Jan–Mar 2025).
  All entries need owner confirmation by 2025-04-01.

entries:
  - id: CD-2025-INIT-001
    title: "DISCOVERY: S3 buckets without lifecycle (23 buckets, ~2 TB)"
    category: "infinite-retention"
    description: "Discovery phase identified 23 S3 buckets without lifecycle policy."
    detected: "2025-03-01"
    owner: "TBD - owner identification in progress"
    estimated_annual_impact_eur: 3600
    status: "monitoring"
    paydown_plan: "Lifecycle policies for all 23 buckets by Q2 2025"
    target_resolution: "2025-06-30"
    related_controls: [WAF-COST-040]

  - id: CD-2025-INIT-002
    title: "DISCOVERY: 12 CloudWatch Log Groups without retention"
    category: "infinite-retention"
    detected: "2025-03-01"
    owner: "infrastructure-team"
    estimated_annual_impact_eur: 800
    status: "paydown"
    paydown_plan: "Terraform config for all 12 log groups - Sprint 2025-03-2"
    target_resolution: "2025-04-01"
    related_controls: [WAF-COST-040, WAF-COST-070]

Brownfield-Specific Challenges

Political Coordination

Brownfield optimization touches teams whose infrastructure has grown over years.

Recommendations:

  • Make costs visible, not accusatory: showback before chargeback

  • Quick wins first: small successes build trust

  • Clarify ownership: who pays, who decides – must be clear

  • Architecture Board as neutral instance: cost debt register decisions from the AB carry the political burden

Legacy Resources Without a Clear Owner

#!/bin/bash
# Identify resources without a clear owner
aws ec2 describe-instances \
  --query 'Reservations[].Instances[?!Tags[?Key==`owner`]].{
    InstanceId: InstanceId,
    LaunchTime: LaunchTime,
    InstanceType: InstanceType,
    Name: Tags[?Key==`Name`].Value|[0]
  }' \
  --output table

For resources without an owner: identify the last modifier in CloudTrail, assign the Architecture Board as default owner until clarified.

Metrics (Brownfield-Specific)

  • Untagged cost rate: target monthly -5% until goal of < 5% is reached

  • Waste reduction: EUR/month vs. discovery baseline (target: -20% in 6 months)

  • Lifecycle coverage: % of S3 buckets with lifecycle policy (target: 100% in 3 months)

  • Cost debt register completeness: all discovery findings recorded with owner (target: 100% in 2 months)