Best Practice: Brownfield Cost Optimization
Context
Brownfield cost optimization is the most challenging discipline of the cost pillar. Existing infrastructure has dependencies, political history, and often a debt load that has built up over years. At the same time, it offers the greatest quick-win potential: idle instances, missing lifecycle policies, and unused reservations are often addressable with minimal effort.
This guide structures brownfield optimization into three phases: Discovery → Quick Wins → Structural Improvements.
Related Controls
All WAF-COST controls are relevant; prioritized order for brownfield: WAF-COST-010 → WAF-COST-020 → WAF-COST-040 → WAF-COST-030 → WAF-COST-100 → WAF-COST-050
Phase 1: Discovery (Weeks 1–4)
Step 1.1: Create Cost Baseline
Before optimizing, the current state must be clear:
#!/bin/bash
# scripts/cost-baseline.sh – AWS Cost Baseline
echo "=== Cost Baseline Report ==="
echo ""
# Total costs for the last 3 months
echo "--- Total Costs (last 3 months) ---"
aws ce get-cost-and-usage \
--time-period Start=$(date -d '3 months ago' +%Y-%m-01),End=$(date +%Y-%m-01) \
--granularity MONTHLY \
--metrics "BlendedCost" \
--query 'ResultsByTime[].{Month:TimePeriod.Start, Cost:Total.BlendedCost.Amount}' \
--output table
echo ""
echo "--- Top-10 Cost Drivers (Service) ---"
aws ce get-cost-and-usage \
--time-period Start=$(date -d '1 month ago' +%Y-%m-01),End=$(date +%Y-%m-01) \
--granularity MONTHLY \
--metrics "BlendedCost" \
--group-by Type=DIMENSION,Key=SERVICE \
--query 'sort_by(ResultsByTime[0].Groups, &Keys[0])[-10:] | reverse(@)[].{Service:Keys[0], Cost:Metrics.BlendedCost.Amount}' \
--output table
echo ""
echo "--- Untagged Costs ---"
aws ce get-cost-and-usage \
--time-period Start=$(date -d '1 month ago' +%Y-%m-01),End=$(date +%Y-%m-01) \
--granularity MONTHLY \
--metrics "BlendedCost" \
--filter '{"Not":{"Tags":{"Key":"workload","Values":[""],"MatchOptions":["ABSENT"]}}}' \
--query 'ResultsByTime[0].Total.BlendedCost.Amount' \
--output text
Step 1.2: Waste Discovery
#!/bin/bash
# scripts/waste-discovery.sh – Identify common waste sources
echo "=== S3 Buckets Without Lifecycle Policy ==="
aws s3api list-buckets --query "Buckets[].Name" --output text | \
tr '\t' '\n' | while read BUCKET; do
LIFECYCLE=$(aws s3api get-bucket-lifecycle-configuration --bucket "$BUCKET" 2>/dev/null)
if [ -z "$LIFECYCLE" ]; then
SIZE=$(aws s3api list-objects-v2 --bucket "$BUCKET" \
--query "sum(Contents[].Size)" --output text 2>/dev/null || echo "0")
echo "NO LIFECYCLE: $BUCKET (${SIZE} bytes)"
fi
done
echo ""
echo "=== CloudWatch Log Groups Without Retention ==="
aws logs describe-log-groups \
--query 'logGroups[?retentionInDays==`null`].{Name:logGroupName, StoredBytes:storedBytes}' \
--output table
echo ""
echo "=== Unused Elastic IPs ==="
aws ec2 describe-addresses \
--query 'Addresses[?AssociationId==`null`].{AllocationId:AllocationId, PublicIp:PublicIp}' \
--output table
echo ""
echo "=== Unused EBS Volumes ==="
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[].{VolumeId:VolumeId, Size:Size, CreateTime:CreateTime}' \
--output table
Step 1.3: Tagging Audit
#!/bin/bash
# scripts/tagging-audit.sh – Check tagging compliance
MANDATORY_TAGS=("cost-center" "owner" "environment" "workload")
NON_COMPLIANT=0
TOTAL=0
# Check EC2 instances
echo "=== EC2 Tagging Compliance ==="
aws ec2 describe-instances \
--query 'Reservations[].Instances[]' \
--output json | \
jq -r '.[] | {id: .InstanceId, tags: ([.Tags[]? | {(.Key): .Value}] | add)} |
select(.tags["cost-center"] == null or .tags["owner"] == null or
.tags["environment"] == null or .tags["workload"] == null) |
"NON-COMPLIANT: \(.id)"'
Phase 2: Quick Wins (Months 1–3)
Quick wins are measures with high ROI and low implementation effort. They create immediate savings and momentum for the structural improvements.
Quick-Win Prioritization Matrix
| Category | Measure | Effort | Typical Saving | Control |
|---|---|---|---|---|
Idle Shutdown |
Identify and shut down dev/test instances |
Low |
200–2,000 EUR/month |
WAF-COST-030 |
Set Log Retention |
Configure CloudWatch Log Groups with retention_in_days != 0 |
Low |
50–500 EUR/month |
WAF-COST-040, WAF-COST-070 |
S3 Lifecycle |
Lifecycle policies for the top 10 largest buckets |
Low–Medium |
100–1,000 EUR/month |
WAF-COST-040 |
Unused Resources |
EIP, unattached EBS, unused load balancers |
Low |
50–500 EUR/month |
WAF-COST-030 |
Non-Prod Auto-Shutdown |
Schedule for dev/test: Mon–Fri 8–20:00 |
Medium |
500–3,000 EUR/month |
WAF-COST-030 |
Check RI Usage |
Identify unused reserved instances, reassign |
Medium |
Variable, often significant |
WAF-COST-080 |
Set Up Budget Alerting Immediately (WAF-COST-020)
# Immediate action: budget alert for all accounts
resource "aws_budgets_budget" "monthly_total" {
name = "monthly-total-budget"
budget_type = "COST"
limit_amount = var.monthly_budget_limit
limit_unit = "USD"
time_unit = "MONTHLY"
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = [var.finops_team_email]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 100
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = [var.finops_team_email, var.engineering_manager_email]
}
}
Phase 3: Structural Improvements (Months 4–18)
Structural improvements address architectural cost debt and require more coordination, architecture board involvement, and planning effort.
Structural Measures
| Measure | Description | Time Horizon | Control |
|---|---|---|---|
Fully enforce tagging |
Introduce mandatory tag module, activate CI gate, backfill legacy resources |
3–6 months |
WAF-COST-010 |
FinOps review cycle |
Establish monthly and quarterly reviews, action item tracker |
1–2 months |
WAF-COST-060 |
Cost debt register |
Document existing debt, assign owners, prioritize paydown |
1–3 months |
WAF-COST-100 |
Extend ADR process |
Add cost impact section to ADR template, retroactively for critical decisions |
2–4 months |
WAF-COST-050 |
Reservation restructuring |
Analyze RI portfolio, restructure misaligned RIs, introduce savings plans |
3–6 months |
WAF-COST-080 |
HA review |
Review multi-AZ/multi-region services based on SLO, record oversized HA as cost debt |
6–12 months |
WAF-COST-050, WAF-COST-100 |
Initialize Cost Debt Register for Brownfield
# docs/cost-debt-register.yml – Brownfield start
version: "1.0"
last_reviewed: "2025-03-01"
status: "initial-population"
note: >
Initial population from Brownfield Discovery Phase (Jan–Mar 2025).
All entries need owner confirmation by 2025-04-01.
entries:
- id: CD-2025-INIT-001
title: "DISCOVERY: S3 buckets without lifecycle (23 buckets, ~2 TB)"
category: "infinite-retention"
description: "Discovery phase identified 23 S3 buckets without lifecycle policy."
detected: "2025-03-01"
owner: "TBD - owner identification in progress"
estimated_annual_impact_eur: 3600
status: "monitoring"
paydown_plan: "Lifecycle policies for all 23 buckets by Q2 2025"
target_resolution: "2025-06-30"
related_controls: [WAF-COST-040]
- id: CD-2025-INIT-002
title: "DISCOVERY: 12 CloudWatch Log Groups without retention"
category: "infinite-retention"
detected: "2025-03-01"
owner: "infrastructure-team"
estimated_annual_impact_eur: 800
status: "paydown"
paydown_plan: "Terraform config for all 12 log groups - Sprint 2025-03-2"
target_resolution: "2025-04-01"
related_controls: [WAF-COST-040, WAF-COST-070]
Brownfield-Specific Challenges
Political Coordination
Brownfield optimization touches teams whose infrastructure has grown over years.
Recommendations:
-
Make costs visible, not accusatory: showback before chargeback
-
Quick wins first: small successes build trust
-
Clarify ownership: who pays, who decides – must be clear
-
Architecture Board as neutral instance: cost debt register decisions from the AB carry the political burden
Legacy Resources Without a Clear Owner
#!/bin/bash
# Identify resources without a clear owner
aws ec2 describe-instances \
--query 'Reservations[].Instances[?!Tags[?Key==`owner`]].{
InstanceId: InstanceId,
LaunchTime: LaunchTime,
InstanceType: InstanceType,
Name: Tags[?Key==`Name`].Value|[0]
}' \
--output table
For resources without an owner: identify the last modifier in CloudTrail, assign the Architecture Board as default owner until clarified.
Metrics (Brownfield-Specific)
-
Untagged cost rate: target monthly -5% until goal of < 5% is reached
-
Waste reduction: EUR/month vs. discovery baseline (target: -20% in 6 months)
-
Lifecycle coverage: % of S3 buckets with lifecycle policy (target: 100% in 3 months)
-
Cost debt register completeness: all discovery findings recorded with owner (target: 100% in 2 months)