Cost Optimization Principles
The following seven principles form the foundation of the Cost pillar in WAF++. They are formulated in a provider- and technology-agnostic way and apply to both brownfield and greenfield scenarios.
CP1 – Transparency First
Costs that are not visible cannot be optimized.
Every cloud resource must be clearly attributed to a workload, a team and an environment. Cost transparency is not optional – it is the fundamental prerequisite for all other cost controls.
Transparency means concretely:
-
Tagging taxonomy defined and enforced via IaC (no deployment without mandatory tags)
-
Cost allocation groups configured in cloud provider billing tools
-
Chargeback or showback model established for internal cost distribution
-
Cost anomalies are detected automatically – not discovered manually in the monthly report
Implication: "We see our total bill" is not transparency. A fully tagged, workload-segmented cost dashboard with alert thresholds is transparency.
Related controls: WAF-COST-010, WAF-COST-020
CP2 – Ownership
Every resource and every workload has a clear cost owner.
Without ownership, cost optimizations are not implemented – it is never clear who is responsible. Cost responsibility must lie where architecture decisions are made: with the engineering team.
Ownership means concretely:
-
The
ownertag on every resource refers to a concrete team (not a department) -
Cost owners are part of FinOps review cycles and receive budget alerts
-
Budget overruns trigger a direct escalation to the team owner
-
Ownership is integrated into onboarding processes for new services
Implication: "The FinOps team is responsible for costs" is an anti-pattern. FinOps supports engineering teams – cost responsibility remains with the team.
Related controls: WAF-COST-010, WAF-COST-060
CP3 – Cost-Aware Architecture
Architecture decisions have long-term economic impacts. These must be assessed.
Every decision for an infrastructure component, a managed service or a deployment pattern brings cost structure with it: fixed costs, variable costs, transfer costs, operational costs, exit costs. These do not arise by chance – they are the result of architectural decisions.
When these economic impacts are not assessed, Architectural Cost Debt accumulates: cost structures that become embedded in the architecture and can later only be changed with significant effort.
Cost-Aware Architecture means concretely:
-
Every ADR with infrastructure impact includes a structured cost impact assessment
-
TCO, lock-in risk, data transfer costs, operational effort and exit costs are explicitly assessed
-
HA and multi-region decisions are made based on SLOs – not hypothetically
-
Open source alternatives are evaluated on equal footing
-
Known cost debts are documented in the cost debt register
Implication: A decision for a high-priced managed service without an exit plan is not an architecture decision – it is cost debt that someone will have to pay later.
Related controls: WAF-COST-050, WAF-COST-100
Further details: Architectural Cost Debt
CP4 – Continuous Optimization
Costs are not a one-time optimization project. They are continuously reviewed and reduced.
Cloud infrastructure changes constantly: new services, growing data volumes, changed usage patterns. What is optimally sized today may be over-provisioned in six months.
Continuous Optimization means concretely:
-
Monthly engineering reviews with concrete optimization actions and owners
-
Quarterly architecture board reviews to assess structural cost drivers
-
Rightsizing tags on compute resources with the date of the last review
-
Cost debt register reviewed and updated quarterly
-
Automated idle detection and rightsizing recommendations as input for reviews
Implication: A one-time rightsizing project before annual planning is not a process. A monthly review cycle with an action item tracker is a process.
Related controls: WAF-COST-030, WAF-COST-060, WAF-COST-100
CP5 – Automation First
Budget controls, alerts and optimization actions are automated – not manual.
Manual cost control is error-prone and does not scale. Budgets set manually in the cloud console UI are not reproducible, not versioned and not auditable.
Automation First means concretely:
-
All budget definitions are IaC-managed (Terraform
aws_budgets_budget, Azureconsumption_budget, GCPbilling_budget) -
Alerts are automatically routed to owner channels (Slack, email, PagerDuty)
-
CI gates check tagging compliance on every pull request
-
Lifecycle policies for storage and logs are automated – no manual archiving
-
Idle resources are automatically identified and proposed for shutdown
Implication: A budget that is only visible in the billing dashboard provides no operational control. A budget as a Terraform resource with alert notification and automatic ticket creation is control.
Related controls: WAF-COST-020, WAF-COST-040, WAF-COST-070
CP6 – Right-Size, Not Over-Size
Resources are sized according to actual demand – not hypothetical peak scenarios.
Over-provisioning is the most common and costly form of cloud waste. Systems dimensioned for 10x growth that never materialized pay the price of that decision every month.
Right-Size, Not Over-Size means concretely:
-
Sizing decisions are based on measured utilization (P95/P99), not estimates
-
SLO/SLA requirements drive HA and redundancy decisions – not caution
-
Reservations are made based on >= 70% utilization over 30 days – not as a default
-
Spot/Preemptible Instances for variable workloads; on-demand only for unpredictable peaks
-
Rightsizing reviews are documented and traceable (tag
rightsizing-reviewedwith date)
Implication: HA across three Availability Zones without an SLO requirement that demands more than one AZ is not a resilience investment – it is Architectural Cost Debt.
Related controls: WAF-COST-030, WAF-COST-080
CP7 – Full Cost View
The total cost of a workload includes infrastructure, licenses, operational effort, skills and exit costs.
Cloud bills show only a fraction of actual costs. Operational effort (in FTE hours), license costs, vendor management, training effort and potential exit costs are systematically invisible in many cost comparisons.
Full Cost View means concretely:
-
TCO calculations include: infrastructure + licenses + FTE effort (ops + engineering) + vendor management + exit costs
-
ROI assessments refer to the value of the workload (revenue, risk reduction, compliance), not just infrastructure costs
-
Multi-cloud scenarios are assessed on actual total costs: data transfer between providers, duplicated operational competence, enterprise agreement losses
-
Open source alternatives are assessed on total costs: license savings vs. operational effort, support costs, missing managed service convenience
-
Lock-in costs are recorded as hidden liabilities: the higher the lock-in, the higher the notional exit costs
Implication: "AWS is more expensive than on-premises" is often wrong when you include colocation, power, hardware amortization, staff and outage costs. Equally: "Open source is free" ignores operational effort.
Related controls: WAF-COST-050, WAF-COST-060, WAF-COST-100