WAF++ WAF++
Back to WAF++ Homepage

Performance Principles

The 7 performance principles of WAF++ are guidelines for technical decisions. They describe the why behind the controls and enable informed judgments in situations that no control directly addresses.

PP1 – Measure First

Tagline: Without measurement, optimization is guesswork.

Explanation

Performance optimization without measurement is one of the most common forms of resource waste in software development. Developers optimize code paths that were never measured and miss the actual bottlenecks. The first principle of WAF++ is: measure before optimizing.

Measuring means:

  • Collecting baselines: Document P50, P95, P99 latency, throughput, and error rate before every optimization

  • Identifying bottlenecks: Which layer is the actual constraint? CPU? Network? Database? Cache?

  • Defining success: What result counts as an improvement? Without a target, no progress can be measured

  • Continuous measurement: Collect metrics permanently, not only during incidents

Concrete implications

  • Before every sizing decision: collect 2 weeks of utilization data

  • Before every caching implementation: measure cache miss rate and query frequency

  • Before every index optimization: run EXPLAIN ANALYZE on the actual queries

  • Before every auto-scaling tuning: measure the load profile and derive scaling thresholds

  • WAF-PERF-050 – Performance Monitoring & SLO Definition

  • WAF-PERF-040 – Database Performance Baseline & Index Strategy


PP2 – Right Technology for the Job

Tagline: The best technology for a use case is not the most powerful – it’s the most suitable.

Explanation

Every technology decision has performance implications. A relational database for a key-value store is just as wrong as a NoSQL document store for complex transactions. The principle: choose the technology that is sufficiently powerful and optimally suited for the specific use case – not the theoretically fastest one.

Concrete implications

  • For variable load: evaluate serverless before classic compute (WAF-PERF-080)

  • For caching: Redis for structured data, CDN for content, in-process for read-heavy lookups

  • For storage: gp3 for general compute, io2 for high-load databases (WAF-PERF-090)

  • For global distribution: evaluate CDN before multi-region deployment (WAF-PERF-070)

  • WAF-PERF-010 – Compute Instance Type & Sizing Validated

  • WAF-PERF-080 – Serverless & Managed Services for Variable Load


PP3 – Scale Horizontally

Tagline: More identical instances beats a single larger instance.

Explanation

Vertical scaling (larger instance) has limits: eventually there is no larger instance. It also creates single points of failure, has downtime during resizing, and provides no redundancy. Horizontal scaling (more instances behind a load balancer) is elastic, has no hard upper limit, and provides natural redundancy.

Concrete implications

  • Stateless architecture is a prerequisite for horizontal scaling

  • Session state MUST be externalized (Redis, DynamoDB) – not kept in-process

  • Database scaling: read replicas for read scaling, sharding for write scaling

  • Stateful services (databases, message queues): use provider-level auto-scaling


PP4 – Cache Strategically

Tagline: Don’t cache everything – but cache everything that repeats.

Explanation

Caching is one of the most effective performance optimizations when applied correctly. Applied incorrectly, it creates hard-to-diagnose stale data problems. The principle: cache consciously and with documentation – with clear rules for candidates, TTLs, and invalidation.

Caching candidates:

  • Always cache: Static assets, immutable configuration, external API responses with tolerance

  • Cache with short TTL: Aggregated metrics, product catalogs, price lists

  • Do not cache: User-specific transaction data, real-time information, security decisions

Concrete implications

  • Caching strategy MUST be documented with data type, layer, TTL, and invalidation logic

  • Cache hit rate MUST be measured; target: >= 80% application cache, >= 95% CDN

  • Cache invalidation on data mutations is mandatory – no "we just won’t cache it"

  • Thundering herd protection for cache expiry events on high-frequency keys


PP5 – Test Under Load

Tagline: An untested scaling path is not a scaling path.

Explanation

Every architecture behaves differently under load than at idle. Connection pools saturate, auto-scaling triggers are delayed, database query plans change with data volume. The principle: no production-critical system goes to production without load test validation.

Load tests reveal:

  • Auto-scaling configuration errors (thresholds too high, cooldowns too long)

  • Connection pool exhaustion under concurrent load

  • Database performance degradation from query plan changes under load

  • Memory leaks and garbage collection pauses under extended load

Concrete implications

  • Load tests are a deployment gate: production deployment without a passed load test is a violation

  • Acceptance criteria MUST be defined before the load test: P95 < X ms, error rate < Y%

  • Baseline for regression comparison MUST be updated after every successful test

  • Stress test (2x, 5x expected load) at least quarterly for critical services


PP6 – Automate Scaling

Tagline: Manual capacity management is a 24/7 operational problem.

Explanation

Manual scaling requires human intervention at any time of day or night, is error-prone, slow, and creates response time gaps during unexpected traffic spikes. Automated scaling with validated configurations is not only more efficient – it is more reliable than any person in an on-call rotation.

Concrete implications

  • All stateless production workloads MUST have auto-scaling configured

  • Scaling metrics MUST be based on application behavior (latency, request rate, queue depth)

  • Scaling thresholds MUST be derived from load test data – not intuition

  • Scale-in MUST be more conservative than scale-out (cooldown: scale-out shorter than scale-in)


PP7 – Performance as Architecture Concern (Document Performance Debt)

Tagline: Known performance limitations are debt – and debt grows with interest.

Explanation

Performance debt arises when architectural decisions or resource constraints lead to known performance limitations that are consciously accepted. This debt is rational – but only if it is documented and regularly reviewed.

Undocumented performance debt:

  • Is forgotten and accumulates unnoticed

  • Creates recurring incidents because root causes are not documented

  • Prevents informed decisions about performance investments

  • Is lost when employees leave the organization

Concrete implications

  • All known performance limitations MUST be documented in the register

  • Quarterly review MUST take place: which debts were resolved, which were prioritized?

  • Every performance incident MUST create a register entry

  • New features MUST be evaluated for performance debt creation (ADR performance section)

  • WAF-PERF-100 – Performance Debt Register & Quarterly Review

  • WAF-PERF-050 – Performance Monitoring & SLO Definition