Design Principles – Performance Efficiency
Design principles describe concrete technical guidelines for architecture and implementation decisions. They complement the 7 Performance Principles with technical specificity.
DP1 – Separate Workloads by Load Profile
Different load profiles require different scaling strategies. Synchronous APIs, asynchronous background jobs, batch processing, and event-driven workloads SHOULD be architecturally separated to enable optimal scaling configurations per component.
Implications:
-
API servers: stateless, horizontally scalable, auto-scaling on request rate
-
Background jobs: queue-based, scales on queue depth (SQS, Azure Service Bus, Pub/Sub)
-
Batch processing: spot/preemptible instances, scheduled scaling
-
Real-time pipelines: streaming services (Kinesis, Event Hubs, Dataflow)
# Anti-pattern: Everything on one EC2
resource "aws_instance" "monolith" {
instance_type = "m5.8xlarge" # Large enough for all workloads?
}
# Better: Separate scaling domains
resource "aws_autoscaling_group" "api" { min_size = 2; max_size = 20 }
resource "aws_autoscaling_group" "workers" { min_size = 0; max_size = 50 }
# Workers scale on SQS queue depth
DP2 – Externalise State Before Scaling
Before implementing auto-scaling, all in-process state must be externalized. Sessions, caches, temporary files, and locks must not be stored locally on the instance, as they will be lost during scale-in and unavailable during scale-out.
State externalization:
| State type | Anti-pattern | Correct solution |
|---|---|---|
HTTP Session |
In-memory session store (default in many frameworks) |
Redis/ElastiCache (aws_elasticache_replication_group) |
File Uploads |
Local filesystem path |
S3/Azure Blob/GCS directly from client or pre-signed URLs |
Distributed Locks |
File lock or in-memory mutex |
Redis SETNX, DynamoDB conditional writes |
Temporary Computation Data |
Local file for multi-step processes |
S3 for stage output, queue for stage coordination |
DP3 – Use Latency-Optimized Routing
Network latency accumulates at every service hop. In a microservices architecture, a request can trigger 10+ internal service calls. Every hop must be latency-optimized.
Routing decisions:
-
AZ affinity: Service-A on AZ-a preferably communicates with Service-B on AZ-a
-
VPC endpoints: All cloud service APIs (S3, DynamoDB, SSM, ECR) via gateway/interface endpoints
-
Service mesh: For high-frequency service-to-service communication (Istio, AWS App Mesh)
-
gRPC instead of REST: For internal APIs with high call volume (binary protocol, HTTP/2)
# VPC endpoints for all major AWS services
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = [aws_route_table.private.id]
}
resource "aws_vpc_endpoint" "ecr_api" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.ecr.api"
vpc_endpoint_type = "Interface"
subnet_ids = var.private_subnet_ids
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
}
DP4 – Configure Connection Pools Explicitly
Connection pool exhaustion is one of the most common causes of performance degradation under load. Every service that calls a database or an external API MUST have explicitly configured connection pool sizes.
Pool sizing formula (database server):
Optimal pool size = (CPU cores * 2) + number of active disks
Example: 4-core RDS + 2 disks = 10 connections per application instance
With 10 app instances: max_connections_RDS = 100 (10 * 10)
# SQLAlchemy – explicit pool configuration
engine = create_engine(
DATABASE_URL,
pool_size=10, # Always-open connections
max_overflow=5, # Temporary additional connections
pool_timeout=30, # Seconds to wait for a free connection
pool_recycle=3600, # Recycle connection after 1h
pool_pre_ping=True, # Test connection before reuse
)
DP5 – Define SLOs Before the First Deployment
SLOs MUST be defined before a service goes to production – not after. A service without an SLO has no objective criterion for "good enough performance".
SLO template:
# docs/slos/payment-api.yml
service: "payment-api"
slos:
- name: "availability"
sli: "success_rate"
target: 99.9 # 99.9% of requests succeed
window: "30d"
- name: "latency_p95"
sli: "request_latency_p95"
target: 200 # P95 < 200ms
unit: "ms"
window: "30d"
- name: "latency_p99"
sli: "request_latency_p99"
target: 500 # P99 < 500ms
unit: "ms"
window: "30d"
error_budget:
period: "30d"
policy: "feature_freeze_on_exhaustion"
DP6 – Isolate I/O-Intensive Workloads
I/O-intensive workloads (database access, file I/O, external API calls) MUST be isolated from CPU-intensive workloads. Async I/O and non-blocking patterns are more important for I/O-bound services than horizontal scaling alone.
I/O isolation patterns:
-
Async/Non-Blocking: asyncio (Python), async/await (Node.js, .NET), Reactive (Java)
-
Bulkhead Pattern: Separate thread pools for internal and external calls
-
Circuit Breaker: Prevent cascading failures from slow downstream services
-
Timeout Pyramid: Outer timeout > inner timeout > DB timeout
# Bulkhead: Separate executor for external API calls
internal_executor = ThreadPoolExecutor(max_workers=50)
external_executor = ThreadPoolExecutor(max_workers=10) # Limits external calls
async def get_payment(payment_id: str):
# Internal DB query
db_result = await loop.run_in_executor(internal_executor, db_query, payment_id)
# External API call (limited)
ext_result = await loop.run_in_executor(external_executor, external_api, payment_id)
return merge(db_result, ext_result)
DP7 – Performance Validation in CI/CD
Performance validation is a first-class citizen in the deployment process. Performance regressions must be taken as seriously as functional bugs.
CI/CD performance gate setup:
# .github/workflows/deploy.yml
jobs:
performance-validation:
needs: [build, unit-test, integration-test]
runs-on: ubuntu-latest
steps:
- name: Run k6 Load Test
run: |
k6 run \
--vus 50 --duration 5m \
--env BASE_URL=${{ env.STAGING_URL }} \
--out json=results.json \
tests/performance/payment-api.js
- name: Check Acceptance Criteria
run: |
# P95 < 200ms, P99 < 500ms, Error Rate < 0.1%
python scripts/validate-perf-results.py results.json \
--p95-threshold 200 \
--p99-threshold 500 \
--error-rate-threshold 0.001
- name: Compare to Baseline
run: |
# Fail if P99 > 110% of the last successful baseline
python scripts/compare-to-baseline.py results.json \
--regression-threshold 0.10
DP8 – Document Performance Decisions in ADRs
Every architectural decision with performance implications MUST include a performance section in the associated Architecture Decision Record (ADR).
ADR performance section template:
## Performance Impact
### Expected Throughput
- Design target: 1000 req/s at P95 < 200ms
- Load test result: 1200 req/s at P95 = 145ms ✅
### Scaling Strategy
- Auto-scaling: Target tracking on ALBRequestCountPerTarget
- Min instances: 2 (no cold start), Max instances: 20
### Performance Debt Created
- None identified at this time
### Performance Debt Accepted
- CDN not configured in Phase 1 (estimated +50ms for static assets)
- Registered in Performance Debt Register as PERF-DEBT-2026-003
- Target resolution: Q3 2026