Design Principles – Performance Efficiency

Design principles describe concrete technical guidelines for architecture and implementation decisions. They complement the 7 Performance Principles with technical specificity.

DP1 – Separate Workloads by Load Profile

Different load profiles require different scaling strategies. Synchronous APIs, asynchronous background jobs, batch processing, and event-driven workloads SHOULD be architecturally separated to enable optimal scaling configurations per component.

Implications:

API servers: stateless, horizontally scalable, auto-scaling on request rate
Background jobs: queue-based, scales on queue depth (SQS, Azure Service Bus, Pub/Sub)
Batch processing: spot/preemptible instances, scheduled scaling
Real-time pipelines: streaming services (Kinesis, Event Hubs, Dataflow)

# Anti-pattern: Everything on one EC2
resource "aws_instance" "monolith" {
  instance_type = "m5.8xlarge"  # Large enough for all workloads?
}

# Better: Separate scaling domains
resource "aws_autoscaling_group" "api" { min_size = 2; max_size = 20 }
resource "aws_autoscaling_group" "workers" { min_size = 0; max_size = 50 }
# Workers scale on SQS queue depth

DP2 – Externalise State Before Scaling

Before implementing auto-scaling, all in-process state must be externalized. Sessions, caches, temporary files, and locks must not be stored locally on the instance, as they will be lost during scale-in and unavailable during scale-out.

State externalization:

State type Anti-pattern Correct solution

State type	Anti-pattern	Correct solution
HTTP Session	In-memory session store (default in many frameworks)	Redis/ElastiCache (aws_elasticache_replication_group)
File Uploads	Local filesystem path `/tmp/uploads`	S3/Azure Blob/GCS directly from client or pre-signed URLs
Distributed Locks	File lock or in-memory mutex	Redis SETNX, DynamoDB conditional writes
Temporary Computation Data	Local file for multi-step processes	S3 for stage output, queue for stage coordination

HTTP Session

In-memory session store (default in many frameworks)

Redis/ElastiCache (aws_elasticache_replication_group)

File Uploads

Local filesystem path /tmp/uploads

S3/Azure Blob/GCS directly from client or pre-signed URLs

Distributed Locks

File lock or in-memory mutex

Redis SETNX, DynamoDB conditional writes

Temporary Computation Data

Local file for multi-step processes

S3 for stage output, queue for stage coordination

DP3 – Use Latency-Optimized Routing

Network latency accumulates at every service hop. In a microservices architecture, a request can trigger 10+ internal service calls. Every hop must be latency-optimized.

Routing decisions:

AZ affinity: Service-A on AZ-a preferably communicates with Service-B on AZ-a
VPC endpoints: All cloud service APIs (S3, DynamoDB, SSM, ECR) via gateway/interface endpoints
Service mesh: For high-frequency service-to-service communication (Istio, AWS App Mesh)
gRPC instead of REST: For internal APIs with high call volume (binary protocol, HTTP/2)

# VPC endpoints for all major AWS services
resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [aws_route_table.private.id]
}

resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true
}

DP4 – Configure Connection Pools Explicitly

Connection pool exhaustion is one of the most common causes of performance degradation under load. Every service that calls a database or an external API MUST have explicitly configured connection pool sizes.

Pool sizing formula (database server):

Optimal pool size = (CPU cores * 2) + number of active disks
Example: 4-core RDS + 2 disks = 10 connections per application instance
With 10 app instances: max_connections_RDS = 100 (10 * 10)

# SQLAlchemy – explicit pool configuration
engine = create_engine(
    DATABASE_URL,
    pool_size=10,           # Always-open connections
    max_overflow=5,         # Temporary additional connections
    pool_timeout=30,        # Seconds to wait for a free connection
    pool_recycle=3600,      # Recycle connection after 1h
    pool_pre_ping=True,     # Test connection before reuse
)

DP5 – Define SLOs Before the First Deployment

SLOs MUST be defined before a service goes to production – not after. A service without an SLO has no objective criterion for "good enough performance".

SLO template:

# docs/slos/payment-api.yml
service: "payment-api"
slos:
  - name: "availability"
    sli: "success_rate"
    target: 99.9      # 99.9% of requests succeed
    window: "30d"
  - name: "latency_p95"
    sli: "request_latency_p95"
    target: 200       # P95 < 200ms
    unit: "ms"
    window: "30d"
  - name: "latency_p99"
    sli: "request_latency_p99"
    target: 500       # P99 < 500ms
    unit: "ms"
    window: "30d"
error_budget:
  period: "30d"
  policy: "feature_freeze_on_exhaustion"

DP6 – Isolate I/O-Intensive Workloads

I/O-intensive workloads (database access, file I/O, external API calls) MUST be isolated from CPU-intensive workloads. Async I/O and non-blocking patterns are more important for I/O-bound services than horizontal scaling alone.

I/O isolation patterns:

Async/Non-Blocking: asyncio (Python), async/await (Node.js, .NET), Reactive (Java)
Bulkhead Pattern: Separate thread pools for internal and external calls
Circuit Breaker: Prevent cascading failures from slow downstream services
Timeout Pyramid: Outer timeout > inner timeout > DB timeout

# Bulkhead: Separate executor for external API calls
internal_executor = ThreadPoolExecutor(max_workers=50)
external_executor = ThreadPoolExecutor(max_workers=10)  # Limits external calls

async def get_payment(payment_id: str):
    # Internal DB query
    db_result = await loop.run_in_executor(internal_executor, db_query, payment_id)
    # External API call (limited)
    ext_result = await loop.run_in_executor(external_executor, external_api, payment_id)
    return merge(db_result, ext_result)

DP7 – Performance Validation in CI/CD

Performance validation is a first-class citizen in the deployment process. Performance regressions must be taken as seriously as functional bugs.

CI/CD performance gate setup:

# .github/workflows/deploy.yml
jobs:
  performance-validation:
    needs: [build, unit-test, integration-test]
    runs-on: ubuntu-latest
    steps:
      - name: Run k6 Load Test
        run: |
          k6 run \
            --vus 50 --duration 5m \
            --env BASE_URL=${{ env.STAGING_URL }} \
            --out json=results.json \
            tests/performance/payment-api.js

      - name: Check Acceptance Criteria
        run: |
          # P95 < 200ms, P99 < 500ms, Error Rate < 0.1%
          python scripts/validate-perf-results.py results.json \
            --p95-threshold 200 \
            --p99-threshold 500 \
            --error-rate-threshold 0.001

      - name: Compare to Baseline
        run: |
          # Fail if P99 > 110% of the last successful baseline
          python scripts/compare-to-baseline.py results.json \
            --regression-threshold 0.10

DP8 – Document Performance Decisions in ADRs

Every architectural decision with performance implications MUST include a performance section in the associated Architecture Decision Record (ADR).

ADR performance section template:

## Performance Impact

### Expected Throughput
- Design target: 1000 req/s at P95 < 200ms
- Load test result: 1200 req/s at P95 = 145ms ✅

### Scaling Strategy
- Auto-scaling: Target tracking on ALBRequestCountPerTarget
- Min instances: 2 (no cold start), Max instances: 20

### Performance Debt Created
- None identified at this time

### Performance Debt Accepted
- CDN not configured in Phase 1 (estimated +50ms for static assets)
  - Registered in Performance Debt Register as PERF-DEBT-2026-003
  - Target resolution: Q3 2026