Best Practice: Load Tests & Stress Tests

Context

Performance regressions are only discovered in production without load tests – often through user complaints or alerts. A service without load test validation is an untested safety net: it sounds reassuring, but nobody knows whether it holds.

Typical problems without a load testing process:

Auto-scaling configuration never validated under load – trigger fires too late
Database query plan changes with data volume and becomes slow only at 10M entries
Memory leak only visible under extended load – not detected in unit tests
New feature causes 30% P99 regression – goes unnoticed into production

Related Controls

WAF-PERF-060 – Load & Stress Testing in CI/CD Pipeline

Target State

Fully integrated load testing process:

Automated: Load tests run automatically on every deployment
Acceptance criteria: Clearly defined and technically enforced
Regression detection: Baseline comparison detects > 10% P99 degradation
Stress test cadence: Quarterly determination of capacity limits

Technical Implementation

Step 1: Create k6 Load Test Script

// tests/performance/payment-api.js
import http from 'k6/http';
import { check, group, sleep } from 'k6';
import { Counter, Rate, Trend } from 'k6/metrics';

// Custom Metrics
const paymentErrors = new Counter('payment_errors');
const paymentDuration = new Trend('payment_duration', true);

// Acceptance criteria as thresholds
export const options = {
  stages: [
    { duration: '2m', target: 10 },   // Warm-up
    { duration: '5m', target: 50 },   // Normal load
    { duration: '10m', target: 100 }, // Peak load
    { duration: '3m', target: 0 },    // Ramp-down
  ],
  thresholds: {
    // SLO requirements as gate
    'http_req_duration{type:payment}': [
      'p(95)<200',   // P95 < 200ms
      'p(99)<500',   // P99 < 500ms
    ],
    'http_req_failed': ['rate<0.001'],  // < 0.1% errors
    'payment_errors': ['count<10'],     // Absolute max 10 errors
  },
};

const BASE_URL = __ENV.BASE_URL || 'https://api-staging.payment.example.com';
const API_KEY = __ENV.API_KEY;

export function setup() {
  // Prepare test data
  return { userId: '550e8400-e29b-41d4-a716-446655440000' };
}

export default function (data) {
  group('Payment API', () => {
    // Scenario 1: Create payment
    group('create_payment', () => {
      const payload = JSON.stringify({
        amount: 100,
        currency: 'EUR',
        user_id: data.userId,
        idempotency_key: `k6-${__VU}-${__ITER}`,
      });

      const start = Date.now();
      const res = http.post(
        `${BASE_URL}/api/v1/payments`,
        payload,
        {
          headers: {
            'Content-Type': 'application/json',
            'X-API-Key': API_KEY,
          },
          tags: { type: 'payment' },
        }
      );
      paymentDuration.add(Date.now() - start);

      const success = check(res, {
        'status is 201': (r) => r.status === 201,
        'has payment_id': (r) => r.json('id') !== undefined,
        'response time < 200ms': (r) => r.timings.duration < 200,
      });

      if (!success) {
        paymentErrors.add(1);
      }
    });

    sleep(0.5);

    // Scenario 2: Query payment status
    group('get_payment_status', () => {
      const res = http.get(
        `${BASE_URL}/api/v1/payments?user_id=${data.userId}&status=pending`,
        {
          headers: { 'X-API-Key': API_KEY },
          tags: { type: 'payment' },
        }
      );

      check(res, {
        'status is 200': (r) => r.status === 200,
        'response time < 100ms': (r) => r.timings.duration < 100,
      });
    });
  });

  sleep(0.1);
}

Step 2: CI/CD Integration (GitHub Actions)

# .github/workflows/performance.yml
name: Performance Tests

on:
  push:
    branches: [main, staging]
  pull_request:
    branches: [main]

jobs:
  load-test:
    runs-on: ubuntu-latest
    needs: [build, integration-tests]

    steps:
      - uses: actions/checkout@v4

      - name: Deploy to Staging
        run: |
          # Staging deployment (Terraform or kubectl apply)
          terraform -chdir=environments/staging apply -auto-approve

      - name: Wait for Staging Readiness
        run: |
          for i in {1..30}; do
            if curl -sf "${STAGING_URL}/health"; then break; fi
            sleep 10
          done

      - name: Run k6 Load Test
        uses: grafana/k6-action@v0.3.1
        with:
          filename: tests/performance/payment-api.js
          flags: |
            --env BASE_URL=${{ env.STAGING_URL }}
            --env API_KEY=${{ secrets.STAGING_API_KEY }}
            --out json=results/k6-results.json

      - name: Upload Test Results
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: k6-results-${{ github.sha }}
          path: results/k6-results.json
          retention-days: 90

      - name: Check Regression vs Baseline
        run: |
          python scripts/compare-baseline.py \
            --current results/k6-results.json \
            --baseline results/baseline.json \
            --regression-threshold 0.10  # 10% P99 degradation = Fail

      - name: Update Baseline on Success
        if: success() && github.ref == 'refs/heads/main'
        run: |
          cp results/k6-results.json results/baseline.json
          git add results/baseline.json
          git commit -m "chore: update performance baseline after successful load test"
          git push

Step 3: Baseline Comparison Script

#!/usr/bin/env python3
# scripts/compare-baseline.py

import json
import sys
import argparse

def load_k6_results(filename: str) -> dict:
    with open(filename) as f:
        data = json.load(f)
    metrics = data.get('metrics', {})
    return {
        'p95': metrics.get('http_req_duration', {}).get('values', {}).get('p(95)', 0),
        'p99': metrics.get('http_req_duration', {}).get('values', {}).get('p(99)', 0),
        'error_rate': metrics.get('http_req_failed', {}).get('values', {}).get('rate', 0),
    }

def compare(current: dict, baseline: dict, threshold: float) -> bool:
    """Returns True if regression detected."""
    p99_change = (current['p99'] - baseline['p99']) / max(baseline['p99'], 1)
    p95_change = (current['p95'] - baseline['p95']) / max(baseline['p95'], 1)

    print(f"P95: {current['p95']:.0f}ms (baseline: {baseline['p95']:.0f}ms, change: {p95_change:+.1%})")
    print(f"P99: {current['p99']:.0f}ms (baseline: {baseline['p99']:.0f}ms, change: {p99_change:+.1%})")

    if p99_change > threshold:
        print(f"❌ REGRESSION: P99 increased by {p99_change:.1%} (threshold: {threshold:.1%})")
        return True

    print(f"✅ No regression detected (P99 change: {p99_change:+.1%})")
    return False

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--current', required=True)
    parser.add_argument('--baseline', required=True)
    parser.add_argument('--regression-threshold', type=float, default=0.10)
    args = parser.parse_args()

    current = load_k6_results(args.current)
    baseline = load_k6_results(args.baseline)
    regression = compare(current, baseline, args.regression_threshold)
    sys.exit(1 if regression else 0)

Step 4: Stress Test Cadence

Quarterly stress tests determine capacity limits:

// tests/performance/stress-test.js
export const options = {
  stages: [
    { duration: '5m',  target: 100  },  // Normal load
    { duration: '5m',  target: 200  },  // 2x peak
    { duration: '5m',  target: 400  },  // 4x peak
    { duration: '10m', target: 600  },  // Until breaking point
    { duration: '5m',  target: 0    },  // Recovery
  ],
  // No threshold – we want to see where it breaks, not pass/fail
};

Common Anti-Patterns

Load test only before go-live: Performance regressions occur with every change, not just at launch.
Too few VUs: 5 VUs do not test concurrency problems; derive realistic VU numbers from production analysis.
Testing against production: Stress testing in production without prior staging validation is risky.
Thresholds too loose: P99 < 5000ms is not a meaningful acceptance criterion.

Metrics

Proportion of CI/CD runs with load test as a gate (target: 100% for production deployments)
Performance regression rate (proportion of deployments that fail the load test gate)
Time between deployment and load test result (target: < 20 minutes)
Stress test capacity reserve (current max load / expected peak: target >= 2x)

Maturity Level

Level 1 – No load tests; performance under load unknown
Level 2 – Manual load tests before releases; no CI integration
Level 3 – Automated CI/CD gate; acceptance criteria enforced
Level 4 – Regression detection; quarterly stress tests
Level 5 – Continuous performance tests; chaos engineering integrated