Best Practice: Load Tests & Stress Tests
Context
Performance regressions are only discovered in production without load tests – often through user complaints or alerts. A service without load test validation is an untested safety net: it sounds reassuring, but nobody knows whether it holds.
Typical problems without a load testing process:
-
Auto-scaling configuration never validated under load – trigger fires too late
-
Database query plan changes with data volume and becomes slow only at 10M entries
-
Memory leak only visible under extended load – not detected in unit tests
-
New feature causes 30% P99 regression – goes unnoticed into production
Related Controls
-
WAF-PERF-060 – Load & Stress Testing in CI/CD Pipeline
Target State
Fully integrated load testing process:
-
Automated: Load tests run automatically on every deployment
-
Acceptance criteria: Clearly defined and technically enforced
-
Regression detection: Baseline comparison detects > 10% P99 degradation
-
Stress test cadence: Quarterly determination of capacity limits
Technical Implementation
Step 1: Create k6 Load Test Script
// tests/performance/payment-api.js
import http from 'k6/http';
import { check, group, sleep } from 'k6';
import { Counter, Rate, Trend } from 'k6/metrics';
// Custom Metrics
const paymentErrors = new Counter('payment_errors');
const paymentDuration = new Trend('payment_duration', true);
// Acceptance criteria as thresholds
export const options = {
stages: [
{ duration: '2m', target: 10 }, // Warm-up
{ duration: '5m', target: 50 }, // Normal load
{ duration: '10m', target: 100 }, // Peak load
{ duration: '3m', target: 0 }, // Ramp-down
],
thresholds: {
// SLO requirements as gate
'http_req_duration{type:payment}': [
'p(95)<200', // P95 < 200ms
'p(99)<500', // P99 < 500ms
],
'http_req_failed': ['rate<0.001'], // < 0.1% errors
'payment_errors': ['count<10'], // Absolute max 10 errors
},
};
const BASE_URL = __ENV.BASE_URL || 'https://api-staging.payment.example.com';
const API_KEY = __ENV.API_KEY;
export function setup() {
// Prepare test data
return { userId: '550e8400-e29b-41d4-a716-446655440000' };
}
export default function (data) {
group('Payment API', () => {
// Scenario 1: Create payment
group('create_payment', () => {
const payload = JSON.stringify({
amount: 100,
currency: 'EUR',
user_id: data.userId,
idempotency_key: `k6-${__VU}-${__ITER}`,
});
const start = Date.now();
const res = http.post(
`${BASE_URL}/api/v1/payments`,
payload,
{
headers: {
'Content-Type': 'application/json',
'X-API-Key': API_KEY,
},
tags: { type: 'payment' },
}
);
paymentDuration.add(Date.now() - start);
const success = check(res, {
'status is 201': (r) => r.status === 201,
'has payment_id': (r) => r.json('id') !== undefined,
'response time < 200ms': (r) => r.timings.duration < 200,
});
if (!success) {
paymentErrors.add(1);
}
});
sleep(0.5);
// Scenario 2: Query payment status
group('get_payment_status', () => {
const res = http.get(
`${BASE_URL}/api/v1/payments?user_id=${data.userId}&status=pending`,
{
headers: { 'X-API-Key': API_KEY },
tags: { type: 'payment' },
}
);
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 100ms': (r) => r.timings.duration < 100,
});
});
});
sleep(0.1);
}
Step 2: CI/CD Integration (GitHub Actions)
# .github/workflows/performance.yml
name: Performance Tests
on:
push:
branches: [main, staging]
pull_request:
branches: [main]
jobs:
load-test:
runs-on: ubuntu-latest
needs: [build, integration-tests]
steps:
- uses: actions/checkout@v4
- name: Deploy to Staging
run: |
# Staging deployment (Terraform or kubectl apply)
terraform -chdir=environments/staging apply -auto-approve
- name: Wait for Staging Readiness
run: |
for i in {1..30}; do
if curl -sf "${STAGING_URL}/health"; then break; fi
sleep 10
done
- name: Run k6 Load Test
uses: grafana/k6-action@v0.3.1
with:
filename: tests/performance/payment-api.js
flags: |
--env BASE_URL=${{ env.STAGING_URL }}
--env API_KEY=${{ secrets.STAGING_API_KEY }}
--out json=results/k6-results.json
- name: Upload Test Results
uses: actions/upload-artifact@v4
if: always()
with:
name: k6-results-${{ github.sha }}
path: results/k6-results.json
retention-days: 90
- name: Check Regression vs Baseline
run: |
python scripts/compare-baseline.py \
--current results/k6-results.json \
--baseline results/baseline.json \
--regression-threshold 0.10 # 10% P99 degradation = Fail
- name: Update Baseline on Success
if: success() && github.ref == 'refs/heads/main'
run: |
cp results/k6-results.json results/baseline.json
git add results/baseline.json
git commit -m "chore: update performance baseline after successful load test"
git push
Step 3: Baseline Comparison Script
#!/usr/bin/env python3
# scripts/compare-baseline.py
import json
import sys
import argparse
def load_k6_results(filename: str) -> dict:
with open(filename) as f:
data = json.load(f)
metrics = data.get('metrics', {})
return {
'p95': metrics.get('http_req_duration', {}).get('values', {}).get('p(95)', 0),
'p99': metrics.get('http_req_duration', {}).get('values', {}).get('p(99)', 0),
'error_rate': metrics.get('http_req_failed', {}).get('values', {}).get('rate', 0),
}
def compare(current: dict, baseline: dict, threshold: float) -> bool:
"""Returns True if regression detected."""
p99_change = (current['p99'] - baseline['p99']) / max(baseline['p99'], 1)
p95_change = (current['p95'] - baseline['p95']) / max(baseline['p95'], 1)
print(f"P95: {current['p95']:.0f}ms (baseline: {baseline['p95']:.0f}ms, change: {p95_change:+.1%})")
print(f"P99: {current['p99']:.0f}ms (baseline: {baseline['p99']:.0f}ms, change: {p99_change:+.1%})")
if p99_change > threshold:
print(f"❌ REGRESSION: P99 increased by {p99_change:.1%} (threshold: {threshold:.1%})")
return True
print(f"✅ No regression detected (P99 change: {p99_change:+.1%})")
return False
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--current', required=True)
parser.add_argument('--baseline', required=True)
parser.add_argument('--regression-threshold', type=float, default=0.10)
args = parser.parse_args()
current = load_k6_results(args.current)
baseline = load_k6_results(args.baseline)
regression = compare(current, baseline, args.regression_threshold)
sys.exit(1 if regression else 0)
Step 4: Stress Test Cadence
Quarterly stress tests determine capacity limits:
// tests/performance/stress-test.js
export const options = {
stages: [
{ duration: '5m', target: 100 }, // Normal load
{ duration: '5m', target: 200 }, // 2x peak
{ duration: '5m', target: 400 }, // 4x peak
{ duration: '10m', target: 600 }, // Until breaking point
{ duration: '5m', target: 0 }, // Recovery
],
// No threshold – we want to see where it breaks, not pass/fail
};
Common Anti-Patterns
-
Load test only before go-live: Performance regressions occur with every change, not just at launch.
-
Too few VUs: 5 VUs do not test concurrency problems; derive realistic VU numbers from production analysis.
-
Testing against production: Stress testing in production without prior staging validation is risky.
-
Thresholds too loose: P99 < 5000ms is not a meaningful acceptance criterion.
Metrics
-
Proportion of CI/CD runs with load test as a gate (target: 100% for production deployments)
-
Performance regression rate (proportion of deployments that fail the load test gate)
-
Time between deployment and load test result (target: < 20 minutes)
-
Stress test capacity reserve (current max load / expected peak: target >= 2x)
Maturity Level
Level 1 – No load tests; performance under load unknown
Level 2 – Manual load tests before releases; no CI integration
Level 3 – Automated CI/CD gate; acceptance criteria enforced
Level 4 – Regression detection; quarterly stress tests
Level 5 – Continuous performance tests; chaos engineering integrated