Evidence & Audit: Operational Excellence
This page describes the evidence required for an audit of the Operational Excellence pillar. Evidence is categorized by type.
Evidence by Type
IaC Evidence (Infrastructure as Code)
| Description | Required | Related Control | Format |
|---|---|---|---|
Pipeline definitions (.github/workflows/, .gitlab-ci.yml, buildspec.yml) |
Required |
WAF-OPS-010 |
File link in Git |
Terraform remote state configuration |
Required |
WAF-OPS-020 |
Terraform code with backend block |
S3 state bucket with versioning (or Azure/GCP equivalent) |
Required |
WAF-OPS-020 |
Terraform resource |
CloudWatch Log Groups with retention (or Azure/GCP equivalent) |
Required |
WAF-OPS-030 |
Terraform resource |
Load balancer configuration for Blue/Green or Canary |
Optional |
WAF-OPS-080 |
Terraform resource |
AWS Config Recorder / Azure Policy Assignment |
Required |
WAF-OPS-090 |
Terraform resource |
Config Evidence (System Configuration)
| Description | Required | Related Control | Format |
|---|---|---|---|
Branch protection configuration (min. reviewer, CODEOWNERS) |
Required |
WAF-OPS-010, WAF-OPS-050 |
GitHub/GitLab Settings screenshot or API output |
Alert definitions with symptom-based metrics |
Required |
WAF-OPS-040 |
Alert rule YAML or Terraform code |
Alert definitions with runbook URLs |
Required |
WAF-OPS-060 |
Alert rule YAML with |
AWS AppConfig / Feature Flag service configuration |
Optional |
WAF-OPS-080 |
Terraform resource or API export |
AWS CloudTrail configuration (multi-region, validated) |
Required |
WAF-OPS-090 |
Terraform resource |
Process Evidence (Process Records)
| Description | Required | Related Control | Format |
|---|---|---|---|
DORA metrics report (Deployment Frequency, Lead Time, MTTR, CFR) |
Optional |
WAF-OPS-010 |
Dashboard screenshot or CSV export |
Runbook directory with all service runbooks |
Required |
WAF-OPS-060 |
Wiki link or Git directory |
Runbook Coverage Report (services with runbooks / total) |
Required |
WAF-OPS-060 |
Percentage report or table |
Postmortem archive (last 3 months) |
Required |
WAF-OPS-070 |
Wiki link or document list |
Action item tracking from postmortems |
Required |
WAF-OPS-070 |
JIRA filter or GitHub Issues export |
Quarterly Operational Debt Review minutes |
Required |
WAF-OPS-100 |
Meeting notes or ticket history |
Alert noise report (pages/week, actionability rate) |
Optional |
WAF-OPS-040 |
PagerDuty/OpsGenie analytics or CSV |
Governance Evidence (Policies and Decision Records)
| Description | Required | Related Control | Format |
|---|---|---|---|
Change management policy (categories, approval requirements, freezes) |
Required |
WAF-OPS-050 |
Document link (Wiki, Confluence, PDF) |
Post-Incident Review policy (trigger, timeline, template, publication) |
Required |
WAF-OPS-070 |
Document link |
Operational Debt Register (version-controlled) |
Required |
WAF-OPS-100 |
Git file (ops-debt-register.yml) |
SLO definitions for all critical services |
Required |
WAF-OPS-040 |
Document link or YAML file |
Deployment freeze policy (critical business periods) |
Optional |
WAF-OPS-050 |
Calendar configuration or policy document |
Metrics Evidence (Measurement Records)
| Description | Required | Related Control | Format |
|---|---|---|---|
Drift detection log (last 90 days with resolution times) |
Optional |
WAF-OPS-090 |
CSV export or ticket history |
Toil hours report (weekly per engineer) |
Optional |
WAF-OPS-100 |
Table or survey results |
Repeat incident rate (same incident class recurring) |
Optional |
WAF-OPS-070 |
Incident management system report |
Sprint capacity allocation for debt reduction |
Optional |
WAF-OPS-100 |
Sprint planning export |
Audit Checklist
A quick checklist for auditors and self-assessing teams:
Automation & IaC
-
Pipeline definitions are in version control and do not use inline secrets
-
Terraform remote state is configured and has no local state file
-
Branch protection prevents direct commits to main/master
Observability & Alerting
-
Log groups have retention policies (at least 30 days)
-
Alerts reference symptom-based metrics (error rate, latency)
-
All paging alerts have runbook URLs in their description