Best Practices: Operational Excellence
The following best practices translate the theoretical controls into concrete implementation guides. Each best practice includes Terraform examples, CI configurations, common anti-patterns, and maturity level indicators.
Overview of Best Practices
| Best Practice | Description | Related Controls |
|---|---|---|
Pipeline-as-Code, branch protection, approval gates, artifact versioning, deployment automation |
||
Terraform remote state, module libraries, drift detection, brownfield migration, GitOps |
||
Structured logging, distributed tracing, RED metrics, OpenTelemetry, dashboards, log retention |
||
SLO definition, burn-rate alerting, runbook linking, alert fatigue management |
||
Runbook template, versioning, review cadence, operational debt register |
||
Postmortem process, blameless culture, action item tracking, trend analysis |
||
Progressive Delivery, feature flag management, automatic rollback, deployment strategy |
Recommended Reading Order
For teams at the beginning of the OpsEx journey (Level 1 → 2)
-
CI/CD Pipeline – Without a pipeline, no progress
-
Observability Stack – Visibility as the next priority
-
Runbooks – Codify knowledge before it is lost
For teams on the path to automation (Level 2 → 3)
-
Infrastructure as Code – All infrastructure in code
-
Symptom-Based Alerting – Combat alert fatigue
-
Postmortems – Learn systematically from failures
For teams on the path to continuous improvement (Level 3 → 5)
-
Safe Deployments – Minimize blast radius
-
Operational Debt Register – Make toil visible and reduce it