Best Practice: Building and Securing a CI/CD Pipeline
Context
A CI/CD pipeline is the foundation of every Operational Excellence initiative. Without automated deployments, all other OpsEx measures are more effort-intensive, error-prone, and harder to scale.
This best practice describes: Building a production-ready pipeline from pipeline definition through approval gates, branch protection, and artifact versioning.
Target State
A production-ready CI/CD pipeline:
-
Is fully defined as code (YAML, HCL) and stored in version control
-
Runs on every pull request (tests, linting, security scans)
-
Deploys automatically to staging after merge to main
-
Requires manual approval for production deployments
-
Uses versioned, immutable artifacts
-
Has a deployment freeze mechanism for critical periods
Technical Implementation
Step 1: Define the Pipeline Structure
# .github/workflows/ci-cd.yml (GitHub Actions)
name: CI/CD Pipeline
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
AWS_REGION: eu-central-1
ECR_REPOSITORY: payment-service
jobs:
# ===== STAGE 1: Lint & Test =====
test:
name: Lint, Test & Security Scan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Lint
run: make lint
- name: Unit Tests
run: make test-unit
- name: Security Scan (Trivy)
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
security-checks: 'vuln,secret'
exit-code: '1'
severity: 'HIGH,CRITICAL'
# ===== STAGE 2: Build & Publish =====
build:
name: Build & Push Container Image
runs-on: ubuntu-latest
needs: test
if: github.ref == 'refs/heads/main'
outputs:
image-tag: ${{ steps.meta.outputs.tags }}
image-digest: ${{ steps.build.outputs.digest }}
steps:
- uses: actions/checkout@v4
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: ${{ env.AWS_REGION }}
- name: Login to ECR
id: ecr-login
uses: aws-actions/amazon-ecr-login@v2
- name: Docker Meta (tag with Git SHA)
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ steps.ecr-login.outputs.registry }}/${{ env.ECR_REPOSITORY }}
tags: |
type=sha,prefix=sha-,format=short
- name: Build and Push
id: build
uses: docker/build-push-action@v5
with:
push: true
tags: ${{ steps.meta.outputs.tags }}
# ===== STAGE 3: Deploy to Staging =====
deploy-staging:
name: Deploy to Staging
runs-on: ubuntu-latest
needs: build
environment: staging
steps:
- name: Deploy to ECS Staging
run: |
aws ecs update-service \
--cluster payment-staging \
--service payment-service \
--force-new-deployment
- name: Wait for Stable Deployment
run: |
aws ecs wait services-stable \
--cluster payment-staging \
--services payment-service
- name: Smoke Test
run: make smoke-test ENVIRONMENT=staging
# ===== STAGE 4: Deploy to Production (Manual Gate) =====
deploy-production:
name: Deploy to Production
runs-on: ubuntu-latest
needs: deploy-staging
environment: production # environment protection rules = approval gate
steps:
- name: Deploy to ECS Production (Canary)
run: |
aws deploy create-deployment \
--application-name payment-service \
--deployment-group-name production \
--container-name payment-service \
--container-port 8080 \
--image "${{ needs.build.outputs.image-tag }}"
Step 2: Configure Branch Protection
// GitHub Branch Protection (via Terraform)
resource "github_branch_protection" "main" {
repository_id = github_repository.app.node_id
pattern = "main"
required_status_checks {
strict = true
contexts = ["Lint, Test & Security Scan"]
}
required_pull_request_reviews {
required_approving_review_count = 1
require_code_owner_reviews = true
dismiss_stale_reviews = true
}
enforce_admins = true
allows_force_pushes = false
allows_deletions = false
}
Step 3: Define CODEOWNERS
# CODEOWNERS – each line: path owners
# Changes to critical paths require review from the listed teams
# Terraform Infrastructure
/infrastructure/ @platform-team @security-team
# CI/CD Pipelines
/.github/workflows/ @platform-team
# Application Code
/src/ @payment-team
# Security-sensitive configuration
/infrastructure/security/ @security-team
Step 4: Configure a Deployment Freeze
# GitHub Actions: Deployment Freeze via Environment Protection
# In GitHub Settings > Environments > production:
# - Required reviewers: @tech-leads
# - Deployment branches: main only
# - Wait timer: 0 minutes (reviewer requirement only)
# Alternative: programmatic freeze check in the pipeline
- name: Check Deployment Freeze
run: |
FREEZE_START=$(date -d "2025-12-20" +%s)
FREEZE_END=$(date -d "2026-01-05" +%s)
NOW=$(date +%s)
if [ "$NOW" -ge "$FREEZE_START" ] && [ "$NOW" -le "$FREEZE_END" ]; then
echo "DEPLOYMENT FREEZE ACTIVE until 2026-01-05"
echo "Emergency deployments require CTO approval: ops-emergency@company.com"
exit 1
fi
Common Anti-Patterns
| Anti-Pattern | Problem |
|---|---|
|
Not reproducible; every new build changes the deployed image without a deployment |
Secrets in pipeline code |
GitHub Actions logs are visible; secrets in |
No branch protection for |
Engineers push directly; no review; no tests |
Pipeline without security scan |
Vulnerabilities are only discovered in production |
Manual deployment steps after the build |
"Semi-automated" is not automated; the first manual step negates all automation gains |
Pipeline YAML not in repository |
"Deployment scripts" on a server – no review, no version history |
Metrics
Measure these metrics to evaluate pipeline maturity:
-
Deployment Frequency: How often is deployed to production? (target: daily+)
-
Lead Time for Changes: Commit to production? (target: < 1 hour for hotfix, < 1 day for feature)
-
Pipeline throughput time: Total pipeline duration (target: < 15 minutes)
-
Pipeline success rate: % of pipelines that are green (target: > 90%)
Maturity Levels
| Level | Characteristics |
|---|---|
Level 1 |
Deployments via SSH/console. No pipeline definition. No automated tests. |
Level 2 |
CI pipeline exists. Tests run automatically. Deployment scripts exist but are executed manually. |
Level 3 |
Full CI/CD. Branch protection. Approval gate for production. Artifacts versioned. |
Level 4 |
Deployment metrics measured. Canary/Blue-Green deployments. Automatic rollback. |
Level 5 |
DORA Elite: multiple times daily. Change Failure Rate < 5%. Continuous Deployment possible. |