Security Design Patterns

The following six design patterns are proven solution architectures for common security challenges in cloud environments. Each pattern is directly linked to WAF-SEC controls.

Pattern 1: Hub-and-Spoke Network

Central firewall and control, isolated VPCs for individual workloads.

Problem

In multi-account or multi-VPC environments without a central network architecture:

Inconsistent security group configurations in every VPC
Uncontrolled east-west traffic between workloads
No central visibility into network flows
Difficult enforcement of egress control

Solution

Hub VPC (Shared Services / Transit)
├── AWS Network Firewall (or Palo Alto / Fortinet)
├── NAT Gateway (central internet egress)
├── VPN Gateway / Direct Connect
├── DNS (Route 53 Resolver)
└── Transit Gateway

    Spoke VPC 1 (Production)         Spoke VPC 2 (Staging)
    ├── App Tier (private)           ├── App Tier (private)
    └── Data Tier (private)          └── Data Tier (private)

Traffic flow:
  Spoke VPC → Transit Gateway → Hub VPC (Firewall) → Internet
  Spoke VPC ← Transit Gateway ← Hub VPC (Firewall) ← Internet

Implementation (AWS)

Transit Gateway: Connects all VPCs to the hub
AWS Network Firewall: In the hub for centralized L7 filtering
VPC Endpoints: In every spoke VPC for AWS services (no internet route needed)
RAM (Resource Access Manager): For shared subnets between accounts

Related Controls

WAF-SEC-050 – Network Segmentation & Security Group Hardening

When to Apply?

Organizations with multiple AWS accounts (landing zone)
Workloads with different compliance requirements
When central egress control and firewall policy are required

Pattern 2: Just-in-Time (JIT) Privileged Access

No permanent admin rights. Elevated permissions are granted only for the duration of a specific task.

Problem

Permanent admin roles are a permanent attack risk:

Compromised admin credentials allow complete account takeover
Administrators accidentally perform destructive actions
Access review is difficult: "Does this person really need admin access?"

Solution

Normal state:
  User → IAM role with minimal read permissions (ReadOnly)

JIT request:
  User → JIT system → Request (justification, time period, scope)
                    → Approval by second person
                    → Temporary role elevation (1-4 hours)
                    → Automatic revocation after expiry
                    → Audit log of the entire process

Technical implementation:
  ├── AWS IAM Identity Center with Permission Set Elevation
  ├── HashiCorp Boundary + Vault for JIT credentials
  ├── Custom Lambda + SNS for approval workflow
  └── CyberArk / BeyondTrust (enterprise solution)

Implementation (AWS)

# JIT role: Time-limited admin permissions
resource "aws_iam_role" "jit_admin" {
  name = "jit-admin-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { AWS = "arn:aws:iam::${var.account_id}:role/jit-approval-system" }
      Action    = "sts:AssumeRole"
      Condition = {
        # Session may last at most 4 hours
        NumericLessThanEquals = {
          "sts:DurationSeconds" = "14400"
        }
      }
    }]
  })
}

# CloudWatch alarm: Alert on JIT admin usage
resource "aws_cloudwatch_metric_alarm" "jit_admin_usage" {
  alarm_name = "jit-admin-role-assumed"
  # ... CloudTrail AssumeRole event for jit-admin-role
}

Related Controls

WAF-SEC-010 – IAM Baseline
WAF-SEC-020 – Least Privilege

When to Apply?

Whenever privileged cloud accesses exist for administrators
Especially in regulated environments (ISO 27001, SOC 2, GDPR)
As a replacement for permanent admin roles

Pattern 3: Secrets Injection Pattern

Secrets are loaded at runtime from a secrets store – never baked into images or passed as ENV variables.

Problem

Secrets end up in insecure places in many ways:

Directly in Terraform code (then in the state file)
As environment variable in task definitions (visible in plaintext in CloudWatch logs)
Baked into Docker images (visible in image layer)
Committed to Git repositories (in history forever)

Solution

Container start time:
  ECS Task → IAM Task Role (AssumeRole via IRSA/Task Policy)
           → Secrets Manager API (GetSecretValue)
           → Secret is injected as ENV variable INTO the container
           → Secret never leaves the AWS network (VPC endpoint)

OR: Sidecar pattern (Vault Agent):
  Vault Agent Sidecar → Vault Server (AppRole/Kubernetes Auth)
                      → Reads secret, writes to shared volume
                      → App container reads secret from file

Implementation (AWS ECS)

# Secrets Manager Secret
resource "aws_secretsmanager_secret" "db_credentials" {
  name                    = "prod/myapp/db-credentials"
  recovery_window_in_days = 30
  kms_key_id              = aws_kms_key.secrets.arn  # CMK for secrets
}

# ECS Task Definition: Secret as container secret (not environment variable!)
resource "aws_ecs_task_definition" "app" {
  family                   = "myapp"
  task_role_arn            = aws_iam_role.app_task.arn
  execution_role_arn       = aws_iam_role.ecs_execution.arn

  container_definitions = jsonencode([{
    name  = "app"
    image = "${var.ecr_repository}:${var.image_tag}"

    # Correct: secrets from Secrets Manager
    secrets = [
      {
        name      = "DB_PASSWORD"
        valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:password::"
      },
      {
        name      = "DB_USERNAME"
        valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:username::"
      }
    ]
    # Not: environment = [{ name = "DB_PASSWORD", value = var.db_password }]
  }])
}

Related Controls

WAF-SEC-060 – Secrets Management

When to Apply?

For every application that needs database passwords, API keys or certificates
In CI/CD pipelines (OIDC instead of static access keys)
In all container-based workloads

Pattern 4: Immutable Infrastructure

Infrastructure is replaced by new versions – not repaired by manual changes. No SSH in production.

Problem

Mutable servers (directly configured) lead to:

Configuration drift: what is in the IaC code no longer matches the live system
Security gaps from forgotten manually applied hotfixes
Difficult forensics: what was changed before the incident?
SSH as a permanent attack surface

Solution

Change cycle with Immutable Infrastructure:

Old way (mutable):
  Developer → SSH access → Configuration change in live system

New way (immutable):
  Developer → Code change in repository
             → CI/CD pipeline (Terraform plan + apply)
             → New AMI baked (if EC2)
             → Blue/green deployment (swap to new AMI)
             → Old AMI decommissioned

If debugging needed:
  Analyze CloudWatch Logs (no SSH session)
  AWS Systems Manager Session Manager (if absolutely necessary, with full audit log)

Implementation

# EC2: No SSH key, SSM instead of SSH for debugging
resource "aws_instance" "app" {
  ami                  = var.app_ami  # Patched, scanned AMI
  instance_type        = "t3.medium"
  iam_instance_profile = aws_iam_instance_profile.ssm_access.name

  # No key_name = ... (no SSH key)

  # SSM Agent is pre-installed in modern AMIs
  # Access via: aws ssm start-session --target instance-id

  metadata_options {
    http_tokens = "required"  # Enforce IMDSv2
  }
}

# Security group: no SSH port 22 open
resource "aws_security_group" "app" {
  # No ingress rule on port 22
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Related Controls

WAF-SEC-010 – IAM Baseline (SSM instead of SSH)
WAF-SEC-050 – Network Segmentation (no SSH port)

When to Apply?

For all EC2-based production workloads
For ECS/EKS nodes (always)
For all environments where compliance requirements apply

Pattern 5: Security Event Pipeline

CloudTrail → CloudWatch → SIEM → Alerting: complete, tamper-proof security event pipeline.

Problem

Without a structured security event pipeline:

Events are lost or cannot be evaluated
Attacks are detected too late or not at all
Forensics after an incident is impossible
Compliance evidence is missing

Solution

Security Event Pipeline:

AWS CloudTrail (API audit logs)
  ├── → S3 (tamper-proof, log file validation, Object Lock)
  └── → CloudWatch Logs
          ├── Metric Filter → CloudWatch Alarm → SNS → PagerDuty/Slack
          └── → Log Group (Retention: 365 days)

AWS GuardDuty (threat intelligence)
  └── → Security Hub (aggregates all findings)
          ├── → EventBridge (automatic response)
          └── → SIEM (Splunk / Elastic / Datadog)

VPC Flow Logs
  └── → S3 or CloudWatch Logs

AWS Config (configuration changes)
  └── → Security Hub findings

Alerting:
  Critical (immediate): Root login, GuardDuty High Severity → PagerDuty (24/7)
  High (< 1h): IAM policy change, Security Group 0.0.0.0/0 → Slack #security-alerts
  Medium (< 4h): Failed login attempts, unusual API calls → ticketing system

Implementation (Excerpt)

# CloudWatch Metric Filter: Root account login
resource "aws_cloudwatch_log_metric_filter" "root_login" {
  name           = "root-account-login"
  log_group_name = aws_cloudwatch_log_group.cloudtrail.name
  pattern        = "{ $.userIdentity.type = \"Root\" && $.eventName = \"ConsoleLogin\" }"

  metric_transformation {
    name      = "RootLoginCount"
    namespace = "SecurityMetrics"
    value     = "1"
  }
}

resource "aws_cloudwatch_metric_alarm" "root_login_alarm" {
  alarm_name          = "root-account-login-detected"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "1"
  metric_name         = "RootLoginCount"
  namespace           = "SecurityMetrics"
  period              = "60"
  statistic           = "Sum"
  threshold           = "1"
  alarm_actions       = [aws_sns_topic.security_critical.arn]
}

Related Controls

WAF-SEC-080 – Security Monitoring & Threat Detection

When to Apply?

In all production environments (no exceptions)
Ideally also in staging/dev for early detection

Pattern 6: Policy-as-Code Gateway

OPA or wafpass in CI/CD as quality gate – security violations are blocked before the merge.

Problem

Without automated security gate in CI/CD:

Security vulnerabilities enter production unnoticed
Security reviews are time-consuming and often superficial
Inconsistent enforcement of security standards across teams
Compliance evidence is missing (no audit trail of PR checks)

Solution

Pull Request → CI/CD Pipeline

Pipeline steps:
  1. terraform fmt --check (formatting)
  2. terraform validate (syntax)
  3. wafpass check --pillar security (WAF++ controls)
        ├── Critical findings → Pipeline FAIL (PR is blocked)
        └── Medium findings → Warning (PR passes but with comment)
  4. tfsec / checkov (additional IaC security scanners)
  5. terraform plan (what would change?)
  6. OPA / Conftest (custom policies)
  7. Manual review (for security-relevant resources)
  8. terraform apply (after merge, automatic)

Implementation (GitHub Actions)

# .github/workflows/security-check.yml
name: Security Gate

on: [pull_request]

jobs:
  wafpass:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
      pull-requests: write

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS Credentials (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_GITHUB_ACTIONS_ROLE }}
          aws-region: eu-central-1

      - name: WAF++ Security Check
        run: |
          wafpass check \
            --pillar security \
            --path ./infrastructure \
            --format github-annotations \
            --fail-on critical

      - name: tfsec
        uses: aquasecurity/tfsec-action@v1
        with:
          soft_fail: true

      - name: checkov
        uses: bridgecrewio/checkov-action@v12
        with:
          directory: infrastructure/
          framework: terraform
          soft_fail: false
          check: CKV_AWS_*

OPA/Conftest Custom Policies

# policies/no-public-s3.rego
package main

deny[msg] {
  resource := input.resource.aws_s3_bucket[name]
  resource.acl == "public-read"
  msg := sprintf("S3 Bucket '%s' has public-read ACL – forbidden", [name])
}

deny[msg] {
  resource := input.resource.aws_security_group[name]
  rule := resource.ingress[_]
  rule.cidr_blocks[_] == "0.0.0.0/0"
  rule.from_port == 0
  rule.to_port == 65535
  msg := sprintf("Security Group '%s' has open ingress rule – forbidden", [name])
}

Related Controls

WAF-SEC-090 – Policy-as-Code & Compliance Automation

When to Apply?

In every infrastructure repository with IaC (Terraform, OpenTofu, Pulumi)
Ideally from the first commit (greenfield)
For brownfield: introduce incrementally, first in warning mode, then blocking