WAF++ WAF++
Back to WAF++ Homepage

Security Design Patterns

The following six design patterns are proven solution architectures for common security challenges in cloud environments. Each pattern is directly linked to WAF-SEC controls.


Pattern 1: Hub-and-Spoke Network

Central firewall and control, isolated VPCs for individual workloads.

Problem

In multi-account or multi-VPC environments without a central network architecture:

  • Inconsistent security group configurations in every VPC

  • Uncontrolled east-west traffic between workloads

  • No central visibility into network flows

  • Difficult enforcement of egress control

Solution

Hub VPC (Shared Services / Transit)
├── AWS Network Firewall (or Palo Alto / Fortinet)
├── NAT Gateway (central internet egress)
├── VPN Gateway / Direct Connect
├── DNS (Route 53 Resolver)
└── Transit Gateway

    Spoke VPC 1 (Production)         Spoke VPC 2 (Staging)
    ├── App Tier (private)           ├── App Tier (private)
    └── Data Tier (private)          └── Data Tier (private)

Traffic flow:
  Spoke VPC → Transit Gateway → Hub VPC (Firewall) → Internet
  Spoke VPC ← Transit Gateway ← Hub VPC (Firewall) ← Internet

Implementation (AWS)

  • Transit Gateway: Connects all VPCs to the hub

  • AWS Network Firewall: In the hub for centralized L7 filtering

  • VPC Endpoints: In every spoke VPC for AWS services (no internet route needed)

  • RAM (Resource Access Manager): For shared subnets between accounts

  • WAF-SEC-050 – Network Segmentation & Security Group Hardening

When to Apply?

  • Organizations with multiple AWS accounts (landing zone)

  • Workloads with different compliance requirements

  • When central egress control and firewall policy are required


Pattern 2: Just-in-Time (JIT) Privileged Access

No permanent admin rights. Elevated permissions are granted only for the duration of a specific task.

Problem

Permanent admin roles are a permanent attack risk:

  • Compromised admin credentials allow complete account takeover

  • Administrators accidentally perform destructive actions

  • Access review is difficult: "Does this person really need admin access?"

Solution

Normal state:
  User → IAM role with minimal read permissions (ReadOnly)

JIT request:
  User → JIT system → Request (justification, time period, scope)
                    → Approval by second person
                    → Temporary role elevation (1-4 hours)
                    → Automatic revocation after expiry
                    → Audit log of the entire process

Technical implementation:
  ├── AWS IAM Identity Center with Permission Set Elevation
  ├── HashiCorp Boundary + Vault for JIT credentials
  ├── Custom Lambda + SNS for approval workflow
  └── CyberArk / BeyondTrust (enterprise solution)

Implementation (AWS)

# JIT role: Time-limited admin permissions
resource "aws_iam_role" "jit_admin" {
  name = "jit-admin-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { AWS = "arn:aws:iam::${var.account_id}:role/jit-approval-system" }
      Action    = "sts:AssumeRole"
      Condition = {
        # Session may last at most 4 hours
        NumericLessThanEquals = {
          "sts:DurationSeconds" = "14400"
        }
      }
    }]
  })
}

# CloudWatch alarm: Alert on JIT admin usage
resource "aws_cloudwatch_metric_alarm" "jit_admin_usage" {
  alarm_name = "jit-admin-role-assumed"
  # ... CloudTrail AssumeRole event for jit-admin-role
}

When to Apply?

  • Whenever privileged cloud accesses exist for administrators

  • Especially in regulated environments (ISO 27001, SOC 2, GDPR)

  • As a replacement for permanent admin roles


Pattern 3: Secrets Injection Pattern

Secrets are loaded at runtime from a secrets store – never baked into images or passed as ENV variables.

Problem

Secrets end up in insecure places in many ways:

  • Directly in Terraform code (then in the state file)

  • As environment variable in task definitions (visible in plaintext in CloudWatch logs)

  • Baked into Docker images (visible in image layer)

  • Committed to Git repositories (in history forever)

Solution

Container start time:
  ECS Task → IAM Task Role (AssumeRole via IRSA/Task Policy)
           → Secrets Manager API (GetSecretValue)
           → Secret is injected as ENV variable INTO the container
           → Secret never leaves the AWS network (VPC endpoint)

OR: Sidecar pattern (Vault Agent):
  Vault Agent Sidecar → Vault Server (AppRole/Kubernetes Auth)
                      → Reads secret, writes to shared volume
                      → App container reads secret from file

Implementation (AWS ECS)

# Secrets Manager Secret
resource "aws_secretsmanager_secret" "db_credentials" {
  name                    = "prod/myapp/db-credentials"
  recovery_window_in_days = 30
  kms_key_id              = aws_kms_key.secrets.arn  # CMK for secrets
}

# ECS Task Definition: Secret as container secret (not environment variable!)
resource "aws_ecs_task_definition" "app" {
  family                   = "myapp"
  task_role_arn            = aws_iam_role.app_task.arn
  execution_role_arn       = aws_iam_role.ecs_execution.arn

  container_definitions = jsonencode([{
    name  = "app"
    image = "${var.ecr_repository}:${var.image_tag}"

    # Correct: secrets from Secrets Manager
    secrets = [
      {
        name      = "DB_PASSWORD"
        valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:password::"
      },
      {
        name      = "DB_USERNAME"
        valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:username::"
      }
    ]
    # Not: environment = [{ name = "DB_PASSWORD", value = var.db_password }]
  }])
}

When to Apply?

  • For every application that needs database passwords, API keys or certificates

  • In CI/CD pipelines (OIDC instead of static access keys)

  • In all container-based workloads


Pattern 4: Immutable Infrastructure

Infrastructure is replaced by new versions – not repaired by manual changes. No SSH in production.

Problem

Mutable servers (directly configured) lead to:

  • Configuration drift: what is in the IaC code no longer matches the live system

  • Security gaps from forgotten manually applied hotfixes

  • Difficult forensics: what was changed before the incident?

  • SSH as a permanent attack surface

Solution

Change cycle with Immutable Infrastructure:

Old way (mutable):
  Developer → SSH access → Configuration change in live system

New way (immutable):
  Developer → Code change in repository
             → CI/CD pipeline (Terraform plan + apply)
             → New AMI baked (if EC2)
             → Blue/green deployment (swap to new AMI)
             → Old AMI decommissioned

If debugging needed:
  Analyze CloudWatch Logs (no SSH session)
  AWS Systems Manager Session Manager (if absolutely necessary, with full audit log)

Implementation

# EC2: No SSH key, SSM instead of SSH for debugging
resource "aws_instance" "app" {
  ami                  = var.app_ami  # Patched, scanned AMI
  instance_type        = "t3.medium"
  iam_instance_profile = aws_iam_instance_profile.ssm_access.name

  # No key_name = ... (no SSH key)

  # SSM Agent is pre-installed in modern AMIs
  # Access via: aws ssm start-session --target instance-id

  metadata_options {
    http_tokens = "required"  # Enforce IMDSv2
  }
}

# Security group: no SSH port 22 open
resource "aws_security_group" "app" {
  # No ingress rule on port 22
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

When to Apply?

  • For all EC2-based production workloads

  • For ECS/EKS nodes (always)

  • For all environments where compliance requirements apply


Pattern 5: Security Event Pipeline

CloudTrail → CloudWatch → SIEM → Alerting: complete, tamper-proof security event pipeline.

Problem

Without a structured security event pipeline:

  • Events are lost or cannot be evaluated

  • Attacks are detected too late or not at all

  • Forensics after an incident is impossible

  • Compliance evidence is missing

Solution

Security Event Pipeline:

AWS CloudTrail (API audit logs)
  ├── → S3 (tamper-proof, log file validation, Object Lock)
  └── → CloudWatch Logs
          ├── Metric Filter → CloudWatch Alarm → SNS → PagerDuty/Slack
          └── → Log Group (Retention: 365 days)

AWS GuardDuty (threat intelligence)
  └── → Security Hub (aggregates all findings)
          ├── → EventBridge (automatic response)
          └── → SIEM (Splunk / Elastic / Datadog)

VPC Flow Logs
  └── → S3 or CloudWatch Logs

AWS Config (configuration changes)
  └── → Security Hub findings

Alerting:
  Critical (immediate): Root login, GuardDuty High Severity → PagerDuty (24/7)
  High (< 1h): IAM policy change, Security Group 0.0.0.0/0 → Slack #security-alerts
  Medium (< 4h): Failed login attempts, unusual API calls → ticketing system

Implementation (Excerpt)

# CloudWatch Metric Filter: Root account login
resource "aws_cloudwatch_log_metric_filter" "root_login" {
  name           = "root-account-login"
  log_group_name = aws_cloudwatch_log_group.cloudtrail.name
  pattern        = "{ $.userIdentity.type = \"Root\" && $.eventName = \"ConsoleLogin\" }"

  metric_transformation {
    name      = "RootLoginCount"
    namespace = "SecurityMetrics"
    value     = "1"
  }
}

resource "aws_cloudwatch_metric_alarm" "root_login_alarm" {
  alarm_name          = "root-account-login-detected"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods  = "1"
  metric_name         = "RootLoginCount"
  namespace           = "SecurityMetrics"
  period              = "60"
  statistic           = "Sum"
  threshold           = "1"
  alarm_actions       = [aws_sns_topic.security_critical.arn]
}
  • WAF-SEC-080 – Security Monitoring & Threat Detection

When to Apply?

  • In all production environments (no exceptions)

  • Ideally also in staging/dev for early detection


Pattern 6: Policy-as-Code Gateway

OPA or wafpass in CI/CD as quality gate – security violations are blocked before the merge.

Problem

Without automated security gate in CI/CD:

  • Security vulnerabilities enter production unnoticed

  • Security reviews are time-consuming and often superficial

  • Inconsistent enforcement of security standards across teams

  • Compliance evidence is missing (no audit trail of PR checks)

Solution

Pull Request → CI/CD Pipeline

Pipeline steps:
  1. terraform fmt --check (formatting)
  2. terraform validate (syntax)
  3. wafpass check --pillar security (WAF++ controls)
        ├── Critical findings → Pipeline FAIL (PR is blocked)
        └── Medium findings → Warning (PR passes but with comment)
  4. tfsec / checkov (additional IaC security scanners)
  5. terraform plan (what would change?)
  6. OPA / Conftest (custom policies)
  7. Manual review (for security-relevant resources)
  8. terraform apply (after merge, automatic)

Implementation (GitHub Actions)

# .github/workflows/security-check.yml
name: Security Gate

on: [pull_request]

jobs:
  wafpass:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
      pull-requests: write

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS Credentials (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_GITHUB_ACTIONS_ROLE }}
          aws-region: eu-central-1

      - name: WAF++ Security Check
        run: |
          wafpass check \
            --pillar security \
            --path ./infrastructure \
            --format github-annotations \
            --fail-on critical

      - name: tfsec
        uses: aquasecurity/tfsec-action@v1
        with:
          soft_fail: true

      - name: checkov
        uses: bridgecrewio/checkov-action@v12
        with:
          directory: infrastructure/
          framework: terraform
          soft_fail: false
          check: CKV_AWS_*

OPA/Conftest Custom Policies

# policies/no-public-s3.rego
package main

deny[msg] {
  resource := input.resource.aws_s3_bucket[name]
  resource.acl == "public-read"
  msg := sprintf("S3 Bucket '%s' has public-read ACL – forbidden", [name])
}

deny[msg] {
  resource := input.resource.aws_security_group[name]
  rule := resource.ingress[_]
  rule.cidr_blocks[_] == "0.0.0.0/0"
  rule.from_port == 0
  rule.to_port == 65535
  msg := sprintf("Security Group '%s' has open ingress rule – forbidden", [name])
}
  • WAF-SEC-090 – Policy-as-Code & Compliance Automation

When to Apply?

  • In every infrastructure repository with IaC (Terraform, OpenTofu, Pulumi)

  • Ideally from the first commit (greenfield)

  • For brownfield: introduce incrementally, first in warning mode, then blocking