Security Design Patterns
The following six design patterns are proven solution architectures for common security challenges in cloud environments. Each pattern is directly linked to WAF-SEC controls.
Pattern 1: Hub-and-Spoke Network
Central firewall and control, isolated VPCs for individual workloads.
Problem
In multi-account or multi-VPC environments without a central network architecture:
-
Inconsistent security group configurations in every VPC
-
Uncontrolled east-west traffic between workloads
-
No central visibility into network flows
-
Difficult enforcement of egress control
Solution
Hub VPC (Shared Services / Transit)
├── AWS Network Firewall (or Palo Alto / Fortinet)
├── NAT Gateway (central internet egress)
├── VPN Gateway / Direct Connect
├── DNS (Route 53 Resolver)
└── Transit Gateway
Spoke VPC 1 (Production) Spoke VPC 2 (Staging)
├── App Tier (private) ├── App Tier (private)
└── Data Tier (private) └── Data Tier (private)
Traffic flow:
Spoke VPC → Transit Gateway → Hub VPC (Firewall) → Internet
Spoke VPC ← Transit Gateway ← Hub VPC (Firewall) ← Internet
Implementation (AWS)
-
Transit Gateway: Connects all VPCs to the hub
-
AWS Network Firewall: In the hub for centralized L7 filtering
-
VPC Endpoints: In every spoke VPC for AWS services (no internet route needed)
-
RAM (Resource Access Manager): For shared subnets between accounts
Related Controls
-
WAF-SEC-050 – Network Segmentation & Security Group Hardening
Pattern 2: Just-in-Time (JIT) Privileged Access
No permanent admin rights. Elevated permissions are granted only for the duration of a specific task.
Problem
Permanent admin roles are a permanent attack risk:
-
Compromised admin credentials allow complete account takeover
-
Administrators accidentally perform destructive actions
-
Access review is difficult: "Does this person really need admin access?"
Solution
Normal state:
User → IAM role with minimal read permissions (ReadOnly)
JIT request:
User → JIT system → Request (justification, time period, scope)
→ Approval by second person
→ Temporary role elevation (1-4 hours)
→ Automatic revocation after expiry
→ Audit log of the entire process
Technical implementation:
├── AWS IAM Identity Center with Permission Set Elevation
├── HashiCorp Boundary + Vault for JIT credentials
├── Custom Lambda + SNS for approval workflow
└── CyberArk / BeyondTrust (enterprise solution)
Implementation (AWS)
# JIT role: Time-limited admin permissions
resource "aws_iam_role" "jit_admin" {
name = "jit-admin-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = { AWS = "arn:aws:iam::${var.account_id}:role/jit-approval-system" }
Action = "sts:AssumeRole"
Condition = {
# Session may last at most 4 hours
NumericLessThanEquals = {
"sts:DurationSeconds" = "14400"
}
}
}]
})
}
# CloudWatch alarm: Alert on JIT admin usage
resource "aws_cloudwatch_metric_alarm" "jit_admin_usage" {
alarm_name = "jit-admin-role-assumed"
# ... CloudTrail AssumeRole event for jit-admin-role
}
Related Controls
-
WAF-SEC-010 – IAM Baseline
-
WAF-SEC-020 – Least Privilege
Pattern 3: Secrets Injection Pattern
Secrets are loaded at runtime from a secrets store – never baked into images or passed as ENV variables.
Problem
Secrets end up in insecure places in many ways:
-
Directly in Terraform code (then in the state file)
-
As environment variable in task definitions (visible in plaintext in CloudWatch logs)
-
Baked into Docker images (visible in image layer)
-
Committed to Git repositories (in history forever)
Solution
Container start time:
ECS Task → IAM Task Role (AssumeRole via IRSA/Task Policy)
→ Secrets Manager API (GetSecretValue)
→ Secret is injected as ENV variable INTO the container
→ Secret never leaves the AWS network (VPC endpoint)
OR: Sidecar pattern (Vault Agent):
Vault Agent Sidecar → Vault Server (AppRole/Kubernetes Auth)
→ Reads secret, writes to shared volume
→ App container reads secret from file
Implementation (AWS ECS)
# Secrets Manager Secret
resource "aws_secretsmanager_secret" "db_credentials" {
name = "prod/myapp/db-credentials"
recovery_window_in_days = 30
kms_key_id = aws_kms_key.secrets.arn # CMK for secrets
}
# ECS Task Definition: Secret as container secret (not environment variable!)
resource "aws_ecs_task_definition" "app" {
family = "myapp"
task_role_arn = aws_iam_role.app_task.arn
execution_role_arn = aws_iam_role.ecs_execution.arn
container_definitions = jsonencode([{
name = "app"
image = "${var.ecr_repository}:${var.image_tag}"
# Correct: secrets from Secrets Manager
secrets = [
{
name = "DB_PASSWORD"
valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:password::"
},
{
name = "DB_USERNAME"
valueFrom = "${aws_secretsmanager_secret.db_credentials.arn}:username::"
}
]
# Not: environment = [{ name = "DB_PASSWORD", value = var.db_password }]
}])
}
Related Controls
-
WAF-SEC-060 – Secrets Management
Pattern 4: Immutable Infrastructure
Infrastructure is replaced by new versions – not repaired by manual changes. No SSH in production.
Problem
Mutable servers (directly configured) lead to:
-
Configuration drift: what is in the IaC code no longer matches the live system
-
Security gaps from forgotten manually applied hotfixes
-
Difficult forensics: what was changed before the incident?
-
SSH as a permanent attack surface
Solution
Change cycle with Immutable Infrastructure:
Old way (mutable):
Developer → SSH access → Configuration change in live system
New way (immutable):
Developer → Code change in repository
→ CI/CD pipeline (Terraform plan + apply)
→ New AMI baked (if EC2)
→ Blue/green deployment (swap to new AMI)
→ Old AMI decommissioned
If debugging needed:
Analyze CloudWatch Logs (no SSH session)
AWS Systems Manager Session Manager (if absolutely necessary, with full audit log)
Implementation
# EC2: No SSH key, SSM instead of SSH for debugging
resource "aws_instance" "app" {
ami = var.app_ami # Patched, scanned AMI
instance_type = "t3.medium"
iam_instance_profile = aws_iam_instance_profile.ssm_access.name
# No key_name = ... (no SSH key)
# SSM Agent is pre-installed in modern AMIs
# Access via: aws ssm start-session --target instance-id
metadata_options {
http_tokens = "required" # Enforce IMDSv2
}
}
# Security group: no SSH port 22 open
resource "aws_security_group" "app" {
# No ingress rule on port 22
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
}
Related Controls
-
WAF-SEC-010 – IAM Baseline (SSM instead of SSH)
-
WAF-SEC-050 – Network Segmentation (no SSH port)
Pattern 5: Security Event Pipeline
CloudTrail → CloudWatch → SIEM → Alerting: complete, tamper-proof security event pipeline.
Problem
Without a structured security event pipeline:
-
Events are lost or cannot be evaluated
-
Attacks are detected too late or not at all
-
Forensics after an incident is impossible
-
Compliance evidence is missing
Solution
Security Event Pipeline:
AWS CloudTrail (API audit logs)
├── → S3 (tamper-proof, log file validation, Object Lock)
└── → CloudWatch Logs
├── Metric Filter → CloudWatch Alarm → SNS → PagerDuty/Slack
└── → Log Group (Retention: 365 days)
AWS GuardDuty (threat intelligence)
└── → Security Hub (aggregates all findings)
├── → EventBridge (automatic response)
└── → SIEM (Splunk / Elastic / Datadog)
VPC Flow Logs
└── → S3 or CloudWatch Logs
AWS Config (configuration changes)
└── → Security Hub findings
Alerting:
Critical (immediate): Root login, GuardDuty High Severity → PagerDuty (24/7)
High (< 1h): IAM policy change, Security Group 0.0.0.0/0 → Slack #security-alerts
Medium (< 4h): Failed login attempts, unusual API calls → ticketing system
Implementation (Excerpt)
# CloudWatch Metric Filter: Root account login
resource "aws_cloudwatch_log_metric_filter" "root_login" {
name = "root-account-login"
log_group_name = aws_cloudwatch_log_group.cloudtrail.name
pattern = "{ $.userIdentity.type = \"Root\" && $.eventName = \"ConsoleLogin\" }"
metric_transformation {
name = "RootLoginCount"
namespace = "SecurityMetrics"
value = "1"
}
}
resource "aws_cloudwatch_metric_alarm" "root_login_alarm" {
alarm_name = "root-account-login-detected"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"
metric_name = "RootLoginCount"
namespace = "SecurityMetrics"
period = "60"
statistic = "Sum"
threshold = "1"
alarm_actions = [aws_sns_topic.security_critical.arn]
}
Related Controls
-
WAF-SEC-080 – Security Monitoring & Threat Detection
Pattern 6: Policy-as-Code Gateway
OPA or wafpass in CI/CD as quality gate – security violations are blocked before the merge.
Problem
Without automated security gate in CI/CD:
-
Security vulnerabilities enter production unnoticed
-
Security reviews are time-consuming and often superficial
-
Inconsistent enforcement of security standards across teams
-
Compliance evidence is missing (no audit trail of PR checks)
Solution
Pull Request → CI/CD Pipeline
Pipeline steps:
1. terraform fmt --check (formatting)
2. terraform validate (syntax)
3. wafpass check --pillar security (WAF++ controls)
├── Critical findings → Pipeline FAIL (PR is blocked)
└── Medium findings → Warning (PR passes but with comment)
4. tfsec / checkov (additional IaC security scanners)
5. terraform plan (what would change?)
6. OPA / Conftest (custom policies)
7. Manual review (for security-relevant resources)
8. terraform apply (after merge, automatic)
Implementation (GitHub Actions)
# .github/workflows/security-check.yml
name: Security Gate
on: [pull_request]
jobs:
wafpass:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- name: Configure AWS Credentials (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_GITHUB_ACTIONS_ROLE }}
aws-region: eu-central-1
- name: WAF++ Security Check
run: |
wafpass check \
--pillar security \
--path ./infrastructure \
--format github-annotations \
--fail-on critical
- name: tfsec
uses: aquasecurity/tfsec-action@v1
with:
soft_fail: true
- name: checkov
uses: bridgecrewio/checkov-action@v12
with:
directory: infrastructure/
framework: terraform
soft_fail: false
check: CKV_AWS_*
OPA/Conftest Custom Policies
# policies/no-public-s3.rego
package main
deny[msg] {
resource := input.resource.aws_s3_bucket[name]
resource.acl == "public-read"
msg := sprintf("S3 Bucket '%s' has public-read ACL – forbidden", [name])
}
deny[msg] {
resource := input.resource.aws_security_group[name]
rule := resource.ingress[_]
rule.cidr_blocks[_] == "0.0.0.0/0"
rule.from_port == 0
rule.to_port == 65535
msg := sprintf("Security Group '%s' has open ingress rule – forbidden", [name])
}
Related Controls
-
WAF-SEC-090 – Policy-as-Code & Compliance Automation