Technical Design Principles (Security)

Technical design principles translate the abstract Security Principles into concrete architecture decisions. This page is aimed at platform engineers and architects who design secure cloud infrastructure.

Network Design

Private by Default

Every new resource is deployed in a private subnet. Public subnets are exceptions that must be explicitly justified and documented.

VPC (10.0.0.0/16)
├── Public Subnets (10.0.0.0/24, 10.0.1.0/24)
│   └── Only: Load Balancer, NAT Gateway, Bastion (if at all)
├── Private Subnets – Application (10.0.10.0/24, 10.0.11.0/24)
│   └── ECS/EKS Worker Nodes, EC2 Application Instances
├── Private Subnets – Data (10.0.20.0/24, 10.0.21.0/24)
│   └── RDS, ElastiCache, OpenSearch
└── Private Subnets – Management (10.0.30.0/24, 10.0.31.0/24)
    └── Bastion Host (if needed), Admin tools

Consequence: Application servers have no direct internet connectivity. Outbound traffic runs via NAT Gateway (or better: VPC endpoints).

Security Groups: Application-First, Minimal Scope

Security groups are structured according to application logic, not server type:

sg-alb-public – Only port 443 from 0.0.0.0/0 inbound
sg-app-tier – Only from sg-alb-public on app port inbound
sg-db-tier – Only from sg-app-tier on DB port inbound
sg-management – Only from VPN/bastion on SSH (22) or SSM port

Blanket rules like 0.0.0.0/0 in security groups are forbidden. Egress rules are specified, not left open.

VPC Endpoints Instead of Public Routes

AWS services (S3, KMS, Secrets Manager, SQS, etc.) are reached via VPC Interface Endpoints or Gateway Endpoints – not over the public internet:

Less egress traffic → lower costs
No internet gateway needed for AWS service calls
Traffic never leaves the AWS network
Enables restrictive egress policies

IAM Design

Roles Over Users

IAM users with long-lived access keys are forbidden in production.

Instead:

EC2/ECS/Lambda: Instance Profile / Task Role / Execution Role
CI/CD: OIDC Federation (no static access key in GitHub Secrets)
Developers: AWS SSO / IAM Identity Center with temporary credentials via aws sso login
Cross-account: AssumeRole via STS

# Correct: OIDC-based CI/CD role
resource "aws_iam_role" "github_actions" {
  name = "github-actions-deploy"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        Federated = aws_iam_openid_connect_provider.github.arn
      }
      Action = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = {
          "token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
        }
        StringLike = {
          "token.actions.githubusercontent.com:sub" = "repo:myorg/myrepo:*"
        }
      }
    }]
  })
}

STS-Based Access and Credential Lifetime

All temporary credentials via AWS STS have a defined lifetime:

Normal developer sessions: 8 hours (working day)
CI/CD sessions: 1-2 hours (duration of a pipeline run)
Just-in-time admin access: 1 hour (with post-use review)

Long-lived credentials (> 24 hours) are forbidden for automated systems. Exception: external systems without OIDC support (with rotation and monitoring).

Permission Boundaries and SCPs

Permission Boundaries: Limit the maximum permissions a role can have, even if it is modified by anyone
Service Control Policies (SCPs): Organization-wide guardrails that cannot be bypassed by local IAM policies

SCPs are suitable for:

Forbidding actions in unauthorized regions
Forbidding deactivation of CloudTrail and GuardDuty
Enforcing MFA for sensitive actions
Forbidding creation of IAM users with access keys

Encryption Design

CMK for Sensitive Data – Always

For all data of the classes pii, financial, health and restricted: Customer Managed Keys (CMK) via KMS, not provider-managed keys (SSE-S3/AES256).

Rationale: CMK enables:

Independent key rotation (automatic, annually)
Key deletion as a data erasure method (cryptographic erasure)
Granular key policy (who may use the key?)
Audit trail of all key usages in CloudTrail

KMS Key Hierarchy

KMS Customer Master Key (CMK)
└── Encrypts: Data Encryption Keys (DEK)
    └── DEK encrypts: actual data (envelope encryption)

One separate CMK per data class:
  ├── cmk-pii (PII data: RDS, S3 customer data)
  ├── cmk-logs (audit logs, CloudTrail)
  ├── cmk-secrets (Secrets Manager keys)
  └── cmk-backup (backup vault)

Rotation: All CMKs rotate automatically annually (enable_key_rotation = true). Rotation changes the key material, not the key ID – existing data remains decryptable.

No Encryption Without Key Ownership Documentation

For every encrypted data store it must be documented:

Which CMK is used?
Who has access to the key?
What happens in case of key loss (recovery procedure)?

Secret Management Design

No Secrets in Environment Variables

Environment variables are often visible in process memory, container inspect output and log output. They are not a secure storage location for secrets.

Correct alternatives:

AWS Secrets Manager: For rotatable secrets (DB passwords, API keys)
Parameter Store SecureString: For configuration requiring confidentiality
HashiCorp Vault: For dynamic secrets, multi-cloud, fine-grained policies

# Correct: Fetch secret at runtime from Secrets Manager
data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = aws_secretsmanager_secret.db.id
}

resource "aws_ecs_task_definition" "app" {
  # ...
  container_definitions = jsonencode([{
    secrets = [{
      name      = "DB_PASSWORD"
      valueFrom = aws_secretsmanager_secret.db.arn
    }]
    # Not: environment = [{ name = "DB_PASSWORD", value = var.db_password }]
  }])
}

Secret Rotation

All secrets are rotated automatically:

Database passwords: automatic rotation via Secrets Manager Lambda (every 30 days)
API keys: manual rotation with documented process and reminder alarm (max. 90 days)
TLS certificates: ACM managed renewal (automatic)

Logging Design

Centralized and Tamper-Proof

All security-relevant logs are forwarded to a central, immutable log store:

CloudTrail → S3 with S3 Object Lock (WORM) and log file validation
CloudTrail → CloudWatch Logs (for alerting)
VPC Flow Logs → S3 or CloudWatch (for network forensics)
Application logs → central log aggregation system

Tamper Protection: CloudTrail log file validation (SHA-256 signature) detects subsequent manipulation of log files. S3 Object Lock prevents deletion.

Retention by Data Class

Log Type	Minimum Retention	Regulatory Requirement
CloudTrail (API audit logs)	365 days (hot), 7 years (cold/glacier)	SOC 2, ISO 27001, GDPR
VPC Flow Logs	90 days	Network forensics
Application logs	30 days (non-PII), 365 days (PII context)	GDPR Art. 5(1)(e)
IAM access logs	365 days	ISO 27001 A.9.2

Log Type

Minimum Retention

Regulatory Requirement

CloudTrail (API audit logs)

365 days (hot), 7 years (cold/glacier)

SOC 2, ISO 27001, GDPR

VPC Flow Logs

90 days

Network forensics

Application logs

30 days (non-PII), 365 days (PII context)

GDPR Art. 5(1)(e)

IAM access logs

365 days

ISO 27001 A.9.2

No Logging of Sensitive Data

Application logging must avoid sensitive data categories:

Credit card numbers, PINs, passwords must not appear in logs
Use log scrubbing libraries in applications
Configure CloudWatch Logs Data Masking for structured logs

Container Design

Distroless / Minimal Base Images

Container images should only contain what the application needs:

Distroless Images (google/distroless): No shell, no package manager, no OS overhead
Alpine-based Images: Smaller attack surface than Debian/Ubuntu
Multi-stage builds: Build tools not in the final image

# Multi-stage build: Build tools not in production image
FROM golang:1.22 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o server .

FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/server /server
USER nonroot:nonroot
ENTRYPOINT ["/server"]

Non-Root, Read-Only Filesystem

Container processes run as non-privileged user (runAsNonRoot: true)
Root filesystem is read-only (readOnlyRootFilesystem: true)
Capabilities are reduced to the minimum (drop: ["ALL"])
Privileged containers are forbidden (allowPrivilegeEscalation: false)

# Kubernetes Security Context
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL

Image Scanning and Signing

Before push to the registry: automatic vulnerability scanning (Trivy, Grype, ECR scanning)
Critical CVEs (CVSS >= 9.0) block the build
Image signing with Sigstore/Cosign for provenance
Admission controller (OPA Gatekeeper, Kyverno) in Kubernetes blocks unsigned images