Best Practice: Optimize Network Performance

Context

Network latency is cumulative: in a microservices architecture with 10 internal service calls, cross-AZ hops of 1–2ms each can already accumulate 20ms of unnecessary latency. The absence of a CDN means that static assets are served from origin – hundreds of milliseconds of additional latency for users outside the deployment region.

Typical network performance problems:

No VPC endpoints: S3, DynamoDB, SSM traffic routes over the internet
No CDN: all JS/CSS/images are loaded from origin (high latency + egress costs)
Cross-AZ traffic in critical paths without awareness of latency overhead
DNS resolution on every request without DNS caching (typically 10–50ms per resolution)

Related Controls

WAF-PERF-070 – Network Latency & Topology Optimization

Target State

Latency-optimized network design:

VPC Endpoints: All cloud service APIs reachable over the private backbone
CDN active: >= 95% cache hit rate for static assets
AZ-aware: Frequently communicating services preferred in the same AZ
Measured: Network latency baseline documented per service pair

Technical Implementation

Step 1: VPC Endpoints for AWS Services

# Gateway Endpoints (free; for S3 and DynamoDB)
resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = concat(
    [aws_route_table.private_a.id],
    [aws_route_table.private_b.id],
    [aws_route_table.private_c.id]
  )
  tags = { Name = "vpce-s3" }
}

resource "aws_vpc_endpoint" "dynamodb" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.dynamodb"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [aws_route_table.private_a.id, aws_route_table.private_b.id]
}

# Interface Endpoints (paid; for other services)
locals {
  interface_endpoints = {
    "ecr.api"   = "com.amazonaws.${var.region}.ecr.api"
    "ecr.dkr"   = "com.amazonaws.${var.region}.ecr.dkr"
    "ssm"       = "com.amazonaws.${var.region}.ssm"
    "ssmmessages" = "com.amazonaws.${var.region}.ssmmessages"
    "secretsmanager" = "com.amazonaws.${var.region}.secretsmanager"
    "monitoring" = "com.amazonaws.${var.region}.monitoring"
    "logs"      = "com.amazonaws.${var.region}.logs"
  }
}

resource "aws_vpc_endpoint" "interface" {
  for_each = local.interface_endpoints

  vpc_id              = aws_vpc.main.id
  service_name        = each.value
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = { Name = "vpce-${each.key}" }
}

resource "aws_security_group" "vpc_endpoints" {
  name        = "sg-vpc-endpoints"
  description = "Allow HTTPS from private subnets to VPC Endpoints"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [aws_vpc.main.cidr_block]
  }
}

Step 2: Configure CloudFront CDN

resource "aws_cloudfront_origin_access_control" "s3" {
  name                              = "oac-payment-frontend"
  origin_access_control_origin_type = "s3"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}

resource "aws_cloudfront_distribution" "main" {
  enabled             = true
  is_ipv6_enabled     = true
  default_root_object = "index.html"
  price_class         = "PriceClass_100"  # USA + Europe

  # S3 origin for static assets
  origin {
    domain_name              = aws_s3_bucket.frontend.bucket_regional_domain_name
    origin_id                = "s3-frontend"
    origin_access_control_id = aws_cloudfront_origin_access_control.s3.id
  }

  # API origin for backend requests
  origin {
    domain_name = aws_lb.api.dns_name
    origin_id   = "alb-api"
    custom_origin_config {
      http_port              = 80
      https_port             = 443
      origin_protocol_policy = "https-only"
      origin_ssl_protocols   = ["TLSv1.2"]
    }
  }

  # Static assets: long-term caching
  ordered_cache_behavior {
    path_pattern           = "/static/*"
    target_origin_id       = "s3-frontend"
    viewer_protocol_policy = "redirect-to-https"
    allowed_methods        = ["GET", "HEAD"]
    cached_methods         = ["GET", "HEAD"]
    compress               = true

    cache_policy_id = aws_cloudfront_cache_policy.static.id
  }

  # API: pass-through with short cache for GET requests
  ordered_cache_behavior {
    path_pattern           = "/api/*"
    target_origin_id       = "alb-api"
    viewer_protocol_policy = "https-only"
    allowed_methods        = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
    cached_methods         = ["GET", "HEAD"]
    compress               = true

    cache_policy_id          = aws_cloudfront_cache_policy.api.id
    origin_request_policy_id = aws_cloudfront_origin_request_policy.api.id
  }

  # Default: SPA routing
  default_cache_behavior {
    target_origin_id       = "s3-frontend"
    viewer_protocol_policy = "redirect-to-https"
    allowed_methods        = ["GET", "HEAD"]
    cached_methods         = ["GET", "HEAD"]
    compress               = true
    cache_policy_id        = aws_cloudfront_cache_policy.html.id
  }

  restrictions {
    geo_restriction { restriction_type = "none" }
  }

  viewer_certificate {
    acm_certificate_arn      = var.certificate_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.2_2021"
  }
}

resource "aws_cloudfront_cache_policy" "static" {
  name        = "cache-static-assets"
  min_ttl     = 86400
  default_ttl = 31536000
  max_ttl     = 31536000
  parameters_in_cache_key_and_forwarded_to_origin {
    cookies_config  { cookie_behavior = "none" }
    headers_config  { header_behavior = "none" }
    query_strings_config { query_string_behavior = "none" }
    enable_accept_encoding_gzip   = true
    enable_accept_encoding_brotli = true
  }
}

resource "aws_cloudfront_cache_policy" "html" {
  name        = "cache-html"
  min_ttl     = 0
  default_ttl = 60      # 1 minute for HTML
  max_ttl     = 300
  parameters_in_cache_key_and_forwarded_to_origin {
    cookies_config  { cookie_behavior = "none" }
    headers_config  { header_behavior = "none" }
    query_strings_config { query_string_behavior = "none" }
    enable_accept_encoding_gzip = true
  }
}

Step 3: AZ Affinity for Kubernetes

# Prefer same AZ for service-to-service communication
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-processor
spec:
  template:
    spec:
      topologySpreadConstraints:
        # Even distribution across AZs for HA
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: payment-processor
      affinity:
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            # Prefer placement near the payment-api (consumer)
            - weight: 80
              podAffinityTerm:
                topologyKey: topology.kubernetes.io/zone
                labelSelector:
                  matchLabels:
                    app: payment-api
---
# Service with TopologyAwareRouting: prefer local endpoints
apiVersion: v1
kind: Service
metadata:
  name: payment-processor
  annotations:
    service.kubernetes.io/topology-mode: "Auto"  # Prefer local AZ
spec:
  selector:
    app: payment-processor
  ports:
    - port: 8080

Step 4: Measure Network Latency Baseline

#!/bin/bash
# scripts/network-baseline.sh
# Measures RTT between services for latency baseline

SERVICES=(
  "payment-api:8080"
  "payment-processor:8080"
  "db-payment.internal:5432"
  "cache-payment.internal:6379"
)

echo "Service Pair RTT Baseline – $(date)"
echo "======================================="

for SERVICE in "${SERVICES[@]}"; do
  HOST=$(echo $SERVICE | cut -d: -f1)
  PORT=$(echo $SERVICE | cut -d: -f2)

  # Measure 10 TCP connections and median RTT
  RTT=$(for i in {1..10}; do
    (time (echo "" | nc -w1 $HOST $PORT)) 2>&1 | grep real | awk '{print $2}'
  done | sort -n | awk 'NR==5{print}')  # P50

  echo "$HOST:$PORT → P50 RTT: $RTT"
done

Common Anti-Patterns

No VPC Endpoint for S3: S3 traffic leaves the private network and routes over the internet → latency + egress costs.
CDN only for staging, not production: "CDN is too complex, we’ll do that later" – and later never comes.
Service mesh without AZ awareness: All services routed via round-robin, regardless of AZ → unnecessary cross-AZ latency.
DNS lookups without caching: Every Kubernetes pod makes DNS lookups without sufficient caching.

Metrics

CDN cache hit rate (target: >= 95% for static assets)
Cross-AZ traffic share for latency-sensitive paths (target: minimize)
VPC endpoint coverage (proportion of cloud service traffic over endpoints; target: 100%)
Origin response time for CDN cache misses (target: within SLO)

Maturity Level

Level 1 – No CDN; no VPC endpoints; no topology considerations
Level 2 – CDN for static assets; VPC partially configured
Level 3 – Complete VPC endpoints; CDN optimized; AZ affinity documented
Level 4 – Latency baseline measured; service mesh with AZ routing
Level 5 – Anycast; edge computing; intelligent routing optimization