Best Practice: Netzwerk-Performance optimieren
Kontext
Netzwerklatenz ist kumulativ: In einer Microservices-Architektur mit 10 internen Service-Calls können Cross-AZ-Hops von je 1–2ms bereits 20ms unnötige Latenz akkumulieren. CDN-Fehlen bedeutet, dass statische Assets von Origin ausgeliefert werden – hundert Millisekunden zusätzliche Latenz für Nutzer außerhalb der Deployment-Region.
Typische Netzwerk-Performance-Probleme:
-
Keine VPC Endpoints: S3, DynamoDB, SSM-Traffic routet über das Internet
-
Kein CDN: Alle JS/CSS/Images werden von Origin geladen (hohe Latenz + Egress-Kosten)
-
Cross-AZ-Traffic in kritischen Paths ohne Bewusstsein für Latenz-Overhead
-
DNS-Auflösung bei jedem Request ohne DNS-Caching (typisch: 10–50ms pro Resolution)
Zugehörige Controls
-
WAF-PERF-070 – Network Latency & Topology Optimization
Zielbild
Latenzoptimiertes Netzwerk-Design:
-
VPC Endpoints: Alle Cloud-Service-APIs über privates Backbone erreichbar
-
CDN aktiv: >= 95% Cache-Hit-Rate für statische Assets
-
AZ-bewusst: Häufig kommunizierende Services in derselben AZ bevorzugt
-
Gemessen: Netzwerk-Latenz-Baseline pro Service-Pair dokumentiert
Technische Umsetzung
Schritt 1: VPC Endpoints für AWS Services
# Gateway Endpoints (kostenlos; für S3 und DynamoDB)
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = concat(
[aws_route_table.private_a.id],
[aws_route_table.private_b.id],
[aws_route_table.private_c.id]
)
tags = { Name = "vpce-s3" }
}
resource "aws_vpc_endpoint" "dynamodb" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.dynamodb"
vpc_endpoint_type = "Gateway"
route_table_ids = [aws_route_table.private_a.id, aws_route_table.private_b.id]
}
# Interface Endpoints (kostenpflichtig; für andere Services)
locals {
interface_endpoints = {
"ecr.api" = "com.amazonaws.${var.region}.ecr.api"
"ecr.dkr" = "com.amazonaws.${var.region}.ecr.dkr"
"ssm" = "com.amazonaws.${var.region}.ssm"
"ssmmessages" = "com.amazonaws.${var.region}.ssmmessages"
"secretsmanager" = "com.amazonaws.${var.region}.secretsmanager"
"monitoring" = "com.amazonaws.${var.region}.monitoring"
"logs" = "com.amazonaws.${var.region}.logs"
}
}
resource "aws_vpc_endpoint" "interface" {
for_each = local.interface_endpoints
vpc_id = aws_vpc.main.id
service_name = each.value
vpc_endpoint_type = "Interface"
subnet_ids = var.private_subnet_ids
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
tags = { Name = "vpce-${each.key}" }
}
resource "aws_security_group" "vpc_endpoints" {
name = "sg-vpc-endpoints"
description = "Allow HTTPS from private subnets to VPC Endpoints"
vpc_id = aws_vpc.main.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [aws_vpc.main.cidr_block]
}
}
Schritt 2: CloudFront CDN konfigurieren
resource "aws_cloudfront_origin_access_control" "s3" {
name = "oac-payment-frontend"
origin_access_control_origin_type = "s3"
signing_behavior = "always"
signing_protocol = "sigv4"
}
resource "aws_cloudfront_distribution" "main" {
enabled = true
is_ipv6_enabled = true
default_root_object = "index.html"
price_class = "PriceClass_100" # USA + Europe
# S3-Origin für statische Assets
origin {
domain_name = aws_s3_bucket.frontend.bucket_regional_domain_name
origin_id = "s3-frontend"
origin_access_control_id = aws_cloudfront_origin_access_control.s3.id
}
# API-Origin für Backend-Requests
origin {
domain_name = aws_lb.api.dns_name
origin_id = "alb-api"
custom_origin_config {
http_port = 80
https_port = 443
origin_protocol_policy = "https-only"
origin_ssl_protocols = ["TLSv1.2"]
}
}
# Statische Assets: Langzeit-Caching
ordered_cache_behavior {
path_pattern = "/static/*"
target_origin_id = "s3-frontend"
viewer_protocol_policy = "redirect-to-https"
allowed_methods = ["GET", "HEAD"]
cached_methods = ["GET", "HEAD"]
compress = true
cache_policy_id = aws_cloudfront_cache_policy.static.id
}
# API: Pass-through mit kurzem Cache für GET-Requests
ordered_cache_behavior {
path_pattern = "/api/*"
target_origin_id = "alb-api"
viewer_protocol_policy = "https-only"
allowed_methods = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
cached_methods = ["GET", "HEAD"]
compress = true
cache_policy_id = aws_cloudfront_cache_policy.api.id
origin_request_policy_id = aws_cloudfront_origin_request_policy.api.id
}
# Default: SPA-Routing
default_cache_behavior {
target_origin_id = "s3-frontend"
viewer_protocol_policy = "redirect-to-https"
allowed_methods = ["GET", "HEAD"]
cached_methods = ["GET", "HEAD"]
compress = true
cache_policy_id = aws_cloudfront_cache_policy.html.id
}
restrictions {
geo_restriction { restriction_type = "none" }
}
viewer_certificate {
acm_certificate_arn = var.certificate_arn
ssl_support_method = "sni-only"
minimum_protocol_version = "TLSv1.2_2021"
}
}
resource "aws_cloudfront_cache_policy" "static" {
name = "cache-static-assets"
min_ttl = 86400
default_ttl = 31536000
max_ttl = 31536000
parameters_in_cache_key_and_forwarded_to_origin {
cookies_config { cookie_behavior = "none" }
headers_config { header_behavior = "none" }
query_strings_config { query_string_behavior = "none" }
enable_accept_encoding_gzip = true
enable_accept_encoding_brotli = true
}
}
resource "aws_cloudfront_cache_policy" "html" {
name = "cache-html"
min_ttl = 0
default_ttl = 60 # 1 Minute für HTML
max_ttl = 300
parameters_in_cache_key_and_forwarded_to_origin {
cookies_config { cookie_behavior = "none" }
headers_config { header_behavior = "none" }
query_strings_config { query_string_behavior = "none" }
enable_accept_encoding_gzip = true
}
}
Schritt 3: AZ-Affinität für Kubernetes
# Bevorzuge gleiche AZ für Service-zu-Service-Kommunikation
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-processor
spec:
template:
spec:
topologySpreadConstraints:
# Gleichmäßige Verteilung über AZs für HA
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: payment-processor
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
# Bevorzuge Placement nahe dem payment-api (Consumer)
- weight: 80
podAffinityTerm:
topologyKey: topology.kubernetes.io/zone
labelSelector:
matchLabels:
app: payment-api
---
# Service mit TopologyAwareRouting: bevorzuge lokale Endpoints
apiVersion: v1
kind: Service
metadata:
name: payment-processor
annotations:
service.kubernetes.io/topology-mode: "Auto" # Preferiere lokale AZ
spec:
selector:
app: payment-processor
ports:
- port: 8080
Schritt 4: Netzwerk-Latenz-Baseline messen
#!/bin/bash
# scripts/network-baseline.sh
# Misst RTT zwischen Services für Latenz-Baseline
SERVICES=(
"payment-api:8080"
"payment-processor:8080"
"db-payment.internal:5432"
"cache-payment.internal:6379"
)
echo "Service Pair RTT Baseline – $(date)"
echo "======================================="
for SERVICE in "${SERVICES[@]}"; do
HOST=$(echo $SERVICE | cut -d: -f1)
PORT=$(echo $SERVICE | cut -d: -f2)
# 10 TCP-Verbindungen und mittlere RTT messen
RTT=$(for i in {1..10}; do
(time (echo "" | nc -w1 $HOST $PORT)) 2>&1 | grep real | awk '{print $2}'
done | sort -n | awk 'NR==5{print}') # P50
echo "$HOST:$PORT → P50 RTT: $RTT"
done
Typische Fehlmuster
-
Kein VPC Endpoint für S3: S3-Traffic verlässt das private Netzwerk und routet über das Internet → Latenz + Egress-Kosten.
-
CDN nur für Staging nicht für Produktion: "CDN ist zu komplex, wir machen das später" – und später kommt nie.
-
Service-Mesh ohne AZ-Bewusstsein: Alle Dienste über Round-Robin geroutet, unabhängig von AZ → unnötige Cross-AZ-Latenz.
-
DNS-Lookups ohne Caching: Jeder Kubernetes-Pod macht DNS-Lookups ohne ausreichendes Caching.
Metriken
-
CDN-Cache-Hit-Rate (Ziel: >= 95% für statische Assets)
-
Cross-AZ-Traffic-Anteil für latenz-sensitive Paths (Ziel: minimieren)
-
VPC-Endpoint-Coverage (Anteil Cloud-Service-Traffic über Endpoints; Ziel: 100%)
-
Origin-Response-Time für CDN-Cache-Misses (Ziel: innerhalb SLO)
Reifegrad
Level 1 – Kein CDN; keine VPC Endpoints; keine Topologie-Überlegungen
Level 2 – CDN für statische Assets; VPC teilweise konfiguriert
Level 3 – Vollständige VPC-Endpoints; CDN optimiert; AZ-Affinität dokumentiert
Level 4 – Latenz-Baseline gemessen; Service-Mesh mit AZ-Routing
Level 5 – Anycast; Edge-Computing; intelligente Routing-Optimierung