WAF++ WAF++
Back to WAF++ Homepage

Best Practice: Retention Strategy

Context

"Storing things barely costs anything" – this statement is the most common trigger of infinite retention cost debt. Storage is cheaper per GB than compute. But storage costs grow linearly and without limit, while compute costs are controllable through rightsizing and shutdown.

Without lifecycle policies, S3 buckets, log groups, and snapshots accumulate over years and become an invisible but significant cost burden.

Target State

  • No S3 bucket, no log group, no Azure Storage, no GCS bucket without a lifecycle policy

  • Log tiering: Hot (7–30d), Warm (30–90d), Cold (90–365d), Archive (>365d)

  • No DEBUG-level logging in production without explicit justification

  • Retention strategy documented and versioned

Log Retention Tiering

The 4-Tier Model

Tier Retention Type Costs

Hot

0–30 days

Operational Logs (Errors, Warnings, App Logs)

Highest costs/GB – minimal volume

Warm

30–90 days

Security Logs, Audit Trails, Access Logs

Medium costs – compliance-relevant data

Cold

90–365 days

Regulatory Logs, Financial Logs, GDPR-relevant Logs

Cheap – rarely queried

Archive

> 365 days

Legal Hold, Long-Term Compliance

Very cheap – Glacier/Cool/Coldline

Configure CloudWatch Logs Retention

# Compliant: retention explicitly set (30 days = hot tier)
resource "aws_cloudwatch_log_group" "application" {
  name              = "/app/${var.environment}/application"
  retention_in_days = 30  # Hot Tier: Operational Logs
  kms_key_id        = aws_kms_key.logging.arn

  tags = merge(module.mandatory_tags.tags, {
    log-tier = "hot"
    log-type = "operational"
  })
}

resource "aws_cloudwatch_log_group" "audit" {
  name              = "/app/${var.environment}/audit"
  retention_in_days = 365  # Cold/Archive: audit logs (regulatory)
  kms_key_id        = aws_kms_key.logging.arn

  tags = merge(module.mandatory_tags.tags, {
    log-tier = "cold"
    log-type = "audit"
  })
}

# Non-Compliant: no retention set (= unlimited)
resource "aws_cloudwatch_log_group" "application" {
  name = "/app/production/application"
  # retention_in_days not set = 0 = unlimited
  # WAF-COST-040 and WAF-COST-070 Violation
}

S3 Lifecycle Policies

Standard Lifecycle for Data Buckets

resource "aws_s3_bucket_lifecycle_configuration" "data_lifecycle" {
  bucket = aws_s3_bucket.application_data.id

  rule {
    id     = "transition-to-ia"
    status = "Enabled"

    filter {
      prefix = "data/"
    }

    transition {
      days          = 30
      storage_class = "STANDARD_IA"  # After 30 days → Infrequent Access
    }

    transition {
      days          = 90
      storage_class = "GLACIER_IR"   # After 90 days → Glacier Instant Retrieval
    }

    transition {
      days          = 365
      storage_class = "DEEP_ARCHIVE"  # After 1 year → Glacier Deep Archive
    }
  }

  rule {
    id     = "delete-temp-files"
    status = "Enabled"

    filter {
      prefix = "tmp/"
    }

    expiration {
      days = 7  # Delete temporary files after 7 days
    }
  }

  rule {
    id     = "delete-old-versions"
    status = "Enabled"

    noncurrent_version_expiration {
      noncurrent_days = 30  # Delete old versions after 30 days
    }

    noncurrent_version_transition {
      noncurrent_days = 7
      storage_class   = "GLACIER_IR"
    }
  }

  rule {
    id     = "abort-incomplete-uploads"
    status = "Enabled"

    abort_incomplete_multipart_upload {
      days_after_initiation = 3  # Clean up incomplete uploads after 3 days
    }
  }
}

# Non-Compliant: no lifecycle defined
resource "aws_s3_bucket" "data" {
  bucket = "acme-application-data"
  # No lifecycle_configuration – WAF-COST-040 Violation
}

Lifecycle for Log Buckets

resource "aws_s3_bucket_lifecycle_configuration" "logs_lifecycle" {
  bucket = aws_s3_bucket.application_logs.id

  rule {
    id     = "log-tiering"
    status = "Enabled"

    transition {
      days          = 30
      storage_class = "STANDARD_IA"
    }

    transition {
      days          = 90
      storage_class = "GLACIER_IR"
    }

    expiration {
      days = 365  # Delete operational logs after 1 year (adjust per compliance requirement)
    }
  }

  rule {
    id     = "abort-incomplete"
    status = "Enabled"

    abort_incomplete_multipart_upload {
      days_after_initiation = 1
    }
  }
}

Azure Storage Lifecycle

resource "azurerm_storage_management_policy" "lifecycle" {
  storage_account_id = azurerm_storage_account.main.id

  rule {
    name    = "tiering-rule"
    enabled = true

    filters {
      blob_types   = ["blockBlob"]
      prefix_match = ["data/"]
    }

    actions {
      base_blob {
        tier_to_cool_after_days_since_modification_greater_than    = 30
        tier_to_archive_after_days_since_modification_greater_than = 90
        delete_after_days_since_modification_greater_than          = 365
      }
      snapshot {
        delete_after_days_since_creation_greater_than = 30
      }
    }
  }
}

GCP Cloud Storage Lifecycle

resource "google_storage_bucket" "data" {
  name          = "acme-data-${var.environment}"
  location      = var.gcp_region
  force_destroy = false

  lifecycle_rule {
    condition {
      age = 30
    }
    action {
      type          = "SetStorageClass"
      storage_class = "NEARLINE"
    }
  }

  lifecycle_rule {
    condition {
      age = 90
    }
    action {
      type          = "SetStorageClass"
      storage_class = "COLDLINE"
    }
  }

  lifecycle_rule {
    condition {
      age = 365
    }
    action {
      type          = "SetStorageClass"
      storage_class = "ARCHIVE"
    }
  }

  lifecycle_rule {
    condition {
      age                   = 30
      with_state            = "ARCHIVED"
    }
    action {
      type = "Delete"
    }
  }
}

Snapshot Management

# AWS: Automated EBS snapshot management with AWS Backup
resource "aws_backup_plan" "main" {
  name = "main-backup-plan"

  rule {
    rule_name         = "daily-backup"
    target_vault_name = aws_backup_vault.main.name
    schedule          = "cron(0 2 * * ? *)"  # Daily at 2:00

    lifecycle {
      cold_storage_after = 30   # After 30 days → Cold Storage
      delete_after       = 90   # After 90 days → Delete
    }
  }

  rule {
    rule_name         = "weekly-backup"
    target_vault_name = aws_backup_vault.main.name
    schedule          = "cron(0 2 ? * SUN *)"  # Sundays

    lifecycle {
      cold_storage_after = 60
      delete_after       = 365
    }
  }
}

Retention Strategy Document

# docs/retention-strategy.yml
version: "1.0"
effective_date: "2025-01-01"

log_retention:
  operational_logs:
    description: "Application logs (INFO, WARN, ERROR)"
    hot_tier_days: 30
    archive_after_days: null  # No archive – delete after hot tier
    regulatory_basis: "Operational requirement only"

  security_audit_logs:
    description: "CloudTrail, Security Group Changes, IAM Events"
    hot_tier_days: 90
    cold_tier_days: 365
    archive_after_days: 2555  # 7 years – BSI C5 requirement
    regulatory_basis: "BSI C5, ISO 27001"

  application_access_logs:
    description: "HTTP Access Logs, API Gateway Logs"
    hot_tier_days: 30
    cold_tier_days: 365
    archive_after_days: null
    regulatory_basis: "Internal policy"

storage_retention:
  customer_data:
    description: "Customer data (personal data)"
    deletion_policy: "On account deletion + 30 days"
    regulatory_basis: "GDPR Art. 17"

  backups:
    daily: 7
    weekly: 4
    monthly: 12
    regulatory_basis: "Business continuity requirement"

Common Anti-Patterns

  • retention_in_days = 0 in CloudWatch: means unlimited, not "no log"

  • Forgotten buckets without lifecycle: especially in older accounts

  • Snapshots without expiry: grow unnoticed to TB scale

  • Compliance as a blanket justification: "we must keep everything" – usually not true in that generality

Metrics

  • Storage growth rate: % month-over-month (target: < 5% without new workloads)

  • Log groups without retention: count (target: 0)

  • S3 buckets without lifecycle policy: count (target: 0)

  • Observability cost share: % of total cloud budget (target: < 20%)