Glossary – Performance Efficiency

A

Auto-Scaling: Mechanism that automatically increases or decreases the number of compute resources based on defined metrics (CPU, request rate, queue depth).
Availability Zone (AZ): Physically isolated data centers within a cloud region. For latency optimization, services that communicate frequently should be deployed in the same AZ.

Baseline: Measured performance reference of a system under defined load conditions. Foundation for regression testing and capacity planning.
Bulkhead Pattern: Isolation of resource pools (thread pools, connection pools) for different service categories, to prevent cascading failures.
Burst Balance: AWS-specific concept for gp2 EBS volumes: credits that accumulate at low I/O load and are consumed during load spikes. When exhausted, IOPS drop to baseline.

Cache Hit Rate: Percentage of requests that can be served from the cache, without querying the origin source (database, API). Target: >= 80% for application caches.
Cache Stampede / Thundering Herd: Phenomenon in which many parallel requests simultaneously attempt to regenerate an expired cache entry, causing massive load on the origin source.
Circuit Breaker: Software pattern that temporarily blocks further requests to a slow or failed downstream system, to prevent cascading failures.
Cold Start: Initialization delay for serverless functions or containers that have been idle for a long period. The first request after a longer idle phase is significantly slower than subsequent requests.
Connection Pool: A pre-maintained set of database connections reused by multiple threads/requests, to avoid the overhead of establishing new connections.

Distributed Cache: Cache layer outside the application process, typically Redis or Memcached, which can be shared by multiple instances.

Error Budget: SRE concept: the tolerable proportion of SLO violations within a defined time window. A service with a 99.9% availability SLO has 8.7 hours/year of error budget.
EBS gp3: Current generation of AWS General Purpose SSD volumes. Provides 3,000 IOPS and 125 MB/s baseline without burst mechanics, at 20% lower price than gp2.

Full Table Scan: Database operation in which all rows of a table must be read because no index exists for the query condition. Leads to high I/O and CPU load.

Horizontal Scaling: Increasing capacity by adding more identical instances behind a load balancer. Contrasts with vertical scaling (larger instance).
HPA (Horizontal Pod Autoscaler): Kubernetes mechanism that automatically adjusts the number of pods in a deployment based on CPU utilization or custom metrics.

IOPS (Input/Output Operations Per Second): Measurement for the speed of storage systems. Relevant for database performance and data-intensive workloads.
Index Strategy: Documented plan of which database columns/fields are indexed, to speed up frequent queries without creating unnecessary write overhead.

Latency: The time a single request requires from receipt to complete response. Typically measured in percentiles: P50 (median), P95, P99, P99.9.
Load Balancer: Component that distributes incoming requests across multiple backend instances, to distribute load evenly and avoid single points of failure.
Load Testing: Systematic verification of system behavior under defined, realistic load. Used to validate SLOs and auto-scaling configurations.

P50/P95/P99/P99.9 (latency percentiles): Statistical measures for latency distributions: P95 = 95% of all requests are faster than this value. P99 = 99% of all requests are faster. Tail latency (P99, P99.9) is critical for user experience.
Performance Debt: Consciously accepted or unconsciously created performance limitations in architecture and implementation, that must be documented, prioritized, and resolved.
Provisioned Concurrency: AWS Lambda feature that pre-initializes function instances and keeps them warm, to eliminate cold start latency. Billed even during inactivity.

Read Replica: Read-only copy of a database that can take over read requests, to offload the primary database server.
Reserved Concurrency: AWS Lambda feature that reserves a fixed portion of the account concurrency limit for a function, to both guarantee a minimum capacity and prevent overloading the account.

Service Level Agreement (SLA): Contractually agreed performance guarantee between a service provider and customer. Basis: SLOs + escalation/compensation rules.
Service Level Indicator (SLI): Measurable quantity that quantifies the actually experienced service quality. Examples: P99 latency, success rate, availability.
Service Level Objective (SLO): Internal target for an SLI. Example: P99 latency < 500ms, measured over 30 days. SLOs are the foundation for error budget management.
Slow Query Log: Database feature that logs SQL queries that exceed a defined execution time. Fundamental tool for database performance analysis.
SLO Burn Rate: Rate at which the error budget is consumed. A burn rate > 1 means the budget is being consumed faster than allowed.
Stress Testing: Load test with loads significantly above the expected maximum (typically 2x–5x), to identify capacity limits, failure modes, and system behavior at the limit.

Throughput: Number of processed requests or amount of data per unit of time. Typical unit: Requests per Second (RPS/TPS) or MB/s.
TTL (Time-to-Live): Lifetime of a cache entry. After expiry, the entry is removed from the cache and reloaded on the next request.

Vertical Scaling: Increasing capacity by upgrading to a larger instance. Has a hard upper limit; typically requires downtime.
VPC Endpoint: AWS feature that allows cloud service APIs (S3, DynamoDB, SSM, etc.) to be reached via private AWS backbone connections, without passing through the internet.
VPC Peering: Direct network connection between two VPCs that routes traffic over the AWS internal network instead of the internet.

Write-Through Cache: Caching strategy in which write operations synchronously write to both the cache and the data source, to ensure cache consistency.