Glossary – Performance Efficiency
A
- Auto-Scaling
-
Mechanism that automatically increases or decreases the number of compute resources based on defined metrics (CPU, request rate, queue depth).
- Availability Zone (AZ)
-
Physically isolated data centers within a cloud region. For latency optimization, services that communicate frequently should be deployed in the same AZ.
B
- Baseline
-
Measured performance reference of a system under defined load conditions. Foundation for regression testing and capacity planning.
- Bulkhead Pattern
-
Isolation of resource pools (thread pools, connection pools) for different service categories, to prevent cascading failures.
- Burst Balance
-
AWS-specific concept for gp2 EBS volumes: credits that accumulate at low I/O load and are consumed during load spikes. When exhausted, IOPS drop to baseline.
C
- Cache Hit Rate
-
Percentage of requests that can be served from the cache, without querying the origin source (database, API). Target: >= 80% for application caches.
- Cache Stampede / Thundering Herd
-
Phenomenon in which many parallel requests simultaneously attempt to regenerate an expired cache entry, causing massive load on the origin source.
- Circuit Breaker
-
Software pattern that temporarily blocks further requests to a slow or failed downstream system, to prevent cascading failures.
- Cold Start
-
Initialization delay for serverless functions or containers that have been idle for a long period. The first request after a longer idle phase is significantly slower than subsequent requests.
- Connection Pool
-
A pre-maintained set of database connections reused by multiple threads/requests, to avoid the overhead of establishing new connections.
D
- Distributed Cache
-
Cache layer outside the application process, typically Redis or Memcached, which can be shared by multiple instances.
E
- Error Budget
-
SRE concept: the tolerable proportion of SLO violations within a defined time window. A service with a 99.9% availability SLO has 8.7 hours/year of error budget.
- EBS gp3
-
Current generation of AWS General Purpose SSD volumes. Provides 3,000 IOPS and 125 MB/s baseline without burst mechanics, at 20% lower price than gp2.
F
- Full Table Scan
-
Database operation in which all rows of a table must be read because no index exists for the query condition. Leads to high I/O and CPU load.
H
- Horizontal Scaling
-
Increasing capacity by adding more identical instances behind a load balancer. Contrasts with vertical scaling (larger instance).
- HPA (Horizontal Pod Autoscaler)
-
Kubernetes mechanism that automatically adjusts the number of pods in a deployment based on CPU utilization or custom metrics.
I
- IOPS (Input/Output Operations Per Second)
-
Measurement for the speed of storage systems. Relevant for database performance and data-intensive workloads.
- Index Strategy
-
Documented plan of which database columns/fields are indexed, to speed up frequent queries without creating unnecessary write overhead.
L
- Latency
-
The time a single request requires from receipt to complete response. Typically measured in percentiles: P50 (median), P95, P99, P99.9.
- Load Balancer
-
Component that distributes incoming requests across multiple backend instances, to distribute load evenly and avoid single points of failure.
- Load Testing
-
Systematic verification of system behavior under defined, realistic load. Used to validate SLOs and auto-scaling configurations.
P
- P50/P95/P99/P99.9 (latency percentiles)
-
Statistical measures for latency distributions: P95 = 95% of all requests are faster than this value. P99 = 99% of all requests are faster. Tail latency (P99, P99.9) is critical for user experience.
- Performance Debt
-
Consciously accepted or unconsciously created performance limitations in architecture and implementation, that must be documented, prioritized, and resolved.
- Provisioned Concurrency
-
AWS Lambda feature that pre-initializes function instances and keeps them warm, to eliminate cold start latency. Billed even during inactivity.
R
- Read Replica
-
Read-only copy of a database that can take over read requests, to offload the primary database server.
- Reserved Concurrency
-
AWS Lambda feature that reserves a fixed portion of the account concurrency limit for a function, to both guarantee a minimum capacity and prevent overloading the account.
S
- Service Level Agreement (SLA)
-
Contractually agreed performance guarantee between a service provider and customer. Basis: SLOs + escalation/compensation rules.
- Service Level Indicator (SLI)
-
Measurable quantity that quantifies the actually experienced service quality. Examples: P99 latency, success rate, availability.
- Service Level Objective (SLO)
-
Internal target for an SLI. Example: P99 latency < 500ms, measured over 30 days. SLOs are the foundation for error budget management.
- Slow Query Log
-
Database feature that logs SQL queries that exceed a defined execution time. Fundamental tool for database performance analysis.
- SLO Burn Rate
-
Rate at which the error budget is consumed. A burn rate > 1 means the budget is being consumed faster than allowed.
- Stress Testing
-
Load test with loads significantly above the expected maximum (typically 2x–5x), to identify capacity limits, failure modes, and system behavior at the limit.
T
- Throughput
-
Number of processed requests or amount of data per unit of time. Typical unit: Requests per Second (RPS/TPS) or MB/s.
- TTL (Time-to-Live)
-
Lifetime of a cache entry. After expiry, the entry is removed from the cache and reloaded on the next request.
V
- Vertical Scaling
-
Increasing capacity by upgrading to a larger instance. Has a hard upper limit; typically requires downtime.
- VPC Endpoint
-
AWS feature that allows cloud service APIs (S3, DynamoDB, SSM, etc.) to be reached via private AWS backbone connections, without passing through the internet.
- VPC Peering
-
Direct network connection between two VPCs that routes traffic over the AWS internal network instead of the internet.