Global Load Balancing: Architecture Guide for Multi-Region Deployments
Global load balancing is the key to delivering low-latency, high-availability applications worldwide. This guide covers the architecture, implementation, and operational considerations for routing users to the optimal backend—whether that's the nearest region, the healthiest endpoint, or a specific deployment.
What is Global Load Balancing?
Global load balancing distributes traffic across multiple geographic regions or data centers. Unlike regional load balancers that work within a single location, global load balancers make routing decisions at the internet edge based on:
- Geographic proximity: Route users to the nearest healthy region
- Latency: Route based on measured or estimated latency
- Health status: Automatically avoid unhealthy backends
- Capacity: Balance load across regions based on available capacity
- Policy: Route based on business rules (compliance, cost, feature flags)
DNS-Based vs. Anycast Global Load Balancing
There are two fundamental approaches to global load balancing, each with tradeoffs:
DNS-Based Global Load Balancing
DNS-based GLB returns different IP addresses based on the resolver's location or health of backends:
- How it works: DNS resolver queries authoritative nameserver; nameserver returns IP(s) optimized for that resolver's location
- Failover speed: Limited by DNS TTL (typically 60-300 seconds)
- Granularity: Based on resolver location, not actual client location
Pros: Works with any backend, no special infrastructure required
Cons: Slow failover, clients may cache DNS, resolver location != client location
Anycast Global Load Balancing
Anycast GLB uses a single IP address advertised from multiple edge locations:
- How it works: Same IP address announced via BGP from multiple points of presence; internet routing delivers packets to the nearest one
- Failover speed: Seconds (BGP withdrawal/convergence)
- Granularity: Based on actual network path, not resolver
Pros: Instant failover, accurate proximity routing, DDoS absorption
Cons: Requires edge infrastructure (cloud provider or CDN)
Cloud Provider Global Load Balancing Services
AWS Global Accelerator
Global Accelerator provides Anycast IP addresses that route traffic to AWS edge locations, then through AWS's private backbone to your regional endpoints:
- Static IPs: Two fixed Anycast IPs regardless of backend changes
- Endpoint types: ALB, NLB, EC2, Elastic IP
- Traffic dials: Percentage-based traffic distribution
- Health checks: Configurable thresholds and intervals
# Terraform example: Global Accelerator
resource "aws_globalaccelerator_accelerator" "main" {
name = "my-global-accelerator"
ip_address_type = "IPV4"
}
resource "aws_globalaccelerator_listener" "https" {
accelerator_arn = aws_globalaccelerator_accelerator.main.id
protocol = "TCP"
port_range {
from_port = 443
to_port = 443
}
}
resource "aws_globalaccelerator_endpoint_group" "us_east" {
listener_arn = aws_globalaccelerator_listener.https.id
endpoint_group_region = "us-east-1"
health_check_interval_seconds = 10
threshold_count = 3
traffic_dial_percentage = 50
endpoint_configuration {
endpoint_id = aws_lb.us_east.arn
weight = 100
}
}
Google Cloud Load Balancing
GCP's HTTP(S) Load Balancing is Anycast by default with a single global IP:
- Single global IP: Works for HTTP(S), SSL proxy, TCP proxy
- Cross-region backends: Add backend services from any region
- Cloud CDN integration: Enable caching at Google's edge
- Health checks: TCP, HTTP, HTTPS, HTTP/2 with configurable probes
Azure Front Door
Azure Front Door provides global HTTP load balancing with edge caching:
- Anycast entry: Microsoft's global edge network
- Backend pools: Add backends from any region or external
- Routing rules: Path-based, header-based routing
- WAF integration: Built-in web application firewall
Cloudflare Load Balancing
Cloudflare offers load balancing across their 300+ edge locations:
- Origin pools: Define backends with health checks
- Steering policies: Geo, proximity, random, off
- Session affinity: Cookie-based or geo-based
- Deep integration: Works with Workers, Cache, WAF
Health Checking Strategies
Health checks are critical for global load balancing—they determine when to remove unhealthy backends:
Health Check Types
- TCP: Port is responding (minimal—doesn't verify application)
- HTTP: Specific path returns expected status code
- HTTPS: Same as HTTP with TLS verification
- gRPC: gRPC health check protocol
- Custom: Execute scripts or complex validation
Health Check Best Practices
- Deep health checks: Your /health endpoint should verify database connectivity, cache access, and critical dependencies—not just return 200
- Avoid expensive checks: Health checks run frequently; don't trigger heavy database queries
- Multiple probers: Check from different locations to avoid false positives from network issues
- Appropriate thresholds: Require 2-3 failures before marking unhealthy to avoid flapping
// Example: Comprehensive health check endpoint
app.get('/health', async (req, res) => {
const checks = {
database: await checkDatabase(),
cache: await checkRedis(),
timestamp: Date.now()
};
const healthy = checks.database && checks.cache;
res.status(healthy ? 200 : 503).json(checks);
});
Routing Policies
Geoproximity Routing
Route users to the nearest region based on geographic location:
- Uses resolver location (DNS) or actual network path (Anycast)
- Bias settings can shift traffic toward or away from regions
- Useful for latency reduction
Latency-Based Routing
Route based on measured latency rather than geography:
- More accurate than geographic routing for internet topology
- Accounts for peering relationships and congestion
- Requires latency measurement infrastructure
Weighted Routing
Distribute traffic by percentage across regions:
- Useful for canary deployments (10% to new version)
- Gradual migration between regions
- Capacity-based distribution
Failover Routing
Primary/secondary configuration for disaster recovery:
- All traffic to primary while healthy
- Automatic failover to secondary on primary failure
- Can be combined with health checks
Multi-Region Architecture Patterns
Active-Active
┌─────────────────┐
│ Global LB │
└────────┬────────┘
┌────────────┼────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ US-East │ │ EU-West │ │ APAC │
│ (Active) │ │ (Active) │ │ (Active) │
└──────────┘ └──────────┘ └──────────┘
- All regions serve traffic simultaneously
- Best for latency and availability
- Requires data replication strategy
Learn more: Active-Active vs Active-Passive
Active-Passive
┌─────────────────┐
│ Global LB │
└────────┬────────┘
│
┌─────────────┴─────────────┐
▼ (100%) ▼ (0%)
┌──────────┐ ┌──────────┐
│ US-East │ │ US-West │
│ (Active) │ │ (Standby)│
└──────────┘ └──────────┘
- Secondary only receives traffic during primary failure
- Simpler data consistency (async replication to standby)
- Higher latency for users far from primary
Session Persistence Across Regions
Maintaining user sessions across global infrastructure requires careful design:
Options
- Sticky sessions: Route user to same region based on cookie/header
- Distributed session store: Redis/Memcached replicated across regions
- Stateless architecture: JWT tokens eliminate server-side sessions
- Database-backed sessions: Globally distributed database (DynamoDB Global Tables, Spanner)
Cost Considerations
- Global Accelerator: $0.025/hour + $0.015-0.035/GB (varies by region)
- GCP HTTP(S) LB: $0.008/hour + $0.008-0.012/GB
- Azure Front Door: $0.01/GB ingress, $0.08-0.24/GB egress
- Cloudflare: Included in plans; enterprise pricing varies
For cost optimization strategies, see our cost optimization framework.
Key Takeaways
- Choose Anycast for instant failover; DNS for simpler setups
- Health checks should verify application functionality, not just port availability
- Geoproximity routing reduces latency; weighted routing enables traffic shifting
- Active-Active provides best availability but requires data replication strategy
- Consider session management architecture before deploying globally
Need Global Load Balancing Architecture?
We design and implement multi-region architectures. Contact us for a consultation.