Skip to main content

⚖️ Load Balancing

Multiple checkout lanes at a store

The Checkout Lane Analogy

At a grocery store:

One lane: Long line, slow checkout Multiple lanes: Shorter lines, faster service

A store manager directs customers to lanes with shorter lines.

Load balancer is that manager. It distributes requests across multiple servers so no single server gets overwhelmed.


What Is Load Balancing?

Distribute incoming traffic across multiple servers.

Without load balancer:
  All traffic → One server → Overloaded!

With load balancer:
  Traffic → Load Balancer → Server 1
                          → Server 2
                          → Server 3

Spread the load, handle more traffic.

Why Load Balance?

1. Performance

One server: 1000 requests/second max
Three servers: 3000 requests/second!

More servers = More capacity.

2. Reliability

Server 1 dies?
Load balancer removes it from rotation.
Traffic continues to servers 2 and 3.

No single point of failure.

3. Scaling

Traffic increasing?
Add more servers.
Load balancer includes them automatically.

Scale horizontally.

How It Works

                ┌─────────────────┐
                │     Clients     │
                └────────┬────────┘
                         │
                         ▼
                ┌─────────────────┐
                │  Load Balancer  │
                │                 │
                │ - Health checks │
                │ - Routing algo  │
                └────────┬────────┘
                         │
         ┌───────────────┼───────────────┐
         │               │               │
         ▼               ▼               ▼
    ┌─────────┐    ┌─────────┐    ┌─────────┐
    │Server 1 │    │Server 2 │    │Server 3 │
    └─────────┘    └─────────┘    └─────────┘

Load Balancing Algorithms

Round Robin

Request 1 → Server 1
Request 2 → Server 2
Request 3 → Server 3
Request 4 → Server 1 (repeat)

Simple, fair, even distribution.
Doesn't account for server capacity.

Weighted Round Robin

Server 1: 4 cores (weight 4)
Server 2: 2 cores (weight 2)
Server 3: 2 cores (weight 2)

Server 1 gets 4× more traffic.
Account for different server sizes.

Least Connections

Server 1: 10 active connections
Server 2: 5 active connections
Server 3: 8 active connections

Next request → Server 2 (least busy)

Better for long-running connections.

IP Hash

hash(client IP) → Server

Same client often hits the same server.
Useful for session affinity.

Random

Pick server randomly.
Simple.
Works well with many servers.

Types of Load Balancers

Layer 4 (Transport)

Works at TCP/UDP level.
Routes based on IP and port.
Very fast.

Doesn't understand HTTP content.
Can't route based on URL or headers.

Layer 7 (Application)

Works at HTTP level.
Routes based on URL, headers, cookies.
More flexible.

/api/* → API servers
/static/* → CDN

Slightly more overhead.

Comparison

AspectLayer 4Layer 7
SpeedFasterSlightly slower
RoutingMostly IP/portURL, headers, cookies
SSL terminationNoYes
Content inspectionNoYes

Health Checks

Load balancer checks if servers are healthy.

/health endpoint returns 200 OK? Healthy!
Returns 500 or timeout? Unhealthy!

Unhealthy server removed from rotation.
When healthy again, added back.

Health Check Configuration

Check interval: every few seconds
Timeout: short
Unhealthy threshold: 3 failures
Healthy threshold: 2 successes

Fine-tune based on your needs.

Session Persistence (Sticky Sessions)

The Problem

User logs in → Server 1 (session created)
Next request → Server 2 (no session!)
User logged out!

Solutions

1. Sticky sessions:
   Same user often goes to the same server.
   Uses cookie or IP hash.

2. Shared session store:
   Sessions in Redis/database.
   Any server can read session.

3. Stateless design:
   JWT tokens contain all user info.
   No server-side sessions.

#3 is often a strong option for scalability.

ToolTypeOften Used For
NginxSoftwareGeneral purpose
HAProxySoftwareHigh performance
AWS ALB/NLBCloudAWS applications
Google Cloud LBCloudGCP applications
TraefikSoftwareContainer-native
F5HardwareEnterprise

Common Patterns

Active-Passive

Load balancer hot standby.
Active fails → Passive takes over.

Active-Active

Multiple load balancers.
DNS or another balancer distributes.
Full redundancy.

Global Load Balancing

Route to nearest datacenter.

US user → US datacenter
EU user → EU datacenter

Uses DNS or Anycast.

Common Mistakes

1. No Health Checks

Dead server still receiving traffic.
Users get errors.
Usually configure health checks.

2. Session Problems

Forgot about session affinity.
Users randomly logged out.
Use sticky sessions or shared store.

3. Single Load Balancer

Load balancer becomes single point of failure!
Use multiple LBs or managed cloud service.

4. Ignoring Capacity

Round robin with mixed server sizes.
Small server overwhelmed.
Use weighted algorithms.

FAQ

Q: Hardware vs software load balancer?

Software (Nginx, HAProxy) is flexible and cheap. Hardware (F5) for extreme performance.

Q: Can I load balance databases?

Yes, but carefully. Read replicas can be load balanced. Writes usually go to primary.

Q: How many servers behind a load balancer?

Depends on traffic. Start with 2-3 for redundancy. Scale as needed.

Q: What about WebSockets?

Use sticky sessions or Layer 7 load balancer that understands WebSocket upgrades.


Summary

Load balancing distributes traffic across servers for performance, reliability, and scalability.

Key Takeaways:

  • Spread load across multiple servers
  • Algorithms: round robin, least connections, etc.
  • Layer 4 (fast) vs Layer 7 (flexible)
  • Health checks remove failing servers
  • Handle sessions with sticky or shared stores
  • No single load balancer = no single failure

Load balancing is the foundation of scalable web architecture!

Leave a Comment

Comments (0)

Be the first to comment on this concept.

Comments are approved automatically.