Load Balancers

The Problem

Imagine a single server handling all incoming requests. As traffic grows, the server becomes overwhelmed: response times spike, requests start timing out, and eventually users are left waiting. This is the single point of failure that every growing application faces.

Request Routed Processing 408 — Timeout 503 — Rejected

The Solution

A load balancer is a component that sits between clients and a group of servers. Its job is simple: receive every incoming request and decide which server should handle it. This way, instead of one server drowning in traffic, the load is spread across many, improving response times, increasing availability, and eliminating single points of failure. If one server goes down, the load balancer stops sending requests to it and redirects traffic to the healthy ones. There are several tools that do this: Nginx, HAProxy, Traefik, Envoy, AWS ALB and Google Cloud Load Balancing. In this article, we will use Nginx for the configuration examples.

Request Routed Processing 408 — Timeout 503 — Rejected

Algorithms

Round Robin

Think of a bakery with three cashiers. Each new customer goes to the next cashier in line: first to cashier 1, then 2, then 3, then back to 1. Nobody checks who is faster or who has a longer queue. It just follows the order.

Simple to implement, perfectly even distribution, no overhead

If one server is slower or handling heavier requests, it will accumulate a backlog while still receiving the same number of new requests as faster servers

Use case: A CDN serving images, CSS and JS files. Every request takes roughly the same time (read file, send bytes), so distributing them 1-by-1 keeps all servers equally busy. Cloudflare uses this approach across its edge nodes for static asset delivery.

Nginx:

upstream backend {
  server 10.0.0.1;
  server 10.0.0.2;
  server 10.0.0.3;
  # round-robin is the default
}

Request Routed Processing 408 — Timeout 503 — Rejected

Weighted Round Robin

Same bakery, but now cashier 1 has a brand-new register and handles customers twice as fast. So you send 2 customers to cashier 1 for every 1 you send to the others. The faster register gets more work because it can handle it.

Accounts for different server capacities, predictable distribution

Weights need to be set manually and updated whenever you add or upgrade servers. If a server slows down due to a memory leak or disk issue, its weight stays the same and it keeps receiving traffic it cannot handle

Use case: You are migrating from old 2-core VMs to new 8-core machines. During the transition, both run side by side, so you assign weight=4 to the new servers and weight=1 to the old ones. AWS does this with Target Groups when mixing instance types like t3.micro and c5.xlarge.

Nginx:

upstream backend {
  server 10.0.0.1 weight=4;
  server 10.0.0.2 weight=1;
  server 10.0.0.3 weight=1;
}

Request Routed Processing 408 — Timeout 503 — Rejected

Least Connections

Now imagine a hospital emergency room. Instead of calling the next patient in a fixed order, the receptionist checks which doctor has the fewest patients right now and sends the next one there. Doctors who finish faster naturally receive more patients.

Adapts to real-time load, handles variable request durations well

The load balancer needs to track how many open connections each server has. A new server with zero connections will receive a burst of traffic all at once until it catches up

Use case: An API where some endpoints return in 50ms (fetching a user profile) and others take 5s (generating a PDF report). Netflix uses least connections for their internal microservices so that a server stuck processing a heavy request stops receiving new ones until it frees up.

Nginx:

upstream backend {
  least_conn;
  server 10.0.0.1;
  server 10.0.0.2;
  server 10.0.0.3;
}

Request Routed Processing 408 — Timeout 503 — Rejected

IP Hash

Like a restaurant where you always get the same waiter. Every time you visit, a host looks at your face (your IP) and sends you to "your" table. The waiter already knows your preferences, no need to explain your allergies again.

Session persistence, no shared session storage needed

A single corporate network behind one IP can overload one server while others sit idle. Adding or removing a server changes the hash mapping and breaks existing sessions

Use case: An e-commerce checkout flow. The user adds items to the cart (step 1), fills in shipping info (step 2) and pays (step 3). The session lives in server memory. If step 2 hits a different server, the cart is empty. Shopify uses sticky sessions with IP hashing to keep the entire checkout on one server.

Nginx:

upstream backend {
  ip_hash;
  server 10.0.0.1;
  server 10.0.0.2;
  server 10.0.0.3;
}

Request Routed Processing 408 — Timeout 503 — Rejected

Random

Picture a festival food court with 10 identical stalls selling the same thing. You just pick one at random. With so many options and so many people, the lines end up roughly the same length without anyone coordinating.

Zero state required, simple implementation, good at scale

With bad luck, one server can receive several requests in a row while others are idle. There is no guarantee of even distribution in the short term, and no way to maintain session affinity

Use case: A logging ingestion pipeline receiving millions of events/second from thousands of services. Each event is independent, order does not matter, and any server can process it. Google uses random distribution in some internal systems where the sheer volume of requests makes statistical fairness guaranteed.

Nginx:

upstream backend {
  random;
  server 10.0.0.1;
  server 10.0.0.2;
  server 10.0.0.3;
}

Request Routed Processing 408 — Timeout 503 — Rejected

Interactive Playground

Experiment with different algorithms, server counts, and request rates. Watch how each algorithm handles traffic distribution in real time.