NGINX Load Balancing: Upstream Config, Health Checks And Failover

NGINX load balancing is the gateway skill that turns “one server serving a site” into “a fleet of backends with health checks, failover, and capacity headroom.” This guide covers the full NGINX upstream configuration toolkit on Debian and Ubuntu — every load balancing algorithm, the passive and active health check options, keepalive connection pools, failover behaviour, weighted distribution, sticky sessions, and the TCP and UDP stream variants that the limit_req rate-limit crowd often forget exist.

This is the practical NGINX load balancer setup most teams actually need. Five algorithms, sensible defaults, and the gotchas your monitoring will eventually find for you the hard way if you skip them.

The Five NGINX Load Balancing Algorithms

NGINX ships with five built-in load balancing algorithms. Picking the right one for your traffic pattern matters more than people expect — the wrong choice can leave one backend overloaded while two others sit idle.

1. Round-robin (the default)

upstream backend {
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}

Requests rotate through the backends one by one. Simple, fair, no configuration. Right answer for stateless backends where every server is roughly the same size and every request takes roughly the same time. Wrong answer when one backend has a slow database, a fat connection pool, or just happens to be in a different availability zone.

2. Least-connections

upstream backend {
    least_conn;
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}

NGINX sends the next request to whichever backend currently has the fewest active connections. The right choice when your requests take wildly different amounts of time (file downloads, long-running API calls, anything WebSocket-shaped). Pair it with keepalive for proper accounting.

3. IP hash (sticky by client IP)

upstream backend {
    ip_hash;
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}

The same client IP always lands on the same backend — useful for session affinity when your backends store session state locally (which they really should not, but here we are). Falls apart behind a corporate NAT where 5,000 users share one IP. Falls apart again with CGNAT on mobile carriers.

4. Hash on any variable

upstream backend {
    hash $request_uri consistent;
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}

Like IP hash but on any variable — request URI, cookie, custom header, geo region. consistent enables consistent hashing, which keeps cache hit rates sane when you add or remove a backend (only ~1/N of keys move, not all of them). The right algorithm for upstream caching tiers (Varnish farms, image processors keyed on URL).

5. Random-two (NGINX Plus / Angie / 1.15.1+)

upstream backend {
    random two least_conn;
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}

Pick two backends at random, send the request to whichever has fewer connections. Statistically very close to least-connections but cheaper to compute on big upstream groups (dozens of backends). The Power of Two Choices algorithm — small lookup, very smooth distribution. Use it once your upstream block has more than about ten servers.

Weighted Distribution: When Backends Aren’t Equal

Got one beefy 16-core backend and two older 8-core ones? Tell NGINX about it:

upstream backend {
    server 10.0.0.1:8080 weight=3;   # 3x the share of traffic
    server 10.0.0.2:8080 weight=1;
    server 10.0.0.3:8080 weight=1;
}

Weights are relative — what matters is the ratio between them, not the absolute values. Combine with least_conn and weight still applies. Brilliantly useful during gradual hardware refreshes when you are migrating from old boxes to new ones.

Health Checks: Passive (Free) and Active (Better)

Passive health checks — included in open-source NGINX

upstream backend {
    server 10.0.0.1:8080 max_fails=3 fail_timeout=30s;
    server 10.0.0.2:8080 max_fails=3 fail_timeout=30s;
    server 10.0.0.3:8080 max_fails=3 fail_timeout=30s backup;
}

NGINX marks a backend “unavailable” after max_fails consecutive failed requests within fail_timeout. It stays out of rotation for fail_timeout seconds, then NGINX tries again. The backup flag means “only use this server when every primary is down.”

Passive checks are free, but they only learn from real traffic — a dying backend takes down max_fails users before NGINX notices.

Active health checks — Angie, NGINX Plus, or the third-party module

upstream backend {
    zone backend 64k;
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    server 10.0.0.3:8080;
}

server {
    location / {
        proxy_pass http://backend;
        health_check interval=5s fails=2 passes=2 uri=/healthz;
    }
}

NGINX probes /healthz on each backend every five seconds, marks it down after two consecutive failures, brings it back after two consecutive passes. Users never see a broken request from a dead backend. Available natively in Angie (free), in NGINX Plus (paid), or via the nginx_upstream_check_module third-party module.

Keepalive: The Single Biggest Performance Win

Without keepalive, NGINX opens a new TCP connection to your backend on every request. Three-way handshake, slow start, all of it. With keepalive, it pools connections and reuses them. On HTTP/1.1 to a PHP-FPM or Node.js backend the latency improvement is dramatic — often 30 to 60 percent off median response time:

upstream backend {
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    keepalive 64;                # Cache up to 64 idle connections per worker
    keepalive_requests 1000;     # Recycle each connection after 1000 requests
    keepalive_timeout 60s;       # Close idle connections after 60s
}

server {
    location / {
        proxy_pass http://backend;
        proxy_http_version 1.1;          # Required for keepalive
        proxy_set_header Connection "";  # Clear hop-by-hop header
    }
}

The proxy_http_version 1.1 and the empty Connection header are non-negotiable — without them, NGINX still opens a fresh TCP connection per request. This is the single most common reason “my NGINX load balancer feels slow” turns out to have a one-line fix.

Failover: Graceful Degradation Patterns

Real failover is more than just “if backend dies, send traffic elsewhere.” You usually want a tiered model — primary backends serve normally, secondary backends only kick in when the primaries are exhausted:

upstream backend {
    server 10.0.0.1:8080;            # Primary
    server 10.0.0.2:8080;            # Primary
    server 10.0.0.3:8080 backup;     # Secondary, only used if both primaries are down
    server fallback.example.com:443 backup;  # Off-cluster emergency
}

Combine with passive health checks and you have a system that survives one or two backend failures without operator intervention. Combine with active health checks and it does so without a single user-visible error.

Sticky Sessions Without Storing State Anywhere Sensible

Sometimes your application really does need session affinity (you should fix this in the app, but that is a longer conversation). Two patterns:

# IP-hash sticky (free, fragile behind NAT)
upstream backend {
    ip_hash;
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
}

# Cookie-based sticky (NGINX Plus / Angie)
upstream backend {
    server 10.0.0.1:8080;
    server 10.0.0.2:8080;
    sticky cookie srv_id expires=1h domain=.example.com path=/;
}

Cookie sticky is the right answer if your platform supports it — survives NAT, works across mobile carriers, easy to invalidate. The Angie sticky module is included in the free Angie packages; in open-source NGINX you need the third-party nginx-sticky-module-ng.

TCP and UDP Load Balancing (Stream Module)

NGINX is not just an HTTP load balancer. The stream module handles arbitrary TCP and UDP — load balance MySQL replicas, Redis sentinels, DNS resolvers, gRPC, Postfix SMTP, anything that speaks TCP or UDP:

stream {
    upstream mysql_read {
        least_conn;
        server 10.0.0.10:3306;
        server 10.0.0.11:3306;
        server 10.0.0.12:3306;
    }

    server {
        listen 3306;
        proxy_pass mysql_read;
        proxy_connect_timeout 5s;
    }

    # UDP example: DNS load balancing
    upstream dns_servers {
        server 10.0.0.20:53;
        server 10.0.0.21:53;
    }

    server {
        listen 53 udp reuseport;
        proxy_pass dns_servers;
        proxy_responses 1;
    }
}

Same algorithms (round-robin, least_conn, hash), same weighting, same passive health checks. UDP load balancing with reuseport is the foundation of HTTP/3 in a multi-instance NGINX deployment — see our HTTP/3 on NGINX guide for the QUIC variant.

Picking the Right Algorithm Quickly

Stateless API backends, uniform response times — round-robin (default).
Variable response times, file downloads, long polls — least_conn.
Session affinity required, every client has a unique IP — ip_hash (or cookie sticky if you can).
Cache tier keyed on URL — hash $request_uri consistent.
Large upstream pool (10+ backends) — random two least_conn.

Logging That Tells You Which Backend Served What

Add the upstream variables to your access log so post-mortems are actually possible:

log_format upstreamlog '$remote_addr $upstream_addr "$request" '
                       '$status $body_bytes_sent '
                       'rt=$request_time uct=$upstream_connect_time '
                       'urt=$upstream_response_time';

access_log /var/log/nginx/upstream.log upstreamlog;

$upstream_addr tells you which backend handled the request. $upstream_response_time tells you how long the backend took (independent of network round-trip time to the client). Both are invaluable when one of the three backends is slowly going wrong.

Common Mistakes That Bite Eventually

Forgetting proxy_http_version 1.1 — kills keepalive silently and adds 30+ ms of TCP setup to every request.
No timeouts — a hung backend can hold an NGINX worker for minutes. Set proxy_connect_timeout, proxy_read_timeout, and proxy_send_timeout explicitly.
Sticky sessions without health checks — once a backend dies, every user it was stuck to loses their session. Sticky + health check is the minimum.
Round-robin on heterogeneous hardware — old box becomes the bottleneck. Use weight or least_conn.
Passive health checks only on critical services — pay the licence (or use Angie, which has it free) and get active health checks.

Frequently Asked Questions

Which NGINX load balancing algorithm is best for WordPress?

For PHP-FPM backends behind NGINX, least_conn is usually the right answer. WordPress requests vary wildly in cost (a static page vs a logged-in WooCommerce cart vs a search query), so distributing by active connection count keeps each backend honest. Round-robin works fine until one backend gets a slow query.

Do passive health checks work in free open-source NGINX?

Yes — max_fails and fail_timeout on the upstream server directives are in the free, open-source NGINX. They learn from real traffic, so a dying backend takes down a few users before being marked unavailable. Active health checks (probing /healthz on a schedule) require NGINX Plus, Angie (free), or a third-party module.

How do I do session affinity / sticky sessions?

Two options. ip_hash is free and works when each client has a unique IP, but breaks behind NAT and on mobile carrier CGNAT. Cookie-based sticky is much more robust but needs the sticky directive, which is in NGINX Plus, Angie, or the nginx-sticky-module-ng third-party module.

What is the right keepalive value for an NGINX upstream?

A reasonable rule of thumb: keepalive equal to the maximum number of concurrent connections you expect a single NGINX worker to hold open to that upstream, divided by 2. For most teams that’s somewhere between 32 and 128. Always pair with proxy_http_version 1.1 and proxy_set_header Connection “” — without those, keepalive does nothing.

Can I load balance TCP and UDP with NGINX, not just HTTP?

Yes — the stream module handles arbitrary TCP and UDP. Same algorithms, same weights, same passive health checks. Common uses: MySQL read replicas, Redis sentinels, DNS resolvers, gRPC. UDP load balancing with reuseport is the foundation of HTTP/3 in a multi-instance NGINX deployment.

Does Angie support the same load balancing features as NGINX?

Yes, and more. Angie is a free NGINX fork from the original NGINX developers. It bundles features that are NGINX Plus paid extras in upstream NGINX: active health checks, cookie-based sticky sessions, a JSON status API for monitoring. Configuration syntax is identical, so existing nginx.conf files work unchanged.

How do I tell which backend served a given request?

Add $upstream_addr to your NGINX log_format. You can also expose it as a response header for debugging with add_header X-Upstream $upstream_addr — useful during deployments to confirm a specific backend is or isn’t getting traffic. Remove the header in production once you’ve finished debugging; you don’t want to leak backend IPs to clients.

NGINX Reverse Proxy Configuration Guide — proxy_pass, caching, and security headers for the front end of the load balancer.
NGINX Rate Limiting Guide — protect each backend from abuse before the load balancer ever reaches them.
HTTP/3 on NGINX for Debian and Ubuntu — the UDP load balancing piece that makes HTTP/3 scale horizontally.
Angie Web Server: The Complete Guide — free active health checks, sticky sessions and JSON metrics.
WordPress NGINX + PHP-FPM Configuration Guide — the canonical workload behind these upstream blocks.

NGINX Load Balancing: Upstream Config, Health Checks and Failover