Trying out nginx as a load balancer in a container environment

Saturday, January 28 2023 · Lesezeit: 4 Minuten · 842 Wörter

I’m currently playing around with load balancing traffic from a proxy server to multiple worker services. Everything is done in a plain docker environment so I can only use a compose file. This was my first approach:

---
version: '3.9'

services:
  web:
    image: nginx
    deploy:
      replicas: 4

  proxy:
    image: nginx
    ports:
      - 8080:80
    volumes:
      - type: bind
        source: ./nginx.conf
        target: /etc/nginx/conf.d/default.conf
        read_only: true

This spawns four nginx containers with nothing but the default “it works” page. Good enough for me as I’m not interested in the content, but the load balancing result. Another nginx container get’s a special config mounted:

server {
    listen 80;
    listen [::]:80;

    access_log off;
        
    location / {
        proxy_pass http://web;
    }
}

After doing a docker compose up I was able to query the page. But after inspecting the logs printed to stdout I noticed something. The proxy container would only forward traffic to two web containers:

scale-test-web-2 | 172.24.0.3 - - [28/Jan/2023:14:42:36 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-2 | 172.24.0.3 - - [28/Jan/2023:14:42:39 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-3 | 172.24.0.3 - - [28/Jan/2023:14:42:42 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-2 | 172.24.0.3 - - [28/Jan/2023:14:42:45 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-3 | 172.24.0.3 - - [28/Jan/2023:14:42:48 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-2 | 172.24.0.3 - - [28/Jan/2023:14:42:51 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-3 | 172.24.0.3 - - [28/Jan/2023:14:42:54 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-2 | 172.24.0.3 - - [28/Jan/2023:14:42:57 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-3 | 172.24.0.3 - - [28/Jan/2023:14:43:01 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”

Well that’s odd. When entering the proxy containers shell and doing a nslookup (installing the dnsutils package via apt) it returned four IPs. I thought that this might be a race condition: When starting the stack the proxy container would resolve the web hostname but it would only receive the IPs from the service containers that are already started. If the remaining containers start later they’d never be used.
To test my theory I set the replica count to 30 and started the stack. Again: Only two service containers would be used to load balance the requests.

So I downed the stack again and added the following config to the proxy service:

depends_on:
  - web

This ensures that the proxy service is started after the web containers are started. Note: This does not ensure that the web containers are operational in any way before the proxy would start! But for my test setup this was okay. Now the proxy service load balances requests to multiple web containers:

scale-test-web-4 | 172.24.0.6 - - [28/Jan/2023:14:53:05 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-3 | 172.24.0.6 - - [28/Jan/2023:14:53:08 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-1 | 172.24.0.6 - - [28/Jan/2023:14:53:11 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-2 | 172.24.0.6 - - [28/Jan/2023:14:53:14 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-4 | 172.24.0.6 - - [28/Jan/2023:14:53:17 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-3 | 172.24.0.6 - - [28/Jan/2023:14:53:20 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-1 | 172.24.0.6 - - [28/Jan/2023:14:53:23 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”
scale-test-web-2 | 172.24.0.6 - - [28/Jan/2023:14:53:26 +0000] “GET / HTTP/1.0” 200 615 “-” “curl/7.85.0” “-”

Okay this looks like round robin DNS based load balancing. So far so good. What happens when I stop one of the web containers? First nothing happens, until the proxy tries to connect to the downed container. The connection hangs until my curl runs into a timeout. Can I somehow tell nginx to timeout before? Sure! By adding a proxy_connect_timeout 1; to the location config part:

location / {
    proxy_pass http://web;
    proxy_connect_timeout 1;
}

Now after a second after an unsuccessfull connection an error is returned to my browser. That’s … uncool. I’d whish that nginx would try another upstream before reporting an error to the requester. But what’s even worse is that the broken upstream is not taken out from the rotation. So after three more requests the game repeats. Can nginx be blamed for this? Yes and no. Nginx has a module (🖇️ 🔐) for distributing request to upstream servers. However this can only be used if you know the IPs beforehand. And in the container world this is not always the case. The paid nginx subscription also provides health checks and more stuff.

So what’s the conclusion? Yes, nginx can be used as a load balancer. DNS based load balancing is probably the oldest way to balance requests. But for modern applications you probably want more features:

Dynamically scaling up or down the upstream/worker containers
More precise routing options for example for A/B tests
Health checks
Clustered load balancers so the proxy is not the single point of failure
Monitoring and readiness APIs

In the end it was fun to play with it and try it out.

Du hast einen Kommentar, einen Wunsch oder eine Verbesserung? Schreib mir doch eine E-Mail! Die Infos dazu stehen hier.

🖇️ = Link zu anderer Webseite
🔐 = Webseite nutzt HTTPS (verschlüsselter Transportweg) Zurück

veloc1ty · ʎʇƖɔolǝʌ

Trying out nginx as a load balancer in a container environment