I’m getting a lot of 504 gateway timeout when sending requests to my load balancer, especially when POSTing data.

The LB is hit with about 30 request/second, well under the 200 request/second mentionned here.

This doesn’t seem to be correctly shown as a 5xx error code in the graphs of the load balancer, as the number of 5xx shown is inferior to the number I generate just by trying the endpoint manually.

When the request does get through, response time is always under 150ms.

I’m seeing 0 in the Queue metric for the droplets, and I have about 20 concurrent connections according to the LB.

Behind the LB is 3 2vCPU 2GB droplets running nginx as a reverse proxy to NodeJS (pm2 clusters with 1 instance per vCPU, so 6 instances of the application in total).

My questions are :
1 - What steps would you recommend I follow to isolate the cause of the 504 errors
2 - Why does the Queue metric never move from 0 even as I have tons of timeouts

I remain available for any clarification deemed necessary
Thanks in advance

Antoine

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

×
3 answers

Hello,

Even though I believe that this is more of an issue with the backend servers, have you tried increasing the timeout on the load balancer itself and see if the error persists?

Also have you checked the logs on your backend servers? I would recommend starting with the Nginx logs, there might be some useful information there.

Regards,
Bobby

Hey @bobbyiliev,

I’m using DigitalOcean’s load balancers, not hosting it myself, there’s no timeout config (except for healthchecks) that I’m aware of?

In the nginx access log I can see a lot of 499 and 504. The 499 make sense considering the minute long timeout, people probably moved on to another page and their browser cancels the POST request. I still can’t figure out why there’s such a delay, but that does sound like its my problem in the backend.

Does anyone know how to get the Queue metric to reflect something in DO GUI? I read https://www.digitalocean.com/community/questions/do-load-balancer-what-is-the-queue but I don’t see any concrete answer to my question in there.

Thanks,
Antoine

  • hey @antoinetheriaultbrunet sounds like we are in the same situation. We were exec_mode : "cluster" with pm2 (http://pm2.keymetrics.io/docs/usage/cluster-mode/), and were experiencing the same DigitalOcean load balancer issues you described. We found out that enabling that exec_mode: "cluster" was the culprit.

    We’re also experiencing some other issues using instances: "max", and not sure if it’s related, but in general it’s a difficult thing to debug and test, as we can only reactively do so (servers getting hit w/ traffic, and experiencing downtime).

    I was wondering if you had any advice, recommendations, or updates on how you went about this?

I’m having the same issue with a PHP application with maxexecutiontime set to 300s. The load balancer returns an HTTP 504 after exactly 60s.

Neither the interface nor the API for load balancers provide a way to change this limit.

Arguably a web application shouldn’t take more than 60s to answer, but of course there is always this very special case…

Submit an Answer