Suddenly experiencing huge wait times before requests hit our server

March 15, 2016 1.6k views
Nginx Ruby on Rails Scaling Load Balancing Networking Server Optimization

3 or 4 days ago I noticed that our app was running much slower than it had been even one day before, and we kept getting alerts from New Relic saying the site was offline, though it wasn't - I assume that's because the pings were taking so long to come back that New Relic thought it was.

In New Relic Synthetics I can see waits of sometimes upwards of 40 seconds that are just classified as 'waiting'. By monitoring the output of tail -f /var/log/{apache2,httpd,nginx}/{access,error}.log I can see that those times correspond to how long it takes incoming requests to hit the server (some of them - some traffic is still served normally/promptly, though it seems like more requests are getting hung for 10+ seconds before being handled than actually being served in what had been the customary amount of time prior to a few days ago).

This is a Rails app running Unicorn & Nginx on a 2GB droplet. I have previously optimized my Unicorn configuration according to this article: [](http://), and our memory usage tends to hover around 50%. Nonetheless, I also noticed a bunch of these in that same log output from above:

2016/03/15 06:52:36 [error] 9460#0: *1110377 connect() to unix:/tmp/unicorn.streamfeed.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client:, server:, request: "GET /watch HTTP/1.1", upstream: "http://unix:/tmp/unicorn.streamfeed.sock:/watch", host: "", referrer: ""

Which, without having any better ideas, I figured might mean we're getting more traffic than our number of Unicorn workers can handle (our traffic has been increasing steadily and we just had our busiest day ever the other day). So I upgraded our droplet to 4GB and doubled the number of Unicorn workers, and those errors no longer appear in the logs, but the app is running even slower than before.

I'm pretty much clueless how to proceed here, except I know the one thing I haven't done that seems relevant is optimized Nginx. I've been reading about that but am overwhelmed about where/how to begin...if anyone can make a suggestion, or can offer another avenue to pursue, I'd be very grateful - obviously our site can't be taking 30-40 seconds to serve every single request, and I don't even know how to diagnose the problem, much less solve it.

This is my unicorn.rb:

root = "/home/deployer/apps/streamfeed/current"
working_directory root
pid "#{root}/tmp/pids/"
stderr_path "#{root}/log/unicorn.log"
stdout_path "#{root}/log/unicorn.log"

listen "/tmp/unicorn.streamfeed.sock"
worker_processes 11
timeout 1000

if ENV['RAILS_ENV'] == 'production' 
  require 'unicorn/worker_killer'

  max_request_min =  500
  max_request_max =  600

  # Max requests per worker
  use Unicorn::WorkerKiller::MaxRequests, max_request_min, max_request_max

  oom_min = (240) * (1024**2)
  oom_max = (260) * (1024**2)

  # Max memory size (RSS) per worker
  use Unicorn::WorkerKiller::Oom, oom_min, oom_max

require ::File.expand_path('../config/environment',  __FILE__)
run Rails.application

(obviously I have unicorn-worker-killer installed)

and nginx.conf:

upstream unicorn {
  server unix:/tmp/unicorn.streamfeed.sock fail_timeout=0;

server {
  rewrite ^(.*)$1 permanent;

server {
  listen 80 default_server deferred;
  # server_name;
  root /home/deployer/apps/streamfeed/current/public;

  location ^~ /assets/ {
    gzip_static on;
    expires max;
    add_header Cache-Control public;
    location ~* \.(js|css)$ {
      add_header Access-Control-Allow-Origin *;

  location ^~ /fonts/ {
    gzip_static on;
    expires max;
    add_header Cache-Control public;
    location ~* \.(ttf|ttc|otf|eot|woff|svg|font.css)$ {
      add_header Access-Control-Allow-Origin *;

  try_files $uri/index.html $uri @unicorn;
  location @unicorn {
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header Host $http_host;
    proxy_redirect off;
    proxy_pass http://unicorn;
  error_page 500 502 503 504 /500.html;
  client_max_body_size 4G;
  keepalive_timeout 10;

Be the first one to answer this question.