Varnish SSL and hitch SSL termination super bad performance

March 2, 2017 989 views
Caching Scaling Ubuntu 16.04

I'm testing around with loader.io and noticed SSL (termination) in front of varnish serves very badly.

My Digital Ocean graph seems to show Disk I/O maxed at 1.21MB/s (isn't this incredibly low? My M4 SSD runs around 1.500MB/s which is not the same as 1.5 right?)

Cache-Control:  max-age=333s

I have setup Hitch as SSL termination like so:

sudo nano /etc/hitch/hitch.conf
# ADD:
## Basic hitch config for use with Varnish and Acmetool

# Listening
frontend = "[*]:443"
ciphers = "EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH"

# Send traffic to the Varnish backend using the PROXY protocol
backend = "[::1]:6086"
write-proxy-v2 = on

# List of PEM files, each with key, certificates and dhparams
pem-file = "/var/lib/acme/live/website.io/haproxy"
# END ADD

And varnish like so:

sudo nano /etc/varnish/acmetool.vcl
# ADD: 
backend acmetool {
   .host = "127.0.0.1";
   .port = "402";
}

sub vcl_recv {
    if (req.url ~ "^/.well-known/acme-challenge/") {
        set req.backend_hint = acmetool;
        return(pass);
    }
}
# END ADD


# include acmetool settings in default.vcl
cp /dev/null /etc/varnish/default.vcl
sudo nano /etc/varnish/default.vcl
# ADD:
vcl 4.0;
import std;

backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

sub vcl_recv {
    if (std.port(local.ip) == 80) {
        set req.http.x-redir = "https://" + req.http.host + req.url;
        return(synth(850, "Moved permanently"));
    }
}
sub vcl_synth {
    if (resp.status == 850) {
        set resp.http.Location = req.http.x-redir;
        set resp.status = 301;
        return (deliver);
    }
}

include "/etc/varnish/acmetool.vcl";
# END ADD

What's wrong with my setup and how can I improve performance?

3 Answers

Your graph shows the maximum your droplet used, not the maximum available to it.

@Woet That makes sense. Just a bit confusing with the CPU and the like showing 0 - 100%

My loader.io stats

  • @braminator

    What sort of test(s) are you running via loader.io (i.e. the specifics)?

    As far as the disk I/O and seeing 1.21MB/s, that is indeed showing the maximum being used. It's not a cap and it is possible to read/write to a degree higher. Unless you're seeing high wait time when looking at the output of top, then disk performance most likely isn't an issue.

    To gauge performance using Hitch, Varnish, and NGINX, I setup a 2GB Droplet.

    loader.io test setup

    • Test Type: Clients per Second
    • Clients: 1,000
    • Duration: 1 minute

    ...

    Hitch Configuration

    frontend = "[*]:443"
    ciphers  = "EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH"
    
    backend        = "[::1]:6081"
    write-proxy-v2 = on
    
    pem-file = "/var/lib/acme/live/domain.com/haproxy"
    

    Varnish systemd Service

    [Unit]
    Description=Varnish HTTP accelerator
    Documentation=https://www.varnish-cache.org/docs/4.1/ man:varnishd
    
    [Service]
    Type=simple
    LimitNOFILE=131072
    LimitMEMLOCK=82000
    ExecStart=/usr/sbin/varnishd -j unix,user=vcache -F -a '[::1]:6081,PROXY' -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,256m
    ExecReload=/usr/share/varnish/reload-vcl
    ProtectSystem=full
    ProtectHome=true
    PrivateTmp=true
    PrivateDevices=true
    
    [Install]
    WantedBy=multi-user.target
    

    Varnish Default VCL

    vcl 4.0;
    import std;
    
    # Default backend definition. Set this to point to your content server.
    backend default {
        .host = "127.0.0.1";
        .port = "8080";
    }
    
    include "/etc/varnish/acmetool.vcl";
    

    Acme Tool VCL

    backend acmetool {
       .host = "127.0.0.1";
       .port = "402";
    }
    
    sub vcl_recv {
        if (req.url ~ "^/.well-known/acme-challenge/") {
            set req.backend_hint = acmetool;
            return(pass);
        }
    }
    

    NGINX default Server Block

    server {
        listen 8080 default_server;
        listen [::]:8080 default_server;
    
        root /var/www/html;
    
        index index.html;
    
        server_name domain.com;
    
        location / {
            try_files $uri $uri/ =404;
        }
    }
    

    Using Default index.html

    It's important to keep in mind, repeatedly hitting such a basic HTML file is not a clear indication of performance, so the results in the screenshot below are not the best indicator of how well a setup is performing.

    http://imgur.com/a/f5Gp3

    • Thanks for testing along. I setup loader.io to test at 3000 request per second. The issue is with the timeouts. In my test you see almost 50% of the requests get a timeout. While I stress tested I time out myself while trying to visit the page.

      • @braminator

        At that rate, the timeouts you're experiencing are most likely due to saturation of the available ports. When you're system is using "stock" configuration without any system tweaks, this can happen often.

        You can modify /etc/sysctl.conf and tweak it to your liking to squeeze a little extra out of your Droplet, though if you were really taking on that many connections per second in production, you'd be far better off offloading connections using a load balancer to distribute the load.

        You'd still want to make some configuration changes, but distribution of load is going to take you much further than trying to configure an entire stack on a single or even a few droplets.

        sysctl.conf

        If you run:

        sysctl net.ipv4.ip_local_port_range
        

        You should see something like:

        net.ipv4.ip_local_port_range = 32768    60999
        

        ... which means that you're allowing roughly ~28k ports to be used for connections. If you are running 3,000 connections/second over 1 minute, that's 60 * 3000 or 180k connections.

        So what you can do is reduce 32768 to something such as 16000, which will allow another 16k ports to be used for connections. You'd still be a ways off from being able to take in 180k connections in a minute, but it's a step in the right direction.

        The next thing you can run is:

        sysctl net.ipv4.tcp_fin_timeout
        

        You should see something like:

        net.ipv4.tcp_fin_timeout = 60
        

        That number (60) refers to the TIME_WAIT state, which is the time the port will remain open before it's usable once again. With rapid connections like that, 60 seconds isn't going to cut it, so you'd need to reduce that down to a time that you're comfortable with.

        You can make these changes temporary by using:

        sysctl net.ipv4.ip_local_port_range="16000 60999"
        

        and

        sysctl net.ipv4.tcp_fin_timeout="30"
        

        If you need these to persist, you'd want to modify /etc/sysctl.conf and set the same values per option.

        There's quite a few options when it comes to tweaking/tuning your system, though as noted above, this is less than ideal for a single or two-server setup. Ideally you'd want to distribute the load across multiple servers.

Have another answer? Share your knowledge.