Migrating to a Scalable Wordpress Solution

July 11, 2018 177 views
WordPress Nginx Chef Terraform Docker MySQL Block Storage Ubuntu

TLDR; I want to manage and configure multiple LEMP stacks with Terraform and Chef/Ansible to host WordPress sites without losing the data when a droplet is destroyed.

Here are the questions that I am looking to have answered:

  • Is Terraform and Chef/Ansible the right direction to move in to help me manage multiple WordPress sites?
  • Is it better to update NGINX configuration by destroying and rebuilding a droplet with Terraform, or is it better to change the file across all servers with Chef/Ansible
  • If I use Terraform to configure NGINX, PHP, WordPress Config Files, etc... How do I secure the WordPress database and Uploads data so that when a Droplet is destroyed by Terraform, the site remains live without interruption
  • Is Block Storage or GlusterFS a better solution for managing the Data of 100's of different WordPress sites
  • Is Chef or Ansible better for this purpose

Here is the context for those questions:
I design and host WordPress websites in my area. For a lack of a deeper understanding of automation, I have been manually deploying the sites on DigitalOcean droplets and configuring them each independently. The only difference between the servers is the WordPress database and Uploads data, which I feel justifies the move to automate deployment through Terraform. I guess what I am looking for is a critical ear to interpret my plan and guide me in the right direction.

The plan is to set up a centralized management server which will host Terraform for infrastructure and docker automation, Chef (or Ansible) for configuration management, and an OpenVPN server for channeling secure access to WordPress servers. All servers are built on a LEMP stack and have carefully secured and optimized NGINX configurations.

The goal is to set it up so that when a client request a website designed for them I can easily spin up a new droplet through Terraform which containerizes the LEMP stack and configures everything (like NGINX files) with Chef or Ansible. I would also like to be able to update server files in one place and have it update across all servers. This is where I could really use some guidance.

Does it make sense to update configuration files (like NGINX or PHP) with Chef/Ansible or does it make sense to rebuild the droplets with Terraform so there is no configuration drift. If I do it with Terraform then the next problem that presents itself is figuring out a way to keep WordPress sites and data live when Terraform destroys a droplet. There are some pretty good tutorials on here about setting up GlusterFS or block storage, so I could potentially host all of the WordPress data on separate servers and use Terraform to manage the processing servers.

I appreciate the community taking the time to read this and any feedback or guidance would be greatly appreciated.

2 Answers
jarland MOD July 11, 2018
Accepted Answer

(Some mistakes were made in this reply, see comments that follow it)

Hello friend!

That sounds like a lot of fun any way you spin it. I'm going to do my best to provide answers to what I can, and hope that others feel compelled to weigh in on it as well. My perspective alone will not be enough to satisfy all of the advice that you would like, but I still have some thoughts I'd like to share.

My position on Terraform, Chef, and Ansible is "maybe" on all points. I think someone who uses them more intensively than I do might have reasons to lean in a more clear direction. I hope someone will see the opening I'm leaving there and run with it.

What I think I have the most insight on is this:

  • Making individual servers irrelevant, and their destruction non-destructive

Here's what I'm thinking, just as a rough idea:

  • 2x MySQL with master-master replication behind HAProxy
  • 2x LEMP stacks with data in sync via GlusterFS, both behind HAProxy (or our Load Balancer service)
  • Automation which spins up a new server, adds it to GlusterFS cluster, connects to MySQL HAProxy instance, then adds it to the load balancer serving the LEMP stacks

You can use our block storage for the data stores, but the volume will only be mounted to one droplet at a time. You would end up needing a volume for each stack with something like GlusterFS keeping the volumes in sync on each of those droplets. Destroying one droplet would inevitably make it's attached volume fall out of sync with the rest, requiring a fresh sync with a new droplet. Unless you need the additional storage, this seems to increase complexity for no particular gain by using storage volumes.

Now one might imagine having 2x data stores in the cluster and mounting them over network to multiple servers at once. In my experience, this is more functional in theory than in practice, and I have never personally found a protocol that satisfies for me the theory of how great that could potentially be. The most functional implementation I have had on that path has been a davfs docker volume.

I hope that at least helps you with some considerations on this :)

Kind Regards,
Jarland

  • Hey Jarland!

    Thanks for taking the time to write such a thoughtful response, I hope it serves to help many others with their questions regarding similar subjects! :D

    I have to admit I'm a bit confused about how the LEMP stacks and GlusterFS play together in this configuration to facilitate, as you really accurately put it "non-destructive destruction."

    From what I understand:

    • When a user visits one of the client's websites the load balancer serves them with the most available of the two LEMP stacks.
    • The LEMP stacks then are configured in a multi-site setup to serve the WordPress files from the GlusterFS cluster and the most available MYSQL database.

    Where I fall short of understanding is how the automated servers play a role in this with GlusterFS and how configuration management for NGINX, PHP, etc would be handled in this environment.

    Do the servers that are deployed through automation each host their own webserver for serving the client's website through the load balancer, or are they simply nodes which add more storage to the GlusterFS cluster.

    I really appreciate the response you've given me It has put me on a better course than I was going before!

    Cheers,
    Curtis

    • Glad I can at least help some! So about that GlusterFS, here's kind of a different outline with more focus on that, and a course correction from my previous reply:

      Top level: Load Balancer
      Under the LB:

      • LEMP server with /var/www set up as GlusterFS directory
      • LEMP server with /var/www set up as GlusterFS directory

      Under another LB:

      • MySQL master
      • MySQL secondary master (master-master replication)

      Outside of the LB:

      • Storage server with /sync set up as GlusterFS directory
      • Storage backup server with /sync set up as GlusterFS directory

      The LEMP servers would be connected together with the storage GlusterFS cluster, and then perhaps something like a chef script would add a new LEMP server into the cluster by pushing your current Nginx configuration to it, and then adding itself to the GlusterFS storage group. The exact steps to that are a bit relative, but something similar to this:

      https://www.digitalocean.com/community/tutorials/how-to-create-a-redundant-storage-pool-using-glusterfs-on-ubuntu-servers

      Now when I wrote my previous reply, I was thinking in terms of GlusterFS being a multiple direction sync, not being a remote mount with sync. You'll have to forgive me here, I haven't used Gluster in some time and my memory failed me for a bit. It is a remote mount. This kind of nullifies my original thought that a block storage volume would only add confusion, it's not that bad when each LEMP server doesn't need it's own. You were thinking more in the right direction than I was there, apologies. I'm going to make it a point to build a GlusterFS cluster again this weekend as a refresher.

      One key thing to note that my memory does serve me well on: GlusterFS is not fast. I really wouldn't do geographic redundancy here, Gluster with high latency can be rough. I also would not use it if you expect constant writes and need the sync to have absolutely no delay. A GlusterFS sync should be fine for someone writing a blog post every day with a few pictures, bad for someone who automates a thousand posts per hour with 1500 pictures added (I've seen this, it's not pretty).

      by Justin Ellingwood
      GlusterFS is a technology that allows you to create pools of storage that are accessible from the network. Using this software, in this article we will discuss how to create redundant storage across multiple servers in order to ensure that your data is available regardless of if one server goes down.
      • I guess what's really confusing me is whether the 2 LEMP servers you mentioned would be set up as the primary webservers for all of the clients, or whether that was just an example with two clients, and each webserver would be hosted on it's own LEMP stack.

        Let's say there are 50 different websites each with their own unique content. There would of course be the load balanced MYSQL servers where the WordPress databases are stored for each site. What would the rest of the infrastructure look like in terms of LEMP stacks and storage servers?

        1. Would there be 50 servers which each host their own LEMP stack and serve as a member of the GlusterFS storage cluster
        2. Or would there be 2 identical LEMP servers each with 50 server blocks and a cluster of storage servers which scale to address space needs. (similar to this here)

        In the first example each client/site is dedicated their own server and the infrastructure is scaled on the individual client level. I'm not exactly confident how storage works in this configuration.

        In the second example there are two primary webservers which handle all of the request and a cluster of GlusterFS nodes (or two servers with redundant block storage attached) which store the data. In this configuration the infrastructure is divided into compute servers and storage servers which are scaled based on the overall usage of all clients.

        Both of them have their benefits, but I'm curious which one you are referring to.

        Again, thank you so much for the help you have already given me and I would love to hear about how setting up GlusterFS turns out for you :D

        Cheers,
        Curtis

Have you considered Docker at all? Its pretty awesome and solves your specific use case pretty nicely. I also host quite a few wordpress sites + a number of more complex wordpress installs that utilize things like decoupled react frontends via rest api. We also manage hosting for fortune 500 clients and having a site go down or losing data is simply not an option. It could cost some of our clients hundreds of thousands if not millions of dollars if their site went down for a day or we lost any data. We set up Wordpress to operate as closely to a 12 factor application as possible (https://12factor.net).

We also utilize docker in our Development which allows us to have 100% parity between our development and production environments. When adding things like plugins/themes we always add those to the filesystem locally and build them into the docker image rather than uploading via the wp-admin. Plugin & theme updates and additions are all handled in our local dev environment and fully tested before pushing to staging and eventually production. We generally disable plugin/theme install and updating inside the wp-admin as well.

Images / Videos / etc are offloaded to Digital Ocean's New spaces storage using the iLab media cloud plugin. Historically we always used Amazon S3 for this but that can get pretty expensive. Spaces + a good CDN like imgix or DO's upcoming Spaces CDN functionality is all you need to make sure uploads aren't lost as a result of a server or container going down and you get the added benefit of globally available super fast image loading.

We currently utilize a set of external mysql servers setup with master/master replication behind a load balancer just as Jarland does above. We also have an internal project that uses a MariaDB Cluster within the Docker environment but aren't 100% keen on running production Databases within docker just yet.

Rancher has been our docker orchestration platform of choice up to this point. Its a pretty incredible system that handles a lot of the tedious devops stuff and is really easy to setup and use. The ability to add custom "Application Stacks or Templates" within the catalog that can be launched quickly by developers without devops knowledge. Rancher also has a full rest api which allows you to literally automate everything.

Digital Ocean's upcoming Kubernetes release will make docker container management even easier and will be able to fulfill your requests Out of the box as it will integrate DO's object storage, spaces, load balancers, DNS, etc for a complete managed experience. You'll simply launch your apps via Kubernetes YAML files and they will handle the management of the Kubernetes cluster. Much easier than messing with chef, ansible, glusterfs, etc. You also get the added benefit of being able to quickly launch highly available apps with Letsencrypt ssl certs, monitoring, and more across $5 DO droplets and autoscale as traffic / server load increases. You can literally have a $15 3 x node cluster with this method. The ability to both automate all the things and manage your costs more granularly are the driving factors behind our Agency moving to Docker almost exclusively.

  • Hey blakeevanmoor!

    Thank you for your response.

    I didn't consider docker for configuration management, but after you mentioned it I researched into it a bit more and it looks like a promising way to at least add a better level of security and assurance to the droplet management process.

    Configuration automation was seeming like the route to go for managing 100's of different WordPress sites, but the more I think about it, you can't really do server-wide configuration updates for NGINX, PHP, or WordPress because each site has their own unique plugins/themes which depend on certain versions of PHP or WordPress to function properly. If there is some way to get around this by something like containerizing NGINX and PHP separately I would be super excited to hear it :D

    It's making more sense to use Terraform to manage the deployment of droplets provisioned with firewall rules, user credentials, docker, and a LEMP stack, and then use docker to containerize the LEMP stack for each individual WordPress site.

    In this configuration each website would have it's own droplet with a containerized LEMP stack where MYSQL is served by load-balanced master-master MariaDB servers. I would still have to update PHP/NGINX configurations on a per-site basis but at least the server setup is automated through Terraform, and there is 100% parity between environments to minimize the risk of downtime.

    I'm not sure this is the most optimal and secure way to do it, but my limited understanding of docker is the bottleneck in limiting my ability to find a highly available, redundant, and easily scalable version of this.

Have another answer? Share your knowledge.