Question

Looking for the best load balancing solution for a very busy wordpress site

Posted October 6, 2021 82 views
WordPressLoad BalancingKubernetes

Hi

I have read numerous articles and tutorials here regarding wordpress load balancing and I see a mixture of solutions ranging from traditional round robin multi-droplets behind a DO load balancer to using GlusterFS and docker swarm or using Kubernetes.

My current situation is getting out of the control of a single box. I have 150,000 members in a membership site and blog that can’t handle anything greater than 1000 requests per minute – this happens after email promoting a particularly “interesting” article.

I want to be able to load share the blog and maintain database access for the memberships - perhaps using DO database.

These types of situations are hard and time consuming to setup and test and I wondered if there are articles talking about the issues that I may be facing. Assets need to be shared so maybe using spaces as part of the solution would be prudent and the blog pages must be served by some round robin type balancer that is easily scaled. Additionally, the membership site is hampered by having chosen BuddyBoss as the platform .. The database activity is massive. I had to shut down the communications package included with BBoss because of number of slow queries swamping the 40 processors.

I am interested in possible Kubernetes solutions as they can be scaled inside scaling blocks and spaces – however I am interested most in knowing if there is a ‘best’ approach.

Thank you for any help and for suggesting articles.

Jack

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

×
Submit an Answer
1 answer

Jack,

Having worked on scaling these types of sites, including with WP and BuddyPress dozens of times, I can tell you that it will require a multi-faceted approach beyond just trying to horizontally scale processes using k8s orchestration.

As you’ve already experienced, your biggest bottleneck is the database. You’re going to need to look into using object and query caching, potentially at both the app and db level, to pull some strain off of the db. As well, you should analyze the db logs to see if there are any ways to optimize the database, either through a larger query cache or additional indexes.

At the end of the day, the key-value store architecture of WordPress can be challenging to scale, especially since the problems only burgeon over time.

To handle traffic spikes, you might want to look into reverse proxy caching in a micro-caching type of architecture, using something like Varnish. Even caching things for as little as 5 seconds can help you withstand exponentially more traffic just because most requests end up being cache hits. Also, memcached and Redis is your friend.

I don’t think you’re going to find a single article that gives you a beginning to end, tailor made solution for what you’re experiencing, unfortunately. However, it’s a very good problem to have!

  • Thank you for taking the time to respond. I found it very tricky to setup caching using Redis and even the Php object cache (which is still functioning). I had to turn Redis off because of membership logins. Logins are cached and eventually people were either unable to login or became another user when the did login. I wrote BBoss about the problem and they suggested I turn off redis.

    As you noted caching is extremely important on a site like this and I was eventually able to tune wp-optimize to provide image and script caching while not caching membership links.

    I am reading about Varnish now – but I am also find the kubernetes install worth a look. I am also wondering if using DO’s database could provide better caching of repeated querys.

    There is a solution somewhere here that will likely emerge. I appreciate any further input you might have …

    Jack

    • Orchestration is not the issue or resolution. Your architecture, whether deployed through k8s or static servers, will experience the same challenges. The main thing k8s will afford you is the ability to scale horizontally more quickly, and perhaps give you a “single concern” framework to clarify each layer of your stack.

      Fix your architecture first, then look into k8s to help with orchestration/deployment/scaling. Shifting over before you’ve resolved your architecture issue will just add more complexity and more potential sources of issues.

      When you use Redis or memcache you need to make sure you’re “tuning” it properly, and filtering out certain objects, such as authentication. It should work out of the box for most installations, but perhaps BuddyBoss is leveraging some proprietary cache mechanism.

      Refactor your static files so they get called to either nginx or varnish and never go through the wp instance. That will significantly reduce your calls.

      DO doesn’t have any “magic sauce” beyond your own capabilities to tune the database. There are some tuning tools that will allow you to review the long running queries log, as well as get statistics on cache hits, etc, to help you decide where you need to adjust query cache parameters, etc. DO also isn’t going to magically add indexes you might be missing, either. Start off by getting a list of all long running queries (start with over 5 seconds, then work down to over 1 seconds). If this doesn’t get you satisfactory performance, the next step would be creating followers, and balancing those queries. This shouldn’t be necessary, however - I’ve worked with databases with tables of 1b+ records that return complex joins in ms. It really is about identifying where the bottleneck is.

      You could move to k8s and fire up dozens of front end servers to distribute the load, but in the end, if they’re all having to query the DB, you don’t gain anything.

      Please understand the only reason I’m gently pushing back is because I think you’ll get the most dividends by focusing on the db. Tackle it first, then start looking at other issues.