Thoughts on high availability

March 2, 2015 998 views

I have a DO hosted solution that allows members of client organizations to sign-in for events (among other things). Having a system down while an even is starting causes a number of problems, so I have opted to configure redundant servers in two new york locations. The application is using a LAMP stack and I have master-master DB replication working between servers with no issue. The challenge now is to find the best strategy for maintaining a users session across two servers.

Using the memcache session handler for PHP would allow sessions to be persistent across both servers. In normal operations the browser can go to either address supplied by the A record and that server will get current session data from memcach. In a fail over scenario the browser will encounter an IP address that fails for the domain and should then try the other IP address supplied in the DNS lookup. The browser should continue to use that IP address until it expires and a new one is requested.

Using HAProxy and NGINX as a proxy to connect the user to the correct server is also an another viable option. With this approach HAProxy and NGINX are on both servers and will proxy the request to the correct server for the user's session. In a failover scenario we are dependent on the browser using the other supplied IP address and then knowing not to proxy to the other server based on a failed previous request or lack of heartbeat.

I am interested in thoughts on which approach is best from an overall performance and failover perspective. Note that the redundancy is for failover and not for load distribution. Either server could currently support the full client load.

It seems that the proxy based approach is slightly more complicated in that we depend on the browser to potentially fail over to the alternate IP address AND we depend on the proxy knowing that the other server is not available.

Both approaches add overhead in the form of additional connections.

There are almost always surprises beyond what is documented, so I'm looking for anybody with thought and/or experience to share.


P.S. Could we possible get a keyword for High Availability?

1 Answer

failover is really hard, there is not 100% viable option in my opinion, but maybe you can try this

Use memcached for sharing sessions, and use cloudflare as NS, create a monitoring service (any language you know is ok) and access cloudflare via API. Now Cloudflare DNS changes are instant, not like most DNS servers that requires to propagate record changes, so with your monitoring script ask cloudflare to change your A record from fail server IP to backup server IP, or if you want to use both servers as load balancing, just ask cloudflare to remove failed server IP from A record. If the server comes up, add the ip back to A record again.

Have another answer? Share your knowledge.