Clustering adds high availability to your database by distributing changes to different servers. In the event that one of the instances fails, others are quickly available to continue serving.
Clusters come in two general configurations, active-passive and active-active. In active-passive clusters, all writes are done on a single active server and then copied to one or more passive servers that are poised to take over only in the event of an active server failure. Some active-passive clusters also allow SELECT operations on passive nodes. In an active-active cluster, every node is read-write and a change made to one is replicated to all.
In this guide, we will configure an active-active MariaDB Galera cluster. For demonstration purposes, we will configure and test three nodes, the smallest configurable cluster.
To follow along, you will need:
Three Ubuntu 16.04 servers, each with a non-root user with sudo
privileges and private networking, if it’s available to you.
For assistance setting up a user with these privileges, follow our Initial Server Setup with Ubuntu 16.04 guide.
For help setting up private networking, see How To Set Up And Use DigitalOcean Private Networking .
Once all of these prerequisites are in place, we’re ready to install MariaDB.
MariaDB 10.1 isn’t included in the default Ubuntu repositories, so we’ll start by adding the external Ubuntu repository maintained by the MariaDB project to all three of our servers.
Note: MariaDB is a well-respected provider, but not all external repositories are reliable. Be sure to install only from trusted sources.
First, we’ll add the MariaDB repository key with the apt-key
command, which apt will use to verify that the package is authentic.
- sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8
Once we have the trusted key in the database, we can add the repository. We’ll need to run apt-get update
afterward in order to include package manifests from the new repository:
- sudo add-apt-repository 'deb [arch=amd64,i386,ppc64el] http://nyc2.mirrors.digitalocean.com/mariadb/repo/10.1/ubuntu xenial main'
- sudo apt-get update
Note: You must run update
after adding the repository. Otherwise, you’ll install version 10.0 from the Ubuntu packages, which does not contain the Galera package.
Once the repositories are updated on all three servers, we’re ready to install MariaDB. One thing to note about MariaDB is that it originated as a drop-in replacement for MySQL, so in many configuration files and startup scripts, you’ll see mysql
rather than mariadb
. For consistency’s sake, we use mysql
in this guide where either could work.
Beginning with version 10.1, the MariaDB Server and MariaDB Galera Server packages are combined, so installing mariadb-server
will automatically install Galera and several dependencies:
- sudo apt-get install mariadb-server
During the installation, you will be asked to set a password for the MariaDB administrative user. No matter what you choose, this root password will be overwritten with the password from the first node once replication begins.
We should have all of the pieces necessary to begin configuring the cluster, but since we’ll be relying on rsync
in later steps, let’s make sure it’s installed.
- sudo apt-get install rsync
This will confirm that the newest version of rsync
is already available or prompt you to upgrade or install it.
Once we have installed MariaDB on each of the three servers, we can begin configuration.
Each node in the cluster needs to have a nearly identical configuration. Because of this, we will do all of the configuration on our first machine, and then copy it to the other nodes.
By default, MariaDB is configured to check the /etc/mysql/conf.d
directory to get additional configuration settings for from ending in .cnf
. We will create a file in this directory with all of our cluster-specific directives:
- sudo nano /etc/mysql/conf.d/galera.cnf
Copy and paste the following configuration into the file. You will need to change the settings highlighted in red. We’ll explain what each section means below.
[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
# Galera Provider Configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
# Galera Cluster Configuration
wsrep_cluster_name="test_cluster"
wsrep_cluster_address="gcomm://first_ip,second_ip,third_ip"
# Galera Synchronization Configuration
wsrep_sst_method=rsync
# Galera Node Configuration
wsrep_node_address="this_node_ip"
wsrep_node_name="this_node_name"
The first section modifies or re-asserts MariaDB/MySQL settings that will allow the cluster to function correctly. For example, Galera Cluster won’t work with MyISAM or similar non-transactional storage engines, and mysqld
must not be bound to the IP address for localhost. You can learn about the settings in more detail on the Galera Cluster system configuration page.
The “Galera Provider Configuration” section configures the MariaDB components that provide a WriteSet replication API. This means Galera in our case, since Galera is a wsrep (WriteSet Replication) provider. We specify the general parameters to configure the initial replication environment. This doesn’t require any customization, but you can learn more about Galera configuration options.
The “Galera Cluster Configuration” section defines the cluster, identifying the cluster members by IP address or resolvable domain name and creating a name for the cluster to ensure that members join the correct group. You can change the wsrep_cluster_name
to something more meaningful than test_cluster
or leave it as-is, but you must update wsrep_cluster_address
with the addresses of your three servers. If your servers have private IP addresses, use them here.
The “Galera Synchronization Configuration” section defines how the cluster will communicate and synchronize data between members. This is used only for the state transfer that happens when a node comes online. For our initial setup, we are using rsync
, because it’s commonly available and does what we need for now.
The “Galera Node Configuration” section clarifies the IP address and the name of the current server. This is helpful when trying to diagnose problems in logs and for referencing each server in multiple ways. The wsrep_node_address
must match the address of the machine you’re on, but you can choose any name you want in order to help you identify the node in log files.
When you are satisfied with your cluster configuration file, copy the contents into your clipboard, save and close the file.
On each of the remaining nodes, open the configuration file:
- sudo nano /etc/mysql/conf.d/galera.cnf
Paste in the configuration you copied from the first node, then update the “Galera Node Configuration” to use the IP address or resolvable domain name for the specific node you’re setting up. Finally, update its name, which you can set to whatever helps you identify the node in your log files:
. . .
# Galera Node Configuration
wsrep_node_address="this_node_ip"
wsrep_node_name="this_node_name"
. . .
Save and exit the file on each server. We’re almost ready to bring up the cluster, but before we do, we’ll want to make sure that the appropriate ports are open.
On every server, let’s check the status of the firewall:
- sudo ufw status
In this case, only SSH is allowed through:
OutputStatus: active
To Action From
-- ------ ----
OpenSSH ALLOW Anywhere
OpenSSH (v6) ALLOW Anywhere (v6)
Since only ssh traffic is permitted in this case, we’ll need to add rules for MySQL and Galera traffic. If we tried to start the cluster, we would fail because of firewall rules.
Galera can make use of four ports:
In our example, we’ll open all four ports while we do our setup. Once we’ve confirmed that replication is working, we’d want to close any ports we’re not actually using and restrict traffic to just servers in the cluster.
Open the ports with the following command:
- sudo ufw allow 3306,4567,4568,4444/tcp
- sudo ufw allow 4567/udp
Note: Depending on what else is running on your servers you might want restrict access right away. The UFW Essentials: Common Firewall Rules and Commands guide can help with this.
To begin, we need to stop the running MariaDB service so that our cluster can be brought online.
Use the command below on all three servers to stop mysql so that we can bring them back up in a cluster:
- sudo systemctl stop mysql
systemctl
doesn’t display the outcome of all service management commands, so to be sure we succeeded, we’ll use the following command:
- sudo systemctl status mysql
If the last line looks something like the following, the command was successful.
Output . . .
Aug 19 02:55:15 galera-01 systemd[1]: Stopped MariaDB database server.
Once we’ve shut down mysql
on all of the servers, we’re ready to proceed.
To bring up the first node, we’ll need to use a special startup script. The way we’ve configured our cluster, each node that comes online tries to connect to at least one other node specified in its galera.cnf
file to get its initial state. Without using the galera_new_cluster
script that allows systemd to pass the the --wsrep-new-cluster
parameter, a normal systemctl start mysql
would fail because there are no nodes running for the first node to connect with.
- sudo galera_new_cluster
When this script succeeds, the node is registered as part of the cluster, and we can see it with the following command:
- mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
Output+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 1 |
+--------------------+-------+
On the remaining nodes, we can start mysql
normally. They will search for any member of the cluster list that is online, so when they find one, they will join the cluster.
Start mysql
:
- sudo systemctl start mysql
We should see our cluster size increase as each node comes online:
- mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
Output+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 2 |
+--------------------+-------+
Start mysql
:
- sudo systemctl start mysql
If everything is working well, the cluster size should be set to three:
- mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'"
Output+--------------------+-------+
| Variable_name | Value |
+--------------------+-------+
| wsrep_cluster_size | 3 |
+--------------------+-------+
At this point, the entire cluster should be online and communicating. Before we test replication, however, there’s one more configuration detail to attend to.
Currently, Ubuntu and Debian’s MariaDB servers do routine maintenance such as log rotation as a special maintenance user. When we installed MariaDB, the credentials for that user were randomly generated, stored in /etc/mysql/debian.cnf
, and inserted into the MariaDB’s mysql
database.
As soon as we brought up our cluster, the password from the first node was replicated to the other nodes, so the value in debian.cnf
no longer matches the password in the database. This means anything that uses the maintenance account will try to connect to the database with the password in the configuration file and will fail on all but the first node.
To correct this, we’ll copy our first node’s debian.cnf
to the remaining nodes.
Open the debian.cnf
file with your text editor:
- sudo nano /etc/mysql/debian.cnf
The file should look something like:
[client]
host = localhost
user = debian-sys-maint
password = 03P8rdlknkXr1upf
socket = /var/run/mysqld/mysqld.sock
[mysql_upgrade]
host = localhost
user = debian-sys-maint
password = 03P8rdlknkXr1upf
socket = /var/run/mysqld/mysqld.sock
basedir = /usr
Copy the information into your clipboard.
On your second node, open the same file:
- sudo nano /etc/mysql/debian.cnf
Despite the warning at the top of the file that says “DO NOT TOUCH!” we need to make the change for the cluster to work. You can confidently delete the current information and paste the contents from the first node’s configuration. They should be exactly the same now. Save and close the file.
On your third node, open the same file:
- sudo nano /etc/mysql/debian.cnf
Delete the current information and paste the contents from the first node’s configuration. Save and close the file.
The mismatch wouldn’t have affected our ability to test replication, but it’s best to take care of early on to avoid failures later.
Note: After you’re done, you can test that the maintenance account is able to connect by supplying the password from the local debian.conf
as follows:
- sudo cat /etc/mysql/debian.cnf
Copy the password from the output. Then connect to mysql
:
- mysql -u debian-sys-maint -p
At the prompt, supply the password that you copied. If you can connect, all is well.
If not, verify the password in the file is the same as the first node, then substitute below:
- update mysql.user set password=PASSWORD('password_from_debian.cnf') where User='debian-sys-maint';
Repeat to test remaining nodes.
We’ve gone through the steps up to this point so that our cluster can perform replication from any node to any other node, known as active-active or master-master replication. Let’s test to see if the replication is working as expected.
We’ll start by making database changes on our first node. The following commands will create a database called playground
and a table inside of this called equipment
.
- mysql -u root -p -e 'CREATE DATABASE playground;
- CREATE TABLE playground.equipment ( id INT NOT NULL AUTO_INCREMENT, type VARCHAR(50), quant INT, color VARCHAR(25), PRIMARY KEY(id));
- INSERT INTO playground.equipment (type, quant, color) VALUES ("slide", 2, "blue");'
We now have one value in our table.
Next, we’ll look at the second node to verify that replication is working:
- mysql -u root -p -e 'SELECT * FROM playground.equipment;'
If replication is working, the data we entered on the first node will be visible here on the second:
Output+----+-------+-------+-------+
| id | type | quant | color |
+----+-------+-------+-------+
| 1 | slide | 2 | blue |
+----+-------+-------+-------+
From this same node, we can write data to the cluster:
- mysql -u root -p -e 'INSERT INTO playground.equipment (type, quant, color) VALUES ("swing", 10, "yellow");'
From the third node, we can read all of this data by querying the again:
- mysql -u root -p -e 'SELECT * FROM playground.equipment;'
Output +----+-------+-------+--------+
| id | type | quant | color |
+----+-------+-------+--------+
| 1 | slide | 2 | blue |
| 2 | swing | 10 | yellow |
+----+-------+-------+--------+
Again, we can add another value from this node:
- mysql -u root -p -e 'INSERT INTO playground.equipment (type, quant, color) VALUES ("seesaw", 3, "green");'
Back on the first node, we can verify that our data is available everywhere:
- mysql -u root -p -e 'SELECT * FROM playground.equipment;'
Output +----+--------+-------+--------+
| id | type | quant | color |
+----+--------+-------+--------+
| 1 | slide | 2 | blue |
| 2 | swing | 10 | yellow |
| 3 | seesaw | 3 | green |
+----+--------+-------+--------+
We’ve tested we can write to all of the nodes and that replication is being performed properly.
At this point, you should have a working three-node Galera test cluster configured. If you plan on using a Galera cluster in a production situation, it’s recommended that you begin with no fewer than five nodes.
Before production use, you may want to take a look at some of the other state snapshot transfer (sst) agents like “xtrabackup" which allows you to set up new nodes very quickly and without large interruptions to your active nodes. This does not affect the actual replication, but is a concern when nodes are being initialized.
Finally, if your cluster members are not on a private network, you will also need to set up SSL to protect your data as it moves between servers.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
This is great - thanks! Would you consider maybe writing a tutorial on adding more nodes (at a later date, once the nodes from this tutorial are in production)? The guide you linked to is definitely not as good as this one and a bit confusing, so it’d be great to have you explain it a bit better.
Thanks!
Hey Melissa,
i had problems when restarting the server: [ERROR] Can’t start server : Bind on unix socket: No such file or directory
I had to create the folder: /var/run/mysqld/
I was working on a fresh and clean Ubuntu 16.04 installation.
@melissaanderson can you do same tutorial but for MySQL? Thanks
Hi, Melissa! This is a great tutorial I’m working on my thesis related with Galera Cluster and this is helping me a lot.
Anyway, do you know how to measure delay time in replicate database process?
Great tutorial.
If you get stuck with ‘/usr/bin/galera_new_cluster: 8: [: unexpected operator’ when you run sudo galera_new_cluster, this will sort you out - https://jira.mariadb.org/browse/MDEV-13032.
Only issue is, after I run sudo galera_new_cluster, I’m now stuck with…
Job for mariadb.service failed because the control process exited with error code. See “systemctl status mariadb.service” and “journalctl -xe” for details.
Any advice, anyone?
Fantastic tutorial @MelissaAnderson !
Would be nice if you could also add instructions for adding new nodes. I know it’s very important to always keep an odd numbers of Galera node.
Thanks.
Starting the cluster gives me an error. If I remove this part of server 1, server 2 and server 3.
Wsrep_cluster_address = “gcomm: //138.197.51.40,67.207.68.152,188.166.206.22”
All start well but it is no longer cluster. I do not know how to put this part to work.
Error:
ERROR 2002 (HY000): Can’t connect to local MySQL server through socket ‘/var/run/mysqld/mysqld.sock’ (2 “No such file or directory”)
After following all the steps of a tutorial to make a cluster with galley and mariadb. Everything worked fine until I made a copy and then create a new drop.
No matter what I do I always encounter this error when restarting mysql.
If I never turn off the drop or never recover with a previous image does not give the bug.
But I turn off the drop to make it bigger. Or I just turn it off and on.
It gives me this error and I did not step from this step.
Did anyone else tube this error?
If you get this error, ERROR 1524 (HY000): Plugin ‘unix_socket’ is not loaded
after entering this command sudo galera_new_cluster in step 6
sudo service mysql stop
sudo mv /etc/mysql/conf.d/galera.cnf /root
sudo mysqld_safe --skip-grant-tables &
sudo mysql -uroot
https://mariadb.com/kb/en/mariadb/unix_socket-authentication-plugin/
mysql> INSTALL PLUGIN unix_socket SONAME ‘auth_socket’; mysql> quit
sudo mysqladmin shutdown
sudo cp /root/galera.cnf /etc/mysql/conf.d/galera.cnf
Repeat again step 6
Setting up mariadb-server-10.1 (10.1.19+maria-1~xenial) … Job for mariadb.service failed because the control process exited with error code. See “systemctl status mariadb.service” and “journalctl -xe” for details. invoke-rc.d: initscript mysql, action “start” failed. dpkg: error processing package mariadb-server-10.1 (–configure): subprocess installed post-installation script returned error exit status 1 No apport report written because the error message indicates its a followup error from a previous failure. dpkg: dependency problems prevent configuration of mariadb-server: mariadb-server depends on mariadb-server-10.1 (= 10.1.19+maria-1~xenial); however: Package mariadb-server-10.1 is not configured yet.
dpkg: error processing package mariadb-server (–configure): dependency problems - leaving unconfigured Errors were encountered while processing: mariadb-server-10.1 mariadb-server E: Sub-process /usr/bin/dpkg returned an error code (1)
$ apt install -f
did not help either
and
$ sudo galera_new_cluster $ mysql -u root -p -e “SHOW STATUS LIKE ‘wsrep_cluster_size’”
ERROR 1524 (HY000): Plugin ‘unix_socket’ is not loaded