This article covers a version of Ubuntu that is no longer supported. If you are currently operate a server running Ubuntu 12.04, we highly recommend upgrading or migrating to a supported version of Ubuntu:
Reason: Ubuntu 12.04 reached end of life (EOL) on April 28, 2017 and no longer receives security patches or updates. This guide is no longer maintained.
See Instead: This guide might still be useful as a reference, but may not work on other Ubuntu releases. If available, we strongly recommend using a guide written for the version of Ubuntu you are using. You can use the search functionality at the top of the page to find a more recent version.
Redundancy and high availability are necessary for a very wide variety of server activities. Having a single point of failure in terms of data storage is a very dangerous configuration for any critical data.
While many databases and other software allows you to spread data out in the context of a single application, other systems can operate on the filesystem level to ensure that data is copied to another location whenever it is written to disk. A clustered storage solution like GlusterFS provides this exact functionality.
In this guide, we will be setting up a redundant GlusterFS cluster between two 64-bit Ubuntu 12.04 VPS instances. This will act similar to an NAS server with mirrored RAID. We will then access the cluster from a third 64-bit Ubuntu 12.04 VPS.
A clustered environment allows you to pool resources (generally either computing or storage) in order to allow you to treat various computers as a single, more powerful unit. With GlusterFS, we are able to pool the storage of various VPS instances and access them as if it were a single server.
GlusterFS allows you to create different kinds of storage configurations, many of which are functionally similar to RAID levels. For instance, you can stripe data across different nodes in the cluster, or you can implement redundancy for better data availability.
In this guide, we will be creating a redundant clustered storage array, also known as a distributed file system. Basically, this will allow us to have similar functionality to a mirrored RAID configuration over the network. Each independent server will contain its own copy of the data, allowing our applications to access either copy, which will help distribute our read load.
There are some steps that we will be taking on each VPS instance that we are using for this guide. We will need to configure DNS resolution between each host and setting up the software sources that we will be using to install the GlusterFS packages.
In order for our different components to be able to communicate with each other easily, it is best to set up some kind of hostname resolution between each computer.
If you have a domain name that you would like to configure to point at each system, you can follow this guide to set up domain names with DigitalOcean.
If you do not have a spare domain name, or if you just want to set up something quickly and easily, you can instead edit the hosts file on each computer.
Open this file with root privileges on your first computer:
sudo nano /etc/hosts
You should see something that looks like this:
127.0.0.1 localhost gluster2
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Below the local host definition, you should add each VPS’s IP address followed by the long and short names you wish to use to reference it.
It should look something like this when you are finished:
<pre> 127.0.0.1 localhost hostname <span class=“highlight”>first_ip</span> gluster0.droplet.com gluster0 <span class=“highlight”>second_ip</span> gluster1.droplet.com gluster1 <span class=“highlight”>third_ip</span> gluster2.droplet.com gluster2
::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters </pre>
The gluster0.droplet.com
and gluster0
portions of the lines can be changed to whatever name you would like to use to access each droplet. We will be using these settings for this guide.
When you are finished, copy the lines you added and add them to the /etc/hosts
files on your other VPS instances. Each /etc/hosts
file should contain the lines that link your IPs to the names you’ve selected.
Save and close each file when you are finished.
Although Ubuntu 12.04 contains GlusterFS packages, they are fairly out-of-date, so we will be using the latest stable version as of the time of this writing (version 3.4) from the GlusterFS project.
We will be setting up the software sources on all of the computers that will function as nodes within our cluster, as well as on the client computer.
We will actually be adding a PPA (personal package archive) that the project recommends for Ubuntu users. This will allow us to manage our packages with the same tools as other system software.
First, we need to install the python-software-properties
package, which will allow us to manage PPAs easily with apt:
sudo apt-get update
sudo apt-get install python-software-properties
Once the PPA tools are installed, we can add the PPA for the GlusterFS packages by typing:
sudo add-apt-repository ppa:semiosis/ubuntu-glusterfs-3.4
With the PPA added, we need to refresh our local package database so that our system knows about the new packages available from the PPA:
sudo apt-get update
Repeat these steps on all of the VPS instances that you are using for this guide.
In this guide, we will be designating the two of our machines as cluster members and the third as a client.
We will be configuring the computers we labeled as gluster0
and gluster1
as the cluster components. We will use gluster2
as the client.
On our cluster member machines (gluster0 and gluster1), we can install the GlusterFS server package by typing:
sudo apt-get install glusterfs-server
Once this is installed on both nodes, we can begin to set up our storage volume.
On one of the hosts, we need to peer with the second host. It doesn’t matter which server you use, but we will be preforming these commands from our gluster0 server for simplicity:
sudo gluster peer probe gluster1.droplet.com
peer probe: success
This means that the peering was successful. We can check that the nodes are communicating at any time by typing:
sudo gluster peer status
Number of Peers: 1
Hostname: gluster1.droplet.com
Port: 24007
Uuid: 7bcba506-3a7a-4c5e-94fa-1aaf83f5729b
State: Peer in Cluster (Connected)
At this point, our two servers are communicating and they can set up storage volumes together.
Now that we have our pool of servers available, we can make our first volume.
Because we are interested in redundancy, we will set up a volume that has replica functionality. This will allow us to keep multiple copies of our data, saving us from a single point-of-failure.
Since we want one copy of data on each of our servers, we will set the replica option to “2”, which is the number of servers we have. The general syntax we will be using to create the volume is this:
<pre> sudo gluster volume create <span class=“highlight”>volume_name</span> replica <span class=“highlight”>num_of_servers</span> transport tcp <span class=“highlight”>domain1.com:/path/to/data/directory domain2.com:/path/to/data/directory …</span> force </pre>
The exact command we will run is this:
sudo gluster volume create volume1 replica 2 transport tcp gluster0.droplet.com:/gluster-storage gluster1.droplet.com:/gluster-storage force
volume create: volume1: success: please start the volume to access data
This will create a volume called volume1
. It will store the data from this volume in directories on each host at /gluster-storage
. If this directory does not exist, it will be created.
At this point, our volume is created, but inactive. We can start the volume and make it available for use by typing:
sudo gluster volume start volume1
volume start: volume1: success
Our volume should be online currently.
Now that we have our volume configured, it is available for use by our client machine.
Before we begin though, we need to actually install the relevant packages from the PPA we set up earlier.
On your client machine (gluster2 in this example), type:
sudo apt-get install glusterfs-client
This will install the client application, and also install the necessary fuse filesystem tools necessary to provide filesystem functionality outside of the kernel.
We are going to mount our remote storage volume on our client computer. In order to do that, we need to create a mount point. Traditionally, this is in the /mnt
directory, but anywhere convenient can be used.
We will create a directory at /storage-pool
:
sudo mkdir /storage-pool
With that step out of the way, we can mount the remote volume. To do this, we just need to use the following syntax:
<pre> sudo mount -t glusterfs <span class=“highlight”>domain1.com</span>:<span class=“highlight”>volume_name</span> <span class=“highlight”>path_to_mount_point</span> </pre>
Notice that we are using the volume name in the mount command. GlusterFS abstracts the actual storage directories on each host. We are not looking to mount the /gluster-storage
directory, but the volume1
volume.
Also notice that we only have to specify one member of the storage cluster.
The actual command that we are going to run is this:
sudo mount -t glusterfs gluster0.droplet.com:/volume1 /storage-pool
This should mount our volume. If we use the df
command, you will see that we have our GlusterFS mounted at the correct location.
Now that we have set up our client to use our pool of storage, let’s test the functionality.
On our client machine (gluster2), we can type this to add some files into our storage-pool directory:
cd /storage-pool
sudo touch file{1..20}
This will create 20 files in our storage pool.
If we look at our /gluster-storage
directories on each storage host, we will see that all of these files are present on each system:
# on gluster0.droplet.com and gluster1.droplet.com
cd /gluster-storage
ls
file1 file10 file11 file12 file13 file14 file15 file16 file17 file18 file19 file2 file20 file3 file4 file5 file6 file7 file8 file9
As you can see, this has written the data from our client to both of our nodes.
If there is ever a point where one of the nodes in your storage cluster is down and changes are made to the filesystem. Doing a read operation on the client mount point after the node comes back online should alert it to get any missing files:
ls /storage-pool
Now that we have verified that our storage pool can be mounted and replicate data to both of the machines in the cluster, we should lock down our pool.
Currently, any computer can connect to our storage volume without any restrictions. We can change this by setting an option on our volume.
On one of your storage nodes, type:
<pre> sudo gluster volume set volume1 auth.allow <span class=“highlight”>gluster_client_IP_addr</span> </pre>
You will have to substitute the IP address of your cluster client (gluster2) in this command. Currently, at least with /etc/hosts
configuration, domain name restrictions do not work correctly. If you set a restriction this way, it will block all traffic. You must use IP addresses instead.
If you need to remove the restriction at any point, you can type:
sudo gluster volume set volume1 auth.allow *
This will allow connections from any machine again. This is insecure, but may be useful for debugging issues.
If you have multiple clients, you can specify their IP addresses at the same time, separated by commas:
<pre> sudo gluster volume set volume1 auth.allow <span class=“highlight”>gluster_client1_ip</span>,<span class=“highlight”>gluster_client2_ip</span> </pre>
When you begin changing some of the settings for your GlusterFS storage, you might get confused about what options you have available, which volumes are live, and which nodes are associated with each volume.
There are a number of different commands that are available on your nodes to retrieve this data and interact with your storage pool.
If you want information about each of your volumes, type:
sudo gluster volume info
Volume Name: volume1
Type: Replicate
Volume ID: 3634df4a-90cd-4ef8-9179-3bfa43cca867
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster0.droplet.com:/gluster-storage
Brick2: gluster1.droplet.com:/gluster-storage
Options Reconfigured:
auth.allow: 111.111.1.11
Similarly, to get information about the peers that this node is connected to, you can type:
sudo gluster peer status
Number of Peers: 1
Hostname: gluster0.droplet.com
Port: 24007
Uuid: 6f30f38e-b47d-4df1-b106-f33dfd18b265
State: Peer in Cluster (Connected)
If you want detailed information about how each node is performing, you can profile a volume by typing:
<pre> sudo gluster volume profile <span class=“highlight”>volume_name</span> start </pre>
When this command is complete, you can obtain the information that was gathered by typing:
Cumulative Stats: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop
0.00 0.00 us 0.00 us 0.00 us 20 RELEASE
0.00 0.00 us 0.00 us 0.00 us 6 RELEASEDIR
10.80 113.00 us 113.00 us 113.00 us 1 GETXATTR
28.68 150.00 us 139.00 us 161.00 us 2 STATFS
60.52 158.25 us 117.00 us 226.00 us 4 LOOKUP
Duration: 8629 seconds
Data Read: 0 bytes Data Written: 0 bytes . . . </pre>
You will receive a lot of information about each node with this command.
For a list of all of the GlusterFS associated components running on each of your nodes, you can type:
sudo gluster volume status
Status of volume: volume1
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick gluster0.droplet.com:/gluster-storage 49152 Y 2808
Brick gluster1.droplet.com:/gluster-storage 49152 Y 2741
NFS Server on localhost 2049 Y 3271
Self-heal Daemon on localhost N/A Y 2758
NFS Server on gluster0.droplet.com 2049 Y 3211
Self-heal Daemon on gluster0.droplet.com N/A Y 2825
There are no active volume tasks
If you are going to be administering your GlusterFS storage volumes, it may be a good idea to drop into the GlusterFS console. This will allow you to interact with your GlusterFS environment without needing to type sudo gluster
before everything:
sudo gluster
This will give you a prompt where you can type your commands. This is a good one to get yourself oriented:
help
When you are finished, exit like this:
exit
At this point, you should have a redundant storage system that will allow us to write to two separate servers simultaneously. This can be useful for a great number of applications and can ensure that our data is available even when one server goes down.
<div class=“author”>By Justin Ellingwood</div>
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
I’d avoid using domain names that may already exist like “droplet.com” and instead use something like “example.com” or “example.org” which are both reserved for use in documentation.
If you want to use an internal name, I would choose a non-existent TLD to avoid conflicts which may eventually break functionality.
Has anyone used this and would report how they found it on Digital Ocean for performance. I read this week that DO are also trialing a block storage solution of their own.
how is this redundant if the first server goes down?. wouldnt it be better to monitor the network drive, and move files out to local disk?. And how would that be accomplished?
Any idea how I can map it as a network drive on windows?
Is this a HA cluster? In your example, data is redundant. Client (gluster2) connects to the storage via gluster0. But what if gluster0 goes down?
What I’m looking for is to connect to a redundant gluster volume with High Availability.
Hi!
Great instructions. I’m wondering if you have been able to succesfful created a Volume from a Brick(s) that are CIFS-mounted ZFS datasets? For instance, say you have
ZFS-server:/dataset/subset
mounted to/localhost/mnt
, you then want to create a volume using “/localhost/mnt” as the Brick.Have you been able to successfully create a volume over a mounted directory?
Thanks!
Really enjoy this tutorial, condense and to the point. Thank you.
One question: is it OK if the Gluster client is on the same machine as the Gluster server? Take the example from this tutorial, gluster0 and gluster1 are the server. I’m wondering whether it’s valid to setup client on gluster0 or gluster1? Will there be potential problems in this setup?
Hello! Thanks for tut!
Any advices on how to configure iptables with this setup?
I’m running ubuntu 14.04 servers in VirtualBox for testing and found that the probing would never connect. I then tried again without using the ppa in this tutorial:
sudo add-apt-repository ppa:semiosis/ubuntu-glusterfs-3.4
It now works and I am running version:
glusterfs 3.4.2 built on Jan 14 2014 18:05:35
Thanks guys for the tutorial, I really like all your writings.
Unfortunately I had an error following this one, so if you have the error :
E [mount.c:267:gf_fuse_mount] 0-glusterfs-fuse: cannot open /dev/fuse (No such file or directory) E [xlator.c:390:xlator_init] 0-fuse: Initialization of volume ‘fuse’ failed, review your volfile again
when trying to mount the volume on the client, type this command before mounting the volume:
mknod /dev/fuse c 10 229
Cheers.