We hope you find this tutorial helpful. In addition to guides like this one, we provide simple cloud infrastructure for developers. Learn more →

How To Create a Riak Cluster on an Ubuntu VPS

Posted Jul 16, 2013 13.4k views Clustering Ubuntu


Riak is a distributed database that offers highly available, fault-tolerant, scalable data management.

This guide will cover how to install and configure a Riak cluster using 64-bit Ubuntu 12.04 VPS instances. We will be using 5 separate cloud servers.

The Riak website recommends using machines with a minimum of 4GB of Ram for best performance, so we will be using cloud servers of that size.

We will be configuring our VPS with root. Be sure to log into each VPS as root or use "su" to obtain the appropriate privileges.


The following installation steps will be required on each node you will be setting up.

There are pre-compiled binary packages available for Ubuntu that can be downloaded from Riak's website.

First, we will configure apt-get to trust the Riak apt repository and add it to our sources:

curl http://apt.basho.com/gpg/basho.apt.key | apt-key add -
bash -c "echo deb http://apt.basho.com $(lsb_release -sc) main > /etc/apt/sources.list.d/basho.list"

We can now update the apt-get database and install Raik.

apt-get update
apt-get install riak

We now have Riak installed. Remember to repeat this step on the other machines you will be using.

Configuring Riak

Now that Riak has been installed, each node will need to be configured. We will complete the following steps on each machine.

Modifying app.config

Ensure that there are no instances of Riak currently running, change into the Riak configuration directory, and open the primary configuration file:

riak stop
cd /etc/riak
nano app.config

We will be changing two values to reflect the network settings of this machine.

Search for line that reads "{pb, [ {"", 8087 } ]}". Change the "" to reflect the IP Address of your machine.

{pb, [ {"Your.IP.Address", 8087 } ]},

Next, perform a similar replacement on line that reads "{http, [ {"", 8098 } ]}". Again, use the IP address of your machine.

{http, [ {"Your.IP.Address", 8098 } ]},

Save and close the file.

Modifying vm.args

Next, we will be modifying the "vm.args" file:

nano vm.args

Find and modify the line specifying the node name. It should read "-name riak@". Keep everything the same but the IP Address:

-name riak@Your.IP.Address

Save and close the file.

Starting Riak

Starting the Riak nodes is simple:

riak start
!!!! WARNING: ulimit -n is 1024; 4096 is the recommended minimum.

You will probably get the warning above. Let's fix that now temporarily. We will make this permanent later:

riak stop
ulimit -n 65536

Now we can restart Riak to see if the ulimit warning goes away.

riak start

Creating a Cluster

If you have been following the guide, you should now have five nodes configured and running.

However, currently they are operating independently. They are all handling 100% of their independent data sets and are not in communication. We will merge them into a cluster in this section.

The following steps will join all of the Riak nodes to our first node. Riak will redistribute the data between them automatically when complete.

On our second node, tell the local Riak instance to join the first Riak node:

riak-admin cluster join riak@First.Riak.IP
Success: staged join request for 'riak@Second.Riak.IP' to 'riak@1First.Riak.IP'

This will set up the action of joining, but it will not execute yet. We must view the planned changes first:

riak-admin cluster plan

This will show you the results of the planned change. Riak makes you view the purposed changes before it executes the action.

If the proposal looks correct, commit the changes:

riak-admin cluster commit
Cluster changes committed

We can see the new cluster group by typing:

riak-admin member-status

Repeat the procedure for the other nodes to form a full cluster group.

Optimizing Settings

Now that we are set up, it is important that we go back and fix some settings that are not ideal for our purposes.

One thing we need to change is the "ulimit" setting that we were warned about when starting Riak. We will create a file to permanently change this setting:

nano /etc/default/riak

Add the following line, which will be executed when the computer starts Riak each time:

ulimit -n 65536

Save and close the file.

Next, we need to see what Riak thinks we should optimize:

riak-admin diag
[critical] vm.swappiness is 60, should be no more than 0
[critical] net.core.wmem_default is 229376, should be at least 8388608
[critical] net.core.rmem_default is 229376, should be at least 8388608
[critical] net.core.netdev_max_backlog is 1000, should be at least 10000
[critical] net.core.somaxconn is 128, should be at least 4000
[critical] net.ipv4.tcp_max_syn_backlog is 2048, should be at least 40000
[critical] net.ipv4.tcp_fin_timeout is 60, should be no more than 15
[critical] net.ipv4.tcp_tw_reuse is 0, should be 1
[notice] Data directory /var/lib/riak/bitcask is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance.

There is a chance that you will also see a large list of messages, the first of which starts with:

[warning] The following preflists do not satisfy the n_val:

This means that your cluster does not have enough nodes to correctly spread our data out. If we join more nodes to our cluster, these messages will disappear.

We will work on adjusting all of the "critical" notices. They can all be adjusted like this:

sysctl setting=value

Each command will depend on the output of the "riak-admin diag" program, but will follow the same format.

Re-run the diagnostic command to see if the values are fixed:

riak-admin diag
[notice] Data directory /var/lib/riak/bitcask is not mounted with 'noatime'. Please remount its disk with the 'noatime' flag to improve performance.

We safely can ignore the notice message. Our new values have fixed the issues with our node.

These values will only exist for the current session. To make the values persist, we need to edit the "sysctl.conf" file:

nano /etc/sysctl.conf

Search for each of the different keys and adjust the values as suggested by the "riak-admin diag" command. If the settings don't exist, add them to the bottom of the list.


Our node is now configured correctly. Repeat the above steps on each machine to continue.

Testing the Cluster

We can add a file to test our cluster easily. First, get an image you'd like to use. We will use an image off of the DigitalOcean website:

cd ~
wget https://www.digitalocean.com/assets/v2/footer_mascott.png

Now we can put the image into our cluster with the following command.

Replace the IP command with your node's IP address and the port with the http port from the "/etc/riak/app.config" file. By default, it should be "8098":

curl -XPUT http://IPAddress:Port/riak/images/sammy.png -H "Content-type: image/png" --data-binary @footer_mascott.png

Now, you should be able to see your image by pointing your browser to the url from the command:


You should be able to see the image.


You should now have a Riak cluster installed and configured correctly. Your cluster will now automatically distribute your data among the configured nodes.

By Justin Ellingwood


Creative Commons License