Tutorial

How To Create a Beowulf Cluster Using Ubuntu 12.04 VPS Instances

Published on November 6, 2013

How To Create a Beowulf Cluster Using Ubuntu 12.04 VPS Instances

This tutorial is out of date and no longer maintained.

Status: Deprecated

This article covers a version of Ubuntu that is no longer supported. If you are currently operate a server running Ubuntu 12.04, we highly recommend upgrading or migrating to a supported version of Ubuntu:

Reason: Ubuntu 12.04 reached end of life (EOL) on April 28, 2017 and no longer receives security patches or updates. This guide is no longer maintained.

See Instead: This guide might still be useful as a reference, but may not work on other Ubuntu releases. If available, we strongly recommend using a guide written for the version of Ubuntu you are using. You can use the search functionality at the top of the page to find a more recent version.

Introduction

Cloud computing with VPS instances provides a number of possibilities not readily available to home computer users. One of these is the concept of clustering.

With easily deployable server instances, clustered computing is easy to set up and expand. In this guide, we will discuss how to configure a Beowulf cluster for distributed processing between nodes.

Prerequisites

In this tutorial, we will be using 4 Ubuntu 12.04 VPS instances. The majority of the configuration will be the same throughout the nodes, so we will use a bootstrap process to set up an initial environment, then leverage DigitalOcean snapshots to deploy this to the other nodes.

This configuration will also take advantage of DigitalOcean Private Networking, which currently is available in the NYC2 region currently. Be sure to enable private networking when creating your droplet.

We will be creating one control node and then 3 worker nodes to actually do the work.

We will be using 4 GB droplets in order to take advantage of the higher processing power, but you can use smaller nodes.

A description of our hardware and networking configuration:

Control node:
- Hostname: command
- Private IP Address: 1.1.1.1

The rest of the nodes should not be created initially. They will be created at a later by copying the control node’s configuration.

Worker node 1:
- Hostname: work1
- Private IP Address: 1.1.1.2
Worker node 2:
- Hostname: work2
- Private IP Address: 1.1.1.3
Worker node 3:
- Hostname: work3
- Private IP Address: 1.1.1.4

At this point, you should have your control droplet created with an Ubuntu 12.04 image with private networking enabled. You should create a user and give it sudo privileges. We will use this user for this tutorial.

Initial Configuration of Control Node

Log into your control node droplet using SSH.

Create a Cluster User

The first thing that we will do is create an additional, unprivileged user to operate our cluster (this should be separate from the user you use with sudo). We will name our user cluster:

sudo adduser cluster --uid 900

The --uid parameter specifies the user id that will be associated with the account. A number below 1000 indicates a system user that should not be used for regular tasks.

Give the cluster user a password and feel free to press “ENTER” through the rest of the prompts.

Create SSH Credentials

Next, we need to create SSH credentials for our user. Our cluster nodes will communicate with SSH, and share information by mounting a shared NFS partition. We will need to set up an SSH key pair so that all of the nodes can communicate without the use of passwords.

First, change users to the new cluster user. Supply the password you set during creation:

su - cluster

Now, we can generate RSA keys with the following command:

ssh-keygen

Press “ENTER” through all of the prompts, (including the password prompt) to create the key pair.

We now can copy it to our own known hosts file. This usually wouldn’t do anything, but since we will be mounting this home directory with NFS later, it will be shared between the nodes and allow them to connect to each other seamlessly:

ssh-copy-id localhost

Type “yes” to accept the key. Enter the cluster user’s password.

Exit back into your regular user by typing:

exit

Install the MPI Implementation

Our node clusters will communicate with a system called Message Passing Interface, more commonly known as MPI. This allows parallel processes to communicate easily and share work and status information.

We will use the MPICH2 implementation, which is a popular, well-supported version.

Install the software by typing:

sudo apt-get install mpich2

The MPI interface should now be installed.

Deploy Worker Nodes from Control Node

We will create our worker nodes by creating a snapshot of our current control node configuration and then diverging from that point. Starting in October 2016, snapshots cost $0.05 per gigabyte per month, based on the amount of utilized space within the filesystem.

Create a Snapshot of the Control Node

To create a snapshot, begin by powering down your droplet. While it’s possible to take a snapshot of a live system, powering down ensures that the filesystem is in a consistent state. In the command line, type:

sudo shutdown -h now

In the DigitalOcean control panel, select your control node droplet. Under the Snapshots menu, enter the name you would like to use for your snapshot and click “Take Snapshot”:

DigitalOcean control node snapshot

This may take a few minutes.

Launch Worker Nodes from Snapshot

When your snapshot is complete, you can use the snapshot image as the base for your worker nodes. We will be creating 3 additional nodes, called work1, work2, and work3.

Click the “Create” button from the DigitalOcean control panel. Select the name, droplet size you would like, and select a region with Private Networking (NYC2 for example).

When selecting your base image, click on “My Images” and select the snapshot name you just created.

DigitalOcean base image selection

Make sure you select the “Private Networking” check box before you create the droplet:

DigitalOcean private networking

Create your droplet.

Repeat this step for the additional worker nodes.

Gather Private Networking Information

You will need your Private Networking IP Address for each of the nodes. The easiest way of doing this is through the DigitalOcean control panel.

Click on the droplet name. Click on the “Settings” menu. There is a “Private Network” section that contains your Private IP address:

DigitalOcean private IP address

Write down the Private IP address and the associated host name of each node. You will need this information momentarily.

Complete Control Node Configuration

We now need to complete the control node configuration. Up until now, we were doing generic configuration so that our changes would be applicable to our worker nodes. Now we will start differentiating our control node.

Log back into the control node.

As we mentioned, this configuration will use NFS to share the home directory between all of our nodes. The control droplet will have the NFS server. Install it with these commands:

sudo apt-get update && sudo apt-get install nfs-kernel-server

We will be exporting our cluster user’s home directory to all of the nodes:

sudo nano /etc/exports

Add this line at the bottom of the file:

/home/cluster *(rw,sync,no_subtree_check)

We will restart our NFS server with the following command:

sudo service nfs-kernel-server restart

Node Configuration

Now that we have the Private IP addresses and the associated hostnames from the DigitalOcean control panel, we can edit the hosts file on each node (master and workers) to reference each other.

On Control Node and Worker Nodes

On each node, edit the /etc/hosts file and add the information for each node in this format. The IP addresses here reflect the dummy values that I mentioned in the prerequisite section. Substitute the values you found in your Control Panel Settings pages:

<pre> 1.1.1.1 command 1.1.1.2 work1 1.1.1.3 work2 1.1.1.4 work3 </pre>

Open the hosts file with this command:

sudo nano /etc/hosts

Copy and paste the above information in the line under the localhost definition on each droplet:

127.0.0.1       localhost command
1.1.1.1         command
1.1.1.2         work1
1.1.1.3         work2
1.1.1.4         work3

Save and close the file.

On the Worker Nodes

Next, we need to install and configure the NFS components on the worker nodes. We can do this with apt-get.

On each worker node, install the NFS tools:

sudo apt-get install nfs-common -y

We can now see the NFS exports that we configured on our control node:

sudo showmount -e command

Export list for command:
/home/cluster *

This means that your shares are being exported correctly for the command droplet. If you run into trouble, you can try to restart the NFS server on the command droplet by typing:

sudo service nfs-kernel-server restart

Back on your worker droplets, we can now mount the cluster user’s home directory on the command droplet onto the cluster user’s home directory on the worker droplets.

On each of the worker nodes, type the following:

sudo mount command:/home/cluster /home/cluster

This will mount the control droplet’s home directory for this session. To make this happen automatically on startup, add this configuration to the /etc/fstab file.

Open the file on each worker node with administrative privileges:

sudo nano /etc/fstab

Add this line at the bottom of the file to make the mount happen automatically at boot:

command:/home/cluster /home/cluster nfs

Save and close the file.

Complete Final Configuration Steps

We now have a control droplet that is sharing its cluster user’s home directory through NFS. It is configured with SSH to log into the worker nodes (by exporting its own login credentials with NFS essentially).

We should test the ability for our nodes to SSH without a password. This will give us the opportunity to accept each host definition so that SSH won’t complain about an unknown host when we try to run it later.

Change to your cluster user on the control droplet:

su - cluster

SSH into each node (master and workers) in turn:

ssh command

Type “yes” to accept the host definition for each node. Exit back into the node you were in:

exit

Repeat this with each of the nodes by name (ssh work1, ssh work2, etc). Make sure they can SSH into one another without any prompting.

Create the Hosts File

We will create a hosts file (different from the /etc/hosts file) in order to list the nodes that should be used for work in the cluster.

With our current setup, we will use the control droplet (named command) to issue commands to the cluster. The 3 worker nodes (work1, work2, and work3) will distribute the task between themselves.

We will not add the control droplet to the list of hosts. This will allow it to remain responsive in case our cluster is under heavy load.

On the control droplet, log into the cluster user if you have not already done so:

su - cluster

Create a hosts file with this command:

nano ~/hosts

List the names of your worker nodes, one per line:

work1
work2
work3

Save and close the file.

Lastly, we can create a local bin directory to contain our cluster applications:

mkdir ~/bin

Test the Cluster

The mpich2 program that sends messages between nodes includes some sample applications that we can use to test our cluster. Unfortunately, these are not built by default.

We will have to compile them ourself.

On the control droplet, as a regular user with sudo privileges, get the build dependencies of the package we already installed:

sudo apt-get build-dep mpich2

Now that we have the appropriate dependencies, we can acquire the source files from the project’s website:

wget http://www.mpich.org/static/downloads/1.4.1/mpich2-1.4.1.tar.gz

Unzip the file and change into the resulting directory:

tar xzvf mpich*
cd mpich*

Now, we can configure and make the package:

./configure && make

This will take quite a bit of time. When it is done, change into the cluster user:

su - cluster

Copy the example program that we compiled into the bin directory that we created:

<pre> cp /home/regular_user/mpich2-1.4.1/examples/cpi /home/cluster/bin </pre>

We can now test our cluster using this example program.

We will reference the hosts file we created and specify the number of processes to run. We will also specify the interface that the nodes should connect on, since DigitalOcean’s Private Networking uses interface eth1 instead of the regular eth0.

mpiexec -f hosts -iface eth1 -n 12 /home/cluster/bin/cpi

Process 6 of 12 is on work1
Process 2 of 12 is on work3
Process 9 of 12 is on work1
Process 11 of 12 is on work3
Process 0 of 12 is on work1
Process 5 of 12 is on work3
Process 8 of 12 is on work3
Process 3 of 12 is on work1
Process 7 of 12 is on work2
Process 10 of 12 is on work2
Process 4 of 12 is on work2
Process 1 of 12 is on work2
pi is approximately 3.1415926544231256, Error is 0.0000000008333325
wall clock time = 0.003485

As you can see, 12 processes are spawned. If you go through each process sequentially, you’ll notice that each worker is used in a round robin manner. This test proves our cluster is working correctly.

Conclusion

You now have a fully functional Beowulf clustered environment. You can easily add nodes by installing the necessary software, adding additional hosts to the hosts file in the cluster home directory, and filling in the /etc/hosts file.

Any MPI application can be used in the same way. MPI is a standard, so you can find documentation online with detailed explanation of how to write MPI applications. Any MPI application can use this cluster to distribute the processes among multiple computers.

<div class=“author”>By Justin Ellingwood</div>

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author(s)

Justin Ellingwood

See author profile

Category:

Tutorial

Tags:

Still looking for an answer?

Ask a question Search for more help

Was this helpful?

9 Comments

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

a09b68ada5e8b2060f380a53c02cf3 • May 20, 2014

1.1.1.1 is not a private address, why would you use that?

Andrew SB

DigitalOcean Employee

• May 20, 2014

@Pétur: Those are just dummy values, and they should be replaced with the actual private IP addresses.

nixzed • October 25, 2014

I’m getting an error

cluster@marte:~$ mpiexec -f hosts -iface wlan0 -n 12 /home/cluster/bin/cpi bash: /usr/bin/hydra_pmi_proxy: No such file or directory

but there is the the file there

cluster@marte:~$ which hydra_pmi_proxy /usr/bin/hydra_pmi_proxy

Am I missing something?

asiddique81100 • January 3, 2015

Is this all being typed into terminal?

digosw • August 19, 2015

hello there!

I have to build a beowulf cluster at my workspace… And I’m following your walkthrough… but when I tried to do the 4th step, an error occurred.

This was the step.

" ssh-copy-id localhost "

I’ve just installed the first pc (the commander)… Do I need to install the nodes before these firsts steps?

Thanks!

cblunt • October 7, 2015

Excellent article, thanks!

But failing at last hurdle :( When I try to launch mpiexec I get time timeout errors and this error message 4 times: [proxy:0:0@work1] HYDU_sock_connect (./utils/sock/sock.c:203): unable to connect from “work1” to “10.132.139.215” (Connection timed out) [proxy:0:0@work1] main (./pm/pmiserv/pmip.c:209): unable to connect to server 10.132.139.215 at port 46137 (check for firewalls!)

I wrongly assumed communication between command and workers was done over ssh. The port is random so adding rule for port 46137 failed. How do I set the range of ports that mpiexec uses?

codykochmann • October 17, 2015

I really can’t get over how simple you guys make complicated topics. As a student computer scientist I have always wanted to build one of these and now that you’ve drawn it out for me thank you :)

shrey • October 17, 2015

should i follow all these step for implemeting the multi cluster in ubuntu???

satriawan182 • January 28, 2018

for that 4 GB, is it assigned to the control node or to all nodes?