How To Implement Replication Sets in MongoDB on an Ubuntu VPS

Published on December 4, 2013

Ubuntu

MongoDB

Scaling

NoSQL

By Justin Ellingwood

How To Implement Replication Sets in MongoDB on an Ubuntu VPS

Status: Deprecated

This article covers a version of Ubuntu that is no longer supported. If you are currently operate a server running Ubuntu 12.04, we highly recommend upgrading or migrating to a supported version of Ubuntu:

Reason: Ubuntu 12.04 reached end of life (EOL) on April 28, 2017 and no longer receives security patches or updates. This guide is no longer maintained.

See Instead: This guide might still be useful as a reference, but may not work on other Ubuntu releases. If available, we strongly recommend using a guide written for the version of Ubuntu you are using. You can use the search functionality at the top of the page to find a more recent version.

Introduction

MongoDB is an extremely popular NoSQL database. It is often used to store and manage application data and website information. MongoDB boasts a dynamic schema design, easy scalability, and a data format that is easily accessible programmatically.

In this guide, we will discuss how to set up data replication in order to ensure high availability of data and create a robust failover system. This is important in any production environment where a database going down would have a negative impact on your organization or business.

We will assume that you have already installed MongoDB on your system. For information on how to install MongoDB on Ubuntu 12.04, click here.

What is a MongoDB Replication Set?

MongoDB handles replication through an implementation called “replication sets”. Replication sets in their basic form are somewhat similar to nodes in a master-slave configuration. A single primary member is used as the base for applying changes to secondary members.

The difference between a replication set and master-slave replication is that a replication set has an intrinsic automatic failover mechanism in case the primary member becomes unavailable.

Primary member: The primary member is the default access point for transactions with the replication set. It is the only member that can accept write operations.

Each replication set can have only one primary member at a time. This is because replication happens by copying the primary’s “oplog” (operations log) and repeating the changes on the secondary’s dataset. Multiple primaries accepting write operations would lead to data conflicts.
Secondary members: A replication set can contain multiple secondary members. Secondary members reproduce changes from the oplog on their own data.

Although by default applications will query the primary member for both read and write operations, you can configure your setup to read from one or more of the secondary members. A secondary member can become the primary if the primary goes offline or steps down.

Note: Due to the fact that data is transfered asynchronously, reads from secondary nodes can result in old data being served. If this is a concern for your use-case, you should not enable this functionality.
Arbiter: An arbiter is an optional member of a replication set that does not take part in the actual replication process. It is added to the replication set to participate in only a single, limited function: to act as a tie-breaker in elections.

In the event that the primary member becomes unavailable, an automated election process happens among the secondary nodes to choose a new primary. If the secondary member pool contains an even number of nodes, this could result in an inability to elect a new primary due to a voting impasse. The arbiter votes in these situations to ensure a decision is reached.

If a replication set has only one secondary member, an arbiter is required.

Secondary Member Customization Options

There are instances where you may not want all of your secondary members to be beholden to the standard rules for a replication set. A replication set can have up to 12 members and up to 7 will vote in an election situation.

Priority 0 Replication Members

There are some situations where the election of certain set members to the primary position could have a negative impact on your application’s performance.

For instance, if you are replicating data to a remote datacenter or a specific member’s hardware is inadequate to perform as the main access point for the set, setting priority 0 can ensure that this member will not become a primary but can continue copying data.

Hidden Replication Members

Some situations require you to separate the main set of members accessible and visible to your clients from the background members that have separate purposes and should not interfere.

For instance, you may need a secondary member to be the base for analytics work, which would benefit from an up-to-date dataset but would cause a strain on working members. By setting this member to hidden, it will not interfere with the general operations of the replication set.

Hidden members are necessarily set to priority 0 to avoid becoming the primary member, but they do vote in elections.

Delayed Replication Members

By setting the delay option for a secondary member, you can control how long the secondary waits to perform each action it copies from the primary’s oplog.

This is useful if you would like to safeguard against accidental deletions or recover from destructive operations. For instance, if you delay a secondary by a half-day, it would not immediately perform accidental operations on its own set of data and could be used to revert changes.

Delayed members cannot become primary members, but can vote in elections. In the vast majority of situations, they should be hidden to prevent processes from reading data that is out-of-date.

How to Configure a Replication set

To demonstrate how to configure replication sets, we will configure a simple set with a primary and two secondaries. This means that you will need three VPS instances to follow along. We will be using Ubuntu 12.04 machines.

You will need to install MongoDB on each of the machines that will be members of the set. You can follow this tutorial to learn how to install MongoDB on Ubuntu 12.04.

Once you have installed MongoDB on all three of the server instances, we need to configure some things that will allow our droplets to communicate with each other.

The following steps assume that you are logged in as the root user.

Set Up DNS Resolution

In order for our MongoDB instances to communicate with each other effectively, we will need to configure our machines to resolve the proper hostname for each member. You can either do this by configuring subdomains for each replication member or through editing the /etc/hosts file on each computer.

It is probably better to use subdomains in the long run, but for the sake of getting off the ground quickly, we will do this through the hosts file.

On each of the soon-to-be replication members, edit the /etc/hosts file:

nano /etc/hosts

After the first line that configures the localhost, you should add an entry for each of the replication sets members. These entries take the form of:

<pre> ip_address mongohost0.example.com </pre>

You can get the IP addresses of the members of your set in the DigitalOcean control panel. The name you choose as a hostname for that computer is arbitrary, but should be descriptive.

For our example, our /etc/hosts would look something like this:

<pre>127.0.0.1 localhost mongo0 123.456.789.111 mongo0.example.com 123.456.789.222 mongo1.example.com 123.456.789.333 mongo2.example.com</pre>

This file should (mostly) be the same across all of the hosts in your set. Save and close the file on each of your members.

Next, you need to set the hostname of your droplet to reflect these new changes. The command on each VPS will reflect the name you gave that specific machine in the /etc/hosts file. You should issue a command on each server that looks like:

<pre> hostname mongo0.example.com </pre>

Modify this command on each server to reflect the name you selected for it in the file.

Edit the /etc/hostname file to reflect this as well:

<pre> nano /etc/hostname </pre> <pre> mongo0.example.com </pre>

These steps should be performed on each node.

Prepare for Replication in the MongoDB Configuration File

The first thing we need to do to begin the MongoDB configuration is stop the MongoDB process on each server.

On each sever, type:

service mongodb stop

Now, we need to configure a directory that will be used to store our data. Create a directory with the following command:

mkdir /mongo-metadata

Now that we have the data directory created, we can modify the configuration file to reflect our new replication set configuration:

nano /etc/mongodb.conf

In this file, we need to specify a few parameters. First, adjust the dbpath variable to point to the directory we just created:

dbpath=/mongo-metadata

Remove the comment from in front of the port number specification to ensure that it is started on the default port:

port = 27017

Towards the bottom of the file, remove the comment form in front of the replSet parameter. Change the value of this variable to something that will be easy to recognize for you.

replSet = rs0

Finally, you should make the process fork so that you can use your shell after spawning the server instance. Add this to the bottom of the file:

fork = true

Save and close the file. Start the replication member by issuing the following command:

mongod --config /etc/mongodb.conf

These steps must be repeated on each member of the replication set.

Start the Replication Set and Add Members

Now that you have configured each member of the replication set and started the mongod process on each machine, you can initiate the replication and add each member.

On one of your members, type:

mongo

This will give you a MongoDB prompt for the current member.

Start the replication set by entering:

rs.initiate()

This will initiate the replication set and add the server you are currently connected to as the first member of the set. You can see this by typing:

rs.conf()

{
    "_id" : "rs0"
    "version" : 1,
    "members" : [
        {
            "_id" : 0,
            "host" "mongo0.example.com:27017"
        }
    ]
}

Now, you can add the additional nodes to the replication set by referencing the hostname you gave them in the /etc/hosts file:

rs.add("mongo1.example.com")

{ "ok" : 1 }

Do this for each of your remaining replication members. Your replication set should now be up and running.

Conclusion

By properly configuring replication sets for each of your data storage targets, your databases will be protected in some degree from unavailability and hardware failure. This is essential for any production system.

Replication sets provide a seamless interface with applications because they are essentially invisible to the outside. All replication mechanics are handled internally. If you plan on implementing MongoDB sharding, it is a good idea to implement replication sets for each of the shard server components.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author

Justin Ellingwood

Author

See author profile

Former Senior Technical Writer at DigitalOcean, specializing in DevOps topics across multiple Linux distributions, including Ubuntu 18.04, 20.04, 22.04, as well as Debian 10 and 11.

Category:

Tags:

Still looking for an answer?

Ask a question Search for more help

Was this helpful?

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

patrick385057

January 29, 2014

There is a big bug in this documentation. The hosts file, yes is mostly the same on all, but not entirely. If someone want’s to try this and has no idea what the hosts file does and what the differences should be he will never get this to work properly.

You should at least mention what part of the hosts file on each host should be different (the 127.0.0.1 line).

Kamal Nasser

January 30, 2014

@patrick: Thanks, I’ve updated the article to highlight it in red.

karthik

March 28, 2014

In hosts file, if private networking is enabled, Is it ok to save something like this

10.xxx.xxx.xxx mongo0.example.com

-or-

10.xxx.xxx.xxx mongo0 (since its arbitary)

Kamal Nasser

April 5, 2014

@karthik: I think that should work fine. Try it out and let me know how it goes!

jjain91

April 26, 2014

After installing MongoDB on Ubuntu VPS, the command

sudo service mongodb start

doesnt work. But the command

sudo service mongod start

works fine. So I guess correction is needed in the article.

evansims

August 30, 2014

This was tremendously helpful, thanks! One small note; if you’re setting up your nodes members to communicate across a private network, you may need to alter or remove the “bind_ip 127.0.0.1” line in your /etc/mongod.conf file before they will communicate.

Mine were reporting “exception: need most members up to reconfigure, not ok” prior to doing this and restarting the servers.

Thanks again for the great guide.

seattledba

November 19, 2014

replSet = rs0 is this to identify each individual data set? So it should be the same on each host? Thank you for providing awesome documentation

pravinlogicap

June 2, 2015

Nice article, Can we add multiple instance on ec2,i want to run mongod on port 27017 and port 27019? if yes then how can we start both mongo instance?