This article covers a version of Ubuntu that is no longer supported. If you are currently operate a server running Ubuntu 12.04, we highly recommend upgrading or migrating to a supported version of Ubuntu:
Reason: Ubuntu 12.04 reached end of life (EOL) on April 28, 2017 and no longer receives security patches or updates. This guide is no longer maintained.
See Instead: This guide might still be useful as a reference, but may not work on other Ubuntu releases. If available, we strongly recommend using a guide written for the version of Ubuntu you are using. You can use the search functionality at the top of the page to find a more recent version.
MongoDB is an extremely popular NoSQL database. It is often used to store and manage application data and website information. MongoDB boasts a dynamic schema design, easy scalability, and a data format that is easily accessible programmatically.
In this guide, we will discuss how to set up data replication in order to ensure high availability of data and create a robust failover system. This is important in any production environment where a database going down would have a negative impact on your organization or business.
We will assume that you have already installed MongoDB on your system. For information on how to install MongoDB on Ubuntu 12.04, click here.
MongoDB handles replication through an implementation called “replication sets”. Replication sets in their basic form are somewhat similar to nodes in a master-slave configuration. A single primary member is used as the base for applying changes to secondary members.
The difference between a replication set and master-slave replication is that a replication set has an intrinsic automatic failover mechanism in case the primary member becomes unavailable.
Primary member: The primary member is the default access point for transactions with the replication set. It is the only member that can accept write operations.
Each replication set can have only one primary member at a time. This is because replication happens by copying the primary’s “oplog” (operations log) and repeating the changes on the secondary’s dataset. Multiple primaries accepting write operations would lead to data conflicts.
Secondary members: A replication set can contain multiple secondary members. Secondary members reproduce changes from the oplog on their own data.
Although by default applications will query the primary member for both read and write operations, you can configure your setup to read from one or more of the secondary members. A secondary member can become the primary if the primary goes offline or steps down.
Note: Due to the fact that data is transfered asynchronously, reads from secondary nodes can result in old data being served. If this is a concern for your use-case, you should not enable this functionality.
Arbiter: An arbiter is an optional member of a replication set that does not take part in the actual replication process. It is added to the replication set to participate in only a single, limited function: to act as a tie-breaker in elections.
In the event that the primary member becomes unavailable, an automated election process happens among the secondary nodes to choose a new primary. If the secondary member pool contains an even number of nodes, this could result in an inability to elect a new primary due to a voting impasse. The arbiter votes in these situations to ensure a decision is reached.
If a replication set has only one secondary member, an arbiter is required.
There are instances where you may not want all of your secondary members to be beholden to the standard rules for a replication set. A replication set can have up to 12 members and up to 7 will vote in an election situation.
There are some situations where the election of certain set members to the primary position could have a negative impact on your application’s performance.
For instance, if you are replicating data to a remote datacenter or a specific member’s hardware is inadequate to perform as the main access point for the set, setting priority 0 can ensure that this member will not become a primary but can continue copying data.
Some situations require you to separate the main set of members accessible and visible to your clients from the background members that have separate purposes and should not interfere.
For instance, you may need a secondary member to be the base for analytics work, which would benefit from an up-to-date dataset but would cause a strain on working members. By setting this member to hidden, it will not interfere with the general operations of the replication set.
Hidden members are necessarily set to priority 0 to avoid becoming the primary member, but they do vote in elections.
By setting the delay option for a secondary member, you can control how long the secondary waits to perform each action it copies from the primary’s oplog.
This is useful if you would like to safeguard against accidental deletions or recover from destructive operations. For instance, if you delay a secondary by a half-day, it would not immediately perform accidental operations on its own set of data and could be used to revert changes.
Delayed members cannot become primary members, but can vote in elections. In the vast majority of situations, they should be hidden to prevent processes from reading data that is out-of-date.
To demonstrate how to configure replication sets, we will configure a simple set with a primary and two secondaries. This means that you will need three VPS instances to follow along. We will be using Ubuntu 12.04 machines.
You will need to install MongoDB on each of the machines that will be members of the set. You can follow this tutorial to learn how to install MongoDB on Ubuntu 12.04.
Once you have installed MongoDB on all three of the server instances, we need to configure some things that will allow our droplets to communicate with each other.
The following steps assume that you are logged in as the root user.
In order for our MongoDB instances to communicate with each other effectively, we will need to configure our machines to resolve the proper hostname for each member. You can either do this by configuring subdomains for each replication member or through editing the /etc/hosts
file on each computer.
It is probably better to use subdomains in the long run, but for the sake of getting off the ground quickly, we will do this through the hosts file.
On each of the soon-to-be replication members, edit the /etc/hosts
file:
nano /etc/hosts
After the first line that configures the localhost, you should add an entry for each of the replication sets members. These entries take the form of:
<pre> <span class=“highlight”>ip_address</span> mongohost0.example.com </pre>
You can get the IP addresses of the members of your set in the DigitalOcean control panel. The name you choose as a hostname for that computer is arbitrary, but should be descriptive.
For our example, our /etc/hosts
would look something like this:
<pre>127.0.0.1 localhost <span class=“highlight”>mongo0</span> 123.456.789.111 mongo0.example.com 123.456.789.222 mongo1.example.com 123.456.789.333 mongo2.example.com</pre>
This file should (mostly) be the same across all of the hosts in your set. Save and close the file on each of your members.
Next, you need to set the hostname of your droplet to reflect these new changes. The command on each VPS will reflect the name you gave that specific machine in the /etc/hosts
file. You should issue a command on each server that looks like:
<pre> hostname <span class=“highlight”>mongo0.example.com</span> </pre>
Modify this command on each server to reflect the name you selected for it in the file.
Edit the /etc/hostname
file to reflect this as well:
<pre> nano /etc/hostname </pre> <pre> <span class=“highlight”>mongo0.example.com</span> </pre>
These steps should be performed on each node.
The first thing we need to do to begin the MongoDB configuration is stop the MongoDB process on each server.
On each sever, type:
service mongodb stop
Now, we need to configure a directory that will be used to store our data. Create a directory with the following command:
mkdir /mongo-metadata
Now that we have the data directory created, we can modify the configuration file to reflect our new replication set configuration:
nano /etc/mongodb.conf
In this file, we need to specify a few parameters. First, adjust the dbpath
variable to point to the directory we just created:
dbpath=/mongo-metadata
Remove the comment from in front of the port number specification to ensure that it is started on the default port:
port = 27017
Towards the bottom of the file, remove the comment form in front of the replSet
parameter. Change the value of this variable to something that will be easy to recognize for you.
replSet = rs0
Finally, you should make the process fork so that you can use your shell after spawning the server instance. Add this to the bottom of the file:
fork = true
Save and close the file. Start the replication member by issuing the following command:
mongod --config /etc/mongodb.conf
These steps must be repeated on each member of the replication set.
Now that you have configured each member of the replication set and started the mongod
process on each machine, you can initiate the replication and add each member.
On one of your members, type:
mongo
This will give you a MongoDB prompt for the current member.
Start the replication set by entering:
rs.initiate()
This will initiate the replication set and add the server you are currently connected to as the first member of the set. You can see this by typing:
rs.conf()
{
"_id" : "rs0"
"version" : 1,
"members" : [
{
"_id" : 0,
"host" "mongo0.example.com:27017"
}
]
}
Now, you can add the additional nodes to the replication set by referencing the hostname you gave them in the /etc/hosts
file:
rs.add("mongo1.example.com")
{ "ok" : 1 }
Do this for each of your remaining replication members. Your replication set should now be up and running.
By properly configuring replication sets for each of your data storage targets, your databases will be protected in some degree from unavailability and hardware failure. This is essential for any production system.
Replication sets provide a seamless interface with applications because they are essentially invisible to the outside. All replication mechanics are handled internally. If you plan on implementing MongoDB sharding, it is a good idea to implement replication sets for each of the shard server components.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Sign up for Infrastructure as a Newsletter.
Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Thanks for this great tutorial. #HeartedAlready I had some issues during my setup which i think should be mentioned here.
The new conf file is located at /etc/mongod.conf and is in YAML format so you have to make some syntax changes as:-
dbpath: mongo-metadata
//remove comment before “replication” and then under it add replSetName: rs0
//remove comment before "processManagement and then under it add fork: true
Note the spaces in YAML
I’m getting error for
fork = true
Hello it’s been long time, I wonder is this tutorial content valid, with the new mongodb version? and How to setup this tutorial with private local ip? Thanks
Hi, I’m rather new to configuring production environments on a VPS. This tutorial says “you will need three VPS instances” which I assume means that I need to create three different droplets. However, in the Set Up DNS Resolution section, the 4 different IP addressees confuses me.
127.0.0.1 localhost mongo0 123.456.789.111 mongo0.example.com 123.456.789.222 mongo1.example.com 123.456.789.333 mongo2.example.com
Are 3 instances ONLY for serving the databases, and the localhost serves the app (Node.js, etc.)?
Can a master-slave Redis cluster be used on the same IP addresses as my replication sets?
I just want some clarity before proceeding.
Thanks!
To answer @seattledba 's question: This param should be the same for every host. replSet = rs0 for every host.
Nice article, Can we add multiple instance on ec2,i want to run mongod on port 27017 and port 27019? if yes then how can we start both mongo instance?
replSet = rs0 is this to identify each individual data set? So it should be the same on each host? Thank you for providing awesome documentation
This was tremendously helpful, thanks! One small note; if you’re setting up your nodes members to communicate across a private network, you may need to alter or remove the “bind_ip 127.0.0.1” line in your /etc/mongod.conf file before they will communicate.
Mine were reporting “exception: need most members up to reconfigure, not ok” prior to doing this and restarting the servers.
Thanks again for the great guide.
After installing MongoDB on Ubuntu VPS, the command
sudo service mongodb start
doesnt work. But the command
sudo service mongod start
works fine. So I guess correction is needed in the article.
@karthik: I think that should work fine. Try it out and let me know how it goes!