Understanding MongoDB: Advantages of a Document-Oriented NoSQL Database
Understanding MongoDB: Advantages of a Document-Oriented NoSQL Database

Conceptual article

Understanding MongoDB: Advantages of a Document-Oriented NoSQL Database

MongoDBConceptual

Introduction

Data has become a driving force of technology in recent years, as modern applications and websites need to manage an ever-increasing amount of data. Traditionally, database management systems organize data based on the relational model. As organizations’ data needs have changed, however, a number of new types of databases have been developed.

These new types of databases often don’t rely on the traditional table structure provided by relational databases, and can thus allow for far more flexibility than the rigid structure imposed by relational databases. Additionally, they typically don’t use Structured Query Language (SQL), which is employed by most relational database systems to allow users to define and interact with data. This has led to many of these new non-relational databases to be referred to generally as NoSQL databases.

First released in 2009, MongoDB — also known as Mongo — is a document-oriented NoSQL database used in many modern web applications. This conceptual article provides a high-level overview of the features that set MongoDB apart from other database management systems and make it a valuable tool across many different use cases.

A Brief Overview of MongoDB

As mentioned in the introduction, MongoDB is considered to be a NoSQL database since it doesn’t depend on the relational model. Every database management system is designed around a certain type of data model that defines how the data within the database will be organized. The relational model involves storing data in tables — more formally known as relations — made up of rows and columns.

MongoDB, on the other hand, stores its data records in structures known as documents. Mongo allows you to group multiple documents into a structure known as a collection, which can be further grouped into separate databases.

A document is written in BSON, a binary representation of JSON. Like objects in JSON, MongoDB documents begin and end with curly brackets ({ and }), and contain a number of field-and-value pairs which typically take the form of field: value. A field’s value can be any one of the data types used in BSON, or even other structures like documents and arrays.

Security

MongoDB comes installed with a number of features that can help to prevent data loss as well as access by unauthorized users. Some of these features can be found on other database management systems. For instance, Mongo, like many modern DBMSs, allows you to encrypt data as it traverses a network — sometimes called data in transit. It does this by requiring that connections to the database be made with Transport Layer Security (TLS), a cryptographic protocol that serves as a successor to Secure Sockets Layer (SSL).

Also like other DBMSs, Mongo manages authorization — the practice of setting rules for a given user or group of users to define what actions they can perform and what resources they can access — through a computer security concept known as role-based access control, or RBAC. Whenever you create a MongoDB user, you have the option to provide them with one or more roles.

A role defines what privileges a user has, including what actions they can perform on a given database, collection, set of collections, or cluster. For example, you can assign a user the readWrite role on any database, meaning that you can read and modify the data held in any database on your system as long as you’ve granted a user the readWrite role over it. Something that distinguishes MongoDB’s RBAC from that of other databases is that, in addition to its built-in roles, Mongo also allows you to define custom roles, giving you even more control over what resources users can access on your system.

Since the release of version 4.2, MongoDB supports client-side field level encryption. This involves encrypting certain fields within a document before the data gets written to the database. Any client or application that tries to read it later on must first present the correct encryption keys to be able to decrypt the data in these fields.

To illustrate, say your database holds a document with the following fields and values:

{
  "name" : "Sammy",
  "phone" : "555-555-1234",
  "creditcard" : "1234567890123456"
}

It could be dangerous to store sensitive information like this — namely, a person’s phone and credit card numbers — in a real-world application. Even if you’ve put limits on who can access the database, anyone who has privileges to access the database could see and take advantage of your users’ sensitive information. When properly configured, though, these fields would look something like if they were written with client side field level encryption:

{
  "name" : "Sammy",
  "phone" : BinData6,"quas+eG4chuolau6ahq=i8ahqui0otaek7phe+Miexoo"),
  "creditcard" : BinData6,"rau0Teez=iju4As9Eeyiu+h4coht=ukae8ahFah4aRo="),
}

For a more thorough overview of MongoDB’s security features, along with some general strategies for keeping a Mongo database secure, we encourage you to check out our series on MongoDB Security: Best Practices to Keep Your Data Safe.

Flexibility

Another characteristic of MongoDB that has helped drive its adoption is the flexibility it provides when compared with more traditional database management systems. This flexibility is rooted in MongoDB’s document-based design, since collections in Mongo do not enforce a specific structure that every document within them must follow. This contrasts with the rigid structure imposed by tables in a relational database.

Whenever you create a table in a relational database, you must explicitly define the set of columns the table will hold along with their data types. Following that, every row of data you add must conform to that specific structure. On the other hand, MongoDB documents in the same collection can have different fields, and even if they share a given field it can hold different data types in different documents.

This rigidity imposed by the relational model isn’t necessarily a bad thing. In fact, it makes relational databases quite useful for storing data that neatly conforms to a predefined structure. But it can become limiting in cases where you need to store unstructured data — data that doesn’t easily fit into predefined data models or isn’t easily searchable by conventional tools.

Examples of unstructured data include media content, like videos or photos, communications data, or text files. Sometimes, unstructured data is generalized as qualitative data. In other words, data that may be human readable but is difficult for computers to adequately parse. MongoDB’s versatile document-oriented design, however, makes it a great choice for storing and analyzing unstructured data as well as structured and semi-structured data.

Another example of Mongo’s flexibility is how it offers multiple avenues for interacting with one’s data. For example, you can run the mongo shell, a JavaScript-based interface that comes installed with the MongoDB server, which allows you to interact with your data from the command line.

Mongo also supports a number of official drivers that can help you connect a database to your application. Mongo provides these libraries for a variety of popular programming languages, including PHP, Java, JavaScript, and Python. These drivers also provide support for the data types found in their respective host languages, expanding on the BSON data types available by default.

High Availability

Any computer-based database system depends on its underlying hardware to function and serve the needs of an application or client. If the machine on which it’s running fails for any reason, the data held within the database won’t be accessible until the machine is back up and running. If a database management system is able to remain in operation for a higher than normal period of time, it’s said to be highly available.

One way many databases remain highly available is through a practice known as replication. Replication involves synchronizing data across multiple different databases running on separate machines. This results in multiple copies of the same data and provides redundancy in case one of the database servers fails. This ensures that the synchronized data always remains available to the applications or clients that depend on it.

In MongoDB, a group of servers that maintain the same data set through replication are referred to as a replica set. Each running instance of MongoDB that’s part of a given replica set is referred to as one of its members. Every replica set must have one primary member and at least one secondary member.

One advantage that MongoDB’s replica sets have over other replication implementations in other database systems is Mongo’s automatic failover mechanism. In the event that the primary member becomes unavailable, an automated election process happens among the secondary nodes to choose a new primary.

Scalability

As a core component of modern applications, it’s important for a database to be able to respond to changes in the amount of work it must perform. After all, an application can see sudden surges in its number of users, or perhaps experience periods of particularly heavy workloads.

Scalability refers to a computer system’s ability to handle an ever-growing amount of work, and the practice of increasing this capacity is called scaling. There are two ways one can scale a computer system:

  • Vertical scaling — also called scaling up — involves adding more computing resources to a given system, typically by increasing its storage capacity or memory
  • Horizontal scaling — also called, scaling out — involves splitting the workload across multiple computing nodes which, all together, make up a single logical system

To vertically scale a MongoDB database, one could back up its data and migrate it to another machine with more computing resources. This is generally the same procedure for vertically scaling any database management system, including relational databases. However, scaling up like this can have drawbacks. The cost of using larger and larger machines over time can become prohibitively expensive and, no matter how great it is, there is always an upper limit to how much data a single machine can store.

Sharding is a strategy some administrators employ for scaling out a database. If you’d like a thorough explanation of sharding, we encourage you to read our conceptual article on Understanding Database Sharding. For the purposes of this article, though, understand that sharding is the process of breaking up a data set based on a given set of rules, and distributing the resulting pieces of data across multiple separate database nodes. A single node that holds part of a sharded cluster’s data set is known as a shard.

Database management systems don’t always include sharding capabilities as a built-in feature, so oftentimes sharding is implemented at the application level. MongoDB, however, does include a built-in sharding feature which allows you to shard data at the collection level. As of version 3.6, every MongoDB shard must be deployed as a replica set to ensure that the shard’s data remains highly available.

To shard data in Mongo, you must select one or more fields in a given collection’s documents to function as the shard key. MongoDB then takes the range of shard key values and divides them into non-overlapping ranges, known as chunks, and each chunk is assigned to a given shard.

Following that, Mongo reads each document’s shard key value, determines what chunk the document belongs to, and then distributes the document to the appropriate shard. MongoDB actively monitors the number of chunks in each shard, and will attempt to migrate chunks from one shard to another to ensure that each has an equal amount.

The main drawback of sharding is that it adds a degree of operational complexity to a database system. However, once you have a working MongoDB shard cluster, the process of adding more shards to scale the system horizontally is fairly straightforward, and a properly configured replica set can be added as a shard with a single command. This makes MongoDB an appealing choice for applications that need to scale out quickly.

Is MongoDB Right for my Application?

Relational database management systems still see wider use than databases that employ a NoSQL model. With that said, though, MongoDB continues to gain ground thanks to the features described throughout this guide. In particular, it’s become a common choice of database for a number of use cases.

For example, its scaling capabilities and high availability make it a popular database for e-commerce and gaming applications where the number of users being served can increase quickly and dramatically. Likewise, its flexible schema and ability to handle large amounts of unstructured data make it a great choice for content management applications which need to manage an ever-evolving catalog of assets, ranging from text, to video, images, and audio files. It has also seen strong adoption among mobile application developers, thanks again to its powerful scaling as well as its data analysis capabilities.

When deciding whether you should use MongoDB in your next application, you should first ask yourself what the application’s specific data needs are. If your application will store data that rigidly adheres to a predefined structure, you may not get much additional value from Mongo’s schemaless design and you might be better off using a relational database.

Then, weigh how much data you expect your application will need to store and use. MongoDB’s document-oriented design makes it a great choice for applications that need to store large amounts of unstructured data. Similarly, MongoDB’s scalability and high availability make it a perfect fit for applications that serve a large and ever-growing number of clients. However, these features could be excessive in cases that aren’t as data intensive.

Conclusion

By reading this article, you’ll have gained a better understanding of the features that set MongoDB apart from other database management systems. Although MongoDB is a powerful, flexible, and secure database management system that can be the right choice of database in certain use cases, it may not always be the best choice. While its document-based and schemaless design may not supplant the relational database model any time soon, Mongo’s rapid growth highlights its value as a tool worth understanding.

For more information about MongoDB, we encourage you to check out DigitalOcean’s entire library of MongoDB content. Additionally, the official MongoDB documentation serves as a valuable resource of information on working with Mongo.

Creative Commons License