MongoDB Part 1: Key Values
Over the last forty years, relational databases have become the preferred method of persistent data storage for large organizations. They have in fact become so successful that not only are they used by most governments, enterprises, and large websites, you will find them your mobile phone and tablet.
Recently, the Headend Group has adopted MongoDB as our persistent data storage technology for a range of new server products. MongoDB is one of a growing number of NoSQL databases. This article is the first in a series on MongoDB. Future articles will examine how MongoDB works, the issues related to its deployment, and a customer case study.
Since many of you are unfamiliar with non-relational databases, I am going to start this series by providing a general introduction that highlights the historical, technical, and commercial reasons behind the NoSQL movement.
Historical Perspecitive
Non-relational databases have been around since the start of digital computing. After all, NoSQL is just a fancy way to say hash table, dictionary, or key-value store, and there is almost nothing that you cannot accomplish with a NoSQL database, you could not do with a comma separated value (csv) file and a bit of coding. But with the notable exception of Lotus Notes and Telelogic DOORS, relation databases quickly eclipsed alternative technologies.
The Acid Test
Relational databases dominate data storage because they solved many of the issues associated with managing large volumes of data. Perhaps more importantly, they also include safeguards that protect the data they store. Nearly all relational database, include mechanisms that ensure data is not accidentally deleted, overwritten, or corrupted. They also log where, when, and who made changes to the database. At the heart of any relational data base is a series of principals called Atomicity, Consistency, Isolation, and Durability (ACID).
…a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction. (Wikipedia)
Since relational databases are ACID compliant organizations of any size, but especially large conservative organizations, like banks, insurers, multi-national corporations, and governments, not only trust them, but rely on them to store their mission critical data.
Old Assumptions
ACID compliant relational databases were designed for a world dominated by mainframe and mini computers. Many of their technical assumptions were based on the available computer technology of the time:
- Disk storage was both expensive, and unreliable
- Systems supported a limited number of concurrent users
- Solid state random access memory (RAM) was relatively new and expensive. In other words, most data could not be kept in memory, but was written to and from disk. For this reason, modern relational databases try and keep as much data in RAM as possible.
- Users accessed databases via character-based terminals
As a result, relational databases were optimized for writing data at the expense of reading data. With improvements in storage technology, graphics, memory, and processing power these restrictions have become less relevant over time.
The Rise of the Web
Improvements in computer technology were not the only reasons for the emergence of alternative data stores. The main catalyst behind NoSQL was the rise of the World Wide Web. In many cases, the web turns ACID upside down. For example:
- The number of users served by a large web site was previously inconceivable; Facebook by itself has close to 1 billion registered users.
- Most web applications are reading and not writing data.
- For most web surfers, data availability and access speeds are more important than its integrity and consistency. For example, a search engine doesn’t need to hold an accurate copy of every single web page to return useful search results.
In this environment, ACID compliant technologies become bottlenecks and restrict system performance. It is therefore no surprise that Google was one of the early pioneers of NoSQL technology. From a historical perspective, you could trace the start of the current NoSQL movement to a 2004 whitepaper written by Google engineers that describes their Big Table database.
Additional Factors
In addition to ACID compliance, NoSQL proponents can cite numerous technical and non-technical factors for their adoption, such as:
The market is dominated by Oracle and Microsoft. Their licensing and support fees are complex and expensive.
Oracle supports most platforms, Microsoft SQL Server only runs on Windows.
MySQL and Postgres provide free alternatives to commercial databases, but support issues confine them to open source and startup projects. Also, MySQL is now owned by Oracle.
Although Microsoft has tried to simplify and automate database management, it is still very hard to manage large databases without a qualified database administrator. Good DBAs are both rare and expensive.
Relational database store data in a flat, table structure. Modern object-orientated programing organizes data into hierarchical trees. In order to pass data between a database and an application, it must pass through a translation layer; and just like humor, data can get lost in translation. While it is possible to change the underlying structure of a relational database, it is not recommended. Furthermore, making ad-hoc changes could corrupt a database. This means that once you finalize your data model, it is almost impossible to change it without breaking it. This frustrates developers that like to make changes when and as they see fit.
Winners and Loosers
In this game, there are no winners or losers. Just because there are alternatives to relational databases, it does not mean that have been superseded. So for the foreseeable future, our CA systems will store entitlements in a relational database.
On the other hand, for storing other types of information, like user preferences or recommendations, there are clear advantages to using non-relational databases, like MongoDB. At the end of the day, we should always choose the most suitable tool for the job, and not the newest and shiniest in the toolbox.
As a side point, it is interesting to note that as NoSQL enters the mainstream, Google is moving in the opposite direction. In October 2012, Google research released a white paper describing Spanner, its replacement for Big Table. Spanner includes many features from relational databases, which ironically, include an SQL-based query language.
In the next article in this series, I will be taking an in-depth look at MongoDB. The article will focus on how MongoDB stores data and how users retrieve it.