MongoDB, MMAPv1, WiredTiger, Locking, and Queues

In December 2014, MongoDB acquired the database company WiredTiger, and began integrating the WiredTiger storage engine into the MongoDB architecture. In March 2015, MongoDB released version 3.0, which introduced their pluggable storage engine, with options to store your data using the original (MMAPv1) storage engine, or using WiredTiger. This was kind of a Big Deal, and MongoDB developers and DBAs rejoiced, including myself, for a few main reasons:

  • WiredTiger provided on-disk data compression, reducing both the space requirements of your database, as well as the I/O requirements, generating some nice performance wins (at the expense of some extra CPU usage). The legacy (MMAPv1) storage engine was somewhat notorious for gobbling up tons of disk space (and never giving it back to the operating system).
  • WiredTiger introduced document-level locking to MongoDB (well, not exactly, but more on that later). A big issue with MongoDB historically was the use of a server-level lock for all database writes. While server-level locking makes reasoning about concurrent behavior easy, it meant that MongoDB’s write performance would always trail behind (a long way behind) its read performance.

So version 3.0 and WiredTiger seemed to solve all our problems. The truth was a little more complicated…

Locking & Sharding

In addition to all the WiredTiger goodness, MongoDB 3.0 introduced collection-level locking for writes. While this improved the concurrency story somewhat, things were still a little cumbersome. Pre-3.0, if you wanted to improve your write performance, you had 2 options:

  • Make your individual server faster with vertical scaling (more CPU, RAM, faster disk, etc.)
  • Shard (partition) your database (horizontal scaling)

The promise of MongoDB was always that you can shard your database easily. While it’s true that sharding MongoDB is fairly straightforward, it does add some complexity over the un-sharded deployment. In particular, moving from a replica set to a single-shard deployment, you’ll need to add 3 config servers and a mongos routing server. While it’s true that you can co-locate some of these servers on the same physical hardware to save costs, you’re still adding some complexity to the system. In addition to the “ops” side of sharding, you’ll need to choose a shard key for each (sharded) collection which MongoDB will use to partition your data.

The good news (from a complexity standpoint) is that once you’d done all that, you could just add new shards to your cluster and MongoDB would magically move data around to take advantage of new hardware that you added. Of course, the magic sometimes didn’t work completely magically, but usually things would “just work,” particularly once your data was “balanced” amongst your shards.

But there was also a second benefit to moving to shards: you basically just busted up the server-level lock, which became a shard-level lock. This benefit actually led some MongoDB users to colocate shards on the same machines. These DBAs weren’t interested in additional hardware. They just wanted better concurrency.

One drawback to sharding is that, once you have more than one shard, you do pay a performance penalty, as the mongos server has to decide which shard needs to handle your reads and writes. So if you’re creating multiple shards on a single machine, you’ll actually see your single-client performance drop (though it will scale better with multiple clients).

WiredTiger to the Rescue?

Into this environment, MongoDB introduced WiredTiger. Now suddenly the write lock was no more! Write concurrency no longer required sharding, and all of us using MongoDB rejoiced. Well, at least I rejoiced. And I pretty much immediately moved all my databases over to the WiredTiger storage engine and saw immediate performance gains almost everywhere.

Almost everywhere. Everywhere but my job queue, one of the more performance-critical parts of my infrastructure. Weird. It looked like it was slower. Much slower. Well, crud — back to MMAPv1 it was, and performance went back to where it had been in MongoDB 2.6. I tucked that info away for a few months, knowing that WiredTiger is awesome except when it’s not, and I’d found one of those edge cases.

“Document-level Locking?”

In June of 2015, I found myself at MongoDB World (which was, by the way, awesome, and you should totally go to MongoDB world) where many of the talks unsurprisingly focused on the new WiredTiger storage engine. One of them talked about how WiredTiger handles concurrency. Now I’m not a database engineer, so I might get some of the details wrong here, but the basic idea is that it’s not really locking at all; it’s using multiversion concurrency control (MVCC) to perform writes in a lock-free manner. It then checks to see if another thread is also updating the same document, and if it is, only one “wins” and the other rolls back its changes, trying again later.

Suddenly things started to fall into place. In my job queue (and in most any priority queue implemented in MongoDB), the core operation is a findAndModify that atomically finds the highest-priority message in the queue and modifies it to reflect that it’s being worked on. If most of your worker threads are busy doing useful work, this operation is contention-free and things move along fine. But what happens when you have more than one worker thread asking for work at the same time?

In the MMAPv1 engine, since locking happened at the collection level (or database or server level, depending on the version), only one thread would do the operation at a time, and everything proceeded swimmingly. In WiredTiger, all the workers who were seeking a message would findAndModify the same message, and then all but one of them would roll back their changes and try again. So performance tanked when contention increased.

Lessons Learned (tl;dr)

So what did I learn from my little adventure?

  • WiredTiger is awesome except when there is a high degree of write contention over a single document (like the highest priority message in a queue).
  • You should either performance-test major infrastructure changes or be ready to roll them back really fast if things go poorly.
  • Sometimes you want a big lock to control your concurrency.

Finally, if you’d like more information on the job queue I used, I spoke at MongoDB World 2015 (where I learned all the performance caveats about WiredTiger) on Chapman, the distributed task service we built.