The Communications Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we’ll publish excerpts from selected posts.

Follow us on Twitter at http://twitter.com/blogCACM

DOI:10.1145/1721654.1721659 http://cacm.acm.org/blogs/blog-cacm

NoSQL are most often con- SQL Databases v. sidered: update- and lookup-intensive online (OLTP) workloads, not query-intensive, data- NoSQL Databases warehousing workloads. We do not Michael Stonebraker considers several performance arguments consider document repositories or in favor of NoSQL databases—and finds them insufficient. other specialized workloads for which NoSQL systems may be well suited. There are two ways to improve OLTP performance; namely, provide auto- From Michael group identifies itself as advocating matic sharding over a shared-nothing Stonebraker’s NoSQL. processing environment and improve “The NoSQL There are two possible reasons per-server OLTP performance. Discussion has to move to either of these alternate In the first case, one improves per- Nothing to Do DBMS technologies: performance and formance by providing scalability as With SQL” flexibility. nodes are added to a computing en- http://cacm.acm.org/blogs/ The performance argument goes vironment; in the second case, one blog-cacm/50678 something like the following: I started improves the performance of individ- Recently, there has been a lot of buzz with MySQL for my data storage needs ual nodes. about NoSQL databases. In fact, there and over time found performance to be Every serious SQL DBMS—such were at least two conferences on the inadequate. My options were: as Greenplum, Aster Data, Vertica, topic in 2009, one on each coast. Seem- 1. “Shard” my data to it ParAccel, and others—written in the ingly, this buzz comes from people who across several sites, giving me a serious last 10 years has provided shared are proponents of: headache managing distributed data nothing scalability, and any new ef- ! document-style stores in which a in my application or fort would be remiss if it did not do record consists of a collec- 2. Abandon MySQL and pay big li- likewise. Hence, this component of tion of key-value pairs plus a payload. censing fees for an enterprise SQL performance should be “ stakes” Examples of this class of system in- DBMS or move to something other for any DBMS. In my opinion, nobody clude CouchDB and MongoDB, and than a SQL DBMS. should ever run a DBMS that does not we call such systems document stores The flexibility argument goes some- provide automatic sharding over com- for simplicity; thing like the following: My data does puting nodes. ! key-value stores whose records not conform to a rigid relational sche- As a result, this posting continues consist of key-payload pairs. Usually, ma. Hence, I can’t be bound by the about the other component; namely, these are implemented by distributed structure of a RDBMS and need some- single-node OLTP performance. The hash tables, and we call these key-val- thing more flexible. overhead associated with OLTP da- ue stores for simplicity. Examples in- This blog posting considers the tabases in traditional SQL systems clude MemcacheDB and Dynamo. performance argument; a subsequent has little to do with SQL, which is why In either case, one usually gets a posting will address the flexibility ar- “NoSQL” is such a misnomer. low-level, record-at-a-time database gument. Instead, the major overhead in management system (DBMS) in- For simplicity, we will focus this an OLTP SQL DBMS is communicat- terface instead of SQL. Hence, this discussion on the workloads for which ing with the DBMS using ODBC or

10 COMMUNICATIONS OF THE ACM | APRIL 2010 | VOL. 53 | NO. 4 blog@cacm

JDBC. Essentially all applications two. You must get rid of all four to run a overhead has nothing to do with that are performance-sensitive use DBMS a lot faster. SQL, but instead revolves around tra- a stored-procedure interface to run Although the NoSQL systems have ditional implementations of ACID application logic inside the DBMS a variety of different features, there are transactions, multithreading, and and avoid the crippling overhead of some common themes. First, many disk management. To go wildly faster, back-and-forth communication be- NoSQL systems manage data that is one must remove all four sources of tween the application and the DBMS. distributed across multiple sites, and the overhead discussed above. This The other alternative is to run the provide the “table stakes” noted above. is possible in either a SQL context or DBMS in the same address space as Obviously, a well-designed multisite some other context. the application, thereby giving up any system, whether based on SQL or some- pretense of access control or security. thing else, is way more scalable than a References 1. S. Harizopoulos, et. al., “OLTP Through the Looking Such embeddable DBMSs are reason- single-site system. Glass, and What We Found There,” Proc. 2008 SIGMOD Conference, Vancouver, B.., June 2008. able in some environments, but not Second, many NoSQL systems are 2. M. Stonebraker, et. al., “The End of an Architectural for mainstream OLTP, where security disk-based and retain a buffer pool as Era (It’s Time for a Complete Rewrite),” Proc. 2007 is a big deal. well as a multithreaded architecture. VLDB Conference, Vienna, Austria, Sept. 2007. Using either stored procedures or This will leave intact two of the four Disclosure: Michael Stonebraker is embedding, the useful work compo- sources of overhead noted above. associated with four startups that are nent is a very small percentage of the Concerning transactions, there is of- either producers or consumers of data- total transaction cost for today’s OLTP ten support for only single record trans- base technology. Hence, his opinions databases, which usually fit in main actions and an eventual consistency should be considered in this light. memory. Instead, a recent paper1 calcu- replica system, which assumes that lated that total OLTP time was divided transactions are commutative. In ef- Reader’s comment almost equally between the following fect, the “gold standard” of ACID trans- You seem to leave out several other sub- four overhead components: actions is sacrificed for performance. categories of the NoSQL movement in your However, the net-net is that the discussion. For example: Google’s BigTable Logging single-node performance of a NoSQL, (and clones) as well as graph databases. Traditional databases write everything disk-based, non-ACID, multithreaded Considering those in addition, would that twice—once to the database and once system is limited to be a modest factor change your point of ? to the log. Moreover, the log must be faster than a well-designed stored-pro- —Johannes Ernst forced to disk, to guarantee transac- cedure SQL OLTP engine. In essence, tion durability. Logging is, therefore, ACID transactions are jettisoned for a Blogger’s Reply an expensive operation. modest performance boost, and this I am a huge fan of “One size does not fit performance boost has nothing to do all.” There are several implementations Locking with SQL. of SQL engines with very different Before touching a record, a transaction However, it is possible to have one’s performance characteristics, along with must set a lock on it in the lock table. cake and eat it too. To go fast, one needs a plethora of other engines. Besides the This is an overhead-intensive operation. to have a stored procedure interface to ones you mention, there are array stores a runtime system, which compiles a such as Rasdaman and RDF stores such Latching high-level language (for example, SQL) as Freebase. I applaud efforts to build Updates to shared data structures, into low-level code. Moreover, one has DBMSs that are oriented toward particular such as B-trees, the lock table, and re- to get rid of all of the above four sourc- market needs. source tables, must be done carefully es of overhead. The purpose of the blog entry was to in a multithreaded environment. Typi- A recent project2 clearly indicated discuss the major actors in the NoSQL cally, this is done with short-term dura- that this is doable, and showed blaz- movement (as I see it) as they relate tion latches, which are another consid- ing performance on TPC-C. Watch for to bread-and-butter online transaction erable source of overhead. commercial versions of these and sim- processing (OLTP). My conclusion is ilar ideas with open source packaging. that “NoSQL” really means “No disk” or Buffer Management Hence, I fully expect very high speed, “No ACID” or “No threading,” i.e., speed Data in traditional systems is stored open-source SQL engines in the near in the OLTP market does not come on fixed-size disk pages. A buffer pool future that provide automatic shard- from abandoning SQL. The efforts you manages which set of disk pages is ing. Moreover, they will continue to describe, as well as the ones in the above cached in memory at any given time. provide ACID transactions along with paragraphs, are not focused on OLTP. My Moreover, records must be located on the increased programmer produc- blog comments were restricted to OLTP, as pages and the field boundaries identi- tivity, lower maintenance, and better I thought I made clear. fied. Again, these operations are over- data independence afforded by SQL. —Michael Stonebraker head intensive. Hence, high performance does not re- If you eliminate any one of the above quire jettisoning either SQL or ACID Michael Stonebraker is an adjunct professor at the overhead components, you speed up transactions. Massachusetts Institute of Technology. a DBMS by 25%. Eliminate three and In summary, blinding performance your speedup is limited by a factor of depends on removing overhead. Such © 2010 ACM 0001-0782/10/0400 $10.00

APRIL 2010 | VOL. 53 | NO. 4 | COMMUNICATIONS OF THE ACM 11