Feature:

A Survey of Systems

Ganesh Chandra Deka, Ministry of Labor & Employment, Government of India

This survey of 15 popular cloud databases provides an overview of each system and its storage platform, license type, and programming language used for writing the source code of the NoSQLs. It also considers features such as data-handling techniques and billing practices.

he exponential growth of the Internet to meet the requirements of different user groups. has resulted in an explosion of data It’s not possible to discuss all of them here, so sources, creating storage and data-us- I selected 15 popular NoSQL databases that ability problems. Furthermore, an in- are representative (rather than inclusive), and I Tcrease in the number of data types has created review some of their interesting characteristics. challenges in terms of storing and manipulating unstructured data. Surveyed Systems These issues have lead companies and open NoSQL databases can be divided into two source communities to build new tools, known groups, depending on the elasticity level. The as NoSQL systems or “key-value-store” systems, first group contains truly elastic databases, such which aim to offer, on a massive scale, on- as MongoDB, which allows the addition of new demand services and simplified application nodes to a cluster without any observable down- development and deployment. NoSQL databases time for the clients. The second group contains are useful for applications that deal with very rigidly defined BigTable-based NoSQL databases large semistructured and unstructured data. (such as Cassandra and HBase), which have sig- The growing popularity of big data will compel nificant downtime when new nodes are added to many companies to use NoSQL databases1 the cluster. instead of traditional database, so you can expect Constant data availability when nodes are to see vendors offering simplified rollouts and added or removed from the cluster is made additional support for NoSQL solutions.2 possible by routing mechanisms and algorithms According to -database.org, there are at that decide when to move data chunks that least 150 NoSQL databases, with various features are working together. For example, when data

50 IT Pro March/April 2014 Published by the IEEE Computer Society 1520-9202/14/$31.00 © 2014 IEEE

itpro-16-02-deka.indd 50 07/03/14 3:00 PM must be moved to newly added nodes, during stored in tablets already looked for will be made the copying process, the data is served from the directly to the last level of the tree. original location. When the new node has an up- to-date version of the data, the routing processes HBase start to send requests to this node.1 HBase is an open source, distributed, versioned, column-oriented data store modeled after Google’s Cassandra BigTable. Basically, it’s a clone of BigTable, The Apache Cassandra database offers good providing a real-time, structured database on top scalability and high availability without com­­ of the Hadoop distributed file system. HBase is promising performance. Its demonstrated suitable for applications requiring random, real- fault-tolerance on commodity hardware (cloud time read/write access to big data. HBase’s goal infrastructures) and linear scalability make it is to host very large tables with billions of rows the ideal platform for mission-critical data. and millions of columns on top of clusters of Cassandra features allow replication across commodity hardware. multiple datacenters, offering lower latency HBase provides linear and modular scalability for data availability during regional outages. (ability of a database to handle a growing amount Cassandra’s ColumnFamily information model of data), consistent reads and writes, automatic offers the convenience of column indexes with and configurable sharding (a horizontal partition the performance of log-structured updates, in a database) of tables, and automatic failover strong support for materialize views (also known support between RegionServers. It also offers as snapshots), and powerful built-in caching. convenient base classes for backing Hadoop Netflix, Twitter, Urban Airship, Reddit, Cisco, MapReduce jobs with HBase tables, an easy-to- OpenX, Digg, CloudKick, and Ooyala are some use Java API for client access, block cache and of the companies that use Cassandra to deal Bloom Filters for real-time queries, as well as with huge, active, online interactive datasets. query-predicate pushdown via server-side filters. The largest known Cassandra cluster has over Finally, it has an extensible JRuby-based shell 300 Tbytes of information in over 400 machines and includes support via the Hadoop metrics (http://cassandra.apache.org). subsystem to files for exporting metrics via Java Management Extensions. BigTable Google’s BigTable maps two arbitrary string MongoDB values (row key and column key) and a time MongoDB is an open source, schema free, stamp (creating 3D mapping) into an associated document-oriented, scalable NoSQL database arbitrary byte array. BigTable can be characterized system. This high-performance, fault-tolerant, as a light, distributed, multidimensional sorted persistent system provides a complex query map. It was developed to scale to the petabyte language as well as an implementation of range among numerous machines to make it easy MapReduce. to add machines, automatically taking advantage MongoDB offers of those resources without any reconfiguration.3 When sizes threaten to grow beyond a specified • document-oriented storage—JavaScript Object limit, the tablets are compressed using the Notation (JSON)-style documents with dy- BMDiff 4,5 algorithm and the Zippy open source namic schemas offer simplicity and power; compression algorithm,6 publicly known as • full index support—that is, it can index any Snappy,5 which is a less space-optimal variation attribute;­ of the LZ77 algorithm but more efficient in terms • data availability—it can mirror across LANs of computing time (www.aosabook.org/en/posa/ and WANs for scalability; and infinispan.html). • autosharding—it scales horizontally without To get a specific row stored in BigTable, a new compromising functionality. client must connect to all levels of the tree, but the information obtained on the upper levels is It also supports rich, document-based queries; cached, meaning that further requests for data atomic modifiers for contention-free performance;­

computer.org/ITPro 51

itpro-16-02-deka.indd 51 07/03/14 3:00 PM Feature: Cloud Computing

Requests Requests Requests Master copy of partition table/ tablet mapping reliability. Hypertable will be useful for organization that must administer rapidly evolving data support for online real-time applications. Modeled Tablet Routers controller after Google’s BigTable project, Hypertable is designed to manage information storage and processing on a large cluster of commodity servers, providing resilience to machine and component failures.

CouchDB Tablets Apache CouchDB is a document- Tablet servers oriented database written using Erlang, a robust functional programming language ideal for building concurrent Figure 1. The Pnuts data storage architecture. Multiple applications distributed systems. CouchDB can be share this massive-scale centrally managed database system. queried and indexed using JavaScript in a MapReduce fashion. It also offers incremental replication with flexible aggregation and data processing; and bidirectional conflict detection and resolution. GridFS (a MongoDB file format for storing files It provides a RESTful JSON API that can be larger than 16 MB), so it can store files of any size accessed from any environment that allows without complicating your stack. HTTP requests. There are many third-party client libraries, making it easier to choose a programming Pnuts language. CouchDB’s built-in Web administration The Pnuts system is a massive-scale hosted, console speaks directly to the database using centrally managed database system shared by HTTP requests issued from the browser. multiple applications (see Figure 1). It supports Yahoo’s data-serving Web applications rather Voldemort than complex queries (such as offline analysis of Voldemort is a fully distributed key-value Web crawls).7 storage system. Each node is independent with Pnuts provides data management as a service. no central point of failure or coordination. This significantly reduces application development Voldemort is designed for use as a simple storage time, because developers don’t have to architect that’s fast enough to avoid needing a caching and implement their own scalable, reliable data- layer on top of it. The software architecture is management solutions.4 Consolidating multiple made of several layers, each of which implements applications into a single service lets users amortize the put, get, and delete operations. Each layer is operations costs over multiple applications and responsible for a specific function, such as apply the same best practices to the data man­­ TCP/IP communications, routing, or conflict agement of many different applications. Moreover, resolution. having a shared service means you can keep With Voldemort, data is automatically replicated resources (servers, disks, and so on) in reserve and over multiple servers as well as automatically quickly assign them to applications experiencing a partitioned, so each server contains only a sudden surge in popularity.6 subset of the total data. Server failure is handled transparently, and pluggable serialization is Hypertable supported to allow rich keys and values, including Hypertable is a high-performance, distributed, lists and tuples with named fields. Voldemort open source, NoSQL cloud-based data-storage can be integrated with common serialization system designed to support applications requiring frameworks, such as Protocol Buffers, Thrift, maximum performance, scalability, and Avro, and Java Serialization.

52 IT Pro March/April 2014

itpro-16-02-deka.indd 52 07/03/14 3:00 PM Furthermore, in Voldemort, membership gossiping, gossiped synchronization of partitions, pluggable storage engines, a • data items are versioned to maximize data in- thrift interface, and a Web console with canvas tegrity in failure scenarios without compro- visualizations. mising system availability; • single node performance is capable of 10,000– Redis 20,000 operations per second, depending on Redis (remote dictionary server) is an easy-to- the machines, network, disk system, and data- use open source, advanced key-value in-memory replication factor; and database in which all keys stay in memory (RAM). • pluggable data placement strategies can support It’s often referred to as a data-structure server, features such as distribution across ­datacenters because keys can contain strings, hashes, lists, that are geographically far apart. sets, and sorted sets.9 It’s node to node protocol is binary, optimized for bandwidth and speed, Voldemont is used in LinkedIn for high-­scalability but its values can’t be bigger than 512 Mbytes, storage features, where simple functional parti- and it doesn’t use the operating system’s virtual tioning isn’t sufficient. However, it often has error memory. messages, likely caused by uncaught bugs. The source code is available under the Apache 2.0 Xeround license.5 Xeround offers its own elastic database service based on MySQL. It hosts the service at Infinispan Web Services datacenters located in Europe Infinispan is a peer-to-peer, in-memory data grid and North America, so customers can choose (IMDG) platform, written for the Java virtual whichever location is closest.10 Xeround’s machine (www.aosabook.org/en/posa/infinispan. database is host agnostic, so users can easily html). It’s an extremely scalable, highly available, migrate between cloud service providers. open source data grid platform. It’s a distributed, Xeround’s two-tier architecture comprises in-memory key-value NoSQL store, designed access nodes and data nodes. Access nodes to make the most of up-to-date multiprocessor/ receive application requests, communicate with multicore architectures and to provide distributed data nodes, perform computations, and deliver cache capabilities. Infinispan presents a cache request results, while data nodes store the interface, extending java.util.Map as well as data. Xeround stores data in virtual partitions optionally backing a peer-to-peer network archi­ that aren’t bound to the underlying hardware tecture to distribute the system efficiently around infrastructure. Each partition is replicated to the a data grid. different data nodes, located on separate servers, providing high availability and full resiliency. In Dynomite addition to offering multiple geographic locations, Dynomite provides integrated storage and dis­ Xeround is planning to offer its services to tribution, requiring developers to adopt a other cloud providers, including GoGrid and simple, key-value data model to improve Rackspace. availability and scalability. By separating availability Xeround Basic offers data sizes up to 500 Mbytes, and scalability, developers can take advantage of supports up to 40 concurrent connections (Xeround Dynomite’s sophisticated distribution and scaling Pro supports up to 4,800 connections), and has a techniques while still having great flexibility in max throughput up to 8 Mbytes per second. choosing the data model (https://github.com/ cliffmoon/dynomite/wiki/dynomite-framework). SimpleDB Written in Erlang, Dynomite is a consistent, Amazon Web Services has its own NoSQL distributed, key-value-store NoSQL database cloud database service called SimpleDB, which system. The design is based on Amazon’s Dynamo can support basic use cases, and minimal use paper.8 Dynomite currently implements vector is free. The SimpleDB source code is written clocks, Merkle trees, and consistent hashing. using Erlang, allowing for flexible design, easy It also supports tunable quorum sensing, scalability, and extensibility.4

computer.org/ITPro 53

itpro-16-02-deka.indd 53 07/03/14 3:00 PM Feature: Cloud Computing

Dynamo updates are written to a buffer in memory, and Amazon Dynamo helps manage the state of the entire buffer is written sequentially to disk. services that have high reliability requirements and Multiple updates to the same record can be require tight control over the tradeoffs between flushed at different times to different parts of availability, consistency, cost-effectiveness, and the disk. As a result, to read a record, BigTable performance. Dynamo provides a simple primary- type NoSQL databases must perform multiple key only interface to accommodate application I/Os to retrieve and combine the various updates. requirements. It uses a combination of successfully Because all write operations are sequential, the implemented techniques to accomplish scalability writes are fast, but reads are correspondingly and availability: data is abstracted and consistently deoptimized. replicated11 through object versioning.12 Replica In contrast, the traditional buffer-pool consistency is maintained during updates using a architecture of Pnuts overwrites records when synchronization protocol that applies a quorum- they’re updated. Because updates require like technique and decentralized replica. Dynamo random I/O, they’re comparatively slower than provides a decentralized arrangement, keeping BigTable and similar systems, but reads are administration charges to a minimum. Storage fast because a single I/O can retrieve the entire nodes are automatically added and removed latest record. without requiring manual administration or HBase doesn’t synchronize log updates to redistribution. disk, which results in low latency updates Dynamo has been the fundamental storage and high throughput. This is appropriate for technology for several core services in Amazon’s HBase’s target use—to run batch analytics on e-commerce platform. It can scale to extreme serving data, rather than to present guaranteed peak loads efficiently without any downtime robustness for such data. For such a system, during holiday shopping seasons. For example, high-throughput sequential reads and writes the service that maintains the shopping cart are favored over durability for random updates. serves millions requests, which have resulted Pnuts always forces log updates to a disk when in over three million checkouts in a single day, committing a transaction, although this log force and the service that manages the session state can be disabled. has handled thousands of concurrently active­ Cassandra and Pnuts support asynchronous sessions.9 replication—that is, wide-area replication— without adding significant overhead to the ClearDB update call itself. In this model, writes are ClearDB also offers a hosted relational database. allowed anywhere, and conflicting writes to the This globally distributed MySQL database has same object are resolved afterward. high availability, survivability, and performance.13 Synchronous reproduction ensures freshness It offers multiregional read/write mirroring with of replicas and is used in HBase and Cassandra. 100 percent uptime, even if networks or disks Column storage is beneficial for an application fail. It’s also cloud agnostic—that is, its hybrid that must access a known subset of columns for configuration can span over multiple clouds as each request. BigTable, HBase, and Cassandra all well as physical datacenters scattered over a wide provide the capability to declare column groups geographical area. or families and to add columns. Each group/ family is physically stored separately. On the System Capability Comparisons other hand, if requests typically want the entire Here, I consider the systems based on two main row, or arbitrary subsets of it, partitioning that features. keeps the entire row physically together is best. This can be done with row storage in Pnuts or by Data-Handling Techniques using a single column group/family in a column BigTable and similar sort systems, such as store like Cassandra. Cassandra and HBase, attempt to perform To conclude, get, put, and delete functions are sequential I/O for updates, because this approach best supported by key-value systems, such as never overwrites records on a disk. Instead, Hypertable and Voldemort, while data aggregation

54 IT Pro March/April 2014

itpro-16-02-deka.indd 54 07/03/14 3:00 PM becomes much easier using column-oriented • large, unstructured, sparse, and growing data; NoSQLs (such as Cassandra or BigTable). • a less rigid schema; and Mapping data becomes easier using document- • performance and availability are preferred over oriented NoSQL databases, such as MongoDB. redundancy.

Billing Practices Another important factor is cost effectiveness— Cassandra lets client specify on a per-call that is, when clusters of cheap commodity servers basis whether the write is durably persistent. are used to manage the exploding volume of data Amazon DynamoDB is a service from Amazon and transaction. that provides NoSQL database service with seamless scalability. It lets a user launch a new Observations Amazon DynamoDB database table and scale Advancements in the NoSQL architecture the request capacity up or down for the table motivated Yahoo to develop criteria for without downtime or performance degradation. quantitatively evaluating NoSQL systems. Its With Amazon Elastic Compute Cloud (EC2), a Cloud Servicing Benchmark is a well-known user can get root access but pays for idle time, benchmarking framework used to evaluate when no computation is being done. Users can many different NoSQL databases, and it can calculate the cost of computing in the Amazon be extended to support varying workload levels website at http://calculator.s3.amazonaws.com/ (http://code.google.com/appengine/docs/python/ calc5.html. runtime.html). Each month, Amazon SimpleDB users pay Table 1 lists the salient features of the 15 no charges on the first 25 machine hours systems discussed in this article. and 1 Gbyte of storage consumed. Amazon The two main issues for selecting a particular DynamoDB users pay no charges on the database are first 100 Mbytes of storage, as well as five writes per second and 10 reads per second of • addressing the workload requirements, which ongoing throughput capacity (http://calculator. involves comparing read-optimized versus s3.amazonaws.com/index.html). write-optimized substitution; and MongoDB subscription rates are as follows: the • weighing latency versus durability (for ex- basic subscription is US $2,500 per server, the ample, if developers know that they can lose standard subscription is $5,000 per server, and the a small fraction of writes—such as Web poll enterprise subscription is $7,500 per server (www. votes—then they can acknowledge successful mongodb.com/products/mongodb-subscriptions/ writes without waiting for them to be synched pricing). to the disk). Memory (RAM) is one of the key resources that NoSQL databases heavily rely on for Database.com is a database engine for cloud performance boosts. By sharing RAM across application developers. It’s a pilot project de- multiple servers, NoSQL picks up the ability for veloped by .com to power its web- easy scale-out operations while also benefiting site. It offers high scalability and availability; from increased redundancy or higher availability a relational data store, suitable for many en- through fault tolerance. NoSQL’s scale-out terprise needs; file storage for documents, strategy also picks up additional cost benefits, videos, and images; SOAP and REST APIs; because most implementations are open source. and various toolkits for Java, .NET, Ruby, In fact, almost all the NoSQLs discussed here Python, iOS, Android, Google App Engine, are either open source or have an open source Google Data, , Amazon version with limited features. Web Services, Facebook, Twitter, and Adobe Organizations planning to use NoSQL databases Flash/Flex.14 must understand their storage requirements as RedHat has transformed the world of data- well as the long-term implications, in addition storage software the way it revolutionized the to the cost factor. Some of the factors for using market for the Unix-based operating system NoSQLs include (www.redhat.com/database_availability). Use of

computer.org/ITPro 55

itpro-16-02-deka.indd 55 07/03/14 3:00 PM Feature: Cloud Computing

Table 1. Salient features of the 15 NoSQL systems surveyed in this article.* NoSQL system Storage type/platform License type Programming language

Cassandra Column Open source Java BigTable Column Proprietary C HBase Column Open source Java MongoDB Document Open source/ C++ General Public License Pnuts Column Proprietary Java/Java virtual machine Hypertable Key-value General Public License/ C++ open source CouchDB Document Open source Erlang Voldemort Key-value Open source Java Infinispan Data grid cloud Open source Java Dynomite Key-value Open source Erlang Redis Key-value/tuple Open source C Xeround MySQL based General Public License/ C/C++† open source SimpleDB Document Proprietary Erlang Dynamo Key-value Open source Erlang ClearDB MySQL based Open source C† *Source: http://nosql.findthebest.com and http://nosql-database.org †Developed in C or C++

multiple cloud infrastructures, coupled with the puter Eng. Dept., Université Catholique de Louvain, automation of PaaS, will make cloud databases 2011; www.info.ucl.ac.be/~pvr/MemoireThibault services the natural choice, preferred over do-it- Dory.pdf. yourself or managed databases.11 2. S. Sakr et al., “A Survey of Large Scale Data Man- agement Approaches in Cloud Environments,” IEEE Communications Surveys & Tutorials, vol. 13, no. 3, ata management has become critical, 2011, pp. 311–336. considering the wide range of storage lo- 3. “Xeround—Cloud Database Frequently Asked D cations and plethora of mobile devices. Questions (FAQs),” GetApp.com, 2014; www.getapp. Data has already escaped from the IT depart- com/q/xeround-cloud-database-application. ment’s control, moving into the wider reaches of 4. “Major Technology Players Look to Open Source cloud-based services, mobile devices, and social Model for New Data Center Project,” EnterpriseDB. networking sites. The proliferation and maturity com, 27 Dec. 2011; http://enterprisedb.com/news- of cloud computing will increase the need for events/news/major-technology-players-look-open- reliable cloud-database services. Of the 15 cloud source-model-new-data-center-project. databases surveyed here, Cassandra, HBase, and 5. J. Rogers, O. Papaemmanouil, and U. Cetintemel, MongoDB are the most widely used and thus the “A Generic Auto-Provisioning Framework for Cloud most representative of the NoSQL world. Instead Databases,” Proc. IEEE 26th Int’l Conference Data Eng. of supporting ACID (atomicity, consistency, iso- Workshops (ICDEW 10), IEEE, 2010, pp. 63–68; doi: lation, durability) transactions, many cloud data- 10.1109/ICDEW.2010.5452746. bases support BASE (basically available, soft state, 6. F. Chang et al., “Bigtable: A Distributed Storage eventual consistency) principles. As an extension System for Structured Data,” ACM Trans. Com- of this work, I plan to perform a “BASE” analysis puter Systems, vol. 26, no. 2, 2008, article 4; doi: of Cassandra, HBase, and MongoDB. 10.1145/1365815.1365816. 7. B.F. Cooper et al., “Benchmarking Cloud Serving References Systems with YCSB,” Proc. 1st ACM Symp. Cloud Com- 1. T. Dory, “Study and Comparison of Elastic Cloud puting (SoCC 10), 2010, pp. 143–154; http://ipij.aei. Databases: Myth or Reality?” master’s thesis, Com- polsl.pl/django-media/lecture_file/ycsb.pdf.

56 IT Pro March/April 2014

itpro-16-02-deka.indd 56 07/03/14 3:00 PM 8. G. DeCandia et al., “Dynamo: Amazon’s Highly 13. C. Coleman, “Welcome to the New ClearDB,” Available Key-Value Store,” Proc. 21st ACM SIGOPS ClearDB, 17 Oct. 2011; www.cleardb.com/blog/ Symp. Operating Systems Principles (SOSP 07), 2007, entry?id=beta/welcome. pp. 205–220; doi: 10.1145/1294261.1294281. 14. K. Subramanian, “Database.com: Salesforce’s New 9. A. Masudianpour, “An Introduction to Redis Serv- RDBMS as a Service Offering,” Cloud Ave., 6 Dec. er, An Advanced Key Value Database,” SlideShare, 2010; www.cloudave.com/8542/database-com- 9 Aug. 2013; www.slideshare.net/masudianpour/ salesforces-new-rdbms-as-a-service-offering. redis-25088079. 10. N. Vekiarides, “Ten Hot Trends in Cloud Data for 2012,” Ganesh Chandra Deka is an assistant director of train- Cloud Computing J., 3 Jan. 2012; http://cloudcomputing. ing at DGE&T, Ministry of Labor & Employment, Gov- sys-con.com/node/2109293. ernment of India. His research interests include cloud 11. D.C. Giuseppe et al., “Dynamo: Amazon’s Highly Avail- computing, e-governance, and speech processing. Contact able Key-Value Store,” Proc. 21st ACM Symp. Operating him at [email protected]. Systems Principles (SOSP-07), 2007, pp. 205–220. 12. L. Lamport, “Time, Clocks, and the Ordering of Events in a Distributed System,” Comm. ACM, vol. 21, Selected CS articles and columns are available no. 7, 1978, pp. 558–565. for free at http://ComputingNow.computer.org.

Call forArticles IEEE Software seeks practical, readable articles that will appeal to experts and nonexperts alike. The magazine aims to deliver reliable information to software developers and managers to help them stay on top of rapid technology change. Submissions must be original and no more than 4,700 words, including 200 words for each table and fi gure.

Author guidelines: www.computer.org/software/author.htm Further details: [email protected] www.computer.org/software

computer.org/ITPro 57

itpro-16-02-deka.indd 57 07/03/14 3:00 PM