Oracle Nosql Database Compared to Cassandra and Hbase

Oracle NoSQL Database Compared to Cassandra and HBase Overview . Oracle NoSQL Database is licensed under AGPL while Cassandra and HBase are Apache 2.0 licensed. Oracle NoSQL Database is in many respects, as a NoSQL Database implementation leveraging BerkeleyDB in its storage layer, a commercialization of the early NoSQL implementations which lead to the adoption of this category of technology. Several of the earliest NoSQL solutions were based on BerkeleyDB and some are still to this day e.g. LinkedIn’s Voldemort. The Oracle NoSQL Database is a Java based key-value store implementation that supports a value abstraction layer currently implementing Binary and JSON types. Its key structure is designed in such a way as to facilitate large scale distribution and storage locality with range based search and retrieval. The implementation uniquely supports built in cluster load balancing and a full range of transaction semantics from ACID to relaxed eventually consistent. In addition, the technology is integrated with important open source technologies like Hadoop / MapReduce, an increasing number of Oracle software solutions and tools and can be found on Oracle Engineered Systems. Cassandra is a key-value store that supports a single value abstraction known as table-structure. It uses partition based hashing over a ring based architecture where every node in the system can handle any read-write request, so nodes become coordinators of requests when they do not actually hold the data involved in the request operation. HBase is a key-value store that supports a single value abstraction known as table-structure ( popularly referred to as column family ). It is based on the Google Big Table design and is written entirely in Java. HBase is designed to work on top of the HDFS file system. Unlike Hive, HBase does not use MapReduce in its implementation, but accesses HDFS storage blocks directly and storing a natively managed file type. The physical storage is similar to a column oriented database and as such works particularly well for queries involving aggregations, similar to the shared nothing analytic databases AsterData, GreenPlum, etc. Comparison The table below gives a high level comparison of Oracle NoSQL Database and Cassandra features/capabilities. Low level details are found in links to Oracle and Cassandra online documentation. Point HBase Cassandra ONDB Cassandra is based on DynamoDB (Amazon). Initially developed at ONDB is based Oracle Facebook by former Amazon HBase is based Berkeley DB Java Edition a Foundatio engineers. This is one reason why on BigTable mature log-structured, high ns Cassandra supports multi data (Google) performance, transactional center. Rackspace is a big database. contributor to Cassandra due to multi data center support. HBase uses the Hadoop Infrastructure (Zookeeper, Cassandra started and evolved NameNode, separate from Hadoop and its HDFS). infrastructure and Operational ONDB has simple infrastructure Organizations knowledge requirements are requirements and does not use Infrastruct that will different than Hadoop. However, Zookeeper. Hadoop based ure deploy Hadoop for analytics, many Cassandra analytics are supported via a anyway may deployments use Cassandra + ONDB/Hadoop connector. be comfortable Storm (which uses Zookeeper), with leveraging and/or Cassandra + Hadoop. Hadoop knowledge by using HBase The HBase- ONDB uses a single node type Hadoop to store data and satisfy read Infrastructure Cassandra uses a a single Node- requests. Any node can accept a has several type. All nodes are equal and request and forward it if Infrastruct "moving parts" perform all functions. Any Node necessary. There is no SPOF. In ure consisting of can act as a coordinator, ensuring addition, there is a simple Simplicity Zookeeper, no SPOF. Adding Storm or watchdog process (the Storage and SPOF Name Node, Hadoop, of course, adds Node Agent or SNA for short) Hbase Master, complexity to the infrastructure. on each machine to ensure high and Data availability and automatically Nodes, restart any data storage node in Zookeeper is case of process level failures. clustered and The SNA also helps with naturally fault administration of the store. tolerant. Name Node needs to be clustered to be fault tolerant. HBase is optimized for reads, Cassandra has excellent single- supported by row read performance as long as ONDB provides: 1) Strict single-write eventual consistency semantics are consistency reads at the master master, and sufficient for the use-case. 2) eventual consistency reads, resulting strict Cassandra quorum reads, which with optional time constraints consistency are required for strict consistency Read on the recency of data and 3) model, as well will naturally be slower than Intensive application level Read your as use of Hbase reads. Cassandra does not Use Cases writes consistency. All reads Ordered support Range based row-scans contact just a single storage Partitioning which may be limiting in certain node making read operations which supports use-cases. Cassandra is well suited very efficient. ONDB also row-scans. for supporting single-row queries, supports range based scans. HBase is well or selecting multiple rows based suited for on a Column-Value index. doing Range based scans. HBase provides for asynchronous replication of Cassandra Random Partitioning an HBase provides for row-replication of a Cluster across single row across a WAN, either a WAN. HBase Multi-Data asynchronous (write.ONE, clusters cannot Center write.LOCAL_QUORUM), or be set up to [ Release 3.0 provides for Support synchronous (write.QUORUM, achieve zero asynchronous cascaded and write.ALL). Cassandra clusters RPO, but in replication across data centers. ] Disaster can therefore be set up to achieve steady-state Recovery zero RPO, but each write will HBase should require at least one wan-ACK be roughly back to the coordinator to achieve failover- this capability. equivalent to any other DBMS that relies on asynchronous replication over a WAN. Fall-back processes and procedures (e.g. after failover) are TBD. Writes are replicated in a pipeline fashion: the first-data-node for the region persists the write, and then sends the write to the next Natural ONDB considers a request with Cassandra's coordinators will send Endpoint, and ReplicaAckPolicy.NONE (the parallel write-requests to all so-on in a ONDB equivalent of Natural Endpoints, The pipeline Write.ONE) as having coordinator will "ack" the write Write.ON fashion. completed after the change has after exactly one Natural Endpoint E HBase’s been written to the master's log has "acked" the write, which Durability commit log buffer; the change is propagated means that node has also persisted "acks" a write to the other members of the the write to its WAL. The writes only after *all* replication group, via an may or may not have committed to of the nodes in efficient asynchronous stream- any other Natural Endpoint. the pipeline based protcol. have written the data to their OS buffers. The first Region Server in the pipeline must also have persisted the write to its WAL. HBase only Cassandra officially supports ONDB only supports random Ordered supports Ordered Partitioning, but no partitioning. Prevailing Partitionin Ordered production user of Cassandra uses experience indicates that other g Partitoning. Ordered Partitioning due to the forms of partioning are really This means "hot spots" it creates and the hard to administer in practice. that Rows for a operational difficulties such hot- CF are stored spots cause. Random Partitioning in RowKey is the only recommended order in Cassandra partitioning scheme, HFiles, where and rows are distributed across all each Hfile nodes in the cluster. contains a "block" or "shard" of all the rows in a CF. HFiles are distributed across all data- nodes in the Cluster Because of ordered partitioning, HBase queries can be formulated Because of random partitioning, with partial partial rowkeys cannot be used start and end with Cassandra. RowKeys must be ONDB range requests can be RowKey row-keys, and known exactly. Counting rows in a defined with partial start and Range can locate rows CF is complicated. It is highly end row-keys. The start and end Scans inclusive-of, or recommended that for these types row-keys in a range-scan need exclusive of of use-cases, data should be stored not exist in the store. these partial- in columns in Cassandra, not in rowkeys. The rows. start and end row-keys in a range-scan need not even exist in Hbase. There are no limits on range Due to Ordered scans across major or minor Partitioning, If data is stored in columns in keys. Range scans across major Linear HBase will Cassandra to support range scans, keys require access to each Scalability easily scale the practical limitation of a row shard in the store. Release 3 will for large horizontally size in Cassandra is 10's of support major key and index tables and while still Megabytes. Rows larger than that range scans that are parallelized range supporting causes problems with compaction across all the nodes in the store. scans rowkey range overhead and time. Minor key scans are serviced by scans. the single shard that contains the data associated with the minor key range. Cassandra does not support Atomic Compare and Set. HBase Counters require dedicated counter supports column-families which because of ONDB supports atomic Atomic eventual-consistency requires that compare and set, making it Atomic Compare and all replicas in all natural end- simple to implement counters. Compare Set. HBase points be read and updated with ONDB also supports atomic and Set supports ACK. However, hinted-handoff modification of multiple minor supports mechanisms can make even these key/value pairs under the same transaction built-in counters suspect for major key. within a Row. accuracy. FIFO queues are difficult (if not impossible) to implement with Cassandra. Hbase does not support Read Load Balancing against a single row.

Load more