Oracle NoSQL Database Compared to Cassandra and HBase

Overview

. Oracle NoSQL Database is licensed under AGPL while Cassandra and HBase are Apache 2.0 licensed.

. Oracle NoSQL Database is in many respects, as a NoSQL Database implementation leveraging BerkeleyDB in its storage layer, a commercialization of the early NoSQL implementations which lead to the adoption of this category of technology. Several of the earliest NoSQL solutions were based on BerkeleyDB and some are still to this day e.g. LinkedIn’s Voldemort. The Oracle NoSQL Database is a based key-value store implementation that supports a value abstraction layer currently implementing Binary and JSON types. Its key structure is designed in such a way as to facilitate large scale distribution and storage locality with range based search and retrieval. The implementation uniquely supports built in cluster load balancing and a full range of transaction semantics from ACID to relaxed eventually consistent. In addition, the technology is integrated with important open source technologies like Hadoop / MapReduce, an increasing number of Oracle software solutions and tools and can be found on Oracle Engineered Systems.

. Cassandra is a key-value store that supports a single value abstraction known as table-structure. It uses partition based hashing over a ring based architecture where every node in the system can handle any read-write request, so nodes become coordinators of requests when they do not actually hold the data involved in the request operation.

. HBase is a key-value store that supports a single value abstraction known as table-structure ( popularly referred to as column family ). It is based on the Google Big Table design and is written entirely in Java. HBase is designed to work on top of the HDFS file system. Unlike Hive, HBase does not use MapReduce in its implementation, but accesses HDFS storage blocks directly and storing a natively managed file type. The physical storage is similar to a column oriented database and as such works particularly well for queries involving aggregations, similar to the shared nothing analytic databases AsterData, GreenPlum, etc.

Comparison

The table below gives a high level comparison of Oracle NoSQL Database and Cassandra features/capabilities. Low level details are found in links to Oracle and Cassandra online documentation.

Point HBase Cassandra ONDB Cassandra is based on DynamoDB (Amazon). Initially developed at ONDB is based Oracle Facebook by former Amazon HBase is based Berkeley DB Java Edition a Foundatio engineers. This is one reason why on BigTable mature log-structured, high ns Cassandra supports multi data (Google) performance, transactional center. Rackspace is a big database. contributor to Cassandra due to multi data center support. HBase uses the Hadoop Infrastructure (Zookeeper, Cassandra started and evolved NameNode, separate from Hadoop and its HDFS). infrastructure and Operational ONDB has simple infrastructure Organizations knowledge requirements are requirements and does not use Infrastruct that will different than Hadoop. However, Zookeeper. Hadoop based ure deploy Hadoop for analytics, many Cassandra analytics are supported via a anyway may deployments use Cassandra + ONDB/Hadoop connector. be comfortable Storm (which uses Zookeeper), with leveraging and/or Cassandra + Hadoop. Hadoop knowledge by using HBase The HBase- ONDB uses a single node type Hadoop to store data and satisfy read Infrastructure Cassandra uses a a single Node- requests. Any node can accept a has several type. All nodes are equal and request and forward it if Infrastruct "moving parts" perform all functions. Any Node necessary. There is no SPOF. In ure consisting of can act as a coordinator, ensuring addition, there is a simple Simplicity Zookeeper, no SPOF. Adding Storm or watchdog process (the Storage and SPOF Name Node, Hadoop, of course, adds Node Agent or SNA for short) Hbase Master, complexity to the infrastructure. on each machine to ensure high and Data availability and automatically Nodes, restart any data storage node in Zookeeper is case of process level failures. clustered and The SNA also helps with naturally fault administration of the store. tolerant. Name Node needs to be clustered to be fault tolerant. HBase is optimized for reads, Cassandra has excellent single- supported by row read performance as long as ONDB provides: 1) Strict single-write eventual consistency semantics are consistency reads at the master master, and sufficient for the use-case. 2) eventual consistency reads, resulting strict Cassandra quorum reads, which with optional time constraints consistency are required for strict consistency Read on the recency of data and 3) model, as well will naturally be slower than Intensive application level Read your as use of Hbase reads. Cassandra does not Use Cases writes consistency. All reads Ordered support Range based row-scans contact just a single storage Partitioning which may be limiting in certain node making read operations which supports use-cases. Cassandra is well suited very efficient. ONDB also row-scans. for supporting single-row queries, supports range based scans. HBase is well or selecting multiple rows based suited for on a Column-Value index. doing Range based scans. HBase provides for asynchronous replication of Cassandra Random Partitioning an HBase provides for row-replication of a Cluster across single row across a WAN, either a WAN. HBase Multi-Data asynchronous (write.ONE, clusters cannot Center write.LOCAL_QUORUM), or be set up to [ Release 3.0 provides for Support synchronous (write.QUORUM, achieve zero asynchronous cascaded and write.ALL). Cassandra clusters RPO, but in replication across data centers. ] Disaster can therefore be set up to achieve steady-state Recovery zero RPO, but each write will HBase should require at least one wan-ACK be roughly back to the coordinator to achieve failover- this capability. equivalent to any other DBMS that relies on asynchronous replication over a WAN. Fall-back processes and procedures (e.g. after failover) are TBD. Writes are replicated in a pipeline fashion: the first-data-node for the region persists the write, and then sends the write to the next Natural ONDB considers a request with Cassandra's coordinators will send Endpoint, and ReplicaAckPolicy.NONE (the parallel write-requests to all so-on in a ONDB equivalent of Natural Endpoints, The pipeline Write.ONE) as having coordinator will "ack" the write Write.ON fashion. completed after the change has after exactly one Natural Endpoint E HBase’s been written to the master's log has "acked" the write, which Durability commit log buffer; the change is propagated means that node has also persisted "acks" a write to the other members of the the write to its WAL. The writes only after *all* replication group, via an may or may not have committed to of the nodes in efficient asynchronous stream- any other Natural Endpoint. the pipeline based protcol. have written the data to their OS buffers. The first Region Server in the pipeline must also have persisted the write to its WAL. HBase only Cassandra officially supports ONDB only supports random Ordered supports Ordered Partitioning, but no partitioning. Prevailing Partitionin Ordered production user of Cassandra uses experience indicates that other g Partitoning. Ordered Partitioning due to the forms of partioning are really This means "hot spots" it creates and the hard to administer in practice. that Rows for a operational difficulties such hot- CF are stored spots cause. Random Partitioning in RowKey is the only recommended order in Cassandra partitioning scheme, HFiles, where and rows are distributed across all each Hfile nodes in the cluster. contains a "block" or "shard" of all the rows in a CF. HFiles are distributed across all data- nodes in the Cluster Because of ordered partitioning, HBase queries can be formulated Because of random partitioning, with partial partial rowkeys cannot be used start and end with Cassandra. RowKeys must be ONDB range requests can be RowKey row-keys, and known exactly. Counting rows in a defined with partial start and Range can locate rows CF is complicated. It is highly end row-keys. The start and end Scans inclusive-of, or recommended that for these types row-keys in a range-scan need exclusive of of use-cases, data should be stored not exist in the store. these partial- in columns in Cassandra, not in rowkeys. The rows. start and end row-keys in a range-scan need not even exist in Hbase. There are no limits on range Due to Ordered scans across major or minor Partitioning, If data is stored in columns in keys. Range scans across major Linear HBase will Cassandra to support range scans, keys require access to each Scalability easily scale the practical limitation of a row shard in the store. Release 3 will for large horizontally size in Cassandra is 10's of support major key and index tables and while still Megabytes. Rows larger than that range scans that are parallelized range supporting causes problems with compaction across all the nodes in the store. scans rowkey range overhead and time. Minor key scans are serviced by scans. the single shard that contains the data associated with the minor key range. Cassandra does not support Atomic Compare and Set. HBase Counters require dedicated counter supports column-families which because of ONDB supports atomic Atomic eventual-consistency requires that compare and set, making it Atomic Compare and all replicas in all natural end- simple to implement counters. Compare Set. HBase points be read and updated with ONDB also supports atomic and Set supports ACK. However, hinted-handoff modification of multiple minor supports mechanisms can make even these key/value pairs under the same transaction built-in counters suspect for major key. within a Row. accuracy. FIFO queues are difficult (if not impossible) to implement with Cassandra. Hbase does not support Read Load Balancing against a single row. A single row is served by exactly one Cassandra will support Read Load region server at Balancing against a single row. ONDB supports read load a time. Other However, this is primarily balancing. Only absolute replicas are Read Load supported by Read.ONE, and consistency reads need to be used ony in Balancing eventual consistency must be directed to the master, eventual case of a node - single taken into consideration. consistency reads may be served failure. Row Scalability is primarily supported by any replica that can satisfy Scalability is by Partitioning which distributes the read consistency primarily reads of different rows across requirements of the request. supported by multiple data nodes. Partitioning which statistically distributes reads of different rows across multiple data nodes. Bloom Filters Bloom filters are used to can be used in minimize reads to SST files that Bloom HBase as Cassandra uses bloom filters for do not contain a requested key, Filters another form key lookup. in LSM-tree based storage of Indexing. underlying HBase and They work on Cassandra. There is no need to the basis of create and maintain Bloom RowKey or filters in the log-structured RowKey+Colu storage architecture used by mnName to ONDB. reduce the number of data-blocks that HBase has to read to satisfy a query. (Bloom Filters may exhibit false-positives (reading too much data), but never false negatives (reading not enough data). Triggers are supported by the CoProcessor capability in HBase. They allow HBase to Cassandra does not support co- observe the ONDB does not support Triggers processor-like functionality (as far get/put/delete triggers. as we know) events on a table (CF), and then execute the trigger- logic. Triggers are coded as java classes. Hbase does not natively support secondary Cassandra supports secondary Secondary indexes, but indexes on column families where Release 3.0 will support Indexes one use-case of the column name is known. (Not secondary indexes. Triggers is that on dynamic columns). a trigger on a "put" can automatically keep a secondary index up-to- date, and therefore not put the burden on the application (client). Hbase CoProcessors support out-of- the-box simple Aggregations in Cassandra are not aggregations in supported by the Cassandra nodes HBase. SUM, - client must provide aggregations. Simple MIN, MAX, When the aggregation requirement Aggregation is not supported by Aggregati AVG, STD. spans multiple rows, Random ONDB. on Other Partitioning makes aggregations aggregations very difficult for the client. can be built by Recommendation is to use Storm defining java- or Hadoop for aggregations. classes to perform the aggregation HIVE can access HBase tables directly (uses de- Work in Progress HIVE serialization ( https://issues.apache.org/jira/bro No HIVE integration currently Integration under the hood wse/CASSANDRA-4131) that is aware of the HBase file format). PIG has native support for PIG writing Cassandra 0.7.4+ No PIG integration currently Integration into/reading from HBase. Consistency, Availability, Limited Partition-Tolerance if CAP there is a simple majority of Consistency, Theorem Availability, Partition-Tolerance nodes on one side of a partition Availability Focus (https://sleepycat.oracle.com/tra /wiki/JEKV/CAP has a detailed discussion) . Offers different read consistency models: 1) strict consistency reads at the master Consistenc Strong Eventual (Strong is Optional) 2) eventual consistency reads, y with optional time constraints on the recency of data and 3) Read your writes consistency. Single No (R+W+1 to get Strong Write Yes Yes Consistency) Master Both reads and writes. Log- structured storage permits append-only writes, with each change being written once to disk. Reads can be serviced at any replica based upon the read Optimized consistency requirements Reads Writes For associated with the request. Reads can be satisfied at a single node, by a single request to disk. There are no bloom filters to maintain and no risk of false positives causing multiple disk reads. CF, RowKey, Main Data CF, RowKey, Name Value Pair Major key, or minor key with its Name Value Structure Set associated value. Pair Set Provides equivalent Dynamic functionality. Multiple minor Yes Yes Columns keys can be dynamically associated with a major key. Column Provides equivalent Names as Yes Yes functionality via minor keys, Data which can be treated as data. Static [ R3.0 will support static No Yes Columns columns ] RowKey Yes No No Slices Static Column No Yes [ R3.0 ] Value Indexes Sorted Yes Yes [ R3.0 ] Column Names Cell Versioning Yes No No Support Bloom Yes Yes(only on Key) Not necessary for ONDB Filters CoProcess Yes No No ors Yes(Part of Triggers No No Coprocessor) Push Yes(Part of Down No No Coprocessor) Predicates Atomic Compare Yes No Yes and Set Explicit Row Yes No No Locks Row Key Yes Yes Yes Caching Partitionin Ordered Random Partitioning Random partitioning g Strategy Partitioning recommended Rebalanci Not Needed with Random Not Needed with Random Automatic ng Partitioning Partitioning Availabilit N-Replicas N-Replicas across Nodes N-Replicas across Nodes y across Nodes Graceful Degradation, as Data Node Graceful Graceful Degredation described in the availability Failure Degredation section. Data Node Failure - N-Replicas (N-1) Replicas Preserved + Hinted (N-1) Replicas Preserved. Replicatio Preserved Handoff n Node catches up automatically Data Node Same as Node Requires Node Repair Admin- by replaying changes from a Restoratio Addition action member of the replication n group. Data Node Rebalancing Rebalancing Requires Token- New nodes are added through Addition Automatic Assignment Adjustment the Admin service, which automatically redistributes data across the new nodes. Data Node Simple (Roll Manageme Human Admin Action Required Human Admin action required. In, Role Out) nt ONDB has a highly available Admin service, for administrative actions, eg. adding new nodes, replacing failed nodes, software updates, Cluster Zookeeper, etc. but is not required for Admin NameNode, All Nodes are Equal steady state operation of the Nodes HMaster service. There is a light weight SNA process(described earlier) on each machine to ensure high availability and restart any data storage node in case of failure. Now, all the Admin Nodes There is no SPOF, as described SPOF All Nodes are Equal are Fault in the availability section. Tolerant No, but Write.AN Yes (Writes Never Fail if this Replicas are No Y option is used) Node Agnostic Standard, HA, Write.ON Yes (often used), HA, Weak Yes. Requires that the Master be Strong E Consistency reachable. Consistency Yes (often used with Write.QU No (not Read.QUORUM for Strong Yes. This is the default. ORUM required) Consistency Yes Yes (performance penalty, not Yes (performance penalty, not Write.ALL (performance HA) HA) penalty) Asynchronous replication is routine in ONDB. Nodes local Asynchron Yes, but it to the master will typically keep ous WAN needs testing Yes (Replica's can span data up, and nodes seperated by high Replicatio on corner centers) latency WANs will have the n cases. changes replayed asynchronously via an efficient stream based protocol. Synchrono Yes with Write.QUORUM or Yes, for requests that require No us WAN Write.EACH-QUORUM acknowledgements Replicatio (ReplicaAckPolicy.SIMPLE_M n AJORITY or ReplicaAckPolicy.ALL). The acknowledging nodes will be synchronized with the master. Compressi Yes Yes No on Support