Oracle Nosql Database Compared to Riak
Total Page:16
File Type:pdf, Size:1020Kb
Oracle NoSQL Database Compared to Riak Overview . Oracle NoSQL Database is licensed under AGPL while Riak is Apache 2.0 licensed. Oracle NoSQL Database is in many respects, as a NoSQL Database implementation leveraging BerkeleyDB in its storage layer, a commercialization of the early NoSQL implementations which lead to the adoption of this category of technology. Several of the earliest NoSQL solutions were based on BerkeleyDB and some are still to this day e.g. LinkedIn’s Voldemort. The Oracle NoSQL Database is a Java based key-value store implementation that supports a value abstraction layer currently implementing Binary and JSON types. Its key structure is designed in such a way as to facilitate large scale distribution and storage locality with range based search and retrieval. The implementation uniquely supports built in cluster load balancing and a full range of transaction semantics from ACID to relaxed eventually consistent. In addition, the technology is integrated with important open source technologies like Hadoop / MapReduce, an increasing number of Oracle software solutions and tools and can be found on Oracle Engineered Systems. Riak is an implementation of the Dynamo design, leveraging vector clock consistency and partition based hashing with the addition of functionality like links, MapReduce, indexes, full- text Search. Riak is written in a language designed internally at Ericsson known as Erlang. Comparison The table below gives a high level comparison of Oracle NoSQL Database and Riak features/capabilities. Low level details are found in links to Oracle and Riak online documentation. Oracle NoSQL Database Feature/Capability Riak Oracle NoSQL Database has a flexible key-value data model leveraging a value abstraction layer. The value abstractions supported at this time are Riak stores key/value pairs in a higher Binary and JSON(Avro). A level namespace called a bucket. Data Model table-structure value abstraction is coming soon Buckets, Keys, and Values Record Design Considerations Avro Schemas Oracle NoSQL Database storage model is a write ahead logging implementation proven in millions of BerkeleyDB deployments. It’s an append only implementation that enables efficient write throughput Riak has a modular, extensible local with background compaction storage system which features for space reclamation. Write pluggable backend stores designed to operation durability can be fit a variety of use cases. The default Riak backend store is Bitcask. controlled by the user to allow Storage Model multi-memory write Riak Supported Storage operations without fsync or Backends with fully durable disk sync. Data is partitioned into a fix You can also write your own storage space that has logical overlays. backend for Riak using our backend So, data in partitions can API. move between logical shard representations, but must be moved at the granularity of these partitions. BDB Storage - NoSQL before NoSQL was cool The evolution of BerkeleyDB Oracle NoSQL Database has In addition to raw Erlang access, Riak client library API’s for Java and offers two primary APIs: C. In the works are a Command Line Interface and HTTP Protocol Buffers Javascript API. Data Access and Application Riak Client libraries are wrappers Programming Interfaces Client APIs around these APIs, and client support (APIs) exists for dozens of languages. Client-Libraries Community Developed Libraries and Projects Oracle NoSQL Database provides key access methods (put, get, delete) including multi-key variations with large result set streaming support. The database can also be accessed using SQL as an There are currently four ways to query external table from within a Riak. relational database. Primary key operations (GET, It is integrated with and can PUT, DELETE, UPDATE) Query Types and Query- participate in MapReduce MapReduce ability operations from a Hadoop Secondary Indexes environment. Riak Search Comparing MapReduce, Searching in Oracle Search, and Secondary Indexes NoSQL External Table Support NoSQL and MapReduce Using Range Queries Data Versioning and Oracle NoSQL Database Riak uses a data structure called a Consistency provides control at the vector clock to reason about causality operation level for consistency and staleness of stored values. Vector and durability. Each operation clocks enable clients to always write to the database in exchange for can be fully ACID, flushing and consistency conflicts being resolved at syncing all data to disk before read time by either application or taking quorum on the client code. Vector clocks can be operation to allowing a fire configured to store copies of a given and forget into local or remote datum based on size and age of said memory. Read consistency is datum. There is also an option to obtained thru quorum control disable vector clocks and fall back to simple time-stamp based “last-write- spanning the range of wins”. requiring all holders of a copy of data to agree to just getting Vector Clocks the result from a first Why Vector Clocks Are Easy responder. This provides the Why Vector Clocks Are Hard ultimate control for the developer of both transactional and eventually consistent applications. Flexible Consistency options Oracle NoSQL Database concurrency is controlled thru replication groups with an elected master. Reads can be serviced from any node in a replication group and writes In Riak, any node in the cluster can are performed at the currently coordinate a read/write operation for elected master, then any other node. Riak stresses Concurrency replication chained to the availability for writes and reads, and replicas in the group. Read puts the burden of resolution on the consistency is tied to client at read time. concurrency, controlled by quorum, version, timestamp, all. Durability Guarantees Oracle NoSQL Database Riak’s replication system is heavily Replication supports replication for both influenced by the Dynamo Paper and availability and scalability. It Dr. Eric Brewer’s CAP Theorem. Riak uses a consistent hashing uses consistent hashing to replicate algorithm over a fixed, highly and distribute N copies of each value around a Riak cluster composed of any granular, partition definition. number of physical machines. Under Partitions are replicated in the hood, Riak uses virtual nodes to groups according to latency handle the distribution and dynamic demands of the application, rebalancing of data, thus decoupling configured by a replication the data distribution from physical factor. assets. Replication Replication configuration Clustering There is a topology aware The Riak APIs expose tunable driver that is linked with the consistency and availability parameters client application. Writes use that let you select which level of configuration is best for your use case. the driver to hash inserts to Replication is configurable at the the currently elected master bucket level when first storing data in and then a cascading Riak. Subsequent reads and writes to replication occurs to the that data can have request-level replicas belonging to the parameters. replication group where that master resides. How many Reading, Writing, and Updating Data data replications must occur and whether or not those replications are to memory space or disk for the respective replica can be configured on a per operation basis. Topologies Oracle NoSQL Database scales Riak allows you to elastically grow and out by redistribution of data shrink your cluster while evenly partitions to newly added balancing the load on each machine. hardware resources. When No node in Riak is special or has any Scaling Out and In particular role. In other words, all new hardware is added to the nodes are masterless. When you add a system, an administrator, via a physical machine to Riak, the cluster is browser based console or CLI, made aware of its membership via can issue a request to gossiping of ring state. Once it’s a rebalance the cluster. The member of the ring, it’s assigned an administrator has the option equal percentage of the partitions and of just letting it go or subsequently takes ownership of the data belonging to those partitions. The throttling or running during process for removing a machine is the certain windows of time, inverse of this. Riak also ships with a pausing the process, etc. comprehensive suite of command line tools to help make node operations Managing Topology simple and straightforward. Changes Adding and Removing Nodes Command Line Tools Oracle NoSQL Database supports DataCenters thru a non-electable replication group strategy. Read requests Riak features two distinct types of use nodes locally due to replication. Users can replicate to any latency awareness in the client number of nodes in one cluster (which driver. Write availability is is usually contained within one achieved in a local quorum datacenter over a LAN) using the Apache 2.0 licensed database. Riak though replicating to non- Multi-Datacenter Enterprise, Basho’s commercial Replication and Awareness electable nodes in other data extension to Riak, is required for Multi- centers. This allows failures in Datacenter deployments (meaning the a given data center to have no ability to run active Riak clusters in N impact on read availability of datacenters). the cluster as a whole, just possibly some reduced Riak Enterprise latency. Writes will always be performed at the currently elected master. Oracle NoSQL Database provides proprietary, SNMP and JMX based protocols for Riak ships with Riak Control, an open monitorability of the cluster. source graphical console for monitoring and managing Riak The proprietary protocols are Graphical clusters. Monitoring/Admin Console support thru both browser based and CLI interfaces. Riak Control SNMP and JMX facilitate Introducing Riak Control integration into monitoring systems like BMC and Ganglia. Visual Admin Console Standardized Monitoring Protocols Command Line Admin .