JETIR Research Journal
Total Page:16
File Type:pdf, Size:1020Kb
© 2018 JETIR October 2018, Volume 5, Issue 10 www.jetir.org (ISSN-2349-5162) QUALITATIVE COMPARISON OF KEY-VALUE BIG DATA DATABASES 1Ahmad Zia Atal, 2Anita Ganpati 1M.Tech Student, 2Professor, 1Department of computer Science, 1Himachal Pradesh University, Shimla, India Abstract: Companies are progressively looking to big data to convey valuable business insights that cannot be taken care by the traditional Relational Database Management System (RDBMS). As a result, a variety of big data databases options have developed. From past 30 years traditional Relational Database Management System (RDBMS) were being used in companies but now they are replaced by the big data. All big bata technologies are intended to conquer the limitations of RDBMS by enabling organizations to extract value from their data. In this paper, three key-value databases are discussed and compared on the basis of some general databases features and system performance features. Keywords: Big data, NoSQL, RDBMS, Riak, Redis, Hibari. I. INTRODUCTION Systems that are designed to store big data are often called NoSQL databases since they do not necessarily depend on the SQL query language used by RDBMS. NoSQL today is the term used to address the class of databases that do not follow Relational Database Management System (RDBMS) principles and are specifically designed to handle the speed and scale of the likes of Google, Facebook, Yahoo, Twitter and many more [1]. Many types of NoSQL database are designed for different use cases. The major categories of NoSQL databases consist of Key-Values store, Column family stores, Document databaseand graph database. Each of these technologies has their own benefits individually but generally Big data use cases are benefited by these technologies. In this paper, section I contains introduction, section II describe Big data databases, section III shows the future comparison of Redis, Riak and Hibari, section IV shows analysis and final results and section V shows the conclusion. II. BIG DATA DATABASES As mentioned earlier due to the rapid rise in technology and the boom faced by the industry due to the ever-increasing influx of new users, organizations require Big data to deliver valuable business insights and to keep pace with the massive demand of the industry. The traditional Relational Databases Management System (RDBMS) that have been the standard for the past three decades have severe limitations in handling the new requirements and hence the need for a variety of Big data databases options have emerged. While these new technologies differ from the RDBMS in many ways, but they all are designed to conquer the limitation of RDBMS to enable organizations to extract value from their data. NoSQL databases, in particular, can perform at very rapid speeds because of their exciting features like more flexibility and scalability, schema-free architecture, comfortable replication support and simple API consistent base. Big data databases are classified into four categories namely: Key-Values store, Column family stores, Document database and graph database. In this review paper, only three Key-Value big data databases are compared namely Redis, Riak and Hibari. 1. Redis Redis is an in-memory key values data store written in ANSI C programming language by Salvatore Sanfilippo. Redis not only backup string data type although it still supports list, set, sorted sets, hashes data types and give a well-heeled set of operations to work with these types. To accomplish its superior conduct Redis works with an in-memory dataset. Depending on the usage, it can work either by removingl the dataset to disk every once in a while or by affixing each command to a log. Persistence can be optionally disabled if a feature-rich network, in-memory cache is needed. If it is the case of Memcached (an in-memory object caching system), then it is very comparable, butRedis is Memcached++. Redis not only supports rich datatypes, but it also backs data replication and can save data to disk [2]. 2. Riak Riak is a distributed NoSQL database designed to deliver maximum data availability by spreading data across multiple servers. As long as Riak client can reach one Riak server, it should be able to write data. Key/value design delivers powerful yet simple data models for storing massive amounts of unstructured data. Big data applications faces a variety of challenges that include user or session information tracking, storing connected device data and replicating data across the globe. Riak KV is built to handle these challenges. To achieve fast performance and robust business continuity with a masterless architecture, Riak KV automates data distribution across the cluster that ensures high availability and scales near linearly using commodity hardware so that we can quickly add capacity without an enormous operational burden [3]. JETIR1810401 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 1 © 2018 JETIR October 2018, Volume 5, Issue 10 www.jetir.org (ISSN-2349-5162) Even though Riak design is suited for many applications, there are inevitable tradeoffs regarding query options and data types that are available. There is no concept of columns or rows in a key/value model. Therefore Riak does not possess join operations. Riak can be queried either directly via HTTP. The protocol buffers API and through various client libraries queries can be processed. However, there is no SQL or SQL-like language that is currently available [4]. 3. Hibari Hibari is a database for the highly dependable, highly accessible storage of the so-called Big data. Hibari can be used in cloud computing programmes such as webmail, Social Networking services (SNS) and other services requiring storage of terabytes and petabytes of new daily data. Hibari, developed by Gemini, is based on distributed non-relational database technologies of a key-value store and chain replication. Their technologies bring welfare of low cost and high reliability by enabling data storage on tens or hundreds of PC servers, instead of pricey special-purpose storage appliances such as Storage Area Networks (SANs). Development started in 2005 and has been expanding in some large telecom operators, storing everything from SNS digital goods to Cloud Mail for millions of users. Hibari is developed in Erlang and is released under the Apache license. Hibari provides highly-versatile APIs including Amazon S3, JSON-RPC-RFC4627 and Universal Binary Protocol. Hibari supports Java, C/C++, Ryton, Ruby and Erlang [58]. Hibari is a distributed, ordered key-values store with strong consistency guarantee [5]. III. COMPARISON FEATURES Some of the relevant database features considered in this study are API, availability, scalability, MapReduce and durability. API is standing for Application Programming Interface and is a language and message format used by an application program to communicate with database management system. Availability of a system means that it is durable and likely to operate continuously without failure for a long time. Scalability refers to the potential of a system to handle a growing amount of work, or its potential to perform more total work in the same elapsed time when processing power is expanded to accommodate growth. MapReduce is a programming model used for processing and generating large data sets on clusters of the computer and it contains two important tasks, namely, Map and Reduce. Durability in databases is the assist that ensures transactions are saved forever and do not unintentionally disappear or get deleted, even during a database crash. It usually achieved by saving all transactions to a non-volatile storage medium. IV. RESULTS API used in Redis is JSON and in Riak is HTTP. Whereas API used in Hibari are Amazon S3 and JSON. Secondary database model used in Redis is documented store, graph and time-series DBMS, whereas secondary database model for other two is the key-value store. In Table 4.1 ten general features namely, API, primary database model, secondary database models, SQL, in- memory capabilities, user concepts, MapReduce, implementation language, supported the operating system and supported programming languages are taken to distinguish Redis, Riak and Hibari big data key-value database systems. Table 4.1: Comparative analysis of Redis, Riak and Hibari based on general features S. No. Database Redis Riak Hibari Features 1 API JSON HTTP, built for web page Amazon s3, JSON RPC, [6] universal binary protocol 2 Primary database model Key value store [7]. Key-value store [8]. Key-value store [10]. 3 Secondary database Document store Key-value store [8] Key-value store [10]. models Graph DBMS Time series DBMS [7] 4 SQL No [7]. No [8]. No [12]. 5 In-memory capabilities Yes [7]. Yes [9]. Yes [12]. 6 User concepts Simple password-based No No access control JETIR1810401 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 2 © 2018 JETIR October 2018, Volume 5, Issue 10 www.jetir.org (ISSN-2349-5162) 7 MapReduce No Yes [67]. No 8 Implementation language C [13]. Erlang [9]. Erlang [11]. 9 Supported operating Ubuntu, Oracle Linux, Amazon Web Services, Linux, FreeBSD, Mac OS System Amazon Linux, (All 64 bit Debian & Ubuntu, Mac X, Windows, Hibari will run version) [13]. OS X, Windows Azure on any operating system that [6]. support Erlang VM [12]. 10 Supported programming C, C#, C++, Clojure, C, C#, C++, Clojure, Java, C/C++, Python, Ruby languages Crystal, Dart, Elixir, Tcl, Dart, Erlang, Go, and Erlang [12]. Erlang, Fancy, Go, Haskell, Groovy, Haskell, Java, Haxe, Java, D, JavaScript, JavaScript, Lisp, Perl, Lisp, Lua, MatLab, PHP, Python, Ruby, Objective-C, OCaml, Scala, Smalltalk [6]. Pascal, Perl, PHP, Prolog, Pure Data, Python,Rebol, Ruby, Rust, Swift, Scala, Scheme, Smalltalk, Visual Basic [14]. Various system performance features are data storage, query language, fault tolerance, user access, availability, scalability, durability and in-memory capability. Application programming interface (API) supported by Redis, Riak and Hibari are JSON, HTTP and Amazon S3, JSON, RPC etc.