© 2018 JETIR October 2018, Volume 5, Issue 10 www.jetir.org (ISSN-2349-5162)

QUALITATIVE COMPARISON OF KEY-VALUE BIG DATA

1Ahmad Zia Atal, 2Anita Ganpati 1M.Tech Student, 2Professor, 1Department of computer Science, 1Himachal Pradesh University, Shimla, India

Abstract: Companies are progressively looking to big data to convey valuable business insights that cannot be taken care by the traditional Relational Management System (RDBMS). As a result, a variety of big data databases options have developed. From past 30 years traditional Relational Database Management System (RDBMS) were being used in companies but now they are replaced by the big data. All big bata technologies are intended to conquer the limitations of RDBMS by enabling organizations to extract value from their data. In this paper, three key-value databases are discussed and compared on the basis of some general databases features and system performance features.

Keywords: Big data, NoSQL, RDBMS, Riak, , Hibari.

I. INTRODUCTION Systems that are designed to store big data are often called NoSQL databases since they do not necessarily depend on the SQL query language used by RDBMS. NoSQL today is the term used to address the class of databases that do not follow Relational Database Management System (RDBMS) principles and are specifically designed to handle the speed and scale of the likes of Google, Facebook, Yahoo, Twitter and many more [1]. Many types of NoSQL database are designed for different use cases. The major categories of NoSQL databases consist of Key-Values store, Column family stores, Document databaseand graph database. Each of these technologies has their own benefits individually but generally Big data use cases are benefited by these technologies. In this paper, section I contains introduction, section II describe Big data databases, section III shows the future comparison of Redis, Riak and Hibari, section IV shows analysis and final results and section V shows the conclusion.

II. BIG DATA DATABASES As mentioned earlier due to the rapid rise in technology and the boom faced by the industry due to the ever-increasing influx of new users, organizations require Big data to deliver valuable business insights and to keep pace with the massive demand of the industry. The traditional Relational Databases Management System (RDBMS) that have been the standard for the past three decades have severe limitations in handling the new requirements and hence the need for a variety of Big data databases options have emerged. While these new technologies differ from the RDBMS in many ways, but they all are designed to conquer the limitation of RDBMS to enable organizations to extract value from their data. NoSQL databases, in particular, can perform at very rapid speeds because of their exciting features like more flexibility and scalability, schema-free architecture, comfortable replication support and simple API consistent base. Big data databases are classified into four categories namely: Key-Values store, Column family stores, Document database and graph database. In this review paper, only three Key-Value big data databases are compared namely Redis, Riak and Hibari.

1. Redis Redis is an in-memory key values data store written in ANSI by Salvatore Sanfilippo. Redis not only backup string data type although it still supports list, set, sorted sets, hashes data types and give a well-heeled set of operations to work with these types. To accomplish its superior conduct Redis works with an in-memory dataset. Depending on the usage, it can work either by removingl the dataset to disk every once in a while or by affixing each command to a log. Persistence can be optionally disabled if a feature-rich network, in-memory cache is needed. If it is the case of (an in-memory object caching system), then it is very comparable, butRedis is Memcached++. Redis not only supports rich datatypes, but it also backs data replication and can save data to disk [2]. 2. Riak Riak is a distributed NoSQL database designed to deliver maximum data availability by spreading data across multiple servers. As long as Riak client can reach one Riak server, it should be able to write data. Key/value design delivers powerful yet simple data models for storing massive amounts of unstructured data. Big data applications faces a variety of challenges that include user or session information tracking, storing connected device data and replicating data across the globe. Riak KV is built to handle these challenges. To achieve fast performance and robust business continuity with a masterless architecture, Riak KV automates data distribution across the cluster that ensures high availability and scales near linearly using commodity hardware so that we can quickly add capacity without an enormous operational burden [3].

JETIR1810401 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 1

© 2018 JETIR October 2018, Volume 5, Issue 10 www.jetir.org (ISSN-2349-5162)

Even though Riak design is suited for many applications, there are inevitable tradeoffs regarding query options and data types that are available. There is no concept of columns or rows in a key/value model. Therefore Riak does not possess join operations. Riak can be queried either directly via HTTP. The API and through various client libraries queries can be processed. However, there is no SQL or SQL-like language that is currently available [4].

3. Hibari Hibari is a database for the highly dependable, highly accessible storage of the so-called Big data. Hibari can be used in cloud computing programmes such as webmail, Social Networking services (SNS) and other services requiring storage of terabytes and petabytes of new daily data. Hibari, developed by Gemini, is based on distributed non-relational database technologies of a key-value store and chain replication. Their technologies bring welfare of low cost and high reliability by enabling data storage on tens or hundreds of PC servers, instead of pricey special-purpose storage appliances such as Storage Area Networks (SANs). Development started in 2005 and has been expanding in some large telecom operators, storing everything from SNS digital goods to Cloud Mail for millions of users. Hibari is developed in Erlang and is released under the Apache license. Hibari provides highly-versatile including Amazon S3, JSON-RPC-RFC4627 and Universal Binary Protocol. Hibari supports , C/C++, Ryton, Ruby and Erlang [58]. Hibari is a distributed, ordered key-values store with strong consistency guarantee [5].

III. COMPARISON FEATURES Some of the relevant database features considered in this study are API, availability, scalability, MapReduce and durability. API is standing for Application Programming Interface and is a language and message format used by an application program to communicate with database management system. Availability of a system means that it is durable and likely to operate continuously without failure for a long time. Scalability refers to the potential of a system to handle a growing amount of work, or its potential to perform more total work in the same elapsed time when processing power is expanded to accommodate growth. MapReduce is a programming model used for processing and generating large data sets on clusters of the computer and it contains two important tasks, namely, Map and Reduce. Durability in databases is the assist that ensures transactions are saved forever and do not unintentionally disappear or get deleted, even during a database crash. It usually achieved by saving all transactions to a non-volatile storage medium.

IV. RESULTS API used in Redis is JSON and in Riak is HTTP. Whereas API used in Hibari are Amazon S3 and JSON. Secondary database model used in Redis is documented store, graph and time-series DBMS, whereas secondary database model for other two is the key-value store. In Table 4.1 ten general features namely, API, primary database model, secondary database models, SQL, in- memory capabilities, user concepts, MapReduce, implementation language, supported the and supported programming languages are taken to distinguish Redis, Riak and Hibari big data key-value database systems.

Table 4.1: Comparative analysis of Redis, Riak and Hibari based on general features

S. No. Database Redis Riak Hibari

Features

1 API JSON HTTP, built for web page Amazon s3, JSON RPC, [6] universal binary protocol

2 Primary database model Key value store [7]. Key-value store [8]. Key-value store [10].

3 Secondary database Document store Key-value store [8] Key-value store [10]. models Graph DBMS Time series DBMS [7] 4 SQL No [7]. No [8]. No [12].

5 In-memory capabilities Yes [7]. Yes [9]. Yes [12].

6 User concepts Simple password-based No No access control

JETIR1810401 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 2

© 2018 JETIR October 2018, Volume 5, Issue 10 www.jetir.org (ISSN-2349-5162)

7 MapReduce No Yes [67]. No

8 Implementation language C [13]. Erlang [9]. Erlang [11].

9 Supported operating , Oracle , Amazon Web Services, Linux, FreeBSD, Mac OS System Amazon Linux, (All 64 bit & Ubuntu, Mac X, Windows, Hibari will run version) [13]. OS X, Windows Azure on any operating system that [6]. support Erlang VM [12].

10 Supported programming C, C#, C++, Clojure, C, C#, C++, Clojure, Java, C/C++, Python, Ruby languages Crystal, Dart, Elixir, Tcl, Dart, Erlang, Go, and Erlang [12]. Erlang, Fancy, Go, Haskell, Groovy, Haskell, Java, Haxe, Java, D, JavaScript, JavaScript, Lisp, Perl, Lisp, Lua, MatLab, PHP, Python, Ruby, Objective-C, OCaml, Scala, Smalltalk [6]. Pascal, Perl, PHP, Prolog, Pure Data, Python,Rebol, Ruby, Rust, Swift, Scala, Scheme, Smalltalk, Visual Basic [14].

Various system performance features are data storage, query language, fault tolerance, user access, availability, scalability, durability and in-memory capability. Application programming interface (API) supported by Redis, Riak and Hibari are JSON, HTTP and Amazon S3, JSON, RPC etc. respectively. Riak data storage used in Redis is volatile memory and file system, in Riak is Bitcask, Level DB, volatile memory whereas in Hibari keys are stored in RAM, values can be stored in RAM or disk. All three databases support Linux, Mac and windows operating systems and Redis supports more languages than Riak and Hibari. Similarly, other system performance features are given in Table 4.2.

Table 4.2: Comparative analysis of Redis, Riak and Hibari based on system performance features

No. Database Redis Riak Hibari

Features

1 Data Storage Volatile memory , File Bitcask, Level DB, volatile Key’s are stored in RAM; System memory [6] values can be stored in RAM or in Disk [12].

2 Query language API Calls, Lua [15]. HTTP , JavaScript Erlang, HTTP, Erlang APIS Protocol Buffers

3 Fault tolerance As Redis has master/slaves You can lose access to High fault tolerance by architecture, all the slaves many nodes due to network replicating data between contain exactly same data partition or hardware failure servers. Data is repaired as master [63]. without losing data [8]. automatically after a server failure [11].

4 User access Simple password-based Not password based [9] Not password based access control

JETIR1810401 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 3

© 2018 JETIR October 2018, Volume 5, Issue 10 www.jetir.org (ISSN-2349-5162)

5 Availability Due to its in-memory Data is replicated and Each key can be replicated nature, the project retrieved intelligently by the multiple times. If the maintainer’s commitment Riak data so that data is admin server fails and is to keeping complexity at available for read and writes restarted on a standby a minimum and an event- operations [8]. node, the rest of the cluster based programming can continue normal model, Redis boasts operation. If another brick exceptional performance fails while the Admin for read and writes server’s is restarting, then operations [15]. clients may see server interruptions [11]. 6 Scalability An open source key-values Riak automatically The total storage and store. It uses in-memory distributes data around the processing capacity of a dataset to give best cluster and yields a near- Hibari cluster increases performance [15]. linear performance increase linearly as machines are as you added capacity [8]. added to the cluster [11].

7 Durability Yes [7] Yes [9] Yes

8 In-memory Yes [16] Yes [17] Yes [11] capabilities

As it can be inferred from Table 4.1 and Table 4.2 that Riak is better than Hibari because it supports MapReduce. Riak supports more languages than Hibari as mentioned in Table 4.1. Redis is better than Riak because it supports C language which is very familiar whereas Riak supports ErLang which is not so familiar. Therefore, Redis is the most popular database out of three databases

IV. CONCLUSION Big data is an extensive volume of both structured and unstructured data which is so large that it is hard to process using traditional databases system and software techniques. The Big data databases have become prominently crucial in present scenario because a large amount of data is generated daily from social media post, multimedia, etc. The objectives of this study are to have a deeper understanding of Big data concepts and dimensions, different Big data databases systems and their classification and perform a qualitative comparative study of key-value Big data databases systems. Each system has its specific use and none can be fit for all requirements. The research methodology followed a theoretical approach for the study of Big data, dimensions of Big data, Big data databases, various types of key-value databases and their comparative study based on database parameters. This work specifically studied key-value databases. Three key values databases which were considered were Redis, Riak and Hibari and qualitative comparison of the three was done based on different databases features. It was concluded that the Redis key-value database is better than the other two based on the majority of the parameters.

V. Future work In this work, a comparison of three key-value databases is made. In future more key-value databases like Memcached, Hazelcast, Aerospike, Oracle NoSQL, etc. can be compared. Since only key-value databases were considered, later multi-model databases like Amazon Dynamo DB, OrientDB and so on can also be considered. This work made a qualitative comparison of three key- value databases so a quantitative comparison based on tools can be made later.

REFERENCES [1] Vaish, Gaurav. Getting started with NoSQL. Packt Publishing Ltd, 2013. [2] Ahmed, Jeelani. "A Novelredis security best practices for databases", international journal of engineering sciences & research technology, vol. 5, issue 3, pp. 730-736, 2016. [3] http://docs.basho.com/riak/kv Accessed on 4/04/2018 at 10:45 A.M. [4] Han, Jing, et al. "Survey on NoSQL database." Pervasive computing and applications (ICPCA), 2011 6th international conference on. IEEE, 978-1-4577-0208-2, 2011. [5] https://github.com/hibari/hibari Accessed on 16/03/2018 at 11:30 P.M. [6] Gaspar, Drazena and Ivica Coric, eds. Bridging Relational and NoSQL Databases. IGI Global, 2017. [7] Das, Vinoo. Learning Redis. Packt Publishing Ltd, 2015. [8] Frampton, MIKE. “COMPLETE GUIDE TO OPEN SOURCE BIG DATA STACK”. Apress, ISBN 978-1-4842-2149-5, 2017. [9] Fowler, Adam. NoSQL for dummies. John Wiley & Sons, 2015.

JETIR1810401 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 4

© 2018 JETIR October 2018, Volume 5, Issue 10 www.jetir.org (ISSN-2349-5162)

[10] Hibari, https://db-engines.com/en/system/Hibari Accessed on 14/04/2018 at 11:30 P.M. [11] https://github.com/hibari/hibari Accessed on 20/04/2018 at 10:30 P.M. [12] Comparison of Hibari and Riak, https://www.itqlick.com/Compare/hibari/riak Accessed on 2/04/2018 at 11:30 P.M. [13] NoSQL, NewSQL and Guy Harrison. "Next Generation Databases." [14] https://redis.io/clientsAccessed on 17/04/2018 at 9:43 P.M. [15] https://www.credera.com/blog/technology-insights/java/redis-explained-5-minutes-less Accessed on 13/04/2018 at 9:15 A.M. [16] https://en.wikipedia.org/wiki/In-memory_processingAccessed on 15/04/2018 at 7:13 A.M. [17] http://docs.basho.com/riak/kv/2.2.3/setup/planning/backend/memory/Accessed on 13/04/2018 at 6:30 A.M.

JETIR1810401 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org 5