60 Informatica Economică vol. 21, no. 1/2017

A Maturity Analysis of Big Data Technologies

Radu BONCEA, Ionuț PETRE , Dragoș-Marian SMADA, Alin ZAMFIROIU ANAGRAMA [email protected], [email protected], [email protected], [email protected]

In recent years Big Data technologies have been developed at faster pace due to increase in demand from applications that generate and process vast amount of data. The Cloud Computing and the of Things are the main drivers for developing enterprise solutions that support Business Intelligence which in turn, creates new opportunities and new business models. An enterprise can now collect data about its internal processes, process this data to gain new insights and business value and make better decisions. And this is the reason why Big Data is now seen as a vital component in any enterprise architecture. In this article the maturity of several Big Data technologies is put under analysis. For each technology there are several aspects considered, such as development status, market usage, licensing policies, availability for certifications, adoption, support for cloud computing and enterprise. Keywords: Big Data Technologies, Big Data maturity, Big Data comparative analysis

Big Data Overview analyzing this data, dedicated solutions must 1 Driven by the need to generate business be considered. value, the enterprise has started to adopt Big Data as a solution, migrating from the Important BigData Solutions: classical and data stores which lack  Apache HBase/Hadoop is based on the flexibility and are not optimized enough Google’s BigTable distributed storage [1]. system, which runs on top of Hadoop as a The changes in the environment make big data distributed and scalable big data store. This analytics attractive to all types of means that HBase can leverage the distributed organizations, while the market conditions processing paradigm of the Hadoop make it practical. The combination of Distributed File System (HDFS) and benefit simplified models for development, from Hadoop’s MapReduce programming commoditization, a wider palette of data model. It combines the scalability of Hadoop management tools, and low-cost utility with real-time data access as a key/value store computing has effectively lowered the barrier and deep analytic capabilities of Map Reduce to entry. [2]. The concept addresses large [4]. volumes of complex data, rapid growing data HBase allows to query for individual records sets that may come from different autonomous as well as derive aggregate analytic reports sources. across a massive amount of data. It can host In recent approaches, Big Data is large tables with billions of rows, millions of characterized by principles known as the 4V – columns and run across a cluster of Volume, Variety, Velocity and Veracity [3]. commodity hardware. HBase is composed of There are opinions about accepting other three types of servers in a master slave type of principles as Big Data characteristics, such as architecture. Region servers are responsible to Value. serve data for reads and writes. When Each day more businesses realize that Big accessing data, clients communicate with Data is relevant as the applications generate HBase Region Servers directly. Region large volumes of data generated assignment, DDL (create, delete tables) automatically, from different data sources, operations are handled by the HBase Master centralized or autonomous. As traditional process. databases hit limitations when the need of

DOI: 10.12948/issn14531305/21.1.2017.05 Informatica Economică vol. 21, no. 1/2017 61

 Apache Cassandra is a distributed Set is a combination of a string “member” and used for the administration and a double “score”. The member acts as a key in management of large amounts of structured the hash, with the score acting as the sorted data across multiple servers, while providing value in the . With this combination, it can highly available service and no single point of be accessed by members and scores directly failure. It provides features such as continuous by member or score value. availability, linear scale performance, data distribution across multiple data centers and  VoltDB is a distributed SQL database cloud availability zones. Cassandra inherits its intended for high-throughput transactional data architecture from Google's BigTable and workloads on datasets which fit entirely in it borrows its distribution mechanisms from memory and where the data is automatically Amazon's Dynamo. partitioned based on a column specified by the The nodes in a Cassandra cluster are application developer. All data is stored in completely symmetrical, all having identical RAM memory. Disk snapshots are responsibilities. Cassandra also employs periodically used to backup data and provided consistent hashing to partition and replicate on-disk recovery log for crash durability. Data data. it has the capability of handling large is replicated to at least n+1 nodes to tolerate n amounts of data and thousands of concurrent failures. Tables may be replicated to every users or operations per second across multiple node for fast local reads, or sharded for linear data centers. storage scalability. VoltDB determines where Cassandra has a hierarchy of caching each record goes without the need of user’s mechanisms and carefully orchestrated disk specification. I/O which ensures speed and data safety. All data operations in VoltDB are single- Write operations are sent first to a persistent threaded, running each operation to commit log (ensuring a durable write), then to completion before starting the next. VoltDB is a write-back cache called a memTable; when partitioning both the data and the work. For the memTable fills, it is flushed to a sorted best performance the database tables and string table – SSTable - on disk. A Cassandra associated queries need to be partitioned so cluster is organized as a ring, and it uses a that the most common transactions can be run partitioning strategy to distribute data evenly. in parallel. Since each site operates independently, each transaction can be  Redis represents an in-memory data performed without the overhead of locking structure store used as a database, cache and individual records that usually consumes message broker. It supports data structures processing time of traditional databases. such as strings, hashes, lists, sets, sorted sets VoltDB balances the requirements of with range queries, bitmaps, hyperlogs and maximum performance with the flexibility to geospatial indexes with radius queries. Redis accommodate less intense but equally stores all data in RAM, allowing lightning fast important queries that cross partitions [5]. reads and writes. It runs extremely efficiently in memory and handles high-velocity data,  MongoDB is an open-source document needing simple standard servers to deliver database that provides high performance, high millions of operations per second with sub- availability, and automatic scaling. In order to millisecond latency. provide these features, MongoDB has a Redis is schema-less, but when one of its data document-oriented data model that permits it structures (like HASH or Sorted Sets) is used, to split up data through multiple servers, users can take advantage of the in-memory perform data balancing, load across a cluster, operations to accelerate the way data is re-distribute documents automatically and processed. The Sorted Set is a structure that perform routing of user requests. In case there combines the features of a hash table with is a need for more capacity, MongoDB allows those of a sorted tree. Each entry in a Sorted the addition of new machines and can

DOI: 10.12948/issn14531305/21.1.2017.05 62 Informatica Economică vol. 21, no. 1/2017 automatically manage data to the new major and minor releases and the machines. MongoDB doesn’t require stored frequency of the patches are noted; as procedures and the model and stored data well as other indicators such the number have the same structure – BSON, which is of active contributors/ and similar to JSON. the average time for solving critical and MongoDB lacks a series of features that are blocking issues are also worth usually common amongst the traditional mentioning; relational databases. The most notable one is  Support libraries that can be used to the lack of notably joins and complex multi- integrate the solution using most popular row transactions; however the decision of not programming languages; implementing this feature has led to a greater  Guidelines and documentation – should scalability of MongoDB. The technology is answer the question if there is sufficient widely used at this moment and proved in the documentation for installation and recent years to be a fast and easy to use administration; solution to handle Big Data.  Cross-platform support – check if the solution can be deployed on popular 2 The Maturity Model for BigData operating systems Solutions (Unix/Linux/Windows). Since Internet of Big Data management systems receive data Things gained momentum, an important from various sources, gathering a huge question to answer is whether the solution volume of data often represented by can be deployed on resource constrained heterogeneous and diverse dimensionalities environments such as Raspberry Pi; [3]. Different information collectors prefer  Professional Certification – which are the their own schemata or protocols for data certification programs that give recording, and the nature of different individuals or businesses valid applications also result in diverse data certifications in the respective representations. Autonomous data sources technology. with distributed and decentralized controls are a main characteristic of Big Data applications Business Maturity [6]. As a result, an analysis on some of the  Market usage – measures the adoption by most popular Big Data technologies is aiming the enterprise; at supporting companies in selecting the  Cloud computing support – the Big Data proper solution for their needs. solution is offered as a cloud service by In an adaptive organization, measurement and largest cloud computing providers; analysis can be valuable tools for  Licensing policy – analyzes the understanding the present environment and conditions for use – free or commercial evaluating the effectiveness of our actions. use; Advances in Internet and mobile technologies  Enterprise support – does the solution have dramatically expanded the scope and rate offer enterprise support, and under which at which certain types of information can be terms. collected. [7]

This paper proposes to analyze the maturity of 3 Technical Maturity of Big Data Solutions some of the most used Big Data technologies. This chapter presents an analysis regarding In order to achieve its goal, the analysis is the software versions, minor and major performed considering the following model: releases, solving time of issues, number of

contributors in technology development, Technical Maturity support libraries for building client interfaces,

 Development status – describes the documentation available to users and the offer current developments status of the of trainings and certifications. solution. The frequency in releasing

DOI: 10.12948/issn14531305/21.1.2017.05 Informatica Economică vol. 21, no. 1/2017 63

Average time for resolving critical issues 3.1 Development Status (in days/issue) – Represents the average Definitions of terms as considered in this period of time needed by the developers to paper: solve a critical issue reported. It is calculated Major versions supported – Represent the as the average solving time for the latest 20 major releases of the software that are known reported issues. We used Atlassian currently supported by the producers. JIRA to calculate the solving time for Minor versions – Represent the minor MongoDB, Hadoop, Cassandra and VoltDB releases of the software, versions that are and GitHub for Redis. In the case of Redis we released as improvements to a major versions. considered 10 critical bugs, as other reported Only stable releases are considered. bugs were before than the year 2012. Active contributors – The number of In case of exceptions time (more than 5 times contributors to the development of the than average time) we considered to be software. exceptional situations and did not consider in the calculus.

Table 1. Development status of BigData solutions MongoDB Hbase/Hadoop Cassandra VoltDB Redis Number of major 2 2 2 2 2 versions supported Number of minor 14 2 23 8 9 versions released in the last 12 months Number of minor 22 3 52 33 17 versions released in the last 24 months Number of active 280 104/87 200 65 220 contributors Average time in days 2.4 7 6.3 27.5 18 for resolving critical issues[days/issue]

3.2 Support Libraries Hadoop has native implementations of This chapter describes the support provided certain components for performance reasons for developing client interfaces in order to and for non-availability of Java access the solutions. A client interface allows implementations. These components are programs that are written in various available in a single, dynamically-linked programming languages to interact with the native library called the native Hadoop database by using native functions of the library. The main components are HDFS, Map language. The client interface handles the Reduce, Yarn. It offers web service REST requests and translates them into a APIs which is a set of URI resources that give standardized communication protocol with access to the cluster, nodes and applications the database. (MapReduce Application Master and MongoDB uses the term driver when it comes MapReduce History Server REST API’s). to a client library that manages the interaction Cassandra with the database in a language that is also The recommended practice for the projects is appropriate for the respective application. to use CQL – Cassandra Query Language. Beside the official supported libraries, there However there are client libraries that are are also two Community Supported drivers for supported by Cassandra. Go and Erlang. VoltDB uses Java for stored procedures and for the primary client interface. Nevertheless

DOI: 10.12948/issn14531305/21.1.2017.05 64 Informatica Economică vol. 21, no. 1/2017 client interfaces can be written in a large (packaged with VoltDB), JSON (packaged variety of programming languages. A part of with VoltDB), Node.js, ODBC, PHP, Python the client interfaces are developed and Redis packaged inside the distributed kit. There are There are many client libraries supported by also client interfaces which are compiled and Redis. For certain languages, there are clients distributed in the form of a separated kit. that are recommended by Redis for use. New Available client interfaces: #, C++, Erlang, clients can be developed added in the Redis Go, Java (packaged with VoltDB), JDBC repository. Redis documentation provides instructions on how to create a client.

Table 2. Support libraries for BigData solutions MongoDB Hbase/ Cassandra VoltDB Redis Hadoop Access Proprietary Java API, CQL(Cassandra Query Java API RESP - REdis methods protocol on RESTful HTTP Language), RESTful Serialization BSON/JSO API, Apache HTTP/JSO Protocol N Thrift N API JDBC Client C, C++, C#, Java : Apache Java - Achilles, C#, C++, ActionScript, libraries Java, Yarn Client; Astyanax, Casser, Datastax, Erlang, Go, Bash, C, C#, Node.js, Java driver, Kundera Java, C++, Clojure Perl, PHP, Python - Hadoop PlayORM JDBC, Common Lisp, Python, YARN API, Python -Datastax Python JSON, Crystal, D, Ruby, Scala, Snakebite driver Node.js, Dart, Delphi, Casbah provides a pure Ruby - Datastax Ruby driver, ODBC, Elixir, emacs (Scala python HDFS C#/.NET - Cassandra Sharp, PHP, lisp, Erlang, Driver). client. Datastax C3 driver, Fluent Python Fancy, gawk, Go, Erlang Cassandra GNU Prolog, PHP - PHP- NodeJS - Datastax Nodejs Go Haskell, Hadoop-HDFS is driver, Haxe, Io, Java, a wrapper for Node-Cassandra-CQL Julia, Lasso WebHDFS and PHP - CQL | PHP, Datastax Lua, Matlab, CLI hadoop fs. PHP driver, PHP-Cassandra, mruby, Nim, PHP Library for Cassandra Node.js, Ruby – webhdfs a C++ - Datastax C++ driver, Objective-C client for Hadoop libQTCassandra OCaml, Pascal, WebHDFS Scala - Datastax Spark Perl, PHP, connector, Phantom, Quill PL/SQL, Pure Clojure - Alia, Cassaforte, Data Python, Hayt R, Racket, Erlang - CQerl, Erlcass Rebol, Ruby, Go – CQLc, Gocassa, Rust Scala, GoCQL Scheme, Haskell – Cassy Smalltalk, Rust - RustCQL Swift, Tcl, VB, VCL

3.3 Guidelines and Documentation both for MongoDB Community Edition and An analysis was performed on the Enterprise. There are resources available on documentation provided by each solution installation using MongoDB Cloud Manager. regarding the installation, administration and The administration documentation addresses other features. the ongoing operation and maintenance of MongoDB – provides manuals for both MongoDB instances and deployments. It installation and administration of solution. includes both high level overviews as well as The installation guidelines are given tutorials that cover specific procedures and independently for Windows, Linux and OS X, processes for operating MongoDB. [8]

DOI: 10.12948/issn14531305/21.1.2017.05 Informatica Economică vol. 21, no. 1/2017 65

Developers can find detailed instructions on VoltDB. The manual addresses the vast the configuration, maintenance, upgrading, majority of instructions required to properly monitoring or backup. administer the database. A guide providing instructions on how to get There is documentation available regarding started with MongoDB fast is available in best practices in the area, Getting started editions for mongo Shell, Python, Java, Ruby, tutorial, Performance and Customization, Java NodeJS, C++, C#. API or Client Wire protocol. VoltDB – The solution comes as either pre- Hadoop/Hbase: - provides documentation built distributions or as source code. The about last release notes, about native libraries installation documentation specifies system and security mode. It provides also requirements for running VoltDB, installation documentation for the architectures, and upgrading, description of the resources commands and tools of main components included in the distribution kit. (HDFS, Map Reduce, Yarn). Also are detailed The administration guidelines are detailed in guidelines and documentation for integration providing information on how to properly with other systems, for authentication design, develop and run an application On methods and other useful tools.

Table 3. Guidelines and documentation of BigData solutions MongoDB Hbase/Hadoop Cassandra VoltDB Redis Platforms Windows - Supports a Ubuntu 14.04, Server 2008R2 Cross-platform Cross-platform variety of 16.04, 64 bit; and later, Vista platforms, but and later the following CentOS / are RHEL 6.5, OS X -v 10.7 recommended 6.6, 6.7 64 and later in production: bit; • Amazon Linux - Amazon Linux CentOS / Linux 2013.03 • Debian 7.1 RHEL 7.0, and late, Debian • RHEL / 7.1, 7.2, 64 7 & 8, CentOS 6.2+ bit; RHEL/CentOS • SLES 11+ 6.2 and later, • Ubuntu LTS Oracle Linux SLES 11 & 12, 12.04 6.5, 64 bit. Solaris 11 64- • Ubuntu LTS bit, Ubuntu 14.04 12.04 & 14.04 • Windows &16.04 Server 2012 & 2012 R2 Other JDK v8 (Java The latest - Dependencies Development version of Java Kit) 8, either the Oracle Java Standard Edition 8 or OpenJDK 8. For using cqlsh, the latest version of Python 2.7. IoT platform Yes Yes Yes No Yes support (Raspberry Pi, Arduin)

DOI: 10.12948/issn14531305/21.1.2017.05 66 Informatica Economică vol. 21, no. 1/2017

Cassandra - provides documentation divided 3.4 Professional Certification in several sections such as Installation, MongoDB – provides Professional Operation, Data Modeling, Cassandra Tools, Certification Program – includes online CQL or Configuration. As a drawback for classes and offers both public and private unexperienced user, the documentation is training. Registered attendees can take a currently a work-in-progress and there are a Certification Exam. The certifications offered number of sections which lack the necessary are: instructions.  Developer Associate: adequate to those The installation manual contains instructions with knowledge on the fundamentals of regarding the correct install method, the application design and build using prerequisites required to run and instructions MongoDB, software engineers with a for configuring a cluster of nodes. The good knowledge of the fundamentals and administration manual still has some chapters with professional experience in app that are in progress and do not contain development. information, such as Repair, Read Repair or  DBA Associate: adequate to Hints. administrators that understand MongoDB Redis – when it comes to installation it concepts and mechanics, operations provides a Quick Start manual, with professionals that master the theoretical instructions on how to get it running properly. fundaments and already have the The complete Redis documentation is vast and experience in managing MongoDB. divided in chapters such as Programming, Hbase/Hadoop – There are several Tutorials & FAQ, Administration, certifications partners of Apache, the most Troubleshooting, Redis Cluster. important being Cloudera [9] and MapR Also a list containing all the commands Academy [10]. implemented by Redis and documentation Cloudera offers three certification levels - about each of them has proven to be a very CCA Spark and Hadoop Developer, CCA useful document for developers. Data Analyst, Cloudera Certified Administrator for Apache Hadoop (CCAH). MapR offers courses on-demand or instructor- led.

Table 4. Certifications for BigData solutions MongoDB Hbase/Hadoop Cassandra VoltDB Redis Certifications Own Provided in Provided in Own Does not certification partnership – partnership – certification provide system Cloudera, DataStax which system certifications MapR offers DataStax Certification Enterprise, Practice production Questions certified (MCHBD Apache Cassandra, with 24 x 7 x 365 support

Cassandra – Training and certifications are offered on-line. It addresses the basic offered by DataStax. There are also challenges encountered when it comes to partnerships between DataStax and third- scaling relational databases, introduction parties, such as O’Reilly, to provide to Cassandra fundamentals - consistency, certifications. replication, anti-entropy operations, data  Self-pace course – beginner courses modeling and the use of DtaStax

DOI: 10.12948/issn14531305/21.1.2017.05 Informatica Economică vol. 21, no. 1/2017 67

Enterprise. Hbase/Hadoop  Instructor-led training – this is considered  Facebook uses HBase for its messaging to be an expert-level training, performed platform; on-site, and delivers a practical approach.  Spotify uses HBase as base for Hadoop The course also focuses on team-work and machine learning jobs; based development. [11]  Sophos uses for some of their back-end VoltDB – offers courses and certification. systems VoltDB University Certification is designed Cassandra to provide training and certifications for  CERN used Cassandra-based prototype partner organizations, customers or for its ATLAS experiment to archive the individuals. Basic lessons and presentations online DAQ system's monitoring are available on-line. information; Redis – at this moment does not provide an  Facebook used Cassandra to power Inbox official certification program. There are Search, with over 200 nodes deployed; trainings offered by third-parties, but without  IBM has done research in building a the possibility of obtaining an official scalable email system based on certification from Redis. Cassandra;  Discord uses Cassandra to store over 120 4 Bussiness Maturity of Big Data Solutions million messages per day; 4.1 Market Usage  Wikimedia uses Cassandra as backend In this chapter the market adoption of Big storage for its public-facing REST Data technologies is highlighted, along with Content API cases of successful developments of Big Data VoltDB has a wide area of application analytics systems. including high-velocity applications that MongoDB is widely used and proved in the thrive on fast streaming data, such as telecom recent years to be a fast and easy to deploy policy and billing applications, sensor solution to handle Big Data. It is vastly used applications like smart grid power systems, in major log systems, mobile applications, real-time digital advertising platforms, content-centric applications, Cloud analytics for online gaming, and applications applications. Some well-known for risk and fraud detection. VoltDB is implementations are [12]: successfully used by large companies [13]:

 LinkedIn used MongoDB’s intuitive  Nokia Networks - has deployed VoltDB to schema design and quick document search provide real-time data solutions that to develop LearnIn, an internal learning enhance mobile subscriber services platform. through Nokia Open Telecom Application  Forbes now offers an end-to-end Server. publishing platform built on MongoDB as  MAXCDN - uses VoltDB to provide real- a turnkey solution to other publishers, time analytics on their system. both as SaaS and an on-premise solution,  AsiaInfo - a leading supplier of IT driving new revenue and expanding solutions and services chose VoltDB as a business. key component of its Big Data analytics  Gov.uk – the United Kingdom government product – Veris C3 - to perform real-time website uses MongoDB to power its analytics and decision-making on fast- content API, providing data storage in the moving data. It uses a real-time data feed cloud. MongoDB was used for its scaling to deliver up-to-date information in order and data processing capabilities. to optimize customers’ experience of  Royal Bank of Scotland - MongoDB mobile marketing campaigns. supports the global bank's enterprise data Redis is adequate for highly scalable data service which is underpinning several core store shared by multiple processes, multiple trading systems. applications, or multiple servers.

DOI: 10.12948/issn14531305/21.1.2017.05 68 Informatica Economică vol. 21, no. 1/2017

Communications cross-platform, cross- Cloud computing provides a reliable, fault- server, or cross-application makes it a pretty tolerant, available and scalable environment great choice for many use cases. to harbour big data distributed management  GitHub uses Redis as a persistent systems. Storing and processing big volumes key/value store for the routing information of data requires scalability, fault tolerance and and a variety of other data; availability. Cloud computing delivers all  Digg just rolled out a new feature, these through hardware virtualization. Thus, cumulative page event counters (page big data and cloud computing are two views plus clicks), that is using Redis as compatible concepts as cloud enables big data its underlying solution; to be available, scalable and fault tolerant.  StackOverflow uses Redis as a caching [14] layer for the entire network. The list of the major cloud computing vendors is based on the ranking from Synergy 4.2 Cloud Computing Support Research Group. The market share of the In this chapter we analyze the availability of four cloud providers is estimated at about 63% the given Big Data solution on four major of the total market, with Amazon being in the cloud providers. lead with 40% of the total market [15].

Table 5. Cloud support for BigData solutions MongoDB Hbase/Hadoop Cassandra VoltDB Redis Amazon Web x x x x x Services Microsoft x x x x x IBM x x x x x

Google x x x x x

Cloud computing is another paradigm which in January 2004 and includes easier usage for promises theoretically unlimited on-demand non-ASF projects, improved compatibility services to its users. The virtualization of with GPL-based software, allow the license to resources allows abstracting hardware, be included by reference instead of listed in requiring little interaction with cloud service every file. providers and enabling users to access The BSD 3-clause license allows user almost terabytes of storage, high processing power, unlimited freedom and redistribution for any and high availability in a pay-as-you-go purpose with the software as long as its model [16]. While all the technologies that are copyright notices and the license's disclaimers analyzed in this paper are offered by the major of warranty are maintained. Cloud providers, the deployment methods are The GNU Affero General Public License is a different. free, copyleft license for software specifically designed to ensure cooperation with the 4.3 Licensing Policy community in the case of network server The technologies presented in this paper are software. Version 3 allow the transfer of a distributed under the following licenses: work formed by linking code licensed under The Apache License (ASL) is a permissive the one license against code licensed under the free written by the Apache other license, despite the licenses otherwise Software Foundation (ASF). The license not allowing relicensing under the terms of allows the user of the software the freedom to each other. use the software for any purpose, to distribute The GNU General Public License is a widely it, to modify it, and to distribute modified used free software license, which guarantees versions of the software, under the terms of end users the freedom to run, study, share and the license. Apache License 2.0 was released modify the software. Version 3 brings

DOI: 10.12948/issn14531305/21.1.2017.05 Informatica Economică vol. 21, no. 1/2017 69 changes in relation to software patents, free of "source code", and hardware restrictions on software license compatibility, the definition software modification.

Table 6. Licensing policy for BigData solutions MongoDB Hbase/Hadoop Cassandra VoltDB Redis Open-source Apache Apache BSD 3 clause AGPL GPL version Licensing License 2 License 2 version 3 3

4.4 Enterprise support development of packaging and tests of the Cassandra is distributed under the The Apache Hadoop ecosystem; Apache License 2 and commercial  Cloudera offers its enterprise customers a distributions are provided by DataStax. group of products and services that DataStax Enterprise contains the only complement the open-source Apache production-certified version of Cassandra on Hadoop platform; the market along with other enterprise  DataStax provides a product which fully components like integrated analytics, search, integrates Apache Hadoop with Apache multi-model functionality (including graph) Cassandra and Apache Solr in its DataStax and much more. Enterprise platform; Impetus Technologies is a thought leader in  IBM InfoSphere BigInsights brings the Big Data working exclusively for ISVs and power of Apache Hadoop to the large enterprises. Impetus offers services enterprise; around Cassandra and has supported multiple  Emblocsoft delivers enterprise Hadoop organizations in rolling out Cassandra based edition based on Apache Hadoop to meet solutions. the demand of enterprise data processing: Instaclustr provides managed Apache . Big Data data visualization and Cassandra hosting on Amazon Web Services, advanced analytics; Google Cloud Platform, Microsoft Azure, and . Real time stream processing; SoftLayer. Instaclustr also provides expert- . Machine learning at scale; level consultancy. [17] . Enterprise integration.  DataTorrent is certified on Apache Redis is open source software released under Hadoop, and all leading distributions. The the terms of the three clause BSD license. The DataTorrent platform includes built-in licensing model is subscription based. Redis fault tolerance and auto-scaling and can Cloud is priced according to data capacity process billions of events/second. [18] (both fixed and pay-as-you-go plans are available), whereas RLEC is licensed VoltDB is available in both open source and according to the number of shards (Redis enterprise edition. The open source, or processes) in a cluster. Redis Cloud offers a community edition, provides basic database free-for-life tier that's limited to a single functionality with all the transactional database of up to 30MB. RLEC is available as performance benefits of VoltDB. The a free download that's only soft-limited by the enterprise edition provides additional features number of shards. needed to support production environments, A number of companies provide products such as high availability, durability, and which include Apache Hadoop, commercial export integrations for transactionality or support, and/or tools and utilities related to dynamic scaling. The scaling is done Hadoop: automatically, as opposed to Community  Amazon offers a version of Apache edition where the scaling must be performed Hadoop on their EC2 infrastructure, sold manually. The Enterprise edition gives the as Amazon Elastic MapReduce; benefit of unlimited customer support. [19]  Apache Bigtop is a project for the

DOI: 10.12948/issn14531305/21.1.2017.05 70 Informatica Economică vol. 21, no. 1/2017

5 Conclusions References In a world where everything tends to be [1] H. Hashem, D. Ranc, Pre-Processing And connected, Big Data conquered the spotlight Modeling Tools For BigData, when it comes to innovation and competition, Foundations Of Computing And Decision providing both challenges and opportunities Sciences, Vol. 41, No. 3, 2016, ISSN in the IT global landscape. Collecting and 0867-6356 analyzing huge volumes of data can empower [2] D. Loshin, Big Data Analytics, 2013, business decisions and create competitive ISBN: 978-0-12-417319-4 advantages for those who choose the [3] J. Zhang, M. L. Huang, Z.P. Meng, Visual appropriate Big Data solutions. Analytics for BigData Variety and Its The vast majority of large companies have Behaviours, Computer Science and already adopted Big Data technologies to Information Systems 12(4):1171–1191 perform analytics. The present paper analyses [4] I. Lahmer, N. Zhang, Towards a Virtual five major technologies, widely adopted in the Domain Based Authentication on market, from two perspectives – technical MapReduce, IEEE Access, vol. 4, 1658 – maturity and business maturity. 1675, 2016. The selection process of the appropriate [5] VoltDB documentation, technology stack can be difficult, but https://docs.voltdb.com/ considering there is an entire class of [6] X. Wu et al., 2014. Data mining with big performant solutions it lowers the risk of data. IEEE Transactions on Knowledge taking a bad decision. The solution should and Data Engineering map best to the particular objectives of the [7] Big Data Now: 2014 Edition, Current company and solve core issues, which is more Perspectives from O’Reilly Media, 2015, complex than the regular price/performance ISBN: 978-1-491-91736-7 evaluation. Other important consideration [8] MongoDB Manual, should be the personnel knowledge level that doc.mongodb.com/manual/administration will dictate what type of support is required. [9] Cloudera certification, This paper provides a perspective on Big Data https://www.cloudera.com/more/training/ solutions based on the customer needs; certification.html basically, depending on the customer goals [10] MapR certification, the appropriate Big Data solution can be www.mapr.com/services/mapr- adopted. This involves a match of the business academy/MapR-Certified-HBase- requirements to the advantages/disadvantages Developer of each solution. It is evidently unwise and not [11] Datastax courses, recommended to commit to a Big Data https://academy.datastax.com/courses technology without first assessing its potential [12] Who uses mongodb, value and characteristics. www.mongodb.com/who-uses-mongodb [13] VoltDB partners, Acknowledgment www.voltdb.com/partners This work was been carried out as part of the [14] P. Caldeira-Neves, B. Schmerl, J. project ID_34_462, SMIS 2014+ from Bernardino, J. Cámara1, Big Data in Programul Operațional Competitivitate 2014- Cloud Computing: features and issues, 2020 Axa prioritară 1 – Crearea de laboratoare International Conference on Internet of privind cercetarea datelor de mari dimensiuni Things and Big Data , 2016 în vederea dezvoltării unor produse inovative [15] Synergy Research group, și a unor aplicații în domeniul internetul https://www.srgresearch.com/articles/mic viitorului (Creating research laboratories for rosoft-google-and-ibm-charge-public- BigData in order to develop innovative cloud-expense-smaller-providers products and applications in the field of [16] J. González-Martínez et al., 2015. Cloud Future Internet). computing and education: A state-of-the-

DOI: 10.12948/issn14531305/21.1.2017.05 Informatica Economică vol. 21, no. 1/2017 71

art survey. Computers & Education, 80, [18] Hadoop support, pp.132–151 https://wiki.apache.org/hadoop/Distributi [17] Third party support for Cassandra, ons%20and%20Commercial%20Support https://wiki.apache.org/cassandra/ThirdP [19] VoltDB editions, artySupport https://www.voltdb.com/edition

Radu BONCEA is a Researcher at I.C.I. Bucharest and Project Manager at Anagrama. He has been involved in several large European projects such SPOCS, eSENS, Cloud for Europe and The Once Only Principle. As a Ph.D. student at Electronics, Telecommunications and Information Technology, he’s interested in IoT and Cloud Computing related technologies.

Ionuț PETRE graduated the Faculty of Electronics, Telecommunications and Information Technology in 2005. He is a PhD student at University Lucian Blaga from Sibiu, Faculty of Management. Currently he works as Researcher at I.C.I Bucharest and Anagrama. His main areas of interest are Internet of Things, e-Government, Big Data, software engineering, digital libraries, transport research. He is involved in research projects specific to the Information Society. His research was published in journal articles and proceedings of conferences.

Dragoș SMADA graduated the Faculty of Electronics, Telecommunications and Information Technology in 2005. In 2012 he graduated the Documents Information Management Master program organized by the University of Bucharest. Currently he works as a Senior Researcher at I.C.I Bucharest and Anagrama. His main areas of interest are Big Data, Internet of Things, software engineering, information security, software architecture. He participated in both national and international research projects in the IT&C. He published as author and co-author of journal articles and scientific presentations at conferences.

Alin ZAMFIROIU has graduated the Faculty of Cybernetics, Statistics and Economic Informatics in 2009. In 2011 he has graduated the Economic Informatics Master program organized by the Bucharest University of Economic Studies and in 2014 he finished his PhD research in Economic Informatics at the Bucharest University of Economic Studies. Currently he works like a Senior Researcher at I.C.I Bucharest and Anagrama. He has published as author and co-author of journal articles and scientific presentations at conferences.

DOI: 10.12948/issn14531305/21.1.2017.05