An Intelligent Approach for Handling Complexity by Migrating from Conventional Databases to Big Data
Total Page:16
File Type:pdf, Size:1020Kb
S S symmetry Article An Intelligent Approach for Handling Complexity by Migrating from Conventional Databases to Big Data Shabana Ramzan 1, Imran Sarwar Bajwa 1,* and Rafaqut Kazmi 2 1 Department of Computer Science & IT, Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan; [email protected] 2 School of Computing, University of Technology Malaysia, Johor 81310, Malaysia; [email protected] * Correspondence: [email protected] Received: 26 October 2018; Accepted: 14 November 2018; Published: 3 December 2018 Abstract: Handling complexity in the data of information systems has emerged into a serious challenge in recent times. The typical relational databases have limited ability to manage the discrete and heterogenous nature of modern data. Additionally, the complexity of data in relational databases is so high that the efficient retrieval of information has become a bottleneck in traditional information systems. On the side, Big Data has emerged into a decent solution for heterogenous and complex data (structured, semi-structured and unstructured data) by providing architectural support to handle complex data and by providing a tool-kit for efficient analysis of complex data. For the organizations that are sticking to relational databases and are facing the challenge of handling complex data, they need to migrate their data to a Big Data solution to get benefits such as horizontal scalability, real-time interaction, handling high volume data, etc. However, such migration from relational databases to Big Data is in itself a challenge due to the complexity of data. In this paper, we introduce a novel approach that handles complexity of automatic transformation of existing relational database (MySQL) into a Big data solution (Oracle NoSQL). The used approach supports a bi-fold transformation (schema-to-schema and data-to-data) to minimize the complexity of data and to allow improved analysis of data. A software prototype for this transformation is also developed as a proof of concept. The results of the experiments show the correctness of our transformations that outperform the other similar approaches. Keywords: big data; complexity; NoSQL databases; Oracle NoSQL; data migration 1. Introduction The modern information systems have to deal with high-dimension data in terms of gigantic size, and the heterogenous and complex nature of the data. Similarly, the cloud applications and social media applications also have to store, manage and process a massive amount of data. However, the Relational Databases (RDBs) have fixed schema and allow storage and handling of only structured data in the form of tuples or relations [1]. Additionally, the RDBs only provide vertical scalability (vertical scalability allows only vertical growth of a data-structure by adding only new records at run-time.) at higher hardware cost but no horizontal scalability (horizontal scalability allows horizontal growth of a data-structure by also allowing the addition of fields at run-time.) is provided by the RDBs. Since horizontal scalability is needed by today’s software applications to handle high-speed heterogenous data; currently, the relational databases have to face various challenges at the application development level and operational level. At the application development level, the system developer needs high coding velocity to handle large number of users; however, such capability is not available in relational databases. Additionally, modern complex and heterogenous data needs horizontal scaling Symmetry 2018, 10, 698; doi:10.3390/sym10120698 www.mdpi.com/journal/symmetry SymmetrySymmetry2018, 102018, 698, 10, x FOR PEER REVIEW 2 of 202 of 20 databases and consequently, they fail to cope with the needs of modern data-intensive software but thatapplications. feature is also not provided by the relational databases and consequently, they fail to cope with the needsOnce of the modern key challenges data-intensive in recent software times applications. has been to handle high-speed data, as there is a rapid increaseOnce of thein digital key challenges information, in recent exponentially times has growing been to handle (see Figure high-speed 1) to Petabytes data, as there (PB) isPB a = rapid 1000 TB) increasefrom in Terabytes digital information, (TB) 1TB = exponentially1000 GB, and growingeven to Exab (seeytes Figure (EB)1) to1EB Petabytes = 1000 TB (PB) as PBshown = 1000 in Figure TB) 1. fromJohn Terabytes Gantz (TB) and 1TB David = 1000 Reinse GB, andalso even predicted to Exabytes this phenomenon (EB) 1EB = 1000 [2]. TBTypical as shown relational in Figure database1. Johnsystems Gantz and have David shown Reinse their also limits predicted for such this exponentia phenomenonl growth [2]. Typical of data. relational The sh databaseortcomings systems of typical haverelational shown their databases limits for are such addressed exponential by Big growth data ofsolutions data. The such shortcomings as NoSQL of databases typical relational [3–5]. Here, databasesNoSQL are stands addressed for “Not by Big Only data SQL”. solutions Such such databases as NoSQL are databasescurrently [the3–5 ].main Here, focus NoSQL of research stands for due to “Notthe Only fast SQL”. and Suchpersistent databases growth are of currently data. The the NoSQ mainL focus was ofintroduced research due in 1998 to the by fast Carlo, and persistentand the name growthgiven of data.to his Therelational NoSQL database was introduced solution that in 1998 was bydue Carlo, to not and using the Structured name given Query to his Language relational (SQL) database[6]. The solution idea thatof NoSQL was due was to redefined not using Structuredin 2009 and Query became Language the competitor (SQL) [6 ].of The RDBs. idea Now of NoSQL they have was redefinedbecome the in backbone 2009 and of became large-sized the competitor enterprises of such RDBs. as NowGoogle, they Twitter, have become Facebook, the Amazon, backbone etc. of due large-sizedto its peculiar enterprises features such such as Google, as high Twitter, availability Facebook, (when Amazon, a data etc.is automatically due to its peculiar distributed features evenly suchacross as high a availabilitycluster with (when no single a data master.), is automatically efficient performance, distributed evenly horizontal across sc a clusteralability, with and no the single support master.),of a efficientvariety performance,of data models horizontal and queries. scalability, More andover, the the support rapid of growth a variety of of cloud data modelscomputing and has queries.highlighted Moreover, the the problems rapid growth that ofare cloud endured computing in handling has highlighted large volumes the problems of data. that However, are endured NoSQL in handlingdatabases large can volumes handle of“Big data. Data” However, problems NoSQL efficiently databases rather can than handle RDBs. “Big These Data” problemsdatabases are efficientlybecoming rather popular than RDBs. because These they databasesare providing are becominga high level popular of scalability. because Additionally, they are providing they are a very highefficient level of in scalability. handling Additionally,the unstructured they data are to very facilitate efficient universal in handling data communication the unstructured [7] data in modern to facilitateinformation universal systems. data communication Relational [databases7] in modern follow information the ACID systems. (Atomicity, Relational Consistency, databases follow Isolation, the ACIDDurability) (Atomicity, and BASE Consistency, (Basically Isolation, available, Durability) Soft-state, and Eventual-consistency BASE (Basically available,) properties. Soft-state, Whereas, Eventual-consistency)NoSQL databases properties.exist in a spectrum Whereas, between NoSQL ACID databases and existBASE in alliance. a spectrum between ACID and BASE alliance. Source: IDC's Digital University Study,sponosored by EMC, June 2011 8000 6000 4000 2000 0 2005 2010 2015 Figure 1. Exponential growth of digital information, going to Exabyte. Figure 1. Exponential growth of digital information, going to Exabyte. The NoSQL databases have typically four different models: (1) key-value store, (2) column store, (3) graphThe store, NoSQL and (4) databases document have store typically [8]. Each four NoSQL different database models: model (1) key-value has its own store, distinct (2) column schema store, of storing(3) graph data store, [9]. The and most (4) document simple and store flexible [8]. Each model NoSQL is a key database value storemodel that has is its used own in distinct our study schema and anof storing overview data of key-value[9]. The most stores simple is given and below:flexible model is a key value store that is used in our study and an overview of key-value stores is given below: 1.1. Key-Value Stores 1.1.A Key-value Key-Value store Stores is a database that stores data in the form of associative arrays known as a hash or a dictionary.A Key-value Each dictionary store is a hasdatabase a collection that stores of records data in that the have form different of associative fields arrays of data. known They storeas a hash dataor as a key-valuedictionary. (record) Each dictionary pair as shown has a incollection Figure2. of Each records value that is stored have anddifferent retrieved fields through of data. a They uniquestore key. data A valueas a key-value is a dataof (record) an arbitrary pair as type, shown size in and Figure structure. 2. Each Here, value value is stored can be and anything retrieved