International Research Journal of Modernization in Engineering Technology and Science Volume: 01/Issue: 01/December-2019 www.irjmets.com

AN OVERVIEW OF BIG DATA STORAGE AND MANAGEMENT TOOLS (SQL, NoSQL, NEWSQL DATABASES)

1Satish Chandra Reddy Nandipati, 2Chew XinYing*, 3Mohd Adib Omar 1,2,3School of Computer Sciences, 11800, Universiti Sains Malaysia, Pulau Pinang, Malaysia

ABSTRACT

A huge amount of data referred to as ‘big data’ which are in the form of structured, semi and unstructured data produced by various organizations is said to have a huge potential in wide range of sectors. The big data is stored and managed by relational (SQL and New SQL) and non-relational (NoSQL) database management systems for easy data accessing and analysis. This paper performs a brief literature review to elicit the general background of storage & management tools, analytical platforms, analysis and visualization tools that are used to handle the big data and its management systems. A brief history, evolution, some of the characteristic features, comparison, advantages, strengths and weakness, and performance of three SQL databases with respect to functional and non-functional features are explained. Finally moved to the application of these databases in healthcare, education, and transportation, etc. Besides advantages, these databases possess disadvantages that have been overcome by up-gradation of databases and emerging new databases such as NewSQL, and in the view of this the future of the databases has been illustrated in brief.

KEYWORDS: Big Data Management, SQL Databases, NoSQL Databases, New SQL Databases, Application of Databases.

I. INTRODUCTION

The year 1937-1943 has been known to be the history of data project i.e., during the time of the 2nd world war to interpret Nazi codes by the British. A large amount of the data is produced and shared by different methods from different organizations such non-profit sectors, industry, scientific research, public administrations businesses and data related to earth, ocean, astronomy which are in the form of structured data (spreadsheets, relational data,), semi-structured data (CSV file, JSON documents, XML file) unstructured data (doc, pdf, email, audio, video and social media) [1-2]. The basics difference between traditional and big data is shown in Table1.

Table 1. Comparison between Big data and traditional [3]

Big Data Traditional Type of data Semi and Unstructured Structured Rate of data generation Rapid More time Sources of data Multiple sources Centralized Volume of data Peta and Zetta bytes Mega & Giga byte Data storage No SQL, Hadoop Distributed File System RDBMS

The characteristics of big data consists of 3Vs [volume, velocity and variety], 4Vs [volume, velocity, variety, and variability], 6Vs [volume, velocity, variety, veracity, variability, and value], 7Vs, 10Vs and 42 Vs [2-3]. The data produced by different domains has a huge potential in improving the decision-making process in health care, predicting natural catastrophe, productivity, energy futures and economics [4]. Apart from advantages, the capturing, pre-processing, storage and management, sharing, data exploration, security, and privacy hasbeen an important challenge in big data analysis [5]. The first supercomputer was not able to process the big data which leads to a great challenge in handling this big data. The enhancement in computer technology has made possible www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [25]

International Research Journal of Modernization in Engineering Technology and Science Volume: 01/Issue: 01/December-2019 www.irjmets.com

that huge data can be managed without supercomputer and with less cost, by storing over the network. The storage and management of big datasets with reliability and availability of data accessing are refereed as ‘big data storage and management’. Based on the interfaces and functions difference, the data storage and management applications are divided into two parts they are file system (a filesystem organizes information on a hard drive, which controls storage and retrieving of the data) and a database (organized collection of data stored in a computer that are easily accessible). The users and programmers are provided with a software package to create, update, retrieve and manage data are referred as database management systems (DBMS). The four different types of DBMS are hierarchical databases, relational databases, object-oriented databases and network databases [6]. The identification and access of the data in relation to other data in a database are referred to as ‘relational database’. The collection of programs for maintaining the data which allows to create, update and administrate a relational database is referred as ‘relational database management system’ (RDBMS). The traditional RDBMS uses Structured Query Language (SQL) as a communication media for structured data analysis and management, this method utilizes more expensive hardware. Apart, this traditional RDBMS was not able to handle the heterogeneity and huge volume of big data obtained by semi-structured and unstructured data. To overcome this obstacle, different perspectives have been put forward by the research community and proposed that distributed file systems and NoSQL databases are of good choice to manage semi and unstructured data [2]. Apart from the above mention databases and to overcome some of their disadvantages, the rise of modern RDBMS called as NewSQL (which provides scalable performance of NoSQL for read-write workloads or online transaction processing (OLTP), while maintaining the ACID properties of SQL database system (i.e., Scalable performance of NoSQL + ACID properties of RDBMS = NewSQL) [7]. Some of the big data storage and management tools, analytics platforms, analysis and visualization tools are given in Table 2 [3].

Table 2.Databases for Big Data storage, Analysis and Visualization tools [3]

Storage & management tools Analytics platforms Analysis Tools Visualization tools Apache Cassandra & HBase, Cloudera, Amazon Web Service, Apache, Storm, ChartBlocks, Datawrapper, CouchDB, Hive, Hypertable, Infinispan, Dreamer, Hadoop, IBM GridGain, HPCC, Jolicharts, Microsoft Power MongoDB, Neo4j, , Errastore, Big Data, KNIME, MaPR, Pivotal GemFire BI, Plotty, Tableau, Weave, ZohoReports, CockroachDB NuoDB, Altibase CockroachDB, Microsoft XD,VoltDB,

Azure, Open Refine

II. LITERATURE REVIEW

This section covers the history and evolution, highlights and advantages of three SQL databases a) History and evolution of SQL databases

The initial development of SQL took place with the effort of Donald D. Chamberlin and Raymond F. Boyce after meeting Edgar F. Codd in 1972 at IBM T.J. Watson Research Center in Yorktown Heights, New York. During that time the new way of organizing data is named as a “relational data model” by E.F. Codd. Later, Donald and Raymond decided to make relational language more accessible to users who are not so familiar with both mathematics and computer programming. They found two levels (mathematical notation and at semantic level) that have to be resolved in order to overcome this problem and found the solution by replacing the keywords with symbols (i.e., replacing with ‘project’ and ∀ with ‘for all’). Apart from this the query language does not have the scope to extend the language, update and administrative tasks for creation of new tasks and views. After Codd’s symposium Donald and Raymond spent almost a year designing language. Later, to work on a System R project, Donald and Raymond moved to Jose Research Laboratory and began another new language called Sequel (Structured English Query Language). They hope that with little practice one can easily read queries similar to English prose. The sequel is a declarative language since it describes the information. In 1974, after the presentation of a paper on Sequel at technical conference in Michigan, Raymond died due to www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [26]

International Research Journal of Modernization in Engineering Technology and Science Volume: 01/Issue: 01/December-2019 www.irjmets.com

ruptured brain aneurysm. After the death of Raymond, the Sequel language continued to be part of the System R project at San Jose Research Laboratory. Later, System R was installed in three IBM customer sites for experimental purpose, based on experience collected from early users and implements the complete Sequel language was designed and published in 1976. In 1977, Sequel is changed to SQL (structure Query Language) due to trademark issue [8]. In 1979, prior to IBM version (1981 and 1983), the SQL commercial product named Oracle is released by a company called Relational Software, Inc. In 1981, SQL/data system which is the first IBM product based on SQL was released, followed by DB2 in 1983 supporting many IBM platforms. The SQL consists of a set of properties for database transactions they are named as ACID (Atomicity, Consistency, Isolation, and Durability). b) Highlights of SQL Databases

SQL is a standard and open-source query language; queries can be written easily the same as you write in English thus easy to learn. It acts as a medium of communication between user and DBMS since it makes easy and quick access to the data from the database. With the advancements in SQL, it can now handle large pools of data of all sizes. The evolution of SQL technology shows it can run on laptops, PCs, servers, tablets, etc. SQL can be used for the following purposes. Some of the features of SQL databases are shown in Table 3.

1. Queries: The SQL query consists of query blocks which in turn consists of clauses (Select, From, Where, Group by, Having and so on) 2. Data Manipulation: SQL can be used to manipulate data by insert, Delete and Update statements. 3. Database Administration: SQL provides facilities for performing database administration tasks. These tasks fall into three general categories: data definition, access control, and active data features.

Table 3. Features of the SQL Databases [9-10]

SQL databases Features MySQL PostgreSQL Microsoft SQL Server License Open source Open source Commercial Language C and C++ C C++ Partitioning Horizontal Declarative Horizontal Replication Master-master/ Master-slave Master-slave Based on SQL-Server Edition Consistency Immediate Immediate Immediate User concepts Fine-grained authorization fine-grained access – as fine-grained access – as per to per SQL standard SQL-standard Developer / Release Oracle/ 1995 PostgreSQL Group/ 1989 Microsoft/ 1989

c) Advantages of SQL Databases

1. Coding not required: It does not require the writing of a considerable amount of code to manage a database. 2. High Speed: SQL queries can be helpful to retrieve a large amount of data from database quickly and efficiently when compared to other query languages. 3. Defined Standards: The SQL database maintains long - well-established standards adopted by American National Standards Institute (ANSI) in 1986 and International Organization of Standardization (ISO) in 1987. 4. The emergence of ORDBMS: The older SQL databases were identical to relational databases. With the emergence of Object-Oriented DBMS (ORDBMS), the storage capacity of the objects has been extended to relational databases. 5. Online analytic processing (OLAP): This feature of SQL is used to analyze large volumes of businesses related data. 6. Some of the other features which are related to recursion, XML and multimedia, functions and procedures are discussed [11].

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [27]

International Research Journal of Modernization in Engineering Technology and Science Volume: 01/Issue: 01/December-2019 www.irjmets.com

d) History and Evolution of NoSQL Databases

The SQL is one of the main components in the relational databases (store data in vertical scalability) are remarkably complex, rigid and have a limited ability to handle large datasets such as unstructured data. It cannot handle fast Create, Read, Update and Delete (CRUD) operations and are not cost-effective. To overcome this problem the introduction of new model which is named as NoSQL (termed as Not only SQL) has the capability to store data in a more flexible way (i.e., provides horizontal scalability towards any large scale of datasets). The term NoSQL is coined by Carlo Strozzi in 1998 and the basic fundamentals of NoSQL such as CAP (Consistency, Availability, and Partition tolerance) and BASE (Basically, Availability, Soft-State, and Eventual Consistency) were coined by Eric Brewer in 2000. Later in 2002, Seth Gilbert and Nancy Lynch changed CAP into CA (consistency with availability), AP & CP (availability & consistency with partition tolerance respectively) [12]. e) Highlights of NoSQL Databases

The NoSQL databases are evolved to overcome the limitations faced by RDBMS such as semi and unstructured data storage. NoSQL which is called non-relational or distributed databases has gained a lot of attention due to handling of easy scalable, access and manageable, high performance, fault tolerance and support multiple data structures due to horizontal scalability. The 4 general models/databases of NoSQL and their description are given in Table 4. Some of the features of NoSQL databases are shown in Table 5 [13-14].

Table 4. Description of Four Model of NoSQL Databases 4 Models Databases Description Key-value stores Redis, Berkley DB, Store basic information in the form of key and value. LevelDB, Dynamo DB Example: shopping cart data, user profiles, and sessions Column Store Databases , HBase, Stores large volumes of unstructured data. These are Cassandra, Hypertable mostly used for log aggregation, blogging platforms, etc. Document databases MongoDB, CouchDB Stores complex structures like web-based, e-commerce applications in the form of documents. Graph databases Neo4, Hypergraph DB To know the data relationships like social networks, geospatial data.

Table 5. Features Comparison of NoSQL Databases

Features NoSQL databases Redis Cassandra MongoDB Neo4j License Key-Value Column Document Graph store Read-Write-Delete Fast Slow-Fast-Fast Fast Data dependent Operations Implemented Language C Java C++ Scala and Java Partitioning Hash, Range Range Range Cache-based Replication Master-slave Masterless Master-slave Master–slave Consistency Eventual Configurable Immediate Eventual Developer / Initial Release Salvatore Sanfilippo/ MongoDB, Inc./ Apache Software Neo Technology/ 2009 2009 Foundation/ 2008 2007 f) Advantages of NoSQL

1. Does not contain schema 2. Key-value pairs are the main attributes for NoSQL databases 3. NoSQL databases include Key-value stores, Column Store, document store, graph store, XML store and so on. 4. Apart from storing simple strings values, some NoSQL database models allow developers to store serialized objects into the database. www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [28]

International Research Journal of Modernization in Engineering Technology and Science Volume: 01/Issue: 01/December-2019 www.irjmets.com

5. Open-source: NoSQL databases don’t require expensive licensing fees and can run on low-resources hardware, rendering their deployment cost-effective. 6. When compared to relational databases, NoSQL databases offer easier and cheaper scalation (because of horizontally scaling and distribution of the load on all nodes, when compared with vertical as in case of relational database systems) 7. NoSQL databases offer cloud computing and storage, the database like Cassandra is designed to be scaled across multiple data centers out of the box without hassle. g) History and Evolution of New SQL Databases With the increasing demand for financial enterprise systems and transactions from multiple applications, there is a need for new database systems (NewSQL), the term NewSQL was coined by Matthew Aslett in 2011 paper. NewSQL is a technology that aims at making current relational SQL more scalable, elegant, easier to learn, well defined and consistent, that provides scalable performance for OLTP of NoSQL while maintaining the ACID properties of traditional database systems or SQL database system [7]. h) Highlights of New SQL Databases

Since the SQL and NoSQL databases are unable to cope with huge amounts of complex data while maintaining important features of traditional database systems which in turn requires more scalability, availability and improved performance, this is overcome by the introduction of NewSQL database. The internal design of the data structures used by NewSQL databases differs from those used in relational databases such as use of main memory for storage and showing consistency over availability [15]. The feature comparison of NewSQL databases is shown in Table 6.

Table 6. Features Comparison of the NewSQL Databases [16-18]

NewSQL databases Features VoltDB ClustrixDB NuoDB Category New Architecture New Architecture New Architecture License Open source Commercial Commercial Language Java and C++ C/C++ C++ Partitioning Sharding Yes Dynamic storage of data, read/ write cached on the nodes Replication Master/Master or Master/Master or Yes Master/Slave Master/Slave Consistency Strong Strong Immediate Concurrency No, single-threaded MVCC+2PL Yes, Multi-version concurrency Control model control (MVCC) Storage type Yes, In Memory No Yes, In Main Memory Developer / VoltDB Inc/2010 MariaDB Corporation AB NuoDB, Inc./ 2013 Release /2006 i) Advantages of NewSQL Databases [16]

1. It offers the best of both SQL and NoSQL databases such as ACID transactional support of SQL databases and scalability and speed of NoSQL 2. An architecture can handle complex data 3. Transparent sharding middleware, as Database-as-a-Service (DBaaS) - It has cloud support and can also be used for OLTP. 4. It supports SQL but query complexity is very high. 5. The required information is provided by a single host through partitioning and replication

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [29]

International Research Journal of Modernization in Engineering Technology and Science Volume: 01/Issue: 01/December-2019 www.irjmets.com

III. METHODOLOGY a) Comparison of Three SQL Databases

The features comparison between SQL, NoSQL and NewSQL databases is shown in Table 7. The comparison of strength and weakness of SQL, NoSQL, and NewSQL databases shows their complete each other and have their own roles in storage and analysis of the data (Table 8).

Table 7. Features Comparison of SQL, NoSQL, and NewSQL databases [7, 19]

Features SQL NoSQL New SQL Database type /SQL Relational/ Yes Non-relation/ No Relational/ Yes supported Data Storage Medium to large Large Large Data store orientation Row-oriented Column-oriented Row and Column Data type Relational (structured data) Non-relational Both (unstructured data) Data structure for Predefined Not necessary Not necessary storing (Schema) Scalability With additional hardware Unlimited data Unlimited data ACID transactions Supported Not Supported Supported OLTP support Not fully Supported Fully Supported Data integrity Maintained Compromised with a Fully support large dataset. Cost High Cost Low cost Low cost Platforms Amazon, Hana, IBM DB2, Redis, Neo4, Apache Trafodion, Altibuse, MapR, MySQL, Oracle RDBMS, Hypergraph DB, ClusterixDB, CockroachDB, PostgreSQL, SQL server, Sybase Mongo DB and MemSQL, VoltDB, NuoDB, and CouchDB, HBase, TIBCO ActiveSpaces. Cassandra

Table 8. Comparison of the Strengths and Weaknesses of Three SQL Databases [20-22]

SQL databases Strengths Weakness MySQL: • Freely available • Incremental backups creation Robust database • Variety of user interfaces can be implemented • Requires a lot of time management tool for • Works with Oracle and IBM DB2 • OLAP and XML are not budget organization supported PostgreSQL: • Scalable and handles terabytes of data. • Spotty documentation, need For limited budget • JSON supported to figure out how to do it. organizations, the ability to • Presence of predefined functions • Confusing configuration use JSON and select their • Availability of a number of interfaces. • Speed reduces due to heavy interface read queries. SAP HANA: • Supports SQL, OLAP, and OLTP. • The high cost of licensing Organizations with a • Memory stores reduce access times. • Newcomer constrained budgets and • Availability of inventory management and pulling data for application real-time reporting purposes. • Works with a number of other applications. NoSQL databases MongoDB • Easy and Fast to use • Availability of translation For applications that use • JSON and NoSQL are supported. tools for SQL to MongoDB both structured and • Able to handle, easy and quick access of any • Not secured default settings unstructured data. data structure • Schema can be written without downtime. www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [30]

International Research Journal of Modernization in Engineering Technology and Science Volume: 01/Issue: 01/December-2019 www.irjmets.com

CouchDB:uses the same • Easy to use, • Cannot handle high amount file notation format, JSON • REST oriented interface of documents DB2: • The enormous databases can be made by Blu • High cost To retrievemost of the Acceleration • Clusters or multiple available resources and • Support clouding secondary nodes can be handle large databases. • Task scheduler can run multiple jobs made with third-party tools • Error codes and exit codes are available • Three years of basic support NewSQL databases CockroachDB • Freely available • Performance is lower than For high sped transactions • Provides a graphical interface for statistics NuoDB and for building great apps and data analysis • Do not have a load balancer • Performs better for reading operations for the distribution of load between available nodes. NuoDB • Perform good for write and join operations • Not freely available Enterprise-class tool for • Provides load balancer • No graphical interface security, backup and • Performance decreases with administration support increase in workload MemSQL • High performance for billions of queries • Not freely available For real-time transactions • Real-time data analysis • No scalability beyond a single and analytics • Supports MySQL wire protocol node • Supports Geospatial indexes

b) Comparison of SQL Databases Based On Analysis

Some of the analysis performed to compare SQL and NoSQL includes Create, Read, Update and Delete (CRUD) for small and large datasets. The results show that MySQL is better for small datasets, MySQL and MongoDB show minor difference for large datasets, MongoDB performs significantly for Insertion time, non- relational or significant gaps in the data, and complex queries involving multiple joints. Based on this one can decide which database can be used for the given dataset [23]. In another study, the evaluation between MySQL and MongoDB on hypermarket applications has been performed based on the execution time. The results show that for smaller datasets both MySQL and MongoDB performs same execution time. Execution time is reduced in case of MongoDB with respect to increased number of records/clients. However, when number of records is increased, MongoDB performed better than MySQL [24]. The comparison between SQL (MySQL), NoSQL (MongoDB) and NewSQL (VoltDB) respectively to know how these databases perform with increasing load of diverse and large amounts of data (i.e., Internet of Things, IoT). The good results with respect to the write performance are shown by NewSQL database VoltDB. Similarly, in read-intensive systems,VoltDB performs better followed by MySQL and MongoDB. MongoDB showed progress in performance with the size of the data and increased number of clients compared to MySQL. The overall results showed that VoltDB performs better for IoT data [25].

c) Application of SQL Databases in Big Data Healthcare : The digitization of hard copy of patient data to electronic health records (EHRs) or Electronic medical records (EMRs) has been the important technologies in healthcare. The most commonly used database in healthcare is OLTP (online transaction processing) which is in the form of relational database. The OLTP database structure related to healthcare consists of EHRs (physician’s written prescriptions and notes or research lab reports). To access this data, the SQL will be used. A large amount of data such as semi-structured and unstructured data are produced by health care domain such as electrophysiological (EEG) data are stored in Cloudwave which is a Hadoop based data processing module. HBase is used to store time-series data of clinical sensors, the readability, and accessibility of the HBase data schema is stored in MongoDB. In China, to analyze the huge www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [31]

International Research Journal of Modernization in Engineering Technology and Science Volume: 01/Issue: 01/December-2019 www.irjmets.com

amount of online heart disease data the Hadoop/HBase infrastructure and hybrid XML database were used [26]. Similarly the advancement of NewSQL database can be used for medical data mining. Education : Educational institutions need to develop and deploy a student database system that deals with many applications such as student assessments, tuition payment processing. This heterogeneous data sources can be viewed as unified data source through Integrated Information Systems (IIS). With the application of SQL the IIS can be changed as per organization requirements [27]. Now a day’s internet technology has made the student get information on articles/publications, e-books, audio, and video. To study the student academic performance based on different structured datasets, big data analytics tools such as NoSQL (MapReduce) plays a role. The increase in participants and modules of e-learning portals makes the slow performance of standard analytical tools for analysis, in order to overcome this NoSQL (Hadoop, Cassandra, and MongoDB) provides a basis to extract relevant patterns for educational data analysis [28]. Transportation : The huge data produced by the transport industry provides information about vehicle movements and passengers, to know most demand routes, weather affecting transport system, bad road conditions or real-time incidents through GPS system, etc. The performance and departure of public transport vehicles depend on the network and time table. A network of 25 cities public transport network has been analyzed with SQLite databases [29]. The intelligent transport systems (ITS) is a collection of information from maps, vehicles, time table and stops, and uses classical relational databases to analyze the data. The task of finding optimal travel routes connecting two stops can be through graph databases (NoSQL) thus reducing the travel time [30].

IV. DISCUSSION

Both SQL as relational and NoSQL as non-relational databases perform different operations and each of them has their own role in data storage in them. Along with benefits, some SQL versions have disadvantages such as difficult interface, partial control, implementation, and cost, whereas NoSQL is less mature, demand lot of technical skills with both installation and maintenance, ACID transactions are not performed by most NoSQL databases, some version are not good at sharding, no secure default settings, etc. Some of these disadvantages are taken up by NewSQL which is a modern RDMBS that provides scalable performance of NoSQL for read- write workloads or online transaction processing (OLTP) while maintaining the ACID properties of SQL database system. The data structures used by New SQL databases differ in terms of their internal design, but all of them are RDBMSs that run on SQL. They use new information to ingest into SQL and many transactions are executed at the same time, also modify the content of the database. Thus the advancements in each database's versions and NewSQL database has overcome some of the disadvantages. So it is clear in near future more advancements could change the architecture of the databases.

V. CONCLUSION

In this paper, the general comparison between traditional and big data has been explained. The general background of various technologies such as storage, analytical and visualization tools with respect to SQL databases which are used to handle the bigdata and its management system has been covered. A brief history, evolution, characteristic features, comparison, strengths and weakness, and performance of three SQL databases has been explained. Finally, application of SQL databases in the view of health care, education and transportation are illustrated. Apart, of having many challenges related to data searching, migration, query mechanisms, and security issues among three SQL databases. These databases are still providing an opportunity to analyze big data. A bit glimpse of the literature review and the future of the databases has been mentioned in this paper.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [32]

International Research Journal of Modernization in Engineering Technology and Science Volume: 01/Issue: 01/December-2019 www.irjmets.com

VI. REFERENCES

[1] J. Li, Z. Xu, Y. Jiang, R. Zhang, “The Overview of Big Data Storage and Management,” International Conference on Cognitive Informatics and Cognitive Computing, pp. 510-513, 2014. [2] A.Agrahari, Rao. DTVD, “A Review Paper on Big Data: Technologies, Tools and Trends,” International Research Journal of Engineering and Technology, vol. 4, no.10, pp. 640-649, Oct -2017 [3] Z. Nabeel, A.H Abdullah, M.K. Sufian, “Cloud Computing and Big Data is there a Relation between the Two: A Study,” International Journal of Applied Engineering Research, vol. 12, no. 17 pp. 6970-6982, 2017. [4] U. Sivarajah, M.K. Muhammad, I. Zahir, W. Vishanth, “Critical analysis of Big Data challenges and analytical methods,” Journal of Business Research. Vol. 70, pp. 263-286, January 2017. [5] R.A Rishika, K.P. Suresh, “Predictive Big Data Analytics in Healthcare”. Second International Conference on Computational Intelligence & Communication Technology, pp.623-626, 2016. [6] R.A. Ramadan. “Big Data Tools-An Overview,” International Journal of Computer Software Engineering, vol.125, no.2, pp. 2-15, 2017. [7] A. Almassabi, O. Bawazeer, S. Adam, “Top NewSQL Databases and Features Classification”, International Journal of Database Management Systems (IJDMS) Vol.10, No.2, PP.11-31, 2018. [8] D.D. Chamberlin, “Early History of SQL,” IEEE Annals of the History of Computing. Vol. 43, no.4, pp. 78-82, 2012. [9] https://www.mssqltips.com/sqlservertip/5745/compare-sql-server-mysql-and-postgresql-features/ [10] https://db-engines.com/en/system/Microsoft+SQL+Server%3BMySQL%3BPostgreSQL [11] D. Chamberlin, “Encyclopedia of Database Systems”, 2009 [12] J. Kurpanik, M. Pańkowska, “NoSQL problem literature review”, Studia Ekonomiczne, vol. 234, pp. 80- 100, 2015. [13] J.K Chen, & W.Z Lee, “An Introduction of NoSQL Databases Based on Their Categories and Application Industries”. Algorithms, 12. 106. Pp.1-16, 2019. [14] Gupta, S. Tyagi N. Panwar, S. Sachdeva, “NoSQL databases: critical analysis and comparison,” In Paper presented at the 2017 international conference on computing and communication technologies for smart nation (IC3TSN), Gurgaon [15] Duggirala, Siddhartha, “NewSQL Databases and Scalable In-Memory Analytics”, Advances in Computers, Volume 109, 2018, Pages 49-76, 2018 [16] A. Pavlo, M. Aslett, "What's really new with NewSQL?", SIGMOD Record, Vol. 45, No. 2, pp.45-55, 2016. [17] https://db-engines.com/en/systems [18] K. Kaur and M. Sachdeva, "Performance evaluation of NewSQL databases," 2017 International Conference on Inventive Systems and Control (ICISC), Coimbatore, pp. 1-5, 2017DOI:10.1109/ICISC.2017.8068585. [19] https://www.whizlabs.com/blog/nosql-vs-sql/ [20] https://www.keycdn.com/blog/popular-databases [21] https://www.agilelab.it/newsql-the-new-era-of-relational-databases/ [22] https://www.quora.com/How-does-MemSQL-compare-to-VoltDB [23] A. Rajat, M. Sumeet, C. Rahul, C. Siddhant, B. Navdeep, “A comprehensive comparison of SQL and MongoDB databases,” International Journal of Scientific and Research Publications, vol.5, no. 2, pp. 1-2, 2015. [24] D.B Dipina, S. Shirin, M.V. Surekha “Performance evaluation of MySQL and MongoDB database,” International Journal on Cybernetics & Informatics, Vol. 5, no. 2, pp.387-394, 2016. [25] H. Fatima, K. Wasnik, "Comparison of SQL, NoSQL and NewSQL databases for internet of things,” IEEE Bombay Section Symposium, pp. 1-6, 2016. [26] J. Luo J, M. Wu, D. Gopukumar, Y. Zhao, “Big Data Application in Biomedical Research and Health Care: A Literature Review,” Biomed Inform Insights. vol.19 no.8, pp.1-10, 2016.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [33]

International Research Journal of Modernization in Engineering Technology and Science Volume: 01/Issue: 01/December-2019 www.irjmets.com

[27] K.T. Wisdom, K.A. Wisdom, “Student Database System for Higher Education: A Case Study at School of Public Health, University of Ghana,” American Journal of Software Engineering and Applications. vol.4, no.2, pp. 23-34, 2015. [28] T.W Jyotsna, “Discovering Big Data Modelling for Educational World,” Procedia-Social and Behavioral Sciences, vol.176, pp.642 -649, 2015. [29] K. Rainer, W. Christoffer, K.D. Richard, N. M Miloš, S. Jari, “A collection of public transport network data sets for 25 cities” Scientific Data, vol. 5, no. 180089, pp.1-14, 2018. [30] A. Czerepicki, “Application of graph databases for transport purposes,” Bulletin of the polish academy of sciences technical sciences, vol. 64, no. 3, pp.457-466, 2016.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science [34]