algorithms

Article An Introduction of NoSQL Based on Their † Categories and Application Industries

Jeang-Kuo Chen * and Wei-Zhe Lee Department of Information Management, Chaoyang University of Technology, Taichung 41349, Taiwan; [email protected] * Correspondence: [email protected] This Paper is an Extended Version of the Conference Paper (ID: 1070) in Taichung, Taiwan, † 6–8 December 2018, IS3C2018.  Received: 31 January 2019; Accepted: 9 May 2019; Published: 16 May 2019  Abstract: The popularization of big data makes the enterprise need to store more and more data. The data in the enterprise’s must be accessed as fast as possible, but the Relational Database (RDB) has the speed limitation due to the join operation. Many enterprises have changed to use a NoSQL database, which can meet the requirement of fast data access. However, there are more than hundreds of NoSQL databases. It is important to select a suitable NoSQL database for a certain enterprise because this decision will affect the performance of the enterprise operations. In this paper, fifteen categories of NoSQL databases will be introduced to find out the characteristics of every category. Some principles and examples are proposed to choose an appropriate NoSQL database for different industries.

Keywords: big data; NoSQL database; RDB; HBase; MongoDB; Neo4j

1. Introduction The Relational Database (RDB) was developed from the 1970s to present. Through a powerful Relational Database Management System (RDBMS), RDB is easy to use and maintain, and becomes a widely used kind of database [1]. Due to the popularization of big data acquisition technologies and applications, enterprises need to store more data than ever before. The enterprise’s database is desired to be accessed as fast as possible. To obtain complex information from multiple relations, RDB sometimes needs to perform SQL join operations to merge two or more relations at the same time, which can lead to performance bottlenecks. Besides, except the relational data storage format, other data storage formats have been proposed in many applications, such as key-value pairs, document-oriented, time series, etc. As a result, more and more enterprises have decided to use NoSQL databases to store big data [2–4]. However, there are more than 225 NoSQL databases [2]. How to choose an appropriate NoSQL database for a specific enterprise is very important because the change of database may affect the enterprise performance of the business operations. This paper introduces basic concepts, compares the data formats and features, and lists some actual products for every category of NoSQL databases. In addition, this paper also proposes principles and key points for different types of enterprises to choose an appropriate NoSQL database to solve the business problems and challenges.

2. Related Work

2.1. Relational (RDM) Developed by E.F. Codd in the 1970s, the elements of RDM [1] contain , integrity constraints, and so on. The details are described as follows. An RDB is a collection of relations, which

Algorithms 2019, 12, 106; doi:10.3390/a12050106 www.mdpi.com/journal/algorithms Algorithms 2019, 12, 106 2 of 17 are the data structures of RDM. A relation is a two-dimensional, normalized table to organize data. Each relation consists of two parts, relation schema and relation instances, where relation schema includes the name of the relation, the names of the attributes in the relation and their domains; relation instances refer to the data records stored in the relation at a specific time. Table1 shows an example of a relation as follows:

1. The name of this relation is Students; 2. The names of the attributes in this relation are SID, name, telephone, and birthday, respectively; 3. The domain of each attribute is a collection of acceptable data values for the attribute, for example, the acceptable data value of attribute birthday is date; 4. There are five data records in this relation.

Table 1. An example of a relation.

Table Name: Students Birthday: Date SID: Char(5) Name: Char(10) Telephone: Char(11) (dd/mm/yy) S0001 Alice 05-65976597 10/4/1994 S0002 Dora 06-45714571 15/5/1995 S0003 Ella 07-57865869 20/6/1996 S0004 Kevin 06-57995611 22/7/1997 S0005 Leo 05-12899821 26/8/1998

As part of an RDB design, integrity constraints are used to check the correctness of the data input into the database. Integrity constraints not only prevent authorized users from storing illegal data into the database but also avoid data inconsistency between relations. There are four common kinds of integrity constraints described as follows:

1. Key constraint: A relation must have a unique and minimal primary key; 2. Domain constraint: An attribute value of a relation must be an atomic one belonging to the corresponding domain of the attribute; 3. Entity integrity constraint: Some principles for a primary key; 4. Referential integrity constraint: Some principles for a foreign key.

2.2. Entity-Relationship Model (ER-Model) As a tool for database analysis and design, the entity-relationship model (ER-model) [1] uses the geometries of entities, relationships, and attributes to show the blueprint of a conceptual database. An entity is an object recognized from the real world such as people, event, product, supplier, and so on, while a relationship refers to the relationship between two or more entities. An attribute is used to represent a feature of an entity or a relationship. A database design diagram drawn with the ER-Model is called the entity-relationship diagram (ERD). The common geometries of the ER-Model elements are shown in Table2. AlgorithmsAlgorithms 20192019,,, 1212,,, xxx FORFORFOR PEERPEERPEER REVIEWREVIEWREVIEW 22 ofof 1616

DDevelopedeveloped byby E.F.E.F. CoddCodd inin thethe 1970s,1970s, thethe elementselements ofof RDMRDM [1][1] ccontainontain datadata structure,structure, integrityintegrity constraints,constraints, and and so so on. on. The The details details are are describeddescribed asas followsfollows.. AnAn RDB RDB is is a a collection collection of of rrelationselations,, whichwhich are are the the data data structures structures of of RDM RDM.. AA rrelationelation is is a a twotwo--dimensionaldimensional,, normalized normalized tabletable to to organizeorganize data. data. Each Each rrelationelation consists consists of of two two partparts,s, rrelationelation scschemahema a andnd rrelationelation iinstancenstancess,, where where rrelationelation sschemachema incluincludesdes thethe namename ofof thethe rrelation,elation, thethe namenamess ofof thethe attributesattributes inin the the rrelationelation andand theirtheir domaindomainss;; rrelationelation iinstancenstancess referrefer toto thethe datadata recordsrecords storedstored inin thethe rrelationelation atat aa specificspecific time.time. TableTable 11 showsshows anan exampleexample ofof aa rrelationelation asas follows:follows: 1.1. TheThe namename ofof thisthis rrelatioelationn isis StudentsStudents;; 2.2. TheThe namenamess ofof thethe attributesattributes inin thisthis rrelationelation areare SIDSID,, nnameame,, ttelephoneelephone,, andand bbirthdayirthday,, respectivelyrespectively;; 3.3. TheThe ddomainomain ofof eacheach attributeattribute isis aa collection collection of of acceptable acceptable datadata valuesvalues for for the the attribute, attribute, for for example,example, thethe acceptableacceptable datdataa valuevalue ofof attributeattribute bbirthdayirthday isis date;date; 4.4. TTherehere areare fivefive datadata recordsrecords inin tthishis rrelationelation..

TableTable 1.1. AnAn eexamplexample ofof aa rrelationelation..

TableTable Name:Name: StudentsStudents SID:SID: CChar(har(55)) Name:Name: CChar(10)har(10) Telephone:Telephone: CChar(1har(111)) BBirthday:irthday: DDateate S00S000101 AliceAlice 0055--6659759765976597 1991994/4/104/4/10 SS00002002 DoraDora 0066--4457145715714571 19919955//55//1515 SS00000303 EllaElla 0077--5785786658586699 19919966//6/206/20 S0004S0004 KevinKevin 0066--5757999955611611 1997/7/221997/7/22 S0005S0005 LeoLeo 0505--1289982112899821 1998/1998/88//2626

AsAs ppartart ofof anan RDRDBB designdesign,, iintegrityntegrity cconstraintsonstraints areare usedused toto checkcheck thethe correctnesscorrectness ofof thethe datadata inputinput inintoto thethe database.database. IInntegtegrityrity ccoonstraintsnstraints notnot onlyonly preventprevent authorizedauthorized usersusers fromfrom storingstoring illegalillegal datadata inintoto thethe databasedatabase butbut alsoalso avoidavoid datadata inconsistencinconsistencyy betweenbetween rrelations.elations. ThereThere areare fourfour commoncommon kindskinds ofof iintegrityntegrity cconstraintsonstraints describedescribedd asas follows:follows: 1.1. KeyKey cconstraint:onstraint: AA rrelationelation mumustst havehave aa uniqueunique andand minimalminimal primaryprimary keykey;; 2.2. DomainDomain cconstraint:onstraint: AAnn attributeattribute value value of of aa rrelationelation must must be be an an atomi atomicc oneone belongingbelonging to to the the correspondingcorresponding domaindomain ofof thethe attribute;attribute; 3.3. EntityEntity iintegrityntegrity cconstraintonstraint:: SSomeome principlesprinciples forfor aa primaryprimary keykey;; 4.4. ReferentialReferential iintegritntegrityy cconstraintonstraint:: SSomeome principlesprinciples forfor aa foreignforeign key.key.

2.2.2.2. EntityEntity--RelationshipRelationship ModelModel ((ERER--ModelModel)) AsAs aa tooltool forfor databasedatabase analysisanalysis andand designdesign,, tthehe eentityntity--rrelationshipelationship mmodelodel (ER(ER--mmodel)odel) [1][1] usesuses thethe geogeometriesmetries ofof entities,entities, relationshipsrelationships,, andand attributattributeses to to showshow tthehe blueprint blueprint of of a a conceptual conceptual database.database. An An entityentity is is anan object object recognizedrecognized from from the the real real world world such such as as people, people, eventevent,, product, product, supplier,supplier, andand soso onon,, whilewhile aa relationshiprelationship refersrefers toto thethe relationshiprelationship betweenbetween twotwo oror moremore entities.entities. AnAn attributeattribute is is used used to to represen representt a a feature feature of of an an entity entity or or a a relationship. relationship. AA databasedatabase design design diagram diagram Algorithmsdrawndrawn 2019withwith, 12 thethe, 106 ERER--ModelModel isis calledcalled thethe eentityntity--rrelationshipelationship ddiagramiagram (ERD)(ERD).. TThehe commoncommon geogeometriesmetries3 of 17 ofof thethe ERER--ModelModel elementselements areare shownshown inin TableTable 2.2.

Table 2. TableTable 2.2. TheThe common ccommonommon geometries ggeoeometriemetrie ofss theofof thethe ER-model. ERER--mmododelel..

ER-ModelERER--ModelModel Elements ElementElementss SS Symbolsymbolsymbols

EntityEntityEntity

WeakWeakWeak Entity EntityEntity

Relationship-Entity Relationship-EntityRelationshipRelationship--EntityEntity Algorithms 2019, 12, x FOR PEER REVIEW(Bridge Entity) 3 of 16 Algorithms 2019, 12, x FOR PEER (BridgeREVIEW(Bridge Entity) Entity) 3 of 16 Algorithms 2019, 12, x FOR PEER REVIEW 3 of 16 Algorithms 2019,, 12,, xx FORFOR PEERPEER REVIEWREVIEW 3 of 16 Algorithms 2019, 12, x FOR PEER REVIEW 3 of 16 RelationshipRelationship Algorithms 2019, 12, x FOR PEER REVIEWR elationship 3 of 16 Relationship Relationship IdentifyingRelationship Relationship IdentifyingIdentifying Relationship Relationship IdentifyingRelationship Relationship Identifying Relationship IdentifyingAttribute Relationship Attribute IdentifyingAttribute Relationship AttributeAttribute AAttributettribute Key Attribute KeyAttribute Attribute Key Attribute Key AttributeKey Attribute KeyKey AAttributettribute

Key Attribute Composite Attribute Composite Attribute Composite Attribute CompositeComposite Attribute Attribute Composite Attribute MultivaluedComposite Attribute Attribute Multivalued Attribute Multivalued Attribute Multivalued Attribute MultivaluedMultivalued Attribute Attribute An ERD example is shown in Figure 1. This is an ERD of a simple school database with four An ERD example is shoMultivaluedwn in Figure Attribute 1. This is an ERD of a simple school database with four An ERD example is shown in Figure 1. This is an ERD of a simple school database with four entities:An Students,ERD example selections is sho, wncourses in Figure, and e1.mployees This is a,n where ERD ofbridge a simple-entity school selections database is convertedwith four entities:AnAn ERDERDStudents, exampleexample selections isis shosho,wn wncourses inin FigureFigure, and 1.1.e mployees ThisThis isis aann, whereERDERD ofof bridge aa simplesimple-entity schoolschool selections databasedatabase is with convertedwith fourfour fromentities:An a ERDmany Students, example-to-many selections is relationship shown, courses in Figure. The, and relationship1. e Thismployees is ans between ERD, where of aentitiesbridge simple- edescribentity school selections database as follows: is with converted four entities:fromAn a many ERDStudents, -exampleto-many selections isrelationship sho,wn, courses in .Figure The,, andand relationship 1. e mployeesThis is as nbetween,, ERD where where of entities bridgea simple- edescribentity school selections asdatabase follows: is with converted four entities:from a many Students,-to-many selections relationship, courses. The, and relationship employeess between, where bridgeentities- edescribentity selections as follows: is converted entities:from a Students, many-to- selections,many relationship courses,.and The employees, relationship wheres between bridge-entity entities describe selections as is follows: converted from entities:fromfrom aa many manyStudents,--toto--manymany selections relationshiprelationship, courses.. TheThe, and relationshiprelationship employeesss between,between where bridge entitiesentities-e describentitydescribe selections asas follows:follows: is converted a many-to-many1. A student can relationship. select many The courses relationships and vice between versa; entities describe as follows: from1. aA many student-to -canmany select relationship many courses. The relationshipand vice versa;s between entities describe as follows: 2.1. An student employee can canselect teach many many courses courses, and butvice a versa;course can only be taught by an employee. 1.2. A nstudent employee can canselect teach many many courses courses, and butvice a versa;course can only be taught by an employee. 1.1. 2. AA student nstudent employee can can select canselect teach many many many courses courses courses, and and vice butvice versa;a versa; course can only be taught by an employee. 2. An employee can teach many courses, but a course can only be taught by an employee.. 2.1.2. AnAA nstudent employee employee can can canselect teach teach many many many courses courses, courses, and but butvice a a courseversa; course can can only only be be taught taught by by an an employee. employee. 2. An employee can teach manytitle courses,credits but a courseEID can onlyname be taught by an employee. title credits EID name c_no title credits EID name c_no titletitle credits EID name c_no title credits officeEID name salary office salary c_no title EIDoffice name salary c_no credits salary N office1 salary c_no Courses N teaches office1 Employees Courses N teaches 1 Employees salary Courses N teaches office1 Employees Courses N teachteaches 1 Employees remark Courses N teaches Employees remark classroom position1 ext remark Courses classroomN teaches positionEmployees ext remark classroom position ext remark time classroom position ext time M classroom SID position name ext remark time M SID name timetime M classroom SIDposition name ext time M SID SelectionsM N SID name address time Selections N Students name address SelectionsM N SIDStudents address Selectionstions N Students name address Selections N Students address Students grades address telephone gradesSelec tions N e-mail parent_name telephone grades e-mail Studentsparent_name telephone grades e-mail parent_name telephonetelephone grades e-mail parent_name telephone e-mail parent_name grades telephone e-mail parent_name Figure 1. An entity-relationship diagram (ERD) example. Figure 1. An entity-relationship diagram (ERD) example. Figure 1. An entity-relationship diagram (ERD) example. FigureFigure 1.1. AnAn entity-relationshipentity-relationship diagram diagram (ERD) (ERD example.) example.. 2.3. Big Data Figure 1. An entity-relationship diagram (ERD) example. 2.3. Big Data 2.3.2.3. Big Big Data Data Figure 1. An entity-relationship diagram (ERD) example. 2.3. BigWhat Data is big data? Different ages have different answers. Today, big data refers to materials that 2.3. BigWhat Data is big data? Different ages have different answers. Today, big data refers to materials that WhatWhat is is big big data? data? Di Differentfferent ages ages have have di differentfferent answers. answers. Today, Today, big big data data refers refers to to materials materials that that 2.3.are BigdifficultWhat Data is to big store data? in DifferentRDBs and ages cannot have be different processed answers. by stand Today,-alone big data data analysis refers to andmaterials statistical that are difficultWhatWhat isis bigtobig store data?data? in DifferentDifferent RDBs and agesages cannot havehave differentbedifferent processed answers.answers. by stand Today,Today,-alone bigbig datadata refersanalysisrefers toto materialsandmaterials statistical thatthat aretools.are di ffidifficult cultThis to data to store store needs in RDBsin toRDB be and sstored and cannot cannot in be a processedlarge be processed parallel by stand-alone systemby stand with-alone data tensanalysis data or hundredsanalysis and statistical and of machines,statistical tools. aretools. difficultWhat This is data bigto store data?needs in Different toRDB bes storedand ages cannot havein a large differentbe processed parallel answers. systemby stand Today, with-alone bigtens datadata or refershundredsanalysis to materialsand of machines,statistical that aretools. difficult This data to store needs in toRDB bes storedand cannot in a large be processed parallel systemby stand with-alone tens data or hundredsanalysis and of machines,statistical whiletools. theThis NoSQL data needs database to be systemstored in just a haslarge these parallel features, system suitable with tens for or storing hundreds big data of machines, and can aretools.tools.while difficult ThisThis the NoSQLdata datato store needsneeds database in toRDBto bebes stored systemstoredand cannot in in just aa largelargebe has processed these parallelparallel features, systembysystem stand suitable withwith-alone tenstens fordata or or storing hundredsanalysishundreds big and dataofof machines,statisticalmachines, and can quicklywhile the access NoSQL data database for various system application just has process these ing features,. Big data suitable can appl for storingy to all bigareas data of daily and canlife whilequickly the access NoSQL data database for various system application just has process these ing features,. Big data suitable can appl for storingy to all bigareas data of daily and can life tools.whilequickly This the access NoSQLdata data needs database for to various be systemstored application in just a large has process theseparallel ing features, .system Big data suitable with can tens appl for or storingy hundredsto all bigareas dataof of machines, daily and canlife (quicklysuch as access social data network for variousing, e -applicationcommerce, process etc.) anding . scientificBig data can research apply (tosuch all areas as astronomical of daily life whilequicklyquickly(such the as accessaccess NoSQLsocial datadata network database forfor variousvariousing system, eapplicationapplication-commerce, just has process process these etc.) features,andinging.. BigBig scientific datadata suitable can can research appl forappl storingyy toto (such allall big areas areas as data astronomical ofof daily anddaily can lifelife meteorology,(such as social clinical network medicine,ing, e- commerce, etc.), and the etc.) continued and scientific growth research of data (such has forced as astronomical people to quickly(meteorology,such as access social data clinical network for medicine,variousing,, eapplication e- commerce, etc.), and process the etc.) continued ing and. Big scientific data growth can research ofappl datay to (such hasall areas forced as astronomicalof peopledaily life to (meteorology,such as social clinical network medicine,ing, e- commerce, etc.), and the etc.) continued and scientific growth research of data (such has forced as astronomical people to reconsidermeteorology, the storage clinical and medicine, management etc.), andof data the [3 continued,4]. growth of data has forced people to (meteorology,meteorology,suchreconsider as social the clinical clinicalstorage network medicine,and medicine,ing management, e- commerce, etc.), etc.), and and of data the theetc.) [3 continued continued, and4]. scientific growth growth research of of data data ( such has has forced forced as astronomical people people to to reconsiderThe features the storage of big and data management (i.e., 4V) are of described data [3,4 ].as follows [3]. meteorology,reconsiderThe features the clinical storage of big medicine,and data management (i.e. , etc.), 4V) are and of described data the [3 continued,,4 ].as follows growth [3]. of data has forced people to reconsiderThe features the storage of big and data management (i.e., 4V) are of described data [3,4 ].as follows [3]. reconsider1. VolumeThe features the: I tstorage refers of big to and thedata management large (i.e.-,, scale 4V) are growth of described data of [3 data,4]. as volumefollows faced[3].. by enterprises. 1. VolumeThe features: It refers of big to data the large (i.e., -4V)scale are growth described of data as followsvolume [3] faced. by enterprises. 1. Volume: It refers to the large-scale growth of data volume faced by enterprises. 2.1. Variety:TheVolume features :I Itt refersrefers of big to data thethe large(i.e. type, -4V)scale of dataare growth described including of data as a follows varietyvolume [3] offaced . text bys, videoenterprises.s, pictures, geographic 1.1.2. VolumeVolumeVariety::: IIIttt refersrefers toto to the thethe largelarge type--scale scale of data growthgrowth including ofof datadata a volumevolume variety facedfaced of text bybys, enterprises.enterprises. videos, pictures, geographic 2. locationVariety: sI, tand refers information to the type generated of data by including sensors .a variety of texts, videos, pictures, geographic 2. Variety:location sI,t and refers information to the type generated of data by including sensors . a variety of texts,, video videos,, pictures, pictures, geographic geographic 1.2. VolumeVariety:location: s I,t and refers refers information to to the the large type generated- scale of data growth by including sensor of datas . a volume variety faced of text bys, enterprises. videos, pictures, geographic 3. Value:location Its , refersand information to the commercial generated value by sensor of thes. data after analysis. In the case of video, for 2.3. Variety:locationlocationValue: IssIt,t, and refersand refers informationinformation to to the typecommercial generatedgenerated of data value by includingby sensorsensor of thes s. a. variety data after of text analysis.s, video Ins , the pictures, case of geographic video, for 3. example,Value: It refersone-hour to the of video, commercial in continuous value of monitoring, the data after the analysis.information In thethat case may of be video, useful for is 3. locationValue:example, sI,t and refersone information-hour to the of video, commercial generated in continuous valueby sensor of monitoring, thes. data after the analysis. information In thethat case may of be video, useful for is 3. Value:example, It refersone-hour to the of video, commercial in continuous value of monitoring, the data after the analysis.information In thethat case may of be video, useful for is example, one-hour of video, in continuous monitoring, the information that may be useful is 3. Value:example,example, It refers one one--hourhour to the of of video, commercialvideo, inin continuouscontinuous value of monitoring,monitoring, the data after thethe analysis. informationinformation In the thatthat case maymay of be be video, usefuluseful for isis example, one-hour of video, in continuous monitoring, the information that may be useful is Algorithms 2019, 12, 106 4 of 17

This data needs to be stored in a large parallel system with tens or hundreds of machines, while the NoSQL database system just has these features, suitable for storing big data and can quickly access data for various application processing. Big data can apply to all areas of daily life (such as social networking, e-commerce, etc.) and scientific research (such as astronomical meteorology, clinical medicine, etc.), and the continued growth of data has forced people to reconsider the storage and management of data [3,4]. The features of big data (i.e., 4V) are described as follows [3].

1. Volume: It refers to the large-scale growth of data volume faced by enterprises. 2. Variety: It refers to the type of data including a variety of texts, videos, pictures, geographic locations, and information generated by sensors. 3. Value: It refers to the commercial value of the data after analysis. In the case of video, for example, one-hour of video, in continuous monitoring, the information that may be useful is only one or two seconds. Therefore, how to refine the value of data more quickly through powerful machine learning algorithms is an important issue of big data. 4. Velocity: It refers to that enterprises not only need to know how to quickly collect data, but also must know how to process, analyze, and pass back the results to users to meet their immediate needs.

2.4. NoSQL Databases The definition of NoSQL can be found on the official website [2] as follows. NoSQL databases are next generation databases mostly addressing some of the points: Being non-relational, distributed, open-source, and horizontally scalable [2]. The original intention of NoSQL development is to become modern web-scale databases. The development began early 2009 and is growing rapidly. Often more characteristics apply to NoSQL databases such as: Schema-free, easy replication support, simple API, eventually consistent/BASE (basically available, soft-state, eventual consistency [3]), a huge amount of data and more. In addition, the misleading term “NoSQL” can also be explained as “Not Only SQL” that means if RDB is suitable to use then use it while if RDB is unsuitable to use then use alternatives [3]. The features of NoSQL databases are described as follows [1–5].

1. Non-relational: NoSQL databases do not use relational database model, neither does support SQL join operations. In addition, unlike RDBs to obtain advanced data through join operations, NoSQL databases do not support join operations, the related data needs to be stored together to improve the speed of data access. 2. Distributed: Data in NoSQL databases is usually stored in different servers and the locations of the stored data are managed by metadata. 3. Open-source: Unlike most RDBs that require a fee to purchase, most NoSQL databases are open source and free to download. 4. Horizontally scalable: Increase or decrease multiple normal servers to meet the data processing capacity of NoSQL database. 5. Schema-free: Unlike RDBs need to define database schema before inserting data, NoSQL databases do not need to do this. Therefore, NoSQL databases can flexibly add data. 6. Easy replication support: NoSQL databases mostly support master-slave replication or peer-to-peer replication, making it easier for NoSQL databases to ensure high availability. 7. Simple API: The NoSQL database provides for network delivery, data collection, etc. for to use, so that programmers do not need to design additional programs to make writing programs easier. 8. BASE is an abbreviation for “basically available, soft-state, and eventual consistency,” and the meanings are described as follows. Algorithms 2019, 12, 106 5 of 17

(1) Basically available: The DB system can execute and always provide services. Some parts of the DB system may have partial failures and the rest of the DB system can continue to operate. Some NoSQL DBs typically keep several copies of specific data on different servers, which allows the DB system to respond to all queries even if few of the servers fail. (2) Soft-state: The DB system does not require a state of strong consistency. Strong consistency means that no matter which replication of a certain data is updated, all later reading operations of the data must be able to obtain the latest information. (3) Eventual consistency: The DB system needs to meet the consistency requirement after a certain time. Sometimes the DB may be in an inconsistent state. For example, some NoSQL DBs keep multiple copies of certain data on multiple servers. However, these copies may be inconsistent in a short time, which may happen when a copy of the data is updated while the other copies continue to have data from the old version. Eventually, the replication mechanism in the NoSQL DB system will update all replicas to be consistent.

According to the statistics of the NoSQL database official website [2], the current number of NoSQL databases has more than 225. Moreover, some NoSQL databases are widely used in many famous enterprises such as Google, Yahoo, Facebook, Twitter, Taobao, Amazon, and so on [3].

2.5. The Survey Papers of NoSQL Databases There are several survey papers discussing NoSQL databases related technologies, features and examples, as well as several factors that affect the applicability of NoSQL databases. The focuses of these papers are described as follows.

(1) Hecht and Jablonski [6] evaluated the relevant technologies of some of the four common NoSQL database categories (i.e., key value store, document Store, wide column store, and graph databases) to assist users in selecting an appropriate NoSQL database. Related technologies include data models, queries, concurrency controls, partitions, and replication. (2) Lourenço et al. [7] compared several quality attributes for several NoSQL databases. The evaluated NoSQL databases contain Aerospike, Cassandra, Couchbase, CouchDB, HBase, MongoDB, and Voldemort, while the quality attributes include availability, consistency, durability, maintainability, read and write performance, recovery time, reliability, robustness, scalability, and stabilization time. (3) Corbellini et al. [8] reviewed the basic concepts of four common categories of NoSQL databases and compared some databases for each category. In addition, this paper also discussed how to select an appropriate NoSQL database from existing databases. The decision-making factors include data analysis, hardware scalability (horizontally scalable and BASE [3,4]), flexibility schema, fast deployment of servers (replication and sharding configuration), distributed technology, etc. (4) Khazaei et al. [9] illustrated the basic concepts of four popular NoSQL database models and evaluated some databases for each model. In this paper, the authors discussed several factors to be considered in order to select an appropriate NoSQL database, such as data model, access patterns, queries, non-functional requirements (including data access performance, replication, partition, horizontally scalable, BASE [3,4], software development and maintenance, etc.). (5) Gessert et al. [10] linked functional requirements, non-functional requirements in the NoSQL database to the used technologies, and provided decision trees to assist users in selecting the appropriate NoSQL database, where:

(a) Functional requirements include sorting, full-text search, and so on; (b) Non-functional requirements include data scalability, elasticity, and so on; () Used technologies include sharding, replication, storage management, and query processing; Algorithms 2019, 12, 106 6 of 17

(d) Evaluated NoSQL databases contain MongoDB, Redis, HBase, Riak, and Cassandra. (6) Davoudian et al. [11] clarified four factors for deciding a suitable NoSQL database, such as data model, consistency model, data partitioning, and CAP theorem, and further explained the available strategies, features, advantages, and disadvantages for them. This is helpful for selecting an appropriate NoSQL database.

3. The Categories of NoSQL Databases According to the classification of the NoSQL database official website [2], there are 15 categories of NoSQL databases such as wide column store, document store, key value store, graph databases, and so on, which are based on different data models. This section will explain the basic concepts of each category of the NoSQL database and analyze the characteristics of the data that each category of the NoSQL database is suitable for processing.

3.1. Wide Column Store This category of NoSQL databases has a complex table schema described as follows [5,12–14].

1. A row key is an identification that has a unique value used to identify a specific record, similar to the primary key of a relation in RDB. 2. A timestamp (abbreviated as ts) is an integer used to identify a specific version of a data value. 3. At least one column family that has the format of “Family: Qualifier = Value,” where “Family” is the name of a column family, “Qualifier” is the name of a column qualifier, and “Value” is a real value of a column qualifier stored in text. 4. The name of a column family need to be defined when the table is created, but the name of a column qualifier does not. 5. Users can find the actual data value through the value of a specific row key, the name of a specific column family, the name of a specific column qualifier, and the value of a specific timestamp.

An example is illustrated as follows. An inventory table of 3C products in a wide column store database is shown in Table3, where:

1. Products_Inventory is the name of the inventory table, which contains two column families, products, and inventory, and has three records with the product codes P001, P002, and P003 as the values of three row keys, respectively;

2. An increasing integer ti (i = 1, 2, ... , 18) is the value of timestamp for each column qualifier when a data value of a column qualifier is inserted into the table; 3. Column family products includes four column qualifiers: Classes, title, descriptions, price, and their data values, for example, are “TV”, “SONY 55 inch 4K OLED Smart Networked TV”, “TBD”, and “24999”, respectively; 4. Column family inventory includes two column qualifiers: Quantity, place, and their data values, for example, are “10” and “1A”, respectively. Algorithms 2019, 12, 106 7 of 17

Table 3. An example of a data table in wide column store.

Table Name: Products_Inventory Row Key ts Column Family Products Column Family Inventory t1 Products: Classes = “TV” Products: Title = “SONY 55 inch 4K OLED Smart t2 Networked TV” P001 t3 Products: Descriptions = “TBD” t4 Products: Price = “24999” t5 Inventory: Quantity = “10” t6 Inventory: Place = “1A” t7 Products: Classes = “Laptop” t8 Products: Title = “ACER SF514 14-inch laptop” t9 Products: Descriptions = “TBD” P002 t10 Products: Price = “31000” t11 Inventory: Quantity = “20” t12 Inventory: Place = “2A” t13 Products: Classes = “Mobile phone” t14 Products: Title = “ZenFone 5Z” t15 Products: Descriptions = “TBD” P003 t16 Products: Price = “5000” t17 Inventory: Quantity = “8” t18 Inventory: Place = “2B”

According to the statistics of the DB-Engines Ranking website [15], Apache Cassandra and Apache HBase are the more widely discussed ones of the wide column store databases.

3.2. Document Store The terms related to the database model of the document store are described below [5].

A collection is a group of documents. The documents within a collection are usually related to the • same subject, such as employees, products, and so on. A document is a set of ordered key-value pairs, where key is a string used to reference a particular • value, and value can be either a string or a document. JSON (JavaScript Object Notation), BSON (Binary JSON), and XML (eXtensible Markup Language) • are formats commonly used to define documents. Embedded documents are documents within documents. An embedded document enables users • to store related data in a single document to improve database performance. Document store databases do not require users to formally specify the structure of documents • prior to adding documents to a collection. Therefore, document databases are called schemaless ones. Application programs should verify rules about the structure of a document.

An example of a collection in a document store database is shown in Figure2. As a JSON file format, this document stores school curriculum data. There are three courses, “Accounting”, “Economics”, and “Computer Science”, in this file. Each course contains four fields, c_no, title, credits, and instructor. Algorithms 2019, 12, x FOR PEER REVIEW 7 of 16

According to the statistics of the DB-Engines Ranking website [15], Apache Cassandra and Apache HBase are the more widely discussed ones of the wide column store databases.

3.2. Document Store The terms related to the database model of the document store are described below [5]. ⚫ A collection is a group of documents. The documents within a collection are usually related to the same subject, such as employees, products, and so on. ⚫ A document is a set of ordered key-value pairs, where key is a string used to reference a particular value, and value can be either a string or a document. ⚫ JSON (JavaScript Object Notation), BSON (Binary JSON), and XML (eXtensible Markup Language) are formats commonly used to define documents. ⚫ Embedded documents are documents within documents. An embedded document enables users to store related data in a single document to improve database performance. ⚫ Document store databases do not require users to formally specify the structure of documents prior to adding documents to a collection. Therefore, document databases are called schemaless ones. Application programs should verify rules about the structure of a document. An example of a collection in a document store database is shown in Figure 2. As a JSON file format, this document stores school curriculum data. There are three courses, “Accounting”, Algorithms“Economics2019”, ,12 and, 106 “Computer Science”, in this file. Each course contains four fields, c_no, 8 title, of 17 credits, and instructor.

{ { "c_no": "C001", "title": "Accounting", "credits": 3, "instructor": "Zoe" }, { "c_no": "C002", "title": "Economics", "credits": 3, "instructor": "Wendy" }, { "c_no": "C003", "title": "Computer Science", "credits": 3, "instructor": "Cathy" } }

Figure 2. An example of a collection in a d documentocument s storetore database.database.

According to to the the statistics statistics of the of DB-Engines the DB-Engines Ranking Ranking Website Website [15], the MongoDB[15], the MongoDB and Couchbase and ServerCouchbase are the Server more are widely the more discussed widely ones discussed of the documentones of the store document databases. store databases.

3.3. Key Value Store The data in this category of NoSQL databases is stored with the format of “Key Value” [5], where The data in this category of NoSQL databases is stored with the format of “Key→ → Value” [5], where 1. Key is a string used to identify a unique value; 2.1. KeyValue is isa string an object used whose to identify value cana unique be a simple value; string, numeric value, or a complex BLOB (binary 2. Valuelarge object),is an object JSON whose object, image, value can audio, be and a simple so on; string, numeric value, or a complex BLOB 3. (Inbinary key value large storeobject), databases, JSON object, operations image, on audio, values and are so derived on; from keys. Users can retrieve, set, and delete a value by a key; 4. A namespace is a logical data structure that can contain any number of key-value pairs.

Suppose that an online shopping website uses a key value store database to store data as shown in Figure3. This database includes several namespaces, such as “products” and “customers” [ 5], where

1. The key in the namespace “Products” is the ID of products, and the value is the details about products; 2. The key in the namespace “Customers” is the ID of customers, and the value is the details about customers. Algorithms 2019, 12, x FOR PEER REVIEW 8 of 16

3. In key value store databases, operations on values are derived from keys. Users can retrieve, set, and delete a value by a key; 4. A namespace is a logical data structure that can contain any number of key-value pairs. Suppose that an online shopping website uses a key value store database to store data as shown in Figure 3. This database includes several namespaces, such as “products” and “customers” [5], where 1. The key in the namespace “Products” is the ID of products, and the value is the details about products; Algorithms2. The2019 k,ey12, in 106 the namespace “Customers” is the ID of customers, and the value is the detail9s ofabout 17 customers.

Products Customers

Key Value Key Value { { "classes": "TV", "username": "Jack", "title": "LG 55 inch 4K LED TV", "telephone": "0939-619997", P001 C001 "price": 32000 "rank": "Normal" } }

{ { "classes": "Laptop", "username": "Cindy", "title": "ASUS FX503VD i7 gaming laptop", "telephone": "0939-519973", P002 C002 "price": 36000 "rank": "Platinum" } }

FigureFigure 3. An3. An example example of of two two namespaces namespace ins ain key a key value value store store database. database.

AccordingAccording to to the the statistics statistics of of the the DB-Engines DB-Engines Ranking Ranking Website Website [15 [],15 both], both Redis Redis and and DynamoDB DynamoDB areare the the more more widely widely discussed discussed ones one ofs theof the key key value value store store databases. databases.

3.4.3.4. Graph Graph Databases Databases TheThe graph graph database database model model (GDM) (GDM) is composedis composed of verticesof vertices and and edges edges [5], [5] where, where

1. 1. AA vertex vertex is is an an entity entity instance, instance, which which is is equivalent equivalent to to a tuplea tuple in in RDM; RDM; 2. 2. AnAn edge edge is is used used to to define define the the relationship relationship between between vertices; vertices; 3. 3. EachEach vertex vertex and and edge edge contains contains any any number number of of attributes attributes that that store store the the actual actual data data value. value. An Oceania airline is illustrated as an example. The airline needs to store flight hours among An Oceania airline is illustrated as an example. The airline needs to store flight hours among some some cities. The data can be stored in a graph database as shown in Figure 4. In this graph database, cities. The data can be stored in a graph database as shown in Figure4. In this graph database, each each vertex contains some data such as nation, city, and A2C_time (time from an airport to a city vertex contains some data such as nation, city, and A2C_time (time from an airport to a city center), Algorithmscenter), 2019 and, 12each, x FOR edge PEER represent REVIEW s the flight duration between two cities [5]. 9 of 16 and each edge represents the flight duration between two cities [5].

Nation: United States City: Hawaii A2C_time: 25 min.

Duration: 13 hr. Duration: 12 hr.

Nation: Australia Nation: New Zealand Duration: 3 hr. City: Sydney City: Auckland P A2C_time: 30 min. A2C_time: 45 min.

Figure 4. An example of data stored in a graph database.

According to the statisticsFigure 4. ofAn the example DB-Engines of data Ranking stored in website a graph [15 d],atabase Neo4J. and FlockDB are the more widely discussed ones of the graph databases. According to the statistics of the DB-Engines Ranking website [15], Neo4J and FlockDB are the 3.5.more Multimodel widely discussed Databases ones of the graph databases. The data format of this category of NoSQL databases contains more than two data formats of 3.5. Multimodel Databases the other categories of NoSQL databases [16]. According to the statistics of the DB-Engines Ranking The data format of this category of NoSQL databases contains more than two data formats of the other categories of NoSQL databases [16]. According to the statistics of the DB-Engines Ranking website [15], OrientDB and ArangoDB are more widely discussed ones of multimodel databases. OrientDB contains the data formats of object database, document store, graph database, and key value store; while ArangoDB contains the data formats of document store, graph database, and key value store [2].

3.6. Object Databases This category of NoSQL databases combines the functions of object-oriented programming languages and traditional databases [1]. A web-based application system, which provides users to order lunch boxes, is illustrated as an example. The data in the object databases are described in the form of a class diagram as shown in Figure 5 [17]. In Figure 5, each rectangle is an object that includes both data items and data processing functions. For example, the object Customers has four data items (account, password, telephone, and e-mail) and two data processing functions (readData() and writeData()). According to the statistics of the DB-Engines Ranking website [15], db4o and Versant are the more widely discussed ones of the object databases.

Customers Orders Lunch_boxes account ID ID password date title telephone price e-mail 1…n n…n readData() readData() readData() writeData() writeData() writeData() deleteData() deleteData() calculateAmount()

Order_details quantity readData() writeData() deleteData()

Figure 5. An example of a class diagram in an object database. Algorithms 2019, 12, x FOR PEER REVIEW 9 of 16

Nation: United States City: Hawaii A2C_time: 25 min.

Duration: 13 hr. Duration: 12 hr.

Nation: Australia Nation: New Zealand Duration: 3 hr. City: Sydney City: Auckland P A2C_time: 30 min. A2C_time: 45 min.

Figure 4. An example of data stored in a graph database.

According to the statistics of the DB-Engines Ranking website [15], Neo4J and FlockDB are the more widely discussed ones of the graph databases.

3.5. Multimodel Databases Algorithms 2019, 12, 106 10 of 17 The data format of this category of NoSQL databases contains more than two data formats of the other categories of NoSQL databases [16]. According to the statistics of the DB-Engines Ranking websitewebsite [15],], OrientDB and ArangoDB are more widely discussed one oness of m multimodelultimodel d databases.atabases. OrientDB containscontains thethe data data formats formats of of object object database, database document, document store, store graph, graph database, database and,key and value key store;value whilestore; while ArangoDB ArangoDB contains contains the data the formats data formats of document of document store, s graphtore, g database,raph database and key, and value key storevalue [s2tore]. [2].

3.6. Object Databases This category of NoSQL databasesdatabases combines the functions of object-oriented object-oriented programming programming languages and traditional databases [[1].1]. A web-basedweb-based application system system,, which provides users to order lunch boxes boxes,, is is illustrated illustrated as as an an example. example. The The data data in inthe the object object database databasess are aredescribed described in the in theform form of a of class a class diagram diagram as as shown shown in in Fig Figureure 55 [1717]. ].In In Fig Figureure 55,, each each rectangle is is an an object object that that includes both data items and data processingprocessing functions. For example, the object Customers has four data items items (account, (account, password, password, telephone, telephone, and e-mail) and e- andmail) two and data two processing data processing functions (readData() functions and(readData() writeData()). and writeData()). According to According the statistics to of the the statistics DB-Engines of the Ranking DB-Engines website Ranking [15], db4o w andebsite Versant [15], aredb4o the and more Versant widely are discussed the more oneswidely of thediscussed object databases.ones of the object databases.

Customers Orders Lunch_boxes account ID ID password date title telephone price e-mail 1…n n…n readData() readData() readData() writeData() writeData() writeData() deleteData() deleteData() calculateAmount()

Order_details quantity readData() writeData() deleteData()

FigureFigure 5. An 5. An example example of of a a c classlass diagramdiagram in in an an object object database. database.

3.7. Grid and Cloud Database Solutions This category of NoSQL databases stores recent access data in random access memory (RAM) and uses grid computing to speed up the time of access data from a database [2]. According to the statistics of the DB-Engines Ranking website [15], Hazelcast and Oracle Coherence are more widely discussed ones of grid and cloud database solutions.

3.8. XML Databases The files stored in this category of NoSQL databases are based on the XML format [18]. An example of a school curriculum file stored in an XML database is shown in Figure6. In this XML file, there are three courses, internet of things, artificial neural network, and big data, which have course numbers (c_no), C001, C002, and C003, credits, 3, 4, and 2, and instructors, Amy, Zoe, and Mary, respectively. According to the statistics of the DB-Engines Ranking website [15], Oracle Berkeley DB and BaseX are the more widely discussed ones of the XML databases. Algorithms 2019, 12, x FOR PEER REVIEW 10 of 16

3.7. Grid and Cloud Database Solutions This category of NoSQL databases stores recent access data in random access memory (RAM) and uses grid computing to speed up the time of access data from a database [2]. According to the statistics of the DB-Engines Ranking website [15], Hazelcast and Oracle Coherence are more widely discussed ones of grid and cloud database solutions.

3.8. XML Databases The files stored in this category of NoSQL databases are based on the XML format [18]. An example of a school curriculum file stored in an XML database is shown in Figure 6. In this XML file, there are three courses, internet of things, artificial neural network, and big data, which have course numbers (c_no), C001, C002, and C003, credits, 3, 4, and 2, and instructors, Amy, Zoe, and AlgorithmsMary, respectively.2019, 12, 106 According to the statistics of the DB-Engines Ranking website [15], Oracle11 of 17 Berkeley DB and BaseX are the more widely discussed ones of the XML databases.

Internet of Things 3 Amy Artificial Neural Network 4 Zoe Big Data 2 Mary

FigureFigure 6. 6.AnAn example example of of a a data data filefile inin an an XML XML database. database.

3.3.9.9. Multidimensional Databases ThThee data in this category of NoSQL database databasess is store storedd in a multidimensional array in order to analyze thethe value value of eachof arrayeach element. array element Suppose. S auppose printing companya printing stores company data in astores multidimensional data in a mdatabaseultidimensional as shown d inatabase Figure as7[ shown19]. The in printing Figure 7 company [19]. The needs printing to analyze company the needs total sales to analyze amount the of printedtotal sales products amount based of printed on three products dimensions: based Products,on three dimensions: branches, and Products, customer branches rank. For, and example, customer the rankcompany. For example, has two branches, the company Taipei has and two Tainan, branches, three Taipei products, and copyTainan paper,, three photo products, paper, copy and poster,paper, pandhoto two paper customer, and p ranks,oster, and platinum two customer member ranks, and normal platinum member. member The and boss normal of the member. printing companyThe boss wantsof the the printing total sales company amount wants of each the branch, total sales each product,amount ofand each each branch, customer ea rank.ch product, According and to each the customerstatisticsAlgorithms of2019 rank. the, 12 DB-Engines,According x FOR PEER REVIEW toRanking the statistics website of [15 the], intersystems DB-Engines cache Ranking and w GT.Mebsite are [15 the], i morentersystems widely11 of 16 cdiscussedache and onesGT.M of are the the multidimensional more widely discussed databases. ones of the multidimensional databases.

Taipei Total sales Total sales Total sales Branch amount amount amount Customer Rank Tainan Total sales Total sales Total sales Platinum member es amount amount amount Normal member Copy paper Photo paper Poster

Products

Figure 7. An example of a three-dimensional array in a multidimensional database. Figure 7. An example of a three-dimensional array in a multidimensional database. 3.10. Multivalue Databases

3.10This. Multivalue category Databases of NoSQL databases is suitable for storing data of multivalued attributes or composite attributes [20]. An example of student data is illustrated in a table of multivalue databases as shown This category of NoSQL databases is suitable for storing data of multivalued attributes or in Table4. The schema of the table is students (SID, name, and society), where name is a composite composite attributes [20]. An example of student data is illustrated in a table of multivalue attribute composed of the two attributes, First_name and Last_name, society is a multivalued attribute. databases as shown in Table 4. The schema of the table is students (SID, name, and society), where There are six records in this data table, the name of each student is divided into two parts to save into name is a composite attribute composed of the two attributes, First_name and Last_name, society is the attributes, First_name and Last_name, respectively, and the attending societies of each student can a multivalued attribute. There are six records in this data table, the name of each student is divided into two parts to save into the attributes, First_name and Last_name, respectively, and the attending societies of each student can have more than one value. According to the statistics of the DB-Engines Ranking website [15], jBASE and Model 204 Database are the more widely discussed ones of the multivalue databases.

Table 4. An example of a data table in a multivalue database.

Table Name: Students Name SID Society First_name Last_name S001 Frank Lee {Pop music, Choir} S002 George Lu {Choir, Poetry} S003 Hank Wu {Computer, Guitar} S004 Ivy Lin {Computer, Pop music} S005 Jack Chen {Pop music, Guitar} S006 Kevin Yang {Computer, Poetry}

3.11. Event Sourcing This category of NoSQL databases is suitable for storing events that occurred in the past in order to track the status of a specific event. An example about a lecture registration system to store the data in an event sourcing database is shown in Table 5. In this table, the first two fields, time and person, can be considered as an event, and the last field current enrolment number is used to track the number of people currently enrolled in the lecture [21]. According to the statistics of the DB-Engines Ranking website [15], event store is the most widely discussed one of the event sourcing databases.

Table 5. An example of a data table in an event sourcing database.

Time Person Current Enrolment Number 2018/12/22 12:30 Amy 1 2018/12/25 10:40 Ruby 2 2018/12/28 13:20 Cindy 3 2018/12/29 14:10 John 4 2018/12/30 15:00 Mary 5 Algorithms 2019, 12, 106 12 of 17 have more than one value. According to the statistics of the DB-Engines Ranking website [15], jBASE and Model 204 Database are the more widely discussed ones of the multivalue databases.

Table 4. An example of a data table in a multivalue database.

Table Name: Students Name SID Society First_Name Last_Name S001 Frank Lee {Pop music, Choir} S002 George Lu {Choir, Poetry} S003 Hank Wu {Computer, Guitar} S004 Ivy Lin {Computer, Pop music} S005 Jack Chen {Pop music, Guitar} S006 Kevin Yang {Computer, Poetry}

3.11. Event Sourcing This category of NoSQL databases is suitable for storing events that occurred in the past in order to track the status of a specific event. An example about a lecture registration system to store the data in an event sourcing database is shown in Table5. In this table, the first two fields, time and person, can be considered as an event, and the last field current enrolment number is used to track the number of people currently enrolled in the lecture [21]. According to the statistics of the DB-Engines Ranking website [15], event store is the most widely discussed one of the event sourcing databases.

Table 5. An example of a data table in an event sourcing database.

Time Person Current Enrolment Number (dd/mm/yy) 22/12/2018 12:30 Amy 1 25/12/2018 10:40 Ruby 2 28/12/2018 13:20 Cindy 3 29/12/2018 14:10 John 4 30/12/2018 15:00 Mary 5 31/12/2018 16:50 Zoe 6

3.12. Time Series Databases (TSDBs) This category of NoSQL databases is designed to handle time series data [22,23]. An example of air quality data is illustrated as follows. Assume that an observing station measures the air quality index (AQI) and the density of PM2.5 once an hour and transmits the measurement result to a time series database (TSDB), and the results in 2018 are shown in Table6[ 24]. According to the statistics of the DB-Engines Ranking website [15], Informix Time Series Solution and influxdata are the more widely discussed ones of the TSDBs. Algorithms 2019, 12, 106 13 of 17

Table 6. An example of a data table in a time series database (TSDB).

Measurement Time Air Quality Index (AQI) The Density of PM2.5 (dd/mm/yy) 01/01/2018 00:00 156 45 01/01/2018 01:00 101 29 01/01/2018 02:00 97 19 ...... 31/12/2018 21:00 133 34 31/12/2018 22:00 135 36 31/12/2018 23:00 141 43

3.13. Scientific and Specialized DBs This category of NoSQL databases is designed to solve scientific and professional issues. For example, BayesDB allows users who have not been statistically trained to solve basic science problems, and GPUdb is a database suitable for distributed computing [2].

3.14. Other NoSQL Related Databases The NoSQL databases in this category seem to be able to be categorized into several other categories mentioned earlier, but the official website of NoSQL database [2] categorizes them into this special category without giving any explanation for the characteristics of this category of NoSQL databases. Therefore, we have no way to know why this category is needed and the reasons why these NoSQL databases are assigned to this category. According to the statistics of the DB-Engines Ranking website [15], eXtremeDB is the most widely discussed one of other NoSQL related databases.

3.15. Unresolved and Uncategorized Any NoSQL database will be assigned to this category of NoSQL databases if it cannot be classified into any of the previously mentioned categories of NoSQL databases. According to the statistics of the DB-Engines Ranking website [15], Adabas and CodernityDB are the more widely discussed ones of the unresolved and uncategorized databases. By the way, the characteristics of the two categories of NoSQL databases, unresolved and uncategorized and other NoSQL related databases, are similar. This is based on the classification of the NoSQL database official website [2]. We do not know the basis of the classification.

3.16. Summary The basic concepts of each category of NoSQL databases have been described. Then, all the categories of NoSQL databases are analyzed to get the results that each NoSQL database is suitable for processing certain features of data. The results are summarized in Table7. Algorithms 2019, 12, 106 14 of 17

Table 7. Summary of suitable data features for NoSQL databases.

Categories of NoSQL Databases Suitable Data Features Three-dimensional data. Wide Column Store Applications that often search for specific field data. Document Store Semi-structured files, such as XML, JSON, and so on. Key Value Store One-dimensional data, which is stored in key-value pairs. Data stored in a graphic structure. Graph Databases Suitable for data of social network relations, recommendation systems, and so on. Determine data features suitable processing based on the data format of Multimodel Databases a specific database. The object-oriented concepts are used to describe the data itself and the Object Databases relationship among the data. Suitable for computer aided design (CAD) and office automation. Grid and Cloud Database Solutions Applications that need to search recent access data frequently. XML Databases Data stored in XML files. Multidimensional Databases Applications that often analyze data in multiple dimensions. Multivalue Databases Data with multivalued attributes or composite attributes. Data with events that occurred in the past for tracking the status Event Sourcing of something. Time Series Databases Data related to time series. Other NoSQL Related Databases Unable to know. Scientific and Specialized DBs Data suitable for scientific research or computing. Unresolved and Uncategorized Data based on the data format of a specific database.

4. Choose an Appropriate Database

4.1. The Principles of Database Selection If an enterprise prepares to choose a NoSQL database, it must understand the following questions according to the cultures and characteristics of the enterprise.

1. Understand the current problems, goals, and challenges of the corporate operation database. 2. The engineers of the IT center or database administrators (DBAs) must decide to continue using the current RDB or change using a NoSQL database based on the needs of enterprise and their expertise. 3. If changing to use a NoSQL database, the IT engineers or DBA first select a suitable category of NoSQL databases based on the features and formats of the enterprise’s operating data. 4. When deciding which NoSQL database to choose, the IT engineers or DBA can make a decision according to the needs of the enterprise, the characteristics of each database, as well as the reputation and popularity of each database on websites (for example, DB-Engines Ranking website [15], vschart [25]). The more websites we query for this information, the more accurate the reputation and popularity of each database, and the more we can find the right NoSQL database.

4.2. Database Selection Case 1 Suppose that a 3C shopping website uses an RDB to store data for a long period of time, and this RDB generates 300,000 records per day. Users reflect that the website is slower, and hope the data processing speed to be as fast as possible, so the business owner asks the information department staff to solve this problem. The head of the information department traced the reasons according to the boss instructions and found that the reasons for the slower access speed of the data are not only a large amount of data generated every day but also the need for many users to merge several tables of RDB with a Algorithms 2019, 12, 106 15 of 17 large amount of data. Therefore, the supervisor recommends using the NoSQL database as a solution because NoSQL databases can merge some tables of RDB in advance so that when querying a NoSQL database, the desired data can be read quickly without waiting much time to do join operations. After the business owner agrees, the head of the information department will then decide which NoSQL database to use. The decision process is as follows.

1. The most suitable category of NoSQL database is the wide column store because access to the database often requires searching for data in a specific field. 2. According to the DB-Engines Ranking website [15], the wide column store databases that are more commonly discussed on the internet are Apache Cassandra and Apache HBase. 3. According to the experimental results of Chen et al. [26], the time of Apache HBase to read data is less than that of Apache Cassandra. Therefore, Apache HBase is recommended as the NoSQL database used by the enterprise.

4.3. Database Selection Case 2 In order to understand the news reporting strategy of a peer, a famous newspaper must collect online news from various newspapers or media, so about tens of thousands of online news are stored for analysis every day. The information staff found that RDB could not provide quick access to a large amount of data in time; therefore, it is recommended to use the NoSQL database to solve the problem of too slow access rate. In response to this question, the head of the information department must decide which NoSQL database to use for the newspaper company. The decision-making process is as follows.

1. Since the newspaper needs to collect files generated by a large number of instant messages such as tens of thousands of online news and related readers’ messages every day, it is necessary to replace the RDB with a NoSQL database. 2. There are fifteen categories of NoSQL databases available, and the category found to be suitable for storing news multimedia materials is the document store. 3. According to the DB-Engines Ranking website [15], the document store database that is often discussed on the internet has two NoSQL databases, MongoDB and Couchbase Server. Since the former has a higher market share than the latter, it is recommended to use MongoDB as the NoSQL database for the company.

4.4. Database Selection Case 3 An American multinational retail enterprise has the following challenges of its website [27].

1. This corporation intends to promote “online recommendation system optimization” to recommend suitable products for each customer who visits the website of this corporation. 2. To achieve the goal, the database must join a large number of customer and product data quickly to analyze in time the needs and trends of customers to products. 3. However, the RDB this company used cannot meet the above requirement.

For the above reasons, the head of the information department in this corporation decides to use a NoSQL database to replace RDB. The decision process is described as follows.

1. The most suitable category of NoSQL database for the enterprise is graph databases because graph databases is the most suitable for the recommendation system as described in Table7. 2. According to the statistics of DB-Engines Ranking website [15], the most discussed NoSQL database in graph databases are Neo4j and FlockDB. 3. Since Neo4j has the best market share among all graph databases [27]; thereby, Neo4j is recommended as the NoSQL database used by this enterprise. Algorithms 2019, 12, 106 16 of 17

5. Conclusions The main contents of this paper are as follows. First of all, we introduce the basic characteristics of the fifteen categories of NoSQL database (such as the wide column store, document store, key value store, and graph databases, etc.) in the NoSQL database official website [2]. Then we analyze the characteristics of the data that each category of NoSQL database is suitable for processing. Next, we propose some principles and key points for reference to help enterprises to find an appropriate NoSQL database from more than 225 ones when enterprises intend to abandon the use of RDB to use NoSQL database. Finally, we illustrate three cases, 3C shopping website, newspapers, and the US retail industry, to demonstrate how a particular company can choose a suitable NoSQL database to improve its competitiveness and customer services. In summary, if a company abandons RDB and switches to NoSQL DB, it needs to consider the characteristics of the company’s data in order to find the right DB. The transaction data of the e-commerce industry often needs to be related, the suitable NoSQL DB category is the wide column store, and Apache HBase is a good choice. The news materials of the news industry have semi-structured features. The suitable NoSQL DB category is the document store, and the better choice is MongoDB. The retailer data needs to be used by the recommendation system, so the suitable NoSQL DB category is the graph databases, and the best choice is Neo4j. We hope that these principles and examples will help decision makers to change databases correctly.

Author Contributions: Conceptualization, J.-K.C.; methodology, J.-K.C. and W.-Z.L.; writing—original draft preparation, W.-Z.L.; writing—review and editing, J.-K.C.; supervision, J.-K.C.; project administration, J.-K.C. Funding: This research received no external funding. Acknowledgments: Thanks to the reviewers for providing a lot of valuable comments to make this paper more complete. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Chen, H.A. Database System: Concept, Design, and Implementation, 3rd ed.; XBOOK MARKETING Co., Ltd.: Taipei, Taiwan, 2013. (In Chinese) 2. NoSQL Databases. Available online: http://nosql-database.org/ (accessed on 20 January 2019). 3. Pi, S.J. Establish the Cornerstone of Big Data: NoSQL Database Technique, 2nd ed.; TopTeam Information Co., Ltd.: Taipei, Taiwan, 2016. (In Chinese) 4. Lu, J.H. Challenge Big Data, How to Process Big Data in Facebook, Google, Amazon? Use NoSQL to Get 10 Billion Annual Hard Disk Data, 2nd ed.; TopTeam Information Co., Ltd.: Taipei, Taiwan, 2015. (In Chinese) 5. Sullivan, D. NoSQL for Mere Mortals, 1st ed.; Pearson P T R: London, UK, 2015. 6. Hecht, R.; Jablonski, S. NoSQL Evaluation: A Use Case Oriented Survey. In Proceedings of the 2011 International Conference on Cloud and Service Computing, Hong Kong, China, 12–14 December 2011. 7. Lourenço, J.R.; Cabral, B.; Carreiro, P.; Vieira, M.; Bernardino, J. Choosing the right NoSQL database for the job: A quality attribute evaluation. J. Big Data 2015, 2, 18:1–18:26. [CrossRef] 8. Corbellini, A.; Mateos, C.; Zunino, A.; Godoy, D.; Schiaffino, S. Persisting big-data: The NoSQL landscape. Inf. Syst. 2016, 63, 1–23. [CrossRef] 9. Khazaei, H.; Fokaefs, M.; Zareian, S.; Beigi-Mohammadi, N.; Ramprasad, B.; Shtern, M.; Gaikwad, P.; Litoiu, M. How do I Choose the Right NoSQL Solution? A Comprehensive Theoretical and Experimental Survey. Big Data Inf. Anal. 2016, 1, 185–216. 10. Gessert, F.; Wingerath, W.; Friedrich, S.; Ritter, N. NoSQL database systems: A survey and decision guidance. Softw.-Intensiv. Cyber-Phys. Syst. 2017, 32, 353–365. [CrossRef] 11. Davoudian, A.; Chen, L.; Liu, M. A Survey on NoSQL Stores. ACM Comput. Surv. (CSUR) 2018, 51, 40:1–40:43. [CrossRef] 12. Dimiduk, N.; Khurana, A. HBase in Action, 1st ed.; Oreilly & Associates Inc.: New York, NY, USA, 2012. 13. Lu, J.H. Hadoop: Practical Technical Handbook, 2nd ed.; TopTeam Information Co., Ltd.: Taipei, Taiwan, 2014. (In Chinese) Algorithms 2019, 12, 106 17 of 17

14. George, L. HBase: The Definitive Guide, 1st ed.; Oreilly & Associates Inc.: New York, NY, USA, 2011. 15. DB-Engines Ranking. Available online: https://db-engines.com/en/ranking (accessed on 4 March 2018). 16. Multi-Model Databases (Wikipedia). Available online: https://en.wikipedia.org/wiki/Multi-model_database (accessed on 15 June 2018). 17. Wu, R.H. Object-Oriented System Analysis and Design: An MDA Approach with UML, 4th ed.; BestWise Co., Ltd.: Taipei, Taiwan, 2013. (In Chinese) 18. Document-Oriented Database (Wikipedia). Available online: https://en.wikipedia.org/wiki/Document- oriented_database (accessed on 15 June 2018). 19. Multidimensional Databases. Available online: https://docs.oracle.com/cd/E12478_01/rpas/pdf/150/html/ classic_client_user_guide/basic_rpas_concepts/multidimensional_databases.htm (accessed on 5 May 2018). 20. MultiValue (Wikipedia). Available online: https://en.wikipedia.org/wiki/MultiValue (accessed on 15 June 2018). 21. Introducing to Event Sourcing. Available online: https://msdn.microsoft.com/en-us/library/jj591559.aspx#sec1 (accessed on 16 January 2018). 22. Time Series Database (Wikipedia). Available online: https://en.wikipedia.org/wiki/Time_series_database (accessed on 16 January 2018). 23. Time Series (Wikipedia). Available online: https://en.wikipedia.org/wiki/Time_series (accessed on 16 January 2018). 24. Central Weather Bureau. Available online: https://www.cwb.gov.tw/eng/index.htm (accessed on 10 July 2018). 25. vsChart.com: The Comparison Wiki: Database List. Available online: http://vschart.com/list/database/ (accessed on 18 February 2019). 26. Chen, C.Y.; Chang, B.R.; Tsai, H.F.; Guo, C.L. Empirical Analysis of High Efficient Remote Cloud Data Center Backup Using HBase and Cassandra. Sci. Progr. 2014, 2015, 1–10. 27. Neo4j: Walmart Case Study. Available online: https://neo4j.com/case-studies/walmart/ (accessed on 10 December 2018).

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).