Data Modeling for Nosql Document-Oriented Databases

Data Modeling for NoSQL Document-Oriented Databases Harley Vera, Wagner Boaventura, Maristela Holanda, Valeria Guimaraes,˜ Fernanda Hondo Department of Computer Science University of Bras´ılia Bras´ılia, Brasil. harleyve,wagnerbf @gmail.com, [email protected], { valeriaguimaraes,} fernandahondo @hotmail.com { } Abstract by the issue of system scalability, a new gener- ation of databases, known as NoSQL, is gaining In database technologies, some of the strength and space in information systems. The new issues increasingly debated are NoSQL databases emerged in the mid-90s, from non-conventional applications, including a database solution that did not provide an SQL NoSQL (Not only SQL) databases, which interface. Later, the term came to represent solu- were initially created in response to tion that promote an alternative to the Relational the needs for better scalability, lower Model, becoming an abbreviation for Not Only latency and higher flexibility in an era SQL. of bigdata and cloud computing. These The purpose, therefore, of NoSQL solutions is non-functional aspects are the main reason not to replace the Relational Model as a whole, for using NoSQL database. However, but only in cases in which there is a need for currently there are no systematic studies scalability and bigdata. In the recent years, a on data modeling for NoSQL databases, variety of NoSQL databases has been developed especially the document-oriented ones. mainly by practitioners looking to fit their specific Therefore, this article proposes a NoSQL requirements regarding scalability performance, data modeling standard in the form of maintenance and feature-set. Subsequently, there ER diagrams, introducing modeling have been various approaches to classify NoSQL techniques that can be used on document- databases, each with different categories and sub- oriented databases. On the other hand categories, such as key-value stores, column- the purpose of this article is not structure oriented and graph databases, oriented-document. the data using the model proposed, but MongoDB (MongoDB, 2015), Neo4j (Partner et it does helping with the visualization of al., 2013), Cassandra (D. Borthakur et al., 2011) data. In addition, to validate the proposed and HBase (F. Chang et al., 2008) are examples model, a study case was implemented of NoSQL databases. This article only applies to using genomic data. NoSQL document-oriented databases, because of the heterogeneous characteristics of each NoSQL 1 Introduction database classification. Huge amounts of data are produced daily. They Nonetheless, data modeling still has an impor- are generated by smart phones, social networks, tant role to play in NoSQL environments. The data banks transactions, machines measured by sensors modeling process (Elmasri and Navathe, 2010) in- are part of Internet of Things provide information volves the creation of a diagram that represents the that is growing exponentially. The management of meaning of the data and the relationship between this data is currently performed in most cases by the data elements. Thus, understanding is a funda- relational databases that provide centralized con- mental aspect of data modeling (R. F. Lans, 2008), trol of data, redundancy control and elimination and a pattern for this kind of representation has of inconsistencies (Elmasri and Navathe, 2010); few contributions for NoSQL databases. but, some of these factors restrict the use of al- Addressing this issue, this article proposes a ternative database models. Consequently, certain standard for NoSQL data modeling. This proposal limiting factors have led to alternative models of uses NoSQL document-oriented databases, aim- databases in these scenarios. Primarily, motivated ing to introduce modeling techniques that can be 129 used on databases with document features. the equivalent for a collection is the record (tu- The remainder of the paper is organized as fol- ple) and for a document it is the relation (table). lows: Section II presents related works. Section Documents can store completely different sets of III explores the concepts of modeling for NoSQL attributes, and can be mapped directly to a file for- databases based on documents, introducing the mat that can be easily manipulated by a program- different types of relationships and associations. ming language. However, it is difficult to abstract Section IV shows the proposal model in the con- the modeling of documents for the entity relation- text of NoSQL databases based on documents. ship model (R. F. Lans, 2008). Section V presents the study case to validate the proposal model. Finally in Section VI, presents 3.1 Modeling Paradigm for the conclusion of the research and future works. document-oriented Database The relational model designed for SQL has some 2 Related Works important features such as integrity, consistency, type validation, transactional guarantees, schemes Katsov (H. Scalable, 2015) presents a study of and referential integrity. However, some applica- techniques and patterns for data modeling using tions do not need all of these features. The elimi- different categories of NoSQL databases. How- nation of these resources has an important influ- ever, the approach is generic and does not define a ence on the performance and scalability of data specific modeling engine to each database. storage, bringing new meaning to data modeling. Arora and Aggarwal (R. Arora and R. Aggar- Document-oriented databases have some signif- wal, 2013) propose a data modeling, but restricted icant improvements, e.g., index management by to MongoDB document database, describing a the database itself, flexible layouts and advanced UML Diagram Class and JSON format to repre- indexed search engines (H. Scalable, 2015). By sent the documents. associating these improvements (some being de- Similarly, Banker (K. Banker, 2011) provides normalization and aggregation) to the basic prin- some ideas of data modeling, but limited to Mon- ciples of data modeling in NoSQL, it is possi- goDB database and always referring to JSON (D. ble to identify some generic modeling standards Crockford, 2006) format as a modeling solution. associated to document-oriented databases. Ana- Kaur and Rani (K. Kaur, K.Rani, 2013) present lyzing the documentation of the main document- a work for modeling and querying data in NoSQL oriented databases, MongoDB (MongoDB, 2015) databases, especifically present a case study for and CouchDB (CouchDB, 2015), similar repre- document-oriented and graph based data model. sentations of data mapping relationships can be In the case of document-oriented propose a found: References and Embedded Documents, data modeling restricted to MongoDB document a structure which allows associating a document database, describing the data model by UML dia- to another, retaining the advantage of specific per- gram class to represent documents. formance needs and data recovery standards. 3 Data Modeling For 3.2 References Relationship Document-Oriented Database This type of relationship stores the data by includ- An important step in database implementation is ing links or references, from one document to an- the data modeling, because it facilitates the un- other. Applications can solve these references to derstanding of the project through key features access the related data in the structure of the doc- that can prevent programming and operation er- ument itself (MongoDB, 2015). Figure 1 shows rors. For relational databases, the data model- two documents one of them for Fastq files and the ing uses the Entity-Relationship Model (Elmasri other to Activities. and Navathe, 2010). For NoSQL, it depends on the database category. The focus of this article is 3.3 Embedded Documents NoSQL document-oriented databases, where the This type of relationship stores in a single doc- data format of these documents can be JSON, ument structure, where the embedded documents BSON, or XML (S. J. Pramod, 2012). are disposed in a field or an array. These denor- Basically, the documents are stored in collec- malized data models allow data manipulation in tions. A parallel is made with relational databases, a single database transaction (MongoDB, 2015). 130 ceptual representation modeling type, such as: Ensuring a single way of modeling for • the several NoSQL document-oriented databases. Simplifying and facilitating the understand- • ing of a document-oriented database through Figure 1: Example of documents referenced its conceptual model, leveraging the abstraction and making the correct decisions about Figure 2 shows a document of a genome Project the data storage. with a Activity embedded document. Providing an accurate, unambiguous and • concise pattern, so that database administra- tors have substantial gains in abstraction, understanding. Presenting different types of relationships be- • tween collections defined as References and Embedded documents. Assisting the recognition and arrangement of • the objects, as well as its features and rela- Figure 2: Example of Embedded documents tionships with other objects. The following subsections present the concepts 4 Proposal For Document-Oriented and graphing to build a conceptual model for Databases Viewing NoSQL document-oriented databases. Unlike traditional relational databases that have 4.1 Assumptions a simple form in the disposition in rows and columns, a document-oriented database stores in- Before starting the discussion about the approach formation in text format, which consists of collec- of each type of the conceptual modeling repre-

Load more