Science, EngineeringDimitar Pilev, & Education, Ventzislav 1, Nikolov (1), 2016, 75-82

Developing of Node.js module for working with an effective temporal model in NoSQL data bases

Dimitar Pilev, Ventzislav Nikolov *

University of Chemical Technology and Metallurgy, 8 Kl. Ohridski, 1756 Sofia, Bulgaria

Received 15 March 2016, Accepted 05 September 2016

ABSTRACT

The ever growing user requirements to access information demands using temporal database models. In such cases not only the data but the time intervals of their validity should be saved. The present communication describes the develop- ment of Node.js software module to process temporal data in the non-relational DBMS MongoDB. Asynchronous software model leads to extremely fast processing of large data amounts of temporal databases. Keywords: temporal databases, Node.js, NoSQL database, MongoDB, asynchronous programming.

known Java EE [1], Apache PHP [2] and ASP. INTRODUCTION NET [3], as well as the new, but recently increas- ingly popular Node.js [4, 5] are among them. The continuous increase of the number of The Node.js [6] platform is quickly gaining users utilizing the internet as means of com- popularity amongst web developers thanks to the munication, payment, shopping, etc. imposes ease and speed of processing user queries. The the use of new technologies when implementing use of the Node.js technology is exponentially web-based applications. The latter should direct increasing. Leading companies such as PayPal, a large number of user queries towards the large LinkedIn, WalMart [7] have rewritten their ap- amount of information stored in the databases. plications using Node.js. One of the main requirements towards any In a number of cases, the data base appears contemporary web-application is related to its as the primary source of delay in processing user quick response time. The main objective is to queries. The delay increases proportionately to decrease the response time of submitted user que- the processed data amount. This is the reason why ries regardless of their number and complexity. leading companies such as Google, Facebook and At present there is a number of platforms Amazon are making the transition towards using for building web-based applications. The well- a new type of databases called non-relational

* Correspondence to: Ventzislav Nikolov, University of Chemical Technology and Metallurgy, 8 Kl. Ohridski, 1756 Sofia, Bulgaria, E-mail: [email protected]

75 Science, Engineering & Education, 1, (1), 2016

databases (NoSQL - Not only Structured Query the extreme popularity of the problem, however, Language) [8, 9]. commercial DBMS such as Oracle and Teradata The NoSQL database provides a mechanism have already published [10], [11] new specifica- for storage and restoration of data using a free, tions of DBMS with temporal support. An addi- coordinated model of data unlike the relational tional surface layer has been developed in [12] model of DB. A non-relational database is an op- to DBMS MySQL to implement it. timized depository containing information of the The purpose of the development described in type key-value. Its design purpose is to ease the this communication is to implement a module for processes of restoration and addition of informa- the Node.js platform providing a effective time tion aiming to optimize the efficiency under the temporal support of data stored in MongoDB conditions of introduction and storage of large database. The module allows the processing of amounts of data. queries related to addition, deletion, modifica- Traditional models of DB and DBMS are tion and searching of temporal information in capable of storing and processing only the pres- the database. ent state of the modeled subject area. These are normally present states, whose old values are An effective temporal model destroyed when change is necessary. The constantly increasing demand of infor- The effective temporal model (ETM) is us- mation imposes the use of temporal DB, where ing effective time representing a combination of not only data is stored, but also the period of valid and transactional time. The beginning of its validity. One of the major reasons for non- the effective period is set at the current time, i.e. reporting the change of data states in the course the moment of recording the cortege in DB (the of time refers to the lack of adequate maintenance beginning of the period marking the transactional of the temporal factor in DBMS. Until recently time, Ts), and its end is set at the time marking the existing DBMS would not allow temporal the end of the period of the valid time, Ve. processing of data. Taking into consideration The addition of data to the relation under

( , ( 1, … , ), ) = β€² 𝑛𝑛 𝑒𝑒 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖{(π‘Ÿπ‘Ÿ1,π‘Žπ‘Žβ€¦ , |π‘Žπ‘Ž )}𝑑𝑑 ( 1, … , | ) ( 1, … , | ) ( , ) β€² β€² π‘Ÿπ‘Ÿ βˆͺ π‘Žπ‘Ž π‘Žπ‘Žπ‘›π‘› 𝑑𝑑𝑒𝑒 𝑖𝑖𝑖𝑖 βˆ„π‘‘π‘‘π‘’π‘’ οΏ½ π‘Žπ‘Ž π‘Žπ‘Žπ‘›π‘› 𝑑𝑑𝑒𝑒 ∈ π‘Ÿπ‘ŸοΏ½ ⋁ βˆƒπ‘‘π‘‘π‘’π‘’ οΏ½ π‘Žπ‘Ž π‘Žπ‘Žπ‘›π‘› 𝑑𝑑𝑒𝑒 ∈ π‘Ÿπ‘Ÿ β‹€ βˆ„π‘‘π‘‘ ∈ π‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œ 𝑑𝑑𝑒𝑒 𝑑𝑑𝑒𝑒 οΏ½ ⎧ {( 1, … , | )} {( 1, … , | ( , ))} ( 1, … , | ) ( , ) βŽͺ β€² β€² 𝑛𝑛 𝑒𝑒 𝑛𝑛 𝑒𝑒 𝑒𝑒 𝑒𝑒 𝑛𝑛 𝑒𝑒 𝑒𝑒 𝑒𝑒 π‘Ÿπ‘Ÿ βˆ’ π‘Žπ‘Ž π‘Žπ‘Ž 𝑑𝑑 βˆͺ π‘Žπ‘Ž π‘Žπ‘Ž 𝑐𝑐 β„Ž π‘Žπ‘Ž π‘Žπ‘Ž π‘Žπ‘Ž π‘Žπ‘Ž π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Ž 𝑑𝑑 𝑑𝑑 𝑖𝑖𝑖𝑖 βˆƒ 𝑑𝑑 οΏ½ π‘Žπ‘Ž π‘Žπ‘Ž 𝑑𝑑 ∈ π‘Ÿπ‘Ÿ β‹€ π‘šπ‘š π‘šπ‘šπ‘šπ‘šπ‘šπ‘š π‘šπ‘š 𝑑𝑑 𝑑𝑑 οΏ½ ⎨ βŽͺ a) addition of data βŽ©π‘Ÿπ‘Ÿ

( , ( 1, … , )) =

𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 π‘Ÿπ‘Ÿ π‘Žπ‘Ž π‘Žπ‘Žπ‘›π‘› {( , … , | )} {( , … , | ( , ))} ( , … , | ) 1 1 1 in opposite case π‘Ÿπ‘Ÿ βˆ’ π‘Žπ‘Ž π‘Žπ‘Žπ‘›π‘› 𝑑𝑑𝑒𝑒 βˆͺ π‘Žπ‘Ž π‘Žπ‘Žπ‘›π‘› 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑑𝑑𝑒𝑒 𝐢𝐢𝐢𝐢 π‘–π‘–π‘–π‘–βˆƒπ‘‘π‘‘π‘’π‘’ οΏ½ π‘Žπ‘Ž π‘Žπ‘Žπ‘›π‘› 𝑑𝑑𝑒𝑒 ∈ π‘Ÿπ‘Ÿ β‹€ 𝐢𝐢𝐢𝐢 ∈ 𝑑𝑑𝑒𝑒 οΏ½ οΏ½ π‘Ÿπ‘Ÿ b) deletion of data

( , ( 1, … , ), ) = ( ( , ( 1, … , )), ( 1, … , ), ) β€² β€² 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 π‘Ÿπ‘Ÿ π‘Žπ‘Ž π‘Žπ‘Žπ‘›π‘› 𝑑𝑑𝑒𝑒 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 π‘Ÿπ‘Ÿ π‘Žπ‘Ž π‘Žπ‘Žπ‘›π‘› π‘Žπ‘Ž π‘Žπ‘Žπ‘›π‘› 𝑑𝑑𝑒𝑒 Fig. 1. Updating data in relation.

76 Dimitar Pilev, Ventzislav Nikolov

ETM is implemented in case facts that are not is still active), then data modification, rather than recorded in DB (a1, … , an) have to be registered. data addition, is required. The latter are usually valid for a definite period of The updating of the values of data (a1, … , an) time. The effective time coincides with the valid which is part of the current state of the rela- time, which starts with the recording of the facts tion, requires the use of the following function: in DB. This means that the facts are valid in the changeETP(te, te) - this function changes the reality modeled being recorded in DB. The data effective time period,β€² during which the data in recording brings back a new updated version of the cortege is identically marked regardless of the the relation. meeting or overlapping of the old and the new

Functions referring to work with effective periods (te and te β€˜). temporal intervals are defined in the model with The deletion of a certain cortege consists of its the aim to present the semantics of data updat- logical removal from the current state of the rela- ing. They are: tion (Fig. 1b). The logical removal of the cortege l the function meets(te, te’ ) performing a test is performed by correcting the time period during for meeting of two intervals: which the non-temporal attributes (a1, … , an) are - ( , ) marked. Such correction is performed by setting β€² the time for ending the effective time period, 𝑒𝑒 𝑒𝑒 ( = ) π‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘š 𝑑𝑑 𝑑𝑑 which time is current at the moment of deletion. β€² βˆ’ π‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿ 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑖𝑖𝑖𝑖 𝐸𝐸 𝑒𝑒 𝐸𝐸 𝑠𝑠 For this purpose we use the function ( ). l the function ( , ) performing a βˆ’ π‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿ π‘“π‘“π‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Žπ‘Ž 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 [ , ) ( 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑) 𝑑𝑑𝑒𝑒 test for partial overlap of two intervals:β€² π‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œ 𝑑𝑑𝑒𝑒 𝑑𝑑𝑒𝑒 - ( , ) βˆ’ π‘Ÿπ‘Ÿ π‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿ 0𝐸𝐸 𝑠𝑠 𝐢𝐢 𝐢𝐢 𝑖𝑖𝑖𝑖 else𝐢𝐢𝐢𝐢 ∈ 𝑑𝑑 𝑒𝑒 β€² π‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œπ‘œ 𝑑𝑑 𝑒𝑒 𝑑𝑑𝑒𝑒 ( < ) Ifβˆ’ theπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿ valuesπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿ of attributes in the cortege

β€² (a1, … , an) do not exist or, if they do exist but βˆ’ π‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿ 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑖𝑖𝑖𝑖 𝐸𝐸 𝑠𝑠 𝐸𝐸 𝑒𝑒 are not part of the current state of the relation, then βˆ’ π‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘ŸThereπ‘Ÿπ‘Ÿπ‘›π‘› are𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 three cases𝑒𝑒𝑒𝑒𝑒𝑒 𝑒𝑒referring to the addition the deletion of the cortege will produce no effect. of new data to the relation (Fig. 1Π°). The modification of an existing cortege is Case one: Let the values of attributes performed through the β€œupdate” operation (Fig. (a1, … , an) , which do not depend on time, be 1c). It is defined as consecutive execution of the unrecorded in the relation until now, or comprise operations β€œdelete” and β€œinsert”. The proposed part of its previous states (Fig. 1Π°). In that case ETM reduces significantly the stored data vol- there is no record when the effective period of ume. There is significant reduction in the excess time overlaps the new period. In that situation we of information stored in the DB. At the same time add a new cortege to the relation. we can obtain a precise evaluation of the data Case two: If the values of attributes (a1, … , an) validity over time. are recorded in the relation and a record exists, An asynchronous programming model with then its period of validity te meeting the new te’, i.e. the effective period which is used to mark the Node.js cortege, will be updated to [Es, Ee). Case three: If the values of attributesβ€² are part Node.js is a platform built on the basis of of the current state of the relation (the effective Google V8 JavaScript [13] for easy building of period, when the data in the cortege is marked, quick and scalable network applications. Node.

77 Science, Engineering & Education, 1, (1), 2016 js uses an β€œevent-driven” asynchronous input/ also switching between separate threads. output model of operation based on a single The Node.js platform is only an environ- process, which makes it easy and efficient for ment, which means that the programmer must real-time applications. do everything by himself. By default there is Node.js allows the development of applica- neither HTTP, nor any other server. A single tions written in JavaScript, which are executed by script contains the entire communication with the server. The platform contains the most popular the users. This significantly reduces the number operation systems such as Windows, Mac and of resources used by the web application. . The environment provides an opportunity to interact with input/output devices through its A data model at MongoDB API written in C++. This allows a connection to other external libraries written in different lan- MongoDB [14] is a document-oriented data- guages by using JavaScript code. base management system, which stores structured Node.js has a module architecture to simplify information in JSON format [15] with dynamic the creation of complex applications. It contains scheme. This makes the information integration built-in asynchronous input/output libraries for in certain applications much easier and faster. working with files, sockets and HTTP commu- MongoDB has gradually become one of the most nication. popular non-relational DB, often referred to as With traditional and well-known platforms of NoSQL. Java EE and PHP queries are processed by creating Table 1 shows the terminology used with a separate thread or process for each query. Unlike MongoDB compared to that of the relational them, Node.js implements an entirely different model. technology operating with a single thread based on Data in MongoDB is stored in the form of asynchronous queries processing (Fig. 2). For this documents set in JSON format (Fig. 3). Within purpose the so-called event cycle is performed, each separate document, the value of a certain which provides not only processing of tenths field may be of a random type including another of thousands simultaneous queries without any document, data set or data sets. concerns related to limited RAM memory, but MongoDB stores all documents in collections. A collection (Fig. 4) is a group of related docu- ments having a set of shared common indexes. Collections are analogous to the tables in the relational databases. The advantage of MongoDB is that it stores information for a particular object in a single document (Fig. 5), unlike the dividing and storing it in several tables typical for the relational DB model. The disadvantage of the non-relational

Fig. 2. Node.js event cycle. Fig. 3. MongoDB document.

78 Dimitar Pilev, Ventzislav Nikolov

Table 1. Terminology used with MongoDB.

SQL MongoDB

Database Database

Table collectionhttp://docs.mongodb.org/manual/reference/glossary/ - term-collection

Row Document

Column Fieldhttp://docs.mongodb.org/manual/reference/glossary/ - term-field

index index

Connection of tables Built-in documents

Primary key Primary key

DB, refers however to the lack of, or to the document from the collection is referred to ef- partially minimum support of connections. The fective time. For this purpose an additional field functionality of the extracting information stored (characteristic) β€œperiod” is added to document in several different collections is performed by data, which sets the beginning and end of the ef- the application. fective time period marking the document (Fig. 6). DBMS of MongoDB uses an asynchronous The first step in Node.js module development is model for processing of queries submitted to the connected with the creation of the file package.. database, which increases significantly the effi- It provides information on the functionality of the ciency of the system working with large data sets. This makes it suitable for storage and processing of temporal data.

Design and implementation of the modele for working with Π΅Ρ‚ΠΌ under MongoDB

During the development of the module, a normalized data model is selected where each

Collection Fig. 5. MongoDB storage of data regarding an Fig. 4. MongoDB collection. issued invoice. 79 Science, Engineering & Education, 1, (1), 2016

Fig. 6. Storage of data under the temporal module Fig. 7. Contents of the package.json file describ- supported model. ing the module developed. module developed. the data, and the callback function. Fig. 7 shows the contents of the package.json l remove(rΠ΅mΠΎvΠ΅Obj, callback) – for deletion file used. The first two rows contain the name and of data from the collection. Two parameters are the. version, i.e. the mandatory fields for each set: criteria used to perform the deletion and a Node.js module. The description is followed by callback function typical for the asynchronous the keywords and name of developer. Node.js programming model; interface MongoDB is used in order to perform a l update(oldObj, newObj, pe, callback) – for connection MongoDB. The interface is included modification of the data in the collection. Criteria in the developed module as dependancy (Fig. 7, concerning the update, the new values, the end of row 16). the marking time period and the callback func- The functionality of the module is distributed tion are set; among the following functions: l find(query, options, callback) – for extrac- l connect(mongoUrl, collection, callback) – tion of temporal data from the collection. The for establishing a connection to the MongoDB β€œquery” parameter sets criteria for extraction of server of DB. The submitted input data consists of data from the collection. The second parameter the parameters of connection to MongoDB server β€œoptions” is optional and sets additional settings as well as the collection and callback function such as sorting, restriction of the number of docu- which starts after the connection is established; ments returned by the query, etc. The callback l insert(insObj, pe, callback) – for addition function does the processing of the query result; of data marked with effective time to the col- l close() – for terminating the connection to lection. The execution of the function requires the MongoDB server of DB. the data added to the collection (in the form of a All operations related to data updating and/or document), the end of the time period marking extraction are performed on a single collection.

80 Dimitar Pilev, Ventzislav Nikolov

var update = function(oldObj, newObj, pe, callback){ var etm = require("./lib/index.js");

remove(oldObj,function(res){ var mongoUrl = 'mongodb://127.0.0.1:27017/test';

if(res == 'Done' || res.result.ok==1) etm.connect(mongoUrl, 'proba', function(){

insert(newObj,pe,callback); console.log('Connected correctly to server');

else etm.insert({x:1,y:2}, new Date(2016,2,5),

callback('Cannot update collection!'); function(res){ }); console.log(res.result); } etm.find({}, function(docs){

The implementation of the β€œupdate” func- docs.toArray(function(err,data){ tion is in compliance with the requirements of ETM and the asynchronous programming model console.log(data); of Node.js In an ETM operation, β€œupdate” is etm.close(); restricted to consecutive execution of the opera- tions β€œdelete” and β€œinsert”. An addition of new }); data will be performed in the callback function of the β€œremove” function to observe the sequence }); of execution of both operations. This guarantees }); that the data new values are added only after the deletion of existing ones. }); Fig. 8 shows the programming code which demonstrates the use of the module developed. Fig. 8. Usage of the d module developed. A connection to the MongoDB server of DB is established after calling the module and setting chronous processing. A review is performed on the parameters of connection (row 1 and 2). Then the advantages and disadvantages of one of the an anonymous callback function is executed to most popular non-relational DBMS MongoDB. add the document {x:1,y:2} marked with a time A module has been developed for the Node.js period from the current time of execution of the platform providing to process temporal data in an query to 15.12.2016 (row 6). The β€œfind” function environment of MongoDB. The module allows is executed (row 8) only after the set document the update and extraction of data marked with is stored in DB. Thus all documents stored in the effective time. An asynchronous software model β€œproba” collection are printed. leads to extremely fast processing of large data amounts of temporal databases. CONCLUSIONS REFERENCES The present article introduces a new platform for building network applications - Node.js. The 1. Oracle, Java EE Documentation, available: environment allows quick and effective pro- http://www.oracle.com/technetwork/java/ cessing of user queries because of their asyn- javaee/documentation/index.html

81 Science, Engineering & Education, 1, (1), 2016

2. The Apache Software Foundation, available: base.org/ http://www.apache.org/ 10. Database, O., Workspace Manager Devel- 3. PHP, PHP Documentation, available: http:// oper’s Guide. September 2010. php.net/docs.php 11. Database, T., Temporal Table Support. Tera- 4. ASP.NET, Get Started with ASP.NET, avail- data Labs, 2012 able: http://www.asp.net/get-started 12. D. Pilev, A. Georgieva, Effective Time Tempo- 5. NodeJS, available: http://nodejs.org/ ral Database Model, International Journal on 6. JSConf Berlin,The European JavaScript Confer- Information Technologies and Security, ence, Berlin, November 7 & 8, 2009, available: 2012. N2: 33-46, ISSN 1313-8251. http://jsconf.eu/2009/ 13. Google V8, V8 JavaScript Engine, available: 7. WalMart, available: http://www.walmart.com/ https://code.google.com /p/v8/ 8. P. Sadalage, NoSQL Databases: An Overview, 14. MongoDB, available: https://www.mongodb. ThoughtWorks, October 2014 org/ 9. Nosql Database, availeble:http://nosql-data- 15. Introducing JSON, available: http://json.org/

82