Big Data Management: What to Keep from the Past to Face Future Challenges? Genoveva Vargas-Solar, J Zechinelli-Martini, J Espinosa-Oviedo
Total Page:16
File Type:pdf, Size:1020Kb
Big Data Management: What to Keep from the Past to Face Future Challenges? Genoveva Vargas-Solar, J Zechinelli-Martini, J Espinosa-Oviedo To cite this version: Genoveva Vargas-Solar, J Zechinelli-Martini, J Espinosa-Oviedo. Big Data Management: What to Keep from the Past to Face Future Challenges?. Data Science and Engineering, Springer, 2017, Data Science and Engineering, 25, pp.38 - 38. 10.1007/s41019-017-0043-3. hal-01583807 HAL Id: hal-01583807 https://hal.archives-ouvertes.fr/hal-01583807 Submitted on 8 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Data Sci. Eng. DOI 10.1007/s41019-017-0043-3 Big Data Management: What to Keep from the Past to Face Future Challenges? 1 2 3 G. Vargas-Solar • J. L. Zechinelli-Martini • J. A. Espinosa-Oviedo Received: 19 August 2016 / Revised: 24 April 2017 / Accepted: 1 August 2017 Ó The Author(s) 2017. This article is an open access publication Abstract The emergence of new hardware architectures, Data aware systems (intelligent systems, decision making, and the continuous production of data open new challenges virtual environments, smart cities, drug personalization). for data management. It is no longer pertinent to reason These functions, must respect QoS properties (e.g., secu- with respect to a predefined set of resources (i.e., com- rity, reliability, fault tolerance, dynamic evolution and puting, storage and main memory). Instead, it is necessary adaptability) and behavior properties (e.g., transactional to design data processing algorithms and processes con- execution) according to application requirements. Mature sidering unlimited resources via the ‘‘pay-as-you-go’’ and novel system architectures propose models and model. According to this model, resources provision must mechanisms for adding these properties to new efficient consider the economic cost of the processes versus the use data management and processing functions delivered as and parallel exploitation of available computing resources. services. This paper gives an overview of the different In consequence, new methodologies, algorithms and tools architectures in which efficient data management functions for querying, deploying and programming data manage- can be delivered for addressing Big Data processing ment functions have to be provided in scalable and elastic challenges. architectures that can cope with the characteristics of Big Keywords Big data Á Service-based data management systems Á NoSQL Á Data analytics stack 1 Introduction This work has been partially granted by the CONCERNS project granted by Microsoft, the AMBED and Multicloud projects of the ARC6 and ARC7 programs and by the CONACyT of the Mexican Database management systems (DBMS) emerged as a government. flexible and cost-effective solution to information organi- zation, maintenance and access problems found in orga- & J. A. Espinosa-Oviedo nizations (e.g., business, academia and government). [email protected] DBMS addressed these problems under the following G. Vargas-Solar conditions: (i) with models and long-term reliable data [email protected] storage capabilities; (ii) providing retrieval and manipula- J. L. Zechinelli-Martini tion facilities for stored data for multiple concurrent users [email protected] or transactions [40]. The concept of data model (most 1 Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, LAFMIA, notably the relational models like the one proposed by 38000 Grenoble, France Codd [29]; and the object-oriented data models [8]; Cattell 2 Fundacio´n Universidad de las Ame´ricas, Puebla, Puebla, and Barry [25]), the Structured Query Language (SQL, Mexico Melton and Simon [81]), and the concept of transaction 3 Barcelona Supercomputing Center, LAFMIA, Barcelona, (Gray and Reuter [57]) are crucial ingredients of successful Spain data management in current enterprises. 123 G. Vargas-Solar et al. History of DBs & DBMS Active WEB-based Spatial DBMS Temporal XML Embedded Multimedia Object & OR Internet Integration Data Semantics Deductive Data Services J.R. Abrial Parallel DB Hierarchical ( ) Documents P. Chen Mobile & Network E-R ( ) Services DBMS Data Mobile Internet OLTP Warehouse Dataspace WWW Relational OLCP Mining Sensors OLAP Email Files (T. Codd) Distributed DB DSMS Research network 1960 1970 1980 1990 2000 2005 2010 people to people to machines to people machines machines Fig. 1 Historical outline of DBMS [1] 1970 1980 1995 2001 2009 Today, the data management market is dominated by Fig. 2 Shifts in the use of the Web [79] major object-relational database management systems (OR-DBMS) like Oracle,1 DB2,2 or SQLServer.3 These outside of the control of a single DBMS. This resulted in an systems arose from decades of corporate and academic increased use of data integration techniques and exchange research pioneered by the creators of System R [6] and data formats such as XML. However, despite its recent Ingres [97]. development, the Web itself has experienced significant Since their emergence, innovations and extensions have evolution, resulting in a feedback loop between database been proposed to enhance DBMS in power, usability, and and Web technologies, whereby both depart from their spectrum of applications (see Fig. 1). traditional dwellings into new application domains. The introduction of the relational model [29], prevalent Figure 2 depicts the major shifts in the use of the Web. today, addressed the shortcomings of earlier data models. A first phase saw the Web as a mean to facilitate com- Subsequent data models, in turn, were relegated or became munication between people, in the spirit of traditional complementary to the relational model. Further develop- media and mainly through email. Afterward, the WWW ments focused on transaction processing and on extending made available vast amounts of information in the form of DBMS to support new types of data (e.g., spatial, multi- HTML documents, which can be regarded as people-to- media, etc.), data analysis techniques and systems (e.g., machines communication. Advances in mobile communi- data warehouses, OLAP systems, data mining). The evo- cation technologies extended this notion to mobile devices lution of data models and the consolidation of distributed and a much larger number of users, the emergence of IoT systems made it possible to develop mediation infrastruc- environments increases the number of data producers to tures [109] that enable transparent access to multiple data thousands of devices. A more recent trend exemplified by sources through querying, navigation and management Web Services, and later Semantic Web Services, as well as facilities. Examples of such systems are multi-databases, Cloud computing,4 consists of machine-to-machine com- data warehouses, Web portals deployed on Internet/In- munication. Thus, the Web has also become a platform for tranets, polyglot persistence solutions [78]. Common issues applications and a plethora of devices to interoperate, share tackled by such systems are (i) how to handle diversity of data and resources. data representations and semantics? (ii) how to provide a Most recent milestones in data management (cf. Fig. 1) global view of the structure of the information system have addressed data streams leading to data stream man- while respecting access and management constraints? (iii) agement systems (DSMS). Besides, mobile data providers how to ensure data quality (i.e., freshness, consistency, and consumers have also led to data management systems completeness, correctness)? dealing with mobile queries and mobile objects. Finally, Besides, the Web of data has led to Web-based DBMS the last 7 years concern challenges introduced by the XXL and XML data management systems serving as pivot phenomenon including the volume of data to be managed models for integrating data and documents (e.g., active (i.e., Big Data volume) that makes research turn back the XML). The emergence of the Web marked a turning point, eyes toward DBMS architectures (cloud, in-memory, GPU since the attention turned to vast amounts of new data DBMSs), data collection construction5 and parallel data 1 http://www.oracle.com. 4 Cloud computing enables on-demand network access to computing 2 http://www-01.ibm.com/software/data/db2/. resources managed by external parties. 3 http://www.microsoft.com/sqlserver/2008/en/us/. 5 See http://www.datascienceinstitute.org. 123 Big Data Management: What to Keep from the Past to Face Future Challenges? processing. In this context, it seems that Big Data pro- – (ii) deal with exhaustive, partial and approximate cessing must profit from available computing resources by answers; applying parallel execution models, thereby achieving – (iii) use execution models that consider accessing results in ‘‘acceptable’’ times. services as data providers and that include as a Relational queries are ideally suited to parallel execu- source the wisdom of the crowd; tion because they consist of uniform operations applied to – (iv) integrate ‘‘cross-layer’’ optimization that uniform streams of data. Each operator produces