Performance Regression Detection in Devops

Total Page:16

File Type:pdf, Size:1020Kb

Performance Regression Detection in Devops Performance Regression Detection in DevOps Jinfu Chen A Thesis in The Department of Computer Science and Software Engineering Presented in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy (Software Engineering) at Concordia University Montreal,´ Quebec,´ Canada November 2020 © Jinfu Chen, 2020 CONCORDIA UNIVERSITY School of Graduate Studies This is to certify that the thesis prepared By: Jinfu Chen Entitled: Performance Regression Detection in DevOps and submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Software Engineering) complies with the regulations of this University and meets the accepted standards with respect to originality and quality. Signed by the Final Examining Committee: Chair Dr. Youmin Zhang External Examiner Dr. Andy Zaidman Examiner Dr. Nikolaos Tsantalis Examiner Dr. Jinqiu Yang Examiner Dr. Yan Liu Supervisor Dr. Weiyi Shang Approved by Dr. Leila Kosseim, Graduate Program Director November 9, 2020 Dr. Mourad Debbabi, Dean Gina Cody School of Engineering and Computer Science Abstract Performance Regression Detection in DevOps Jinfu Chen, Ph.D. Concordia University, 2020 Performance is an important aspect of software quality. The goals of performance are typically defined by setting upper and lower bounds for response time and throughput of a system and physical level measurements such as CPU, memory, and I/O. To meet such performance goals, several performance-related activities are needed in development (Dev) and operations (Ops). Large software system failures are often due to performance issues rather than functional bugs. One of the most important performance issues is performance regression. Although performance regressions are not all bugs, they often have a direct impact on users’ experience of the system. The process of detection of performance regressions in development and operations is faced with challenges. First, the detection of performance regression is conducted after the fact, i.e., after the system is built and deployed in the field or dedicated performance testing environments. Large amounts of resources are required to detect, locate, understand, and fix performance regressions at such a late stage in the development cycle. Second, even we can detect a performance regression, it is extremely hard to fix it because other changes are applied to the system after the introduction of the regression. These challenges call for further in-depth analyses of the performance regression. In this thesis, to avoid performance regression slipping into operation, we first perform an exploratory study on the source code changes that introduce performance regressions in order to understand root-causes of performance regression in the source code level. Second, we propose an approach that automatically predicts whether a test would manifest performance regressions in a code commit. Most of the performance issues are related to configurations. Therefore, third, we propose an iii approach that predicts whether a configuration option manifests a performance variation issue. To assist practitioners to analyze system performance with operational data, we propose an approach to recovering field-representative workload that can be used to detect performance regression. iv Related Publications The following publication is related to the materials presented in this thesis: • Jinfu Chen and Weiyi Shang. An Exploratory Study of Performance Re-gression Introduc- ing Code Changes. In2017 IEEE International Conference on Software Maintenance and Evolution, ICSME 2017, Shanghai, China, September17-22, 2017. 341-352. • Jinfu Chen, Weiyi Shang, Ahmed E. Hassan, Yong Wang, and Jiangbin Lin. An Experience Report of Generating Load Tests Using Log-Recovered Workloadsat Varying Granularities of User Behaviour. In34th IEEE/ACM InternationalConference on Automated Software Engi- neering, ASE 2019, San Diego, CA, USA, November 11-15, 2019. IEEE, 669-681. • Jinfu Chen. Performance Regression Detection in DevOps. In the 42nd International Con- ference on Software Engineering Doctoral Symposium (ICSE DS). 2020. • Jinfu Chen, Weiyi Shang, and Emad Shihab. PerfJIT: Test-level Just-in-time Prediction for Performance Regression Introducing Commits. IEEE Transactions on Software Engineering, 2020. • Jinfu Chen, Mohammed Sayagh, Heng Li, Weiyi Shang, and Bram Adams. An Empirical Study on the Inconsistent Options Performance Variation. Empirical Software Engineering, 2020. [Under Review] The following publications are not directly related to the material presented in this thesis but were conducted as parallel work to the research presented in this thesis. v • Yi Zeng, Jinfu Chen, Weiyi Shang, and Tse-Hsun Chen, Studying the characteristics of logging practices in mobile apps: a case study on F-Droid. Empirical Software Engineering, pp. 1-41, 2019. • Zishuo Ding, Jinfu Chen, and Weiyi Shang. Towards the Use of the Readily Available Tests from the Release Pipeline as Performance Tests. Are We There Yet? In the 42nd International Conference on Software Engineering (ICSE). 2020. ACM SIGSOFT Distinguished Paper Award. • Lizhi Liao, Jinfu Chen, Heng Li, Yi Zeng, Weiyi Shang, Jianmei Guo, Catalin Sporea, Andrei Toma, and Sarah Sajedi. Using black-box performance models to detect performance regressions under varying workloads: an empirical study. Empirical Software Engineering, 1-31, 2020 • Lizhi Liao, Jinfu Chen, Heng Li, Yi Zeng, Weiyi Shang, Catalin Sporea, Andrei Toma, Sarah Sajedi. Locating Performance Regression Root Causes in the Field for Web-based Systems. IEEE Transactions on Software Engineering, 2020. [Under Review] vi Statement of Originality I, Jinfu Chen, hereby declare that I am the sole author of this thesis. All ideas and inventions attributed to others have been properly referenced. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. vii Dedications To my family. viii Acknowledgments First, I would like to express my greatest gratitude to my supervisor Dr. Weiyi (Ian) Shang for all his guidance and unconditional support. This thesis would have been impossible to complete without your aid and support. Ian, thank you for giving me the opportunity to tackle this challenge, for believing in me. You were there every time I hesitated to take the next step, and guided me to success. I have learned a great deal from you beyond as a researcher. I would also like to show my sincere gratitude to my examination committee members, Dr. Nikolaos Tsantalis, Dr. Emad Shihab, Dr. Jinqiu Yang, Dr. Yan Liu, and Dr. Andy Zaidman, for taking their precious time to consider my work and offer insightful comments, support, and guidance. Many thanks to my examiners for their valuable feedback. I extend my gratitude to the professors and collaborators with whom I worked the closest for my degree and research, Dr. Tse-Hsun (Peter) Chen, Dr. Nikolaos Tsantalis, Dr. Emad Shihab, Dr. Jinqiu Yang, Dr. Heng Li, Dr. Ahmed E. Hassan, Dr. Mohammed Sayagh, Dr. Bram Adams, and Dr. Jianmei Guo. It was a delight to follow your teachings and guidance. An immense thank you to all the members of the Software Engineering and System Engineering (SENSE) lab, Zishuo Ding, Kundi Yao, Guilherme Bicalho de Padua, Maxime Lamothe, Mehran Hassani, Muhammad Moiz Arif, Armin Najafi, Zhenhao Li, Joy Zeng, Hetong Dai, Sophia Quach, Lizhi Liao, Haonan Zhang, Roger Xia, Nian Liu. I would also like to thank the members of our peer labs DAS and SPEAR, Dr. Rabe Abdalkareem, Dr. Diego Elias Costa, Suhaib Mujahid, Giancarlo Sierra Monge, Sultan Wehaibi, Xiaowei Chen, Ahmad Abdellatif, Olivier Noury and Zi Peng, Steven Locke, An Ran Chen. I am glad to not only call you my colleagues but my friends. While pursuing this Ph.D. degree, I received the constant support of my closest friends outside ix academy. Thank you for always being there. Last but not least, I would like to thank my family for their unconditional support. A special thank you to my wife, Yuanhui. Despite the distance that sets us apart, I never felt alone on this journey with you by my side. You are the light that keeps me going. x Contents List of Figures xvi List of Tables xviii 1 Introduction 1 1.1 Introduction ...................................... 1 1.2 Research hypothesis .................................. 3 1.3 Thesis Overview ................................... 4 1.3.1 Chapter 2: Background and Literature Review ............... 4 1.3.2 Chapter 3: What are the prevalence and root-causes of performance regres- sions introducing code changes? ....................... 4 1.3.3 Chapter 4: Can we predict tests that manifest performance regressions at the commit level? ............................... 5 1.3.4 Chapter 5: Can we predict whether a configuration option manifests perfor- mance variation? ............................... 5 1.3.5 Chapter 6: Can we generate load tests using log-recovered workloads at varying granularities of user behavior? .................... 6 1.4 Thesis Contributions ................................. 6 1.5 Thesis organization .................................. 7 2 Background and Literature Review 8 2.1 Background ...................................... 8 xi 2.1.1 DevOps .................................... 8 2.1.2 Software performance ............................ 9 2.1.3 Performance regression ............................ 9 2.1.4 Performance testing ............................. 9 2.2 Literature review ..................................
Recommended publications
  • Using BEAM Software to Simulate the Introduction of On-Demand, Automated, and Electric Shuttles for Last Mile Connectivity in Santa Clara County
    San Jose State University SJSU ScholarWorks Mineta Transportation Institute Publications 1-2021 Using BEAM Software to Simulate the Introduction of On-Demand, Automated, and Electric Shuttles for Last Mile Connectivity in Santa Clara County Gary Hsueh Mineta Transportation Institute David Czerwinski San Jose State University Cristian Poliziani Mineta Transportation Institute Terris Becker San Jose State University Alexandre Hughes San Jose State University See next page for additional authors Follow this and additional works at: https://scholarworks.sjsu.edu/mti_publications Part of the Sustainability Commons, and the Transportation Commons Recommended Citation Gary Hsueh, David Czerwinski, Cristian Poliziani, Terris Becker, Alexandre Hughes, Peter Chen, and Melissa Benn. "Using BEAM Software to Simulate the Introduction of On-Demand, Automated, and Electric Shuttles for Last Mile Connectivity in Santa Clara County" Mineta Transportation Institute Publications (2021). https://doi.org/10.31979/mti.2021.1822 This Report is brought to you for free and open access by SJSU ScholarWorks. It has been accepted for inclusion in Mineta Transportation Institute Publications by an authorized administrator of SJSU ScholarWorks. For more information, please contact [email protected]. Authors Gary Hsueh, David Czerwinski, Cristian Poliziani, Terris Becker, Alexandre Hughes, Peter Chen, and Melissa Benn This report is available at SJSU ScholarWorks: https://scholarworks.sjsu.edu/mti_publications/343 Project 1822 January 2021 Using BEAM Software to
    [Show full text]
  • A3: Assisting Android API Migrations Using Code Examples
    JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2018 1 A3: Assisting Android API Migrations Using Code Examples Maxime Lamothe, Member, IEEE, Weiyi Shang, Member, IEEE, Tse-Hsun (Peter) Chen Abstract—The fast-paced evolution of Android APIs has posed a challenging task for Android app developers. To leverage Androids frequently released APIs, developers must often spend considerable effort on API migrations. Prior research and Android official documentation typically provide enough information to guide developers in identifying the API calls that must be migrated and the corresponding API calls in an updated version of Android (what to migrate). However, API migration remains a challenging task since developers lack the knowledge of how to migrate the API calls. There exist code examples, such as Google Samples, that illustrate the usage of APIs. We posit that by analyzing the changes of API usage in code examples, we can learn API migration patterns to assist developers with API Migrations. In this paper, we propose an approach that learns API migration patterns from code examples, applies these patterns to the source code of Android apps for API migration, and presents the results to users as potential migration solutions. To evaluate our approach, we migrate API calls in open source Android apps by learning API migration patterns from code examples. We find that our approach can successfully learn API migration patterns and provide API migration assistance in 71 out of 80 cases. Our approach can either migrate API calls with little to no extra modifications needed or provide guidance to assist with the migrations.
    [Show full text]
  • A Dictionary of DBMS Terms
    A Dictionary of DBMS Terms Access Plan Access plans are generated by the optimization component to implement queries submitted by users. ACID Properties ACID properties are transaction properties supported by DBMSs. ACID is an acronym for atomic, consistent, isolated, and durable. Address A location in memory where data are stored and can be retrieved. Aggregation Aggregation is the process of compiling information on an object, thereby abstracting a higher-level object. Aggregate Function A function that produces a single result based on the contents of an entire set of table rows. Alias Alias refers to the process of renaming a record. It is alternative name used for an attribute. 700 A Dictionary of DBMS Terms Anomaly The inconsistency that may result when a user attempts to update a table that contains redundant data. ANSI American National Standards Institute, one of the groups responsible for SQL standards. Application Program Interface (API) A set of functions in a particular programming language is used by a client that interfaces to a software system. ARIES ARIES is a recovery algorithm used by the recovery manager which is invoked after a crash. Armstrong’s Axioms Set of inference rules based on set of axioms that permit the algebraic mani- pulation of dependencies. Armstrong’s axioms enable the discovery of minimal cover of a set of functional dependencies. Associative Entity Type A weak entity type that depends on two or more entity types for its primary key. Attribute The differing data items within a relation. An attribute is a named column of a relation. Authorization The operation that verifies the permissions and access rights granted to a user.
    [Show full text]
  • Ontology for Information Systems (O4IS) Design Methodology Conceptualizing, Designing and Representing Domain Ontologies
    Ontology for Information Systems (O4IS) Design Methodology Conceptualizing, Designing and Representing Domain Ontologies Vandana Kabilan October 2007. A Dissertation submitted to The Royal Institute of Technology in partial fullfillment of the requirements for the degree of Doctor of Technology . The Royal Institute of Technology School of Information and Communication Technology Department of Computer and Systems Sciences IV DSV Report Series No. 07–013 ISBN 978–91–7178–752–1 ISSN 1101–8526 ISRN SU–KTH/DSV/R– –07/13– –SE V All knowledge that the world has ever received comes from the mind; the infinite library of the universe is in our own mind. – Swami Vivekananda. (1863 – 1902) Indian spiritual philosopher. The whole of science is nothing more than a refinement of everyday thinking. – Albert Einstein (1879 – 1955) German-Swiss-U.S. scientist. Science is a mechanism, a way of trying to improve your knowledge of na- ture. It’s a system for testing your thoughts against the universe, and seeing whether they match. – Isaac Asimov. (1920 – 1992) Russian-U.S. science-fiction author. VII Dedicated to the three KAs of my life: Kabilan, Rithika and Kavin. IX Abstract. Globalization has opened new frontiers for business enterprises and human com- munication. There is an information explosion that necessitates huge amounts of informa- tion to be speedily processed and acted upon. Information Systems aim to facilitate human decision-making by retrieving context-sensitive information, making implicit knowledge ex- plicit and to reuse the knowledge that has already been discovered. A possible answer to meet these goals is the use of Ontology.
    [Show full text]
  • 2 Database Design
    2 Database Design Designing a database system is a complex undertaking typically divided into four phases. 2.1.1 Requirements specification collects information about the users’ needs with respect to the database system. A large number of approaches for requirements specification have been developed by both academia and practitioners. During this phase, the active participation of users will increase their satisfaction with the delivered system and avoid errors, which can be very expensive to correct if the subsequent phases have already been carried out. 2.1.2 Conceptual design Conceptual design aims at building a user-oriented representation of the database that does not contain any implementation considerations. This is done by using a conceptual model in order to identify the relevant concepts of the application. The entity-relationship model is one of the globally used conceptual models for designing a database application. Other similar applied models are the object-oriented modeling techniques based on the UML (Unified Modeling Language) notation. Two approaches for conceptual design can be performed according to a) the complexity of the target system and b) the developers’ experience: – Top-down design: The requirements of the various users are merged before the design process begins, and a unique schema is built. Afterward, a separation of the views corresponding to individual users’ requirements can be performed. This approach can be difficult and expensive for large databases and inexperienced developers. – Bottom-up design: A separate schema is built for each group of users with different requirements, and later, during the view integration phase, these schemas are merged to form a unified conceptual schema for the target database.
    [Show full text]
  • Automation with the Four Rs of CASE
    Dr. Carma McClure presents Rebuilding for Software Automation with the Four Rs of CASE San Francisco Atlanta September 17-19, 1991 October 21-23, 1991 /ASI -D-W-. An Extended Intelligence, Our LIPS logo stands for the AI tern- Inc. Conference and Tools Exhibition Logical Inferences Per Second for SOFTWARE AUTOMATION Software Automation demands advantage ofCASE. RAD" , Enter- rebuilding the entire software organi- prise Modeling, Object-Oriented zation. CASE began the software Development and Reusability are TOOLS automationmovement with the methods that leverage our huge " I-CASE rebuilding ofsoftware tools. First, we investmentsin CASE, greatly speed introduced CASE tools to provide up application development and REPOSITORIES " automated assistancefor software maintenanceand eliminate the " RE-ENGINEERING systems analysis, design and imple- enormouswaste of time, software mentation. Now, we must introduce resources, and money. __ Integrated,i___ei4_c--cu, Repository-based_YCL.u-__u_y-L>a_cu CASEv^/\ol . f_- because software- automation changes METHODS 100l Environmentsto capitalize. on , , the. software process, we must rebuild1.11 software productivity1 . and11.quality , RAD" * . ourmanagement techniques.. One~ " improvements, as well as Ke-engineer- , . , , _ extremelyimportant change . REUSABILITY . , 111 (■■ r is the use " ing 1ools to extendthe benefits of , , /-> a or- ■ or quantitative,automated measure 1.11. n- r existing. , - CASE to billionsor lines of .11. " OBJECT-ORIENTED , ments, and metrics that direct the1 systems code. , . , , , , r introduction. and control therisk. of MANAGEMENT But there is much more to software software automation. Those organiza- technologyrebuilding then simply tions that are most successful with MEASURES " new tools. We also must rebuild our CASEknow how to measureand METRICS methods and management practices.
    [Show full text]
  • A COMPARISON of DATA MODELING TECHNIQUES David C
    A COMPARISON OF DATA MODELING TECHNIQUES David C. Hay Essential Strategies, Inc October, 1999 [This is a revision of a paper by the same title written in 1995. In addition to stylistic updates, this paper replaces all the object modeling techniques with the UML – a new technique that is intended to replace at least all these.] Peter Chen first introduced entity/relationship modeling in 1976 [Chen 1977]. It was a brilliant idea that has revolutionized the way we represent data. It was a first version only, however, and many people since then have tried to improve on it. A veritable plethora of data modeling techniques have been developed. Things became more complicated in the late 1980’s with the advent of a variation on this theme called “object modeling”. The net effect of all this is that there are now even more ways to model the structure of data. This article is intended to present the most important of these and to provide a basis for comparing them with each other. Regardless of the symbols used, data or object modeling is intended to do one thing: describe the things about which an organization wishes to collect data, along with the relationships among them. For this reason, all of the commonly used systems of notation fundamentally are convertible one to another. The major differences among them are aesthetic, although some make distinctions that others do not, and some do not have symbols to represent all situations. This is true for object modeling notations as well as entity/relationship notations. There are actually three levels of conventions to be defined in the data modeling arena: The first is syntactic, about the symbols to be used.
    [Show full text]
  • 90 a Survey of Software Log Instrumentation
    A Survey of Software Log Instrumentation BOYUAN CHEN and ZHEN MING (JACK) JIANG, York University, Canada Log messages have been used widely in many software systems for a variety of purposes during software development and field operation. There are two phases in software logging: log instrumentation andlog management. Log instrumentation refers to the practice that developers insert logging code into source code to record runtime information. Log management refers to the practice that operators collect the generated log messages and conduct data analysis techniques to provide valuable insights of runtime behavior. There are many open source and commercial log management tools available. However, their effectiveness highly depends on the quality of the instrumented logging code, as log messages generated by high-quality log- ging code can greatly ease the process of various log analysis tasks (e.g., monitoring, failure diagnosis, and auditing). Hence, in this article, we conducted a systematic survey on state-of-the-art research on log instru- mentation by studying 69 papers between 1997 and 2019. In particular, we have focused on the challenges and proposed solutions used in the three steps of log instrumentation: (1) logging approach; (2) logging util- ity integration; and (3) logging code composition. This survey will be useful to DevOps practitioners and researchers who are interested in software logging. CCS Concepts: • Software and its engineering → Software creation and management; Additional Key Words and Phrases: Systematic survey, software logging, instrumentation ACM Reference format: Boyuan Chen and Zhen Ming (Jack) Jiang. 2021. A Survey of Software Log Instrumentation. ACM Comput. Surv. 54, 4, Article 90 (April 2021), 34 pages.
    [Show full text]
  • Software Engineering in Practice Track (ICSE-SEIP 2018)
    2018 IEEE/ACM 40th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP 2018) Gothenburg, Sweden 25 May – 3 June 2018 IEEE Catalog Number: CFP18L79-POD ISBN: 978-1-5386-6360-8 Copyright © 2018, Association for Computing Machinery (ACM) All Rights Reserved *** This is a print representation of what appears in the IEEE Digital Library. Some format issues inherent in the e-media version may also appear in this print version. IEEE Catalog Number: CFP18L79-POD ISBN (Print-On-Demand): 978-1-5386-6360-8 ISBN (Online): 978-1-4503-5659-6 Additional Copies of This Publication Are Available From: Curran Associates, Inc 57 Morehouse Lane Red Hook, NY 12571 USA Phone: (845) 758-0400 Fax: (845) 758-2633 E-mail: [email protected] Web: www.proceedings.com 2018 ACM/IEEE 40th International Conference on Software Engineering: Software Engineering in Practice ICSE-SEIP 2018 Table of Contents Message from the General Chair x Message from the SEIP Chairs xiii Software Engineering in Practice Program Committee xiv Sponsors and Supporters xviii Cloud and DevOps Adopting Autonomic Computing Capabilities in Existing Large-Scale Systems 1 Heng Li (Queen's University), Tse-Hsun (Peter) Chen (Concordia University), Ahmed E. Hassan (Queen's University), Mohamed Nasser (BlackBerry), and Parminder Flora (BlackBerry) Java Performance Troubleshooting and Optimization at Alibaba 11 Fangxi Yin (Alibaba Group), Denghui Dong (Alibaba Group), Sanhong Li (Alibaba Group), Jianmei Guo (Alibaba Group), and Kingsum Chow (Alibaba
    [Show full text]
  • IDEF1 Information Modeling
    IDEF1 Information Modeling A Reconstruction of the Original Air Force Wright Aeronautical Laboratory Technical Report AFWAL-TR-81-4023 Dr. Richard J. Mayer, Editor Knowledge Based Systems, Inc. IDEF1 Information Modeling A Reconstruction of the Original Air Force Wright Aeronautical Laboratory Technical Report AFWAL-TR-81-4023 Dr. Richard J. Mayer, Editor Knowledge Based Systems, Inc. Knowledge Based Systems, Inc. One KBSI Place 1408 University Drive East College Station, Texas 77840-2335 (409) 260-5274 This manual is copyrighted, with all rights reserved. Under the copyright laws, the manual may not be reproduced in any form or by any means, in whole or in part, without written consent of Knowledge Based Systems, Inc. (KBSI). Under the law, reproducing includes translating into another language or format. © 1990, 1992 by Knowledge Based Systems, Inc. One KBSI Place 1408 University Drive East College Station, Texas 77840-2335 (409) 696-7979 Table of Contents List of Figures .......................................................................................................................... v Foreword.................................................................................................................................. ix 1.0 Introduction.................................................................................................................... 3 2.0 IDEF1 Concepts ............................................................................................................. 7 2.1 Introduction........................................................................................................
    [Show full text]
  • Dijkstra's Crisis
    Dijkstra’s Crisis: The End of Algol and Beginning of Software Engineering, 1968-72 Thomas Haigh, [email protected], www.tomandmaria.com/tom Draft for discussion in SOFT-EU Project Meeting, September 2010 “In SHOT meetings and in the journals, it is surprising how few and far between critical references to specific historical arguments are: there is hardly any debate or even serious substantive disagreement.” – David Edgerton, 2010. In 1968 a NATO Software Engineering Conference was held in Garmisch, Germany. Attendees represented a cross section of those involved in programming work and its management. Having never met before they were shocked to realize that identical troubles plagued many different kinds of software. Participants agreed that a “software crisis” was raging. Programming projects were chronically late and over budget, yet often failed to produce useful software. They endorsed a new discipline of software engineering, its contents yet to be defined, as the solution to this crisis. A follow up conference, held in Rome the next year, failed to reach consensus on what specific techniques might constitute the core of this new discipline. Yet software engineering soon became the dominant identity to describe the work and management of programmers. That, at least, is the composite account one would gain from the almost identical paragraphs repeated again and again in historical works both scholarly and popular. But this growing mass of consensus is, I believe, built on sand. The historical significance of the 1968 NATO Conference, the demographics of its participants, its connection to prior and subsequent events, and its relationship to the software crisis are at risk of being fundamentally misinterpreted.
    [Show full text]
  • Modeling and Management Big Data in Databases—A Systematic Literature Review
    sustainability Review Modeling and Management Big Data in Databases—A Systematic Literature Review Diana Martinez-Mosquera 1,* , Rosa Navarrete 1 and Sergio Lujan-Mora 2 1 Department of Informatics and Computer Science, Escuela Politécnica Nacional, 170525 Quito, Ecuador; [email protected] 2 Department of Software and Computing Systems, University of Alicante, 03690 Alicante, Spain; [email protected] * Correspondence: [email protected]; Tel.: +593-02-2976300 Received: 25 November 2019; Accepted: 9 January 2020; Published: 15 January 2020 Abstract: The work presented in this paper is motivated by the acknowledgement that a complete and updated systematic literature review (SLR) that consolidates all the research efforts for Big Data modeling and management is missing. This study answers three research questions. The first question is how the number of published papers about Big Data modeling and management has evolved over time. The second question is whether the research is focused on semi-structured and/or unstructured data and what techniques are applied. Finally, the third question determines what trends and gaps exist according to three key concepts: the data source, the modeling and the database. As result, 36 studies, collected from the most important scientific digital libraries and covering the period between 2010 and 2019, were deemed relevant. Moreover, we present a complete bibliometric analysis in order to provide detailed information about the authors and the publication data in a single document. This SLR reveal very interesting facts. For instance, Entity Relationship and document-oriented are the most researched models at the conceptual and logical abstraction level respectively and MongoDB is the most frequent implementation at the physical.
    [Show full text]