Identifying and Evaluating Software Architecture Erosion

Total Page:16

File Type:pdf, Size:1020Kb

Identifying and Evaluating Software Architecture Erosion SCUOLA DI DOTTORATO UNIVERSITÀ DEGLI STUDI DI MILANO-BICOCCA Department of Informatics Systems and Communication PhD program Computer Science Cycle XXX IDENTIFYING AND EVALUATING SOFTWARE ARCHITECTURE EROSION Surname Roveda Name Riccardo Registration number 723299 Tutor: Prof. Alberto Ottavio Leporati Supervisor: Prof. Francesca Arcelli Fontana Coordinator: Prof. Stefania Bandini ACADEMIC YEAR 2016/2017 CONTENTS 1 introduction1 1.1 Main contributions of the thesis . 2 1.2 Research Questions . 4 1.3 Publications . 5 1.3.1 Published papers . 5 1.3.2 Submitted papers . 6 1.3.3 To be submitted papers . 6 1.3.4 Published papers not strictly related to the thesis . 6 2 related work8 2.1 Architectural smells definitions . 8 2.2 Architectural smells detection . 19 2.3 Code smells and architectural smells correlations . 20 2.3.1 Code smells correlations . 20 2.3.2 Code smells and architectural smell correlations . 21 2.4 Studies on software quality prediction and evolution . 22 2.4.1 Studies on software quality prediction . 22 2.4.2 Studies on software quality evolution . 23 2.5 Technical Debt Indexes . 23 2.5.1 CAST . 24 2.5.2 inFusion . 25 2.5.3 Sonargraph . 26 2.5.4 SonarQube . 26 2.5.5 Structure101 .......................... 27 2.6 Architectural smell refactoring . 27 3 experience reports on the detection of architectu- ral issues through different tools 29 3.1 Tools for evaluating code and architectural issues . 29 3.2 Tool support for evaluating architectural debt . 31 3.2.1 Evaluating the results inspection of the tools . 32 3.2.2 Evaluating the extracted data by the tools . 35 3.3 Detecting and repairing software architecture erosion . 38 3.3.1 Detecting and Repairing Design Erosion with Sonar- graph . 39 3.3.2 Detecting and Repairing Design Erosion with Struc- ture101 ............................. 45 3.3.3 Discussion and Lessons Learned . 50 3.4 The impact evaluation of architectural problems refactoring . 52 3.4.1 Study Setup . 53 3.4.2 Results . 55 3.4.3 Observations on the results . 59 3.4.4 Threats of Validity . 59 3.5 Conclusions . 61 3.5.1 First Study of Section 3.2 .................. 61 contents iii 3.5.2 Second Study of Section 3.3 ................. 62 3.5.3 Third Study of Section 3.4 .................. 63 4 architectural smell detection through arcan 65 4.1 Architecture of Arcan . 65 4.2 Architectural Smells . 68 4.2.1 Cyclic Dependency (CD) . 68 4.2.2 Unstable Dependency (UD) . 69 4.2.3 Hub-Like Dependency (HL) . 70 4.2.4 Specification-Implementation Violation (SIV) . 71 4.2.5 Multiple Architectural Smell (MAS) . 72 4.2.6 Implicit Cross Package Dependency (ICPD) . 72 4.3 Conclusions . 74 5 evaluation and validation of arcan results 76 5.1 Initial evaluation of Arcan results . 76 5.1.1 Unstable Dependency Smell Results . 77 5.1.2 Hub-Like Results . 78 5.1.3 Cyclic Dependency Results . 80 5.1.4 Implicit Cross Package Dependency Results . 80 5.2 Architectural Smells Validation: An Industrial Case Study . 83 5.3 Architectural Smells Validation: A Mixed-Method Study . 85 5.3.1 Study Variables and Data Extraction . 86 5.3.2 Data Extracted to Answer the Research Questions . 89 5.3.3 Empirical Study Results . 89 5.3.4 Threats to Validity . 90 5.4 Conclusions . 91 6 empirical analysis with arcan 93 6.1 Architectural Smell Prediction and Evolution . 93 6.1.1 Definition and setup of the case study . 95 6.1.2 Results . 98 6.1.3 Threats to validity . 106 6.1.4 Conclusions . 107 6.2 Code smells and Architectural smells correlations . 108 6.2.1 Background . 109 6.2.2 Case Study Design . 112 6.2.3 Data Collection . 113 6.2.4 Data Analysis . 115 6.2.5 Results . 116 6.2.6 Discussion . 120 6.2.7 Lessons Learned . 123 6.2.8 Threads to Validity . 124 6.2.9 Conclusions . 125 7 proposal of a new architectural debt index 126 7.1 Discussion on the main TDI features . 126 7.2 A new Architectural Debt Index . 129 7.3 Which Index should be defined? . 129 7.4 An Architectural Debt Index . 131 7.4.1 ASIS . 132 contents iv 7.4.2 The ASIS evaluation . 134 7.4.3 History . 138 7.4.4 Architectural Debt Index evaluation . 139 7.4.5 Architectural Debt Index Profiles . 141 7.5 Conclusion . 142 8 architectural smells refactoring: a preliminary study144 8.1 Set up of the Study . 145 8.1.1 Analyzed Projects . 145 8.1.2 Data collection . 146 8.2 Architectural Smells Refactoring Results . 147 8.2.1 Refactoring Results of the Hub-Like Dependency (HL) smell . 149 8.2.2 Refactoring Results of the Unstable Dependency (UD) smell . 149 8.2.3 Refactoring Results of the Cyclic Dependency (CD) smell150 8.2.4 Refactoring Results of all the three AS . 151 8.2.5 Impact of Refactoring on Quality indexes . 152 8.3 Discussion . 152 8.3.1 Hints on how to refactor the AS . 152 8.3.2 Hints on how to improve the AS detection through Arcan153 8.3.3 Hints on which AS remove first . 154 8.4 Threats to Validity . 154 8.4.1 Threats to Internal Validity . 154 8.4.2 Threats to External Validity . 155 8.4.3 Threats to Conclusion Validity . 155 8.5 Conclusions Remarks . 155 9 conclusions and future developments 157 9.1 Conclusions . 157 9.2 Future Developments . 158 bibliography 161 a appendix2 a.1 The analyzed projects . 2 LISTOFFIGURES Figure 3.1 Sonargraph - package exploration view . 33 Figure 3.2 inFusion - SAP Breaker instance inspection . 34 Figure 3.3 Sonargraph - Cycles view . 36 Figure 3.4 SonarQube - dependency matrix among non-maven modules . 37 Figure 3.5 Sonargraph - Example of cycle to cut in the Cycles View on RepoFinder . 39 Figure 3.6 Sonargraph - Exploration view on RepoFinder . 40 Figure 3.7 Sonargraph - Architecture Violations in RepoFinder . 42 Figure 3.8 Structure101 - Structural over-complexity view on Ant 46 Figure 3.9 Structure101 - Exploration View on Ant . 46 Figure 3.10 Structure101 - Architecture Diagram of Ant . 47 Figure 3.11 S101 - Structural over-Complexity . 60 Figure 4.1 Arcan Architecture . 66 Figure 4.2 Example of Git history log . 67 Figure 4.3 Cycles shapes [2]...................... 69 Figure 4.4 Hub-Like Dependency smell example (on classes) . 71 Figure 4.5 Specific-Implementation Violation example . 72 Figure 4.6 Implicit Cross Package Dependency example . 73 Figure 5.1 Filtered Unstable Dependency Results . 78 Figure 5.2 Example of Junit Hub-Like class . 79 Figure 5.3 Commit of Tomcat by month and number of modified Java files . 81 Figure 5.4 Max ICPD of Tomcat by month . 81 Figure 5.5 Commit of JGit by month and number of modified Java files . 81 Figure 5.6 Max ICPD of JGit by month . 82 Figure 5.7 ICPD trend changes for Support and Strength in Tomcat 82 Figure 5.8 ICPD trend changes for Support and Strength in JGit . 83 Figure 5.9 Architecture Issues in the Dataset, a pie chart. 90 Figure 6.1 Evolution of architectural smells of Commons-Math by month . 104 Figure 6.2 Evolution of architectural smells of JGit by month . 105 Figure 6.3 Evolution of architectural smells of JUnit by month . 105 Figure 6.4 Evolution of architectural smells of Tomcat by month . 105 Figure 6.5 Data Collection Process . 114 Figure 6.6 Number of packages infected by code smells or archi- tectural smells . 117 Figure 6.7 Number of code smells and architectural smells per package . 118 Figure 7.1 ADI workflow . 131 Figure 7.2 Example of AS evolution . 138 list of figures vi Figure 7.3 Evolution of ADI by project . 139 Figure 8.1 Sonargraph indexes . 153 LISTOFTABLES Table 3.1 Tools feature overview . 30 Table 3.2 Sonargraph results overview . 36 Table 3.3 SonarQube results overview . 37 Table 3.4 Sonargraph - Extracted metrics . 40 Table 3.5 RepoFinder components definition patterns . 41 Table 3.6 Ant components definition patterns . 43 Table 3.7 Structure101 - Detected Tangles . 48 Table 3.8 Density of tangled and fat item in Ant . ..
Recommended publications
  • Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects
    Empirical Study on the Usage of Graph Query Languages in Open Source Java Projects Philipp Seifer Johannes Härtel Martin Leinberger University of Koblenz-Landau University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Software Languages Team Institute WeST Koblenz, Germany Koblenz, Germany Koblenz, Germany [email protected] [email protected] [email protected] Ralf Lämmel Steffen Staab University of Koblenz-Landau University of Koblenz-Landau Software Languages Team Koblenz, Germany Koblenz, Germany University of Southampton [email protected] Southampton, United Kingdom [email protected] Abstract including project and domain specific ones. Common applica- Graph data models are interesting in various domains, in tion domains are management systems and data visualization part because of the intuitiveness and flexibility they offer tools. compared to relational models. Specialized query languages, CCS Concepts • General and reference → Empirical such as Cypher for property graphs or SPARQL for RDF, studies; • Information systems → Query languages; • facilitate their use. In this paper, we present an empirical Software and its engineering → Software libraries and study on the usage of graph-based query languages in open- repositories. source Java projects on GitHub. We investigate the usage of SPARQL, Cypher, Gremlin and GraphQL in terms of popular- Keywords Empirical Study, GitHub, Graphs, Query Lan- ity and their development over time. We select repositories guages, SPARQL, Cypher, Gremlin, GraphQL based on dependencies related to these technologies and ACM Reference Format: employ various popularity and source-code based filters and Philipp Seifer, Johannes Härtel, Martin Leinberger, Ralf Lämmel, ranking features for a targeted selection of projects.
    [Show full text]
  • Apache Oozie Apache Oozie Get a Solid Grounding in Apache Oozie, the Workflow Scheduler System for “In This Book, the Managing Hadoop Jobs
    Apache Oozie Apache Oozie Apache Get a solid grounding in Apache Oozie, the workflow scheduler system for “In this book, the managing Hadoop jobs. In this hands-on guide, two experienced Hadoop authors have striven for practitioners walk you through the intricacies of this powerful and flexible platform, with numerous examples and real-world use cases. practicality, focusing on Once you set up your Oozie server, you’ll dive into techniques for writing the concepts, principles, and coordinating workflows, and learn how to write complex data pipelines. tips, and tricks that Advanced topics show you how to handle shared libraries in Oozie, as well developers need to get as how to implement and manage Oozie’s security capabilities. the most out of Oozie. ■ Install and confgure an Oozie server, and get an overview of A volume such as this is basic concepts long overdue. Developers ■ Journey through the world of writing and confguring will get a lot more out of workfows the Hadoop ecosystem ■ Learn how the Oozie coordinator schedules and executes by reading it.” workfows based on triggers —Raymie Stata ■ Understand how Oozie manages data dependencies CEO, Altiscale ■ Use Oozie bundles to package several coordinator apps into Oozie simplifies a data pipeline “ the managing and ■ Learn about security features and shared library management automating of complex ■ Implement custom extensions and write your own EL functions and actions Hadoop workloads. ■ Debug workfows and manage Oozie’s operational details This greatly benefits Apache both developers and Mohammad Kamrul Islam works as a Staff Software Engineer in the data operators alike.” engineering team at Uber.
    [Show full text]
  • Unravel Data Systems Version 4.5
    UNRAVEL DATA SYSTEMS VERSION 4.5 Component name Component version name License names jQuery 1.8.2 MIT License Apache Tomcat 5.5.23 Apache License 2.0 Tachyon Project POM 0.8.2 Apache License 2.0 Apache Directory LDAP API Model 1.0.0-M20 Apache License 2.0 apache/incubator-heron 0.16.5.1 Apache License 2.0 Maven Plugin API 3.0.4 Apache License 2.0 ApacheDS Authentication Interceptor 2.0.0-M15 Apache License 2.0 Apache Directory LDAP API Extras ACI 1.0.0-M20 Apache License 2.0 Apache HttpComponents Core 4.3.3 Apache License 2.0 Spark Project Tags 2.0.0-preview Apache License 2.0 Curator Testing 3.3.0 Apache License 2.0 Apache HttpComponents Core 4.4.5 Apache License 2.0 Apache Commons Daemon 1.0.15 Apache License 2.0 classworlds 2.4 Apache License 2.0 abego TreeLayout Core 1.0.1 BSD 3-clause "New" or "Revised" License jackson-core 2.8.6 Apache License 2.0 Lucene Join 6.6.1 Apache License 2.0 Apache Commons CLI 1.3-cloudera-pre-r1439998 Apache License 2.0 hive-apache 0.5 Apache License 2.0 scala-parser-combinators 1.0.4 BSD 3-clause "New" or "Revised" License com.springsource.javax.xml.bind 2.1.7 Common Development and Distribution License 1.0 SnakeYAML 1.15 Apache License 2.0 JUnit 4.12 Common Public License 1.0 ApacheDS Protocol Kerberos 2.0.0-M12 Apache License 2.0 Apache Groovy 2.4.6 Apache License 2.0 JGraphT - Core 1.2.0 (GNU Lesser General Public License v2.1 or later AND Eclipse Public License 1.0) chill-java 0.5.0 Apache License 2.0 Apache Commons Logging 1.2 Apache License 2.0 OpenCensus 0.12.3 Apache License 2.0 ApacheDS Protocol
    [Show full text]
  • Persisting Big-Data the Nosql Landscape
    Information Systems 63 (2017) 1–23 Contents lists available at ScienceDirect Information Systems journal homepage: www.elsevier.com/locate/infosys Persisting big-data: The NoSQL landscape Alejandro Corbellini n, Cristian Mateos, Alejandro Zunino, Daniela Godoy, Silvia Schiaffino ISISTAN (CONICET-UNCPBA) Research Institute1, UNICEN University, Campus Universitario, Tandil B7001BBO, Argentina article info abstract Article history: The growing popularity of massively accessed Web applications that store and analyze Received 11 March 2014 large amounts of data, being Facebook, Twitter and Google Search some prominent Accepted 21 July 2016 examples of such applications, have posed new requirements that greatly challenge tra- Recommended by: G. Vossen ditional RDBMS. In response to this reality, a new way of creating and manipulating data Available online 30 July 2016 stores, known as NoSQL databases, has arisen. This paper reviews implementations of Keywords: NoSQL databases in order to provide an understanding of current tools and their uses. NoSQL databases First, NoSQL databases are compared with traditional RDBMS and important concepts are Relational databases explained. Only databases allowing to persist data and distribute them along different Distributed systems computing nodes are within the scope of this review. Moreover, NoSQL databases are Database persistence divided into different types: Key-Value, Wide-Column, Document-oriented and Graph- Database distribution Big data oriented. In each case, a comparison of available databases
    [Show full text]
  • Assessment of Multiple Ingest Strategies for Accumulo Key-Value Store
    Assessment of Multiple Ingest Strategies for Accumulo Key-Value Store by Hai Pham A thesis submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Master of Science Auburn, Alabama May 7, 2016 Keywords: Accumulo, noSQL, ingest Copyright 2016 by Hai Pham Approved by Weikuan Yu, Co-Chair, Associate Professor of Computer Science, Florida State University Saad Biaz, Co-Chair, Professor of Computer Science and Software Engineering, Auburn University Sanjeev Baskiyar, Associate Professor of Computer Science and Software Engineering, Auburn University Abstract In recent years, the emergence of heterogeneous data, especially of the unstructured type, has been extremely rapid. The data growth happens concurrently in 3 dimensions: volume (size), velocity (growth rate) and variety (many types). This emerging trend has opened a new broad area of research, widely accepted as Big Data, which focuses on how to acquire, organize and manage huge amount of data effectively and efficiently. When coping with such Big Data, the traditional approach using RDBMS has been inefficient; because of this problem, a more efficient system named noSQL had to be created. This thesis will give an overview knowledge on the aforementioned noSQL systems and will then delve into a more specific instance of them which is Accumulo key-value store. Furthermore, since Accumulo is not designed with an ingest interface for users, this thesis focuses on investigating various methods for ingesting data, improving the performance and dealing with numerous parameters affecting this process. ii Acknowledgments First and foremost, I would like to express my profound gratitude to Professor Yu who with great kindness and patience has guided me through not only every aspect of computer science research but also many great directions towards my personal issues.
    [Show full text]
  • Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin
    Towards an Integrated Graph Algebra for Graph Pattern Matching with Gremlin Harsh Thakkar1, S¨orenAuer1;2, Maria-Esther Vidal2 1 Smart Data Analytics Lab (SDA), University of Bonn, Germany 2 TIB & Leibniz University of Hannover, Germany [email protected], [email protected] Abstract. Graph data management (also called NoSQL) has revealed beneficial characteristics in terms of flexibility and scalability by differ- ently balancing between query expressivity and schema flexibility. This peculiar advantage has resulted into an unforeseen race of developing new task-specific graph systems, query languages and data models, such as property graphs, key-value, wide column, resource description framework (RDF), etc. Present-day graph query languages are focused towards flex- ible graph pattern matching (aka sub-graph matching), whereas graph computing frameworks aim towards providing fast parallel (distributed) execution of instructions. The consequence of this rapid growth in the variety of graph-based data management systems has resulted in a lack of standardization. Gremlin, a graph traversal language, and machine provide a common platform for supporting any graph computing sys- tem (such as an OLTP graph database or OLAP graph processors). In this extended report, we present a formalization of graph pattern match- ing for Gremlin queries. We also study, discuss and consolidate various existing graph algebra operators into an integrated graph algebra. Keywords: Graph Pattern Matching, Graph Traversal, Gremlin, Graph Alge- bra 1 Introduction Upon observing the evolution of information technology, we can observe a trend arXiv:1908.06265v2 [cs.DB] 7 Sep 2019 from data models and knowledge representation techniques being tightly bound to the capabilities of the underlying hardware towards more intuitive and natural methods resembling human-style information processing.
    [Show full text]
  • Licensing Information User Manual
    Oracle® Database Express Edition Licensing Information User Manual 18c E89902-02 February 2020 Oracle Database Express Edition Licensing Information User Manual, 18c E89902-02 Copyright © 2005, 2020, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or “commercial computer software documentation” pursuant to the applicable Federal
    [Show full text]
  • Large Scale Querying and Processing for Property Graphs Phd Symposium∗
    Large Scale Querying and Processing for Property Graphs PhD Symposium∗ Mohamed Ragab Data Systems Group, University of Tartu Tartu, Estonia [email protected] ABSTRACT Recently, large scale graph data management, querying and pro- cessing have experienced a renaissance in several timely applica- tion domains (e.g., social networks, bibliographical networks and knowledge graphs). However, these applications still introduce new challenges with large-scale graph processing. Therefore, recently, we have witnessed a remarkable growth in the preva- lence of work on graph processing in both academia and industry. Querying and processing large graphs is an interesting and chal- lenging task. Recently, several centralized/distributed large-scale graph processing frameworks have been developed. However, they mainly focus on batch graph analytics. On the other hand, the state-of-the-art graph databases can’t sustain for distributed Figure 1: A simple example of a Property Graph efficient querying for large graphs with complex queries. Inpar- ticular, online large scale graph querying engines are still limited. In this paper, we present a research plan shipped with the state- graph data following the core principles of relational database systems [10]. Popular Graph databases include Neo4j1, Titan2, of-the-art techniques for large-scale property graph querying and 3 4 processing. We present our goals and initial results for querying ArangoDB and HyperGraphDB among many others. and processing large property graphs based on the emerging and In general, graphs can be represented in different data mod- promising Apache Spark framework, a defacto standard platform els [1]. In practice, the two most commonly-used graph data models are: Edge-Directed/Labelled graph (e.g.
    [Show full text]
  • HDP 3.1.4 Release Notes Date of Publish: 2019-08-26
    Release Notes 3 HDP 3.1.4 Release Notes Date of Publish: 2019-08-26 https://docs.hortonworks.com Release Notes | Contents | ii Contents HDP 3.1.4 Release Notes..........................................................................................4 Component Versions.................................................................................................4 Descriptions of New Features..................................................................................5 Deprecation Notices.................................................................................................. 6 Terminology.......................................................................................................................................................... 6 Removed Components and Product Capabilities.................................................................................................6 Testing Unsupported Features................................................................................ 6 Descriptions of the Latest Technical Preview Features.......................................................................................7 Upgrading to HDP 3.1.4...........................................................................................7 Behavioral Changes.................................................................................................. 7 Apache Patch Information.....................................................................................11 Accumulo...........................................................................................................................................................
    [Show full text]
  • Introduction to Graph Database with Cypher & Neo4j
    Introduction to Graph Database with Cypher & Neo4j Zeyuan Hu April. 19th 2021 Austin, TX History • Lots of logical data models have been proposed in the history of DBMS • Hierarchical (IMS), Network (CODASYL), Relational, etc • What Goes Around Comes Around • Graph database uses data models that are “spiritual successors” of Network data model that is popular in 1970’s. • CODASYL = Committee on Data Systems Languages Supplier (sno, sname, scity) Supply (qty, price) Part (pno, pname, psize, pcolor) supplies supplied_by Edge-labelled Graph • We assign labels to edges that indicate the different types of relationships between nodes • Nodes = {Steve Carell, The Office, B.J. Novak} • Edges = {(Steve Carell, acts_in, The Office), (B.J. Novak, produces, The Office), (B.J. Novak, acts_in, The Office)} • Basis of Resource Description Framework (RDF) aka. “Triplestore” The Property Graph Model • Extends Edge-labelled Graph with labels • Both edges and nodes can be labelled with a set of property-value pairs attributes directly to each edge or node. • The Office crew graph • Node �" has node label Person with attributes: <name, Steve Carell>, <gender, male> • Edge �" has edge label acts_in with attributes: <role, Michael G. Scott>, <ref, WiKipedia> Property Graph v.s. Edge-labelled Graph • Having node labels as part of the model can offer a more direct abstraction that is easier for users to query and understand • Steve Carell and B.J. Novak can be labelled as Person • Suitable for scenarios where various new types of meta-information may regularly
    [Show full text]
  • SAS 9.4 Hadoop Configuration Guide for Base SAS And
    SAS® 9.4 Hadoop Configuration Guide for Base SAS® and SAS/ACCESS® Second Edition SAS® Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS® 9.4 Hadoop Configuration Guide for Base SAS® and SAS/ACCESS®, Second Edition. Cary, NC: SAS Institute Inc. SAS® 9.4 Hadoop Configuration Guide for Base SAS® and SAS/ACCESS®, Second Edition Copyright © 2015, SAS Institute Inc., Cary, NC, USA All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others' rights is appreciated. U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a) and DFAR 227.7202-4 and, to the extent required under U.S.
    [Show full text]
  • The Query Translation Landscape: a Survey
    The Query Translation Landscape: a Survey Mohamed Nadjib Mami1, Damien Graux2,1, Harsh Thakkar3, Simon Scerri1, Sren Auer4,5, and Jens Lehmann1,3 1Enterprise Information Systems, Fraunhofer IAIS, St. Augustin & Dresden, Germany 2ADAPT Centre, Trinity College of Dublin, Ireland 3Smart Data Analytics group, University of Bonn, Germany 4TIB Leibniz Information Centre for Science and Technology, Germany 5L3S Research Center, Leibniz University of Hannover, Germany October 2019 Abstract Whereas the availability of data has seen a manyfold increase in past years, its value can be only shown if the data variety is effectively tackled —one of the prominent Big Data challenges. The lack of data interoperability limits the potential of its collective use for novel applications. Achieving interoperability through the full transformation and integration of diverse data structures remains an ideal that is hard, if not impossible, to achieve. Instead, methods that can simultaneously interpret different types of data available in different data structures and formats have been explored. On the other hand, many query languages have been designed to enable users to interact with the data, from relational, to object-oriented, to hierarchical, to the multitude emerging NoSQL languages. Therefore, the interoperability issue could be solved not by enforcing physical data transformation, but by looking at techniques that are able to query heterogeneous sources using one uniform language. Both industry and research communities have been keen to develop such techniques, which require the translation of a chosen ’universal’ query language to the various data model specific query languages that make the underlying data accessible. In this article, we survey more than forty query translation methods and tools for popular query languages, and classify them according to eight criteria.
    [Show full text]