Aalborg Universitet Modeling, Annotating, and Querying Geo

Aalborg Universitet Modeling, Annotating, and Querying Geo-Semantic Data Warehouses Gür, Nurefsan Publication date: 2020 Document Version Publisher's PDF, also known as Version of record Link to publication from Aalborg University Citation for published version (APA): Gür, N. (2020). Modeling, Annotating, and Querying Geo-Semantic Data Warehouses. Aalborg Universitetsforlag. Ph.d.-serien for Det Tekniske Fakultet for IT og Design, Aalborg Universitet General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. ? Users may download and print one copy of any publication from the public portal for the purpose of private study or research. ? You may not further distribute the material or use it for any profit-making activity or commercial gain ? You may freely distribute the URL identifying the publication in the public portal ? Take down policy If you believe that this document breaches copyright please contact us at [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from vbn.aau.dk on: October 04, 2021 Modeli NG , , aNN ModeliNG, aNNotatiNG, otati aNd QueryiNG Geo-SeMaNtic NG data WarehouSeS , a , N d Queryi d by NurefşaN Gür NG DISSERTATION SUBMITTED 2020 Geo-Se M a N tic tic d ata Warehou ata S e S Nurefşa N Gür IT BI D C Modeling, Annotating, and Querying Geo-Semantic Data Warehouses Ph.D. Dissertation Nuref¸sanGür Dissertation submitted January, 2020 A thesis submitted to the Technical Faculty of IT and Design at Aalborg Uni- versity (AAU) and the Faculty of Engineering at Université Libre de Bruxelles (ULB), in partial fulfilment of the requirements within the scope of the IT4BI- DC programme for the joint Ph.D. degree in Computer Science. The thesis is not submitted to any other organization at the same time. Dissertation submitted: January, 2020 AAU PhD Supervisor: Prof. Torben Bach Pedersen Aalborg University, Denmark AAU PhD Co-Supervisor: Prof. Katja Hose Aalborg University, Denmark ULB PhD Supervisor: Prof. Esteban Zimányi Université Libre de Bruxelles, Belgium PhD committee: Associate Professor Kristian Torp (chairman) Aalborg University Professor Lars Harrie GIS Centre, Lund University Researcher Sandro Bimonte National Research Institute of Science and Technology for Environment and Agriculture (IRSTEA) ULB PhD Committee: Assoc. Prof. Stijn Vansummeren Université Libre de Bruxelles, Belgium Assoc. Prof. Mahmoud Sakr Université Libre de Bruxelles, Belgium Assoc. Prof. Kristian Torp Aalborg University, Denmark Professor Lars Harrie GIS Centre, Lund University Dr. Sandro Bimonte, National Research Institute of Science and Technology for Environment and Agriculture - IRSTEA, France PhD Series: Technical Faculty of IT and Design, Aalborg University Department: Department of Computer Science ISSN (online): 2446-1628 ISBN (online): 978-87-7210-587-1 Published by: Aalborg University Press Langagervej 2 | DK – 9220 Aalborg Ø Phone: +45 99407140 | [email protected] | forlag.aau.dk © Copyright: Nurefşan Gür Printed in Denmark by Rosendahls, 2020 Abstract Due to recent advances in Semantic Web (SW) technologies and world-wide movement on publishing Linked Open Data (LOD) by following a set of principles on the SW, many governmental organizations and public agencies have been publishing large volumes of geospatial data on the SW. This large amount of spatial data on the SW yields a need for advanced analysis. In the traditional database world, data warehouses and On-Line Analytical Process- ing (OLAP) is proven to be a well constructed and efficient way of storing and querying massive amounts of data with advanced analytical perspectives. Current state-of-the-art SW technologies support only multi-dimensional annotation of data warehouses and querying with traditional OLAP operators over non-spatial data. To reveal new perspectives over SW data, it is impor- tant to utilize the spatial information of the data sets being published, while modeling and querying the (SW) data warehouses. Even though there are many studies around the geospatial SW, none of them addresses the multi- dimensional aspects of spatial data being published or support querying with spatial OLAP (SOLAP) operators. We believe a large number of spatial data sets on the SW can be utilized with their full potential by addressing the lack of methods, tools, and applications of spatial data warehouses on the SW. Therefore the thesis addresses several challenges related to geo-semantic data warehouses to model, query, and derive analytical perspectives over a huge amount of geo-semantic data with SOLAP operators and spatial multi- dimensional enrichment of SW data. This thesis first presents the best practices for publishing Danish governmental data sets from agricultural, spatial, and business domains on the SW as an initial effort to publish open governmental data as Linked Open Data in Denmark. The published LOD endpoint is queried with standard and aggregate query templates and evaluated against three different query processing scenarios with native RDF, relational and virtual strategies. Query perfor- mance for aggregated query templates on native RDF demonstrated signifi- cantly faster response time than the other strategies. The results establish a promising basis for geo-semantic data warehouses, as aggregate queries are fundamental in multi-dimensional models and in data warehousing. iii Next, the thesis proposes a multi-dimensional cube vocabulary for SO- LAP - QB4SOLAP as an initial step of a foundation for geo-semantic data warehouses. The thesis defines full formalizations of multi-dimensional spatial concepts from QB4SOLAP (such as spatial dimension hierarchies, spatial measures, aggregate functions, topological relations) in RDF format. By using the QB4SOLAP vocabulary, spatial data can be annotated and published on the SW with complete spatial multi-dimensional concepts, which were not available before. Spatial multi-dimensional annotation of RDF data with QB4SOLAP overcomes several limitations of spatial analytical querying of SW data with a set of SOLAP operators (e.g., s-roll-up and s-slice), where SO- LAP operators reveal new analytical perspectives to the users. SOLAP operators create a dynamic interpretation of the multi-dimensional concepts (i.e., dynamic spatial hierarchy) by employing spatial functions (e.g., distance) or topological relations (e.g., within). Due to the fact that data warehouse queries typically involve nesting of SOLAP operators, which makes it almost impossible for non-SW-experts to formulate nested SOLAP queries in native RDF query language (SPARQL), the thesis proposes algorithms for translat- ing individual and nested SOLAP queries from high-level multi-dimensional expressions into SPARQL query syntax. Furthermore, using the generation algorithms and formalizations of QB4- SOLAP and SOLAP operators, a geo-semantic OLAP tool for the SW - GeoSem- OLAP is introduced to remove the entry barrier for data warehouse users to query geo-semantic data warehouses without knowledge of RDF/SPARQL. The GeoSemOLAP tool is tested and demonstrated with a non-trivial spatial multi-dimensional use case, which is modeled with the QB4SOLAP vocabulary. By using the QB4SOLAP multi-dimensional schema definitions, GeoSemOLAP parses the use case data with corresponding spatial multi- dimensional concepts, such as spatial levels, spatial attributes, and spatial measures, which can be used as input parameters to the SOLAP operations from an intuitive and easy-to-use graphical user interface that is provided with interactive maps. The QB4SOLAP vocabulary is validated on several use cases including Danish governmental, environmental, agricultural, farming, and spatial data sets, which can be annotated with spatial multi-dimensional concepts of QB4SOLAP. By applying QB4SOLAP on complex real-world data sets from various organizations and resources, a comprehensive insight is brought on to spatial and multi-dimensional modeling of governmental data with best practices, discussions, and perspectives. In order to exploit the data with new analytical perspectives that were not available before, the use cases are queried with a set of common (individual and nested) SOLAP operators in SPARQL. Finally, the thesis addresses the lacking automated methods for spatial multi-dimensional annotations on the Semantic Web and proposes an enrichment framework namely RDF2SOLAP to enrich the existing RDF data cubes. The thesis proposes a set of hierarchical enrichment algorithms and a set of factual enrichment algorithms within the scope of the RDF2SOLAP enrichment process. In each set, there are algorithms for detecting explicit relations and discovering implicit relations between spatial level members (hierarchical enrichment) and between fact-level members (factual enrichment). Moreover, in factual enrichment, the fact schema is automatically redefined from the outcome of the instance level enrichment algorithms over fact members and level members. RDF2SOLAP enrichment process allows spatial data warehouse users to query the existing spatial RDF endpoints with SOLAP operators without needing to download, convert, model and store the data in an offline spatial data warehouse. Experimental evaluation results show that the proposed framework demonstrates

Load more