Query Processing in Spatial Network Databases

Total Page:16

File Type:pdf, Size:1020Kb

Query Processing in Spatial Network Databases Query Processing in Spatial Network Databases Dimitris Papadias Jun Zhang Department of Computer Science Department of Computer Science Hong Kong University of Science and Technology Hong Kong University of Science and Technology Clearwater Bay, Hong Kong Clearwater Bay, Hong Kong [email protected] [email protected] Nikos Mamoulis Yufei Tao Department of Computer Science and Information Systems Department of Computer Science University of Hong Kong City University of Hong Kong Pokfulam Road, Hong Kong Tat Chee Avenue, Hong Kong [email protected] [email protected] than their Euclidean distance. Previous work on spatial Abstract network databases (SNDB) is scarce and too restrictive Despite the importance of spatial networks in for emerging applications such as mobile computing and real-life applications, most of the spatial database location-based commerce. This necessitates the literature focuses on Euclidean spaces. In this development of novel and comprehensive query paper we propose an architecture that integrates processing methods for SNDB. network and Euclidean information, capturing Every conventional spatial query type (e.g., nearest pragmatic constraints. Based on this architecture, neighbors, range search, spatial joins and closest pairs) we develop a Euclidean restriction and a has a counterpart in SNDB. Consider, for instance, the network expansion framework that take road network of Figure 1.1, where the rectangles advantage of location and connectivity to correspond to hotels. If a user at location q poses the efficiently prune the search space. These range query "find the hotels within a 15km range", the frameworks are successfully applied to the most result will contain a, b and c (the numbers in the figure popular spatial queries, namely nearest correspond to network distance). Similarly, a nearest neighbors, range search, closest pairs and e- neighbor query will return hotel b. Note that the results of distance joins, in the context of spatial network the corresponding conventional queries are different (e.g., databases. the Euclidean nearest neighbor is d, which is actually the farthest hotel in the network). Furthermore, queries may 1. Introduction combine both location and network aspects, such as "find Spatial databases have been well studied in the last 20 the nearest hotel to the south" (e.g., hotel a). years resulting in the development of numerous conceptual models, multi-dimensional indexes and query processing techniques [RSV02]. Surprisingly, most of existing work considers Cartesian (typically, Euclidean) spaces, where the distance between two objects is determined solely by their relative position in space. However, in practice, objects can usually move only on a pre-defined set of trajectories as specified by the underlying network (road, railway, river etc.). Thus, the important measure is the network distance, i.e., the length Figure 1.1: Road network query example of the shortest trajectory connecting two objects, rather A crucial pre-requisite for solving these queries in SNDB is a realistic architecture, which captures spatial entities Permission to copy without fee all or part of this material is granted (e.g., hotels) and the underlying network, preserving both provided that the copies are not made or distributed for direct Cartesian co-ordinates and connectivity. In addition, this commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by architecture must take into account real-life constraints, permission of the Very Large Data Base Endowment. To copy such as unidirectional roads, “off-network” (but still otherwise, or to republish, requires a fee and/or special permission from reachable) entities, etc. Furthermore, although the the Endowment th network is almost static, the entities may change with Proceedings of the 29 VLDB Conference, high frequencies (e.g., a new/existing hotel opens/closes). Berlin, Germany, 2003 It is also possible that entire entity sets are added as more R-trees (like most spatial access methods) were motivated information or services become available (e.g., a new by the need to efficiently process range queries, where "restaurant" dataset is incorporated in the system). the range usually corresponds to a rectangular window or In this paper we propose a flexible architecture for SNDB, a circular area around a query point. The R-tree answers by separating the network from the entity datasets. In the query q (shaded area) in Figure 2.1 as follows. The particular, we employ a disk-based network representation root is first retrieved and the entries (e.g., E1, E2) that that preserves connectivity and location, while spatial intersect the range are recursively searched because they entities are indexed by respective spatial access methods may contain qualifying points. Non-intersecting entries for supporting Euclidean queries and dynamic updates. (e.g., E3) are skipped. Note that for non-point data (e.g., Using this architecture, we develop two frameworks, lines, polygons), the R-tree provides just a filter step to Euclidean restriction and network expansion, for prune non-qualifying objects. The output of this phase has processing the most common spatial queries, namely to pass through a refinement step that examines the actual nearest neighbors, range search, closest pairs and distance object representation to determine the actual result. The joins. The resulting algorithms expand conventional concept of filter and refinement steps applies to all spatial processing techniques by integrating connectivity and queries on non-point objects. location information for efficient pruning of the search A nearest neighbor (NN) query retrieves the (k≥1) data space. To the best of our knowledge, this is the first work point(s) closest to a query point q. The R-tree NN dealing with the efficient processing of spatial queries in algorithm proposed in [HS99] keeps a heap with the SNDB. entries of the nodes visited so far. Initially, the heap The rest of the paper is organized as follows: Section 2 contains the entries of the root sorted according to their overviews related work. Section 3 describes our minimum distance (mindist) from q. The entry with the architecture and its application in real-life scenarios. minimum mindist in the heap (E1 in Figure 2.1) is Sections 4 and 5 present algorithms for nearest neighbor expanded, i.e., it is removed from the heap and its and range search queries, respectively, while Sections 6 children (E3, E4, E5) are added together with their mindist. and 7 discuss closest pairs and spatial joins. Section 8 The next entry visited is E2 (its mindist is currently the evaluates the proposed techniques with comprehensive minimum in the heap), followed by E6, where the actual experiments, and Section 9 concludes the paper with a result (h) is found and the algorithm terminates, because discussion. the mindist of all entries in the heap is greater than the distance of h. The algorithm can be easily extended for 2. Related Work the retrieval of k nearest neighbors (kNN). Furthermore, it In this section we overview previous work related to is optimal (it visits only the nodes necessary for obtaining location (Section 2.1) and connectivity (Section 2.2) the nearest neighbors) and incremental, i.e., it reports representation and processing in databases. neighbors in ascending order of their distance to the query point, and can be applied when the number k of nearest 2.1 Spatial Query Processing in the Euclidean Space neighbors to be retrieved is not known in advance. R-trees [G84, SRF87, BKSS90] are the most popular An intersection join retrieves all intersecting object pairs indexes for Euclidean query processing due to their (s,t) from two datasets S and T. If both S and T are simplicity and efficiency. The R-tree can be viewed as a indexed by R-trees, the R-tree join algorithm [BKS93] multi-dimensional extension of the B-tree. Figure 2.1 traverses synchronously the two trees, following entry shows an exemplary R-tree for a set of points {a,b,…,j} pairs that overlap; non-intersecting pairs cannot lead to assuming a capacity of three entries per node. Points that solutions at the lower levels. Several spatial join are close in space (e.g., a,b) are clustered in the same leaf algorithms have been proposed for the case where only node (E ) represented as a minimum bounding rectangle 3 one of the inputs is indexed by an R-tree or no input is (MBR). Nodes are then recursively grouped together indexed [RSV02]. For point datasets, where intersection following the same principle until the top level, which joins are meaningless, the corresponding problem is the e- consists of a single root. distance join, which finds all pairs of objects (s,t) s ∈ S, t mindist( E 1 ) 10 ∈ T within (Euclidean) distance e from each other. R-tree i E 7 f E1 j Root join can be applied in this case as well, the only difference 8 E E E e 6 g 1 2 being that a pair of intermediate entries is followed if their E2 d E5 h 6 E E distance is below (or equal to) e. The intersection join can E 1 E E E 2 E E 4 3 4 5 6 7 q mindist( E 2 ) be considered as a special case of the e-distance join, 4 c = mindist( E 6 ) E3 where e=0. a a b c d e f g h i j 2 E E E E E b 3 4 5 6 7 Finally, a closest-pairs query outputs the (k≥1) pairs of mindist( E ) 3 objects (s,t) s ∈ S, t ∈ T with the smallest (Euclidean) 0 2 4 6 8 10 distance. The algorithms for processing such queries Figure 2.1: An R-tree example [CMTV00] combine spatial joins with nearest neighbor search.
Recommended publications
  • Oracle Database 10G: Oracle Spatial Network Data Model
    Oracle Database 10g: Oracle Spatial Network Data Model An Oracle Technical White Paper May 2005 Oracle Spatial Network Data Model Table of Contents Introduction ....................................................................................................... 3 Design Goals and Architecture ....................................................................... 4 Major Steps Using the Network Data Model................................................ 5 Network Modeling........................................................................................ 5 Network Analysis.......................................................................................... 6 Network Modeling ............................................................................................ 6 Metadata and Schema ....................................................................................... 6 Network Java Representation and API.......................................................... 7 Network Creation Using SQL and PL/SQL ................................................ 8 Creating a Logical Network......................................................................... 9 Creating a Spatial Network.......................................................................... 9 Creating an SDO Geometry Network ................................................ 10 Creating an LRS Geometry Network.................................................. 10 Creating a Topology Geometry Network........................................... 11 Network Creation
    [Show full text]
  • A Network-Of-Networks Percolation Analysis of Cascading Failures in Spatially Co-Located Road-Sewer Infrastructure Networks
    Physica A 538 (2020) 122971 Contents lists available at ScienceDirect Physica A journal homepage: www.elsevier.com/locate/physa A network-of-networks percolation analysis of cascading failures in spatially co-located road-sewer infrastructure networks ∗ ∗ Shangjia Dong a, , Haizhong Wang a, , Alireza Mostafizi a, Xuan Song b a School of Civil and Construction Engineering, Oregon State University, Corvallis, OR 97331, United States of America b Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China article info a b s t r a c t Article history: This paper presents a network-of-networks analysis framework of interdependent crit- Received 26 September 2018 ical infrastructure systems, with a focus on the co-located road-sewer network. The Received in revised form 28 May 2019 constructed interdependency considers two types of node dynamics: co-located and Available online 27 September 2019 multiple-to-one dependency, with different robustness metrics based on their function Keywords: logic. The objectives of this paper are twofold: (1) to characterize the impact of the Co-located road-sewer network interdependency on networks' robustness performance, and (2) to unveil the critical Network-of-networks percolation transition threshold of the interdependent road-sewer network. The results Percolation modeling show that (1) road and sewer networks are mutually interdependent and are vulnerable Infrastructure interdependency to the cascading failures initiated by sewer system disruption; (2) the network robust- Cascading failure ness decreases as the number of initial failure sources increases in the localized failure scenarios, but the rate declines as the number of failures increase; and (3) the sewer network contains two types of links: zero exposure and severe exposure to liquefaction, and therefore, it leads to a two-phase percolation transition subject to the probabilistic liquefaction-induced failures.
    [Show full text]
  • Planarity As a Driver of Spatial Network Structure
    DOI: http://dx.doi.org/10.7551/978-0-262-33027-5-ch075 Planarity as a driver of Spatial Network structure Garvin Haslett and Markus Brede Institute for Complex Systems Simulation, School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, United Kingdom [email protected] Downloaded from http://direct.mit.edu/isal/proceedings-pdf/ecal2015/27/423/1903775/978-0-262-33027-5-ch075.pdf by guest on 23 September 2021 Abstract include city science (Cardillo et al., 2006; Xie and Levin- son, 2007; Jiang, 2007; Barthelemy´ and Flammini, 2008; In this paper we introduce a new model of spatial network growth in which nodes are placed at randomly selected lo- Masucci et al., 2009; Chan et al., 2011; Courtat et al., 2011; cations in space over time, forming new connections to old Strano et al., 2012; Levinson, 2012; Rui et al., 2013; Gud- nodes subject to the constraint that edges do not cross. The mundsson and Mohajeri, 2013), electronic circuits (i Can- resulting network has a power law degree distribution, high cho et al., 2001; Bassett et al., 2010; Miralles et al., 2010; clustering and the small world property. We argue that these Tan et al., 2014), wireless networks (Huson and Sen, 1995; characteristics are a consequence of two features of our mech- anism, growth and planarity conservation. We further pro- Lotker and Peleg, 2010), leaf venation (Corson, 2010; Kat- pose that our model can be understood as a variant of ran- ifori et al., 2010), navigability (Kleinberg, 2000; Lee and dom Apollonian growth.
    [Show full text]
  • Arxiv:2006.02870V1 [Cs.SI] 4 Jun 2020
    The why, how, and when of representations for complex systems Leo Torres Ann S. Blevins [email protected] [email protected] Network Science Institute, Department of Bioengineering, Northeastern University University of Pennsylvania Danielle S. Bassett Tina Eliassi-Rad [email protected] [email protected] Department of Bioengineering, Network Science Institute and University of Pennsylvania Khoury College of Computer Sciences, Northeastern University June 5, 2020 arXiv:2006.02870v1 [cs.SI] 4 Jun 2020 1 Contents 1 Introduction 4 1.1 Definitions . .5 2 Dependencies by the system, for the system 6 2.1 Subset dependencies . .7 2.2 Temporal dependencies . .8 2.3 Spatial dependencies . 10 2.4 External sources of dependencies . 11 3 Formal representations of complex systems 12 3.1 Graphs . 13 3.2 Simplicial Complexes . 13 3.3 Hypergraphs . 15 3.4 Variations . 15 3.5 Encoding system dependencies . 18 4 Mathematical relationships between formalisms 21 5 Methods suitable for each representation 24 5.1 Methods for graphs . 24 5.2 Methods for simplicial complexes . 25 5.3 Methods for hypergraphs . 27 5.4 Methods and dependencies . 28 6 Examples 29 6.1 Coauthorship . 29 6.2 Email communications . 32 7 Applications 35 8 Discussion and Conclusion 36 9 Acknowledgments 38 10 Citation diversity statement 38 2 Abstract Complex systems thinking is applied to a wide variety of domains, from neuroscience to computer science and economics. The wide variety of implementations has resulted in two key challenges: the progenation of many domain-specific strategies that are seldom revisited or questioned, and the siloing of ideas within a domain due to inconsistency of complex systems language.
    [Show full text]
  • Urban Street Network Analysis in a Computational Notebook
    Urban Street Network Analysis in a Computational Notebook Geoff Boeing Department of Urban Planning and Spatial Analysis Sol Price School of Public Policy University of Southern California [email protected] Abstract: Computational notebooks offer researchers, practitioners, students, and educators the ability to interactively conduct analytics and disseminate re- producible workflows that weave together code, visuals, and narratives. This article explores the potential of computational notebooks in urban analytics and planning, demonstrating their utility through a case study of OSMnx and its tu- torials repository. OSMnx is a Python package for working with OpenStreetMap data and modeling, analyzing, and visualizing street networks anywhere in the world. Its official demos and tutorials are distributed as open-source Jupyter notebooks on GitHub. This article showcases this resource by documenting the repository and demonstrating OSMnx interactively through a synoptic tutorial adapted from the repository. It illustrates how to download urban data and model street networks for various study sites, compute network indicators, vi- sualize street centrality, calculate routes, and work with other spatial data such as building footprints and points of interest. Computational notebooks help in- troduce methods to new users and help researchers reach broader audiences in- terested in learning from, adapting, and remixing their work. Due to their utility and versatility, the ongoing adoption of computational notebooks in urban plan- ning, analytics, and related geocomputation disciplines should continue into the future.1 1 Introduction A traditional academic and professional divide has long existed between code creators and code users. The former would develop software tools and workflows for professional or research ap- plications, which the latter would then use to conduct analyses or answer scientific questions.
    [Show full text]
  • Experimenting with Spatial Networks to Save Time When Experimenting, It Can Be Useful to Create a Subset of the Airport Dataset
    Experimenting with Spatial Networks To save time when experimenting, it can be useful to create a subset of the airport dataset, for instance keeping only the airports of larger degree. Note also that when computing deterrence function, it is sometimes useful to work with random samples instead of computing distances between all pairs of nodes. This dataset is small enough however to compute all values in less than 1 minute on an average personal computer. To the best of my knowledge, there is no good library to work with spatial networks. networkx has functions to generate random spatial graphs (called geometric graphs https://networkx.org/ documentation/stable/reference/generators.html), but they are not designed to fit deterrence functions, and are not adapted to work with geographical coordinates. 1. Deterrence Function on the airport dataset When studying a network with spatial information, a first step to check if it can indeed be considered as a spatial network is to compute its deterrence function. (a) We will need to discretize distances using bins(similar to a histogram). Choose b bins, for instance every 500 km from 0km to 10,000km. Write a function which, provided a list of distances, return a list with b values, the number of values in each bin. A convenient way is to use numpy.digitize and collections.Counter . (b) Write a function which compute the distance between two positions on Earth in km. You can use function haversine from the package of the same name( pip install haversine ). To obtain standard latitude and longitude from the data in the airport dataset, you need to divide latitude and longitude values by 3600.
    [Show full text]
  • Osmnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks
    OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks Geoff Boeing Email: [email protected] Department of City and Regional Planning University of California, Berkeley May 2017 Abstract Urban scholars have studied street networks in various ways, but there are data availability and consistency limitations to the current urban planning/street network analysis literature. To address these challenges, this article presents OSMnx, a new tool to make the collection of data and creation and analysis of street networks simple, consistent, automatable and sound from the perspectives of graph theory, transportation, and urban design. OSMnx contributes five significant capabilities for researchers and practitioners: first, the automated downloading of political boundaries and building footprints; second, the tailored and automated downloading and constructing of street network data from OpenStreetMap; third, the algorithmic correction of network topology; fourth, the ability to save street networks to disk as shapefiles, GraphML, or SVG files; and fifth, the ability to analyze street networks, including calculating routes, projecting and visualizing networks, and calculating metric and topological measures. These measures include those common in urban design and transportation studies, as well as advanced measures of the structure and topology of the network. Finally, this article presents a simple case study using OSMnx to construct and analyze street networks in Portland, Oregon. This is a preprint. Download the published article at: http://geoffboeing.com/publications/osmnx-complex- street-networks/ Cite as: Boeing, G. 2017. “OSMnx: New Methods for Acquiring, Constructing, Analyzing, and Visualizing Complex Street Networks.” Computers, Environment and Urban Systems 65, 126-139. doi:10.1016/j.compenvurbsys.2017.05.004 BOEING 1.
    [Show full text]
  • Spatial Science and Network Science: Review and Outcomes of a Complex Relationship César Ducruet, Laurent Beauguitte
    Spatial science and network science: Review and outcomes of a complex relationship César Ducruet, Laurent Beauguitte To cite this version: César Ducruet, Laurent Beauguitte. Spatial science and network science: Review and outcomes of a complex relationship. Networks and Spatial Economics, Springer Verlag, 2014, 14 (3-4), pp.297-316. 10.1007/s11067-013-9222-6. halshs-01093664 HAL Id: halshs-01093664 https://halshs.archives-ouvertes.fr/halshs-01093664 Submitted on 10 Dec 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Spatial science and network science: Review and outcomes of a complex relationship DUCRUET César1 French National Centre for Scientific Research (CNRS) UMR 8054 Géographie-Cités, Paris, France Email: [email protected] BEAUGUITTE Laurent UMR 6266 IDEES, Rouen, France Email: [email protected] Pre-final version of the paper published in Networks and Spatial Economics, Vol. 14, Issue 3-4, pp. 297-316. Abstract For decades, the spatial approach to network analysis has principally focused on planar and technical networks from a classic graph theory perspective. Reference to models and methods developed by other disciplines on non-planar networks, such as sociology and physics, is recent, limited, and dispersed.
    [Show full text]
  • Improved Graph Neural Networks for Spatial Networks Using Structure-Aware Sampling
    International Journal of Geo-Information Article Improved Graph Neural Networks for Spatial Networks Using Structure-Aware Sampling Chidubem Iddianozie * and Gavin McArdle School of Computer Science, University College Dublin, Dublin 4, Ireland; [email protected] * Correspondence: [email protected] Received: 30 September 2020; Accepted: 11 November 2020; Published: 13 November 2020 Abstract: Graph Neural Networks (GNNs) have received wide acclaim in recent times due to their performance on inference tasks for unstructured data. Typically, GNNs operate by exploiting local structural information in graphs and disregarding their global structure. This is influenced by assumptions of homophily and unbiased class distributions. As a result, this could impede model performance on noisy real-world graphs such as spatial graphs where these assumptions may not be sufficiently held. In this article, we study the problem of graph learning on spatial graphs. Particularly, we focus on transductive learning methods for the imbalanced case. Given the nature of these graphs, we hypothesize that taking the global structure of the graph into account when aggregating local information would be beneficial especially with respect to generalisability. Thus, we propose a novel approach to training GNNs for these type of graphs. We achieve this through a sampling technique: Structure-Aware Sampling (SAS), which leverages the intra-class and global-geodesic distances between nodes. We model the problem as a node classification one for street networks with high variance between class sizes. We evaluate our approach using large real-world graphs against state-of-the-art methods. In the majority of cases, our approach outperforms traditional methods by up to a mean F1-score of 20%.
    [Show full text]
  • A Centrality Measure for Urban Networks Based on the Eigenvector Centrality Concept
    This is a repository copy of A Centrality Measure for Urban Networks Based on the Eigenvector Centrality Concept.. White Rose Research Online URL for this paper: https://eprints.whiterose.ac.uk/120372/ Version: Accepted Version Article: Agryzkov, Taras, Tortosa, Leandro, Vicent, Jose et al. (1 more author) (2017) A Centrality Measure for Urban Networks Based on the Eigenvector Centrality Concept. Environment and Planning B: Urban Analytics and City Science. ISSN 2399-8083 https://doi.org/10.1177/2399808317724444 Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the White Rose Research Online record for the item. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request. [email protected] https://eprints.whiterose.ac.uk/ Journal Title A Centrality Measure for Urban XX(X):1–15 c The Author(s) 2015 Reprints and permission: Networks Based on the Eigenvector sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/ToBeAssigned Centrality Concept www.sagepub.com/ Taras Agryzkov1 Leandro Tortosa1 Jose´ F. Vicent1 and Richard Wilson2 Abstract A massive amount of information as geo-referenced data is now emerging from the digitization of contemporary cities.
    [Show full text]
  • A Geospatial Analysis Framework for Fine Scale Urban Infrastructure Networks
    The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4, 2018 ISPRS TC IV Mid-term Symposium “3D Spatial Information Science – The Engine of Change”, 1–5 October 2018, Delft, The Netherlands A geospatial analysis framework for fine scale urban infrastructure networks Q. Ji1, *, S. Barr1, P. James1, D. Fairbairn1 1School of Engineering, Newcastle University, Newcastle upon Tyne, UK – (q.ji2, stuart.barr, phil.james, david.fairbairn)@ncl.ac.uk Commission IV, WG IV/7 KEY WORDS: Infrastructure Network, Fine Spatial Scale, Spatial Heuristics, Database System ABSTRACT: Understanding the spatial connectivity of urban infrastructure networks that connect assets to buildings is important for the fine-scale spatial analysis and modelling of the resource flows within cities. However, rarely are spatially explicit representations of infrastructure networks available for such analysis. Further, an appropriate database system is the core of development of an infrastructure assets information and management platform, capable of handling the wide range of data for infrastructure system modelling and analysis. In this paper, we develop a geospatial simulation and analysis framework, which is capable of generating fine-scale urban infrastructure networks and storing the network instances in a hybrid database system for further modelling and analysis needs. We demonstrate the use of this platform by simulating the entire-city electricity distribution network for the city of Newcastle upon Tyne. Validation
    [Show full text]
  • A Hidden Markov Model for Matching Spatial Networks
    JOURNAL OF SPATIAL INFORMATION SCIENCE Number 18 (2019), pp. 57–89 doi:10.5311/JOSIS.2019.18.489 RESEARCH ARTICLE A hidden Markov model for matching spatial networks Benoît Costes and Julien Perret University Paris-Est, IGN/LaSTIG, ENSG, France Received: January 17, 2019; returned: March 1, 2019; revised: April 9, 2019; accepted: May 16, 2019. Abstract: Datasets of the same geographic space at different scales and temporalities are increasingly abundant, paving the way for new scientific research. These datasets require data integration, which implies linking homologous entities in a process called data match- ing that remains a challenging task, despite a quite substantial literature, because of data imperfections and heterogeneities. In this paper, we present an approach for matching spa- tial networks based on a hidden Markov model (HMM) that takes full benefit of the under- lying topology of networks. The approach is assessed using four heterogeneous datasets (streets, roads, railway, and hydrographic networks), showing that the HMM algorithm is robust in regards to data heterogeneities and imperfections (geometric discrepancies and differences in level of details) and adaptable to match any type of spatial networks. It also has the advantage of requiring no mandatory parameters, as proven by a sensitivity ex- ploration, except a distance threshold that filters potential matching candidates in order to speed-up the process. Finally, a comparison with a commonly cited approach highlights good matching accuracy and completeness. Keywords: spatial networks, data matching, data integration, topology, hidden Markov model, HMM 1 Introduction The increasing development of geographical information systems (GIS), especially thanks to the growth of web technologies and collaborative tools, combined with the growing interest of disciplines connected to GIS such as archaeology, history, and urbanism, make it possible to easily access many heterogeneous datasets of the same geographic space, at different scales, temporalities, or spatial and semantic granularities.
    [Show full text]