Geo-Tagging Non-Spatial Concepts

Geo-Tagging Non-Spatial Concepts Amgad Madkour Walid G. Aref Purdue University Purdue University West Lafayette, USA West Lafayette, USA [email protected] [email protected] Mohamed Mokbel Saleh Basalamah University of Minnesota - Twin Umm Al-Qura University Cities Makkah, KSA Minneapolis, USA [email protected] [email protected] ABSTRACT Categories and Subject Descriptors Concept Geo-tagging is the process of assigning a textual H.2 [Database Management]: Database Applications identifier that describes a real-world entity to a physical ge- ographic location. A concept can either be a spatial concept where it possesses a spatial presence or be a non-spatial con- General Terms cept where it has no explicit spatial presence. Geo-tagging Algorithms locations with non-spatial concepts that have no direct relation is a very useful and important operation but is also very Keywords challenging. The reason is that, being a non-spatial concept, e.g., crime, makes it hard to geo-tag it. This paper proposes geotagging, relatedness using the semantic information associated with concepts and locations such as the type as a mean for identifying these 1. INTRODUCTION relations. The co-occurrence of spatial and non-spatial con- The Web of Data [6] is mostly comprised of a set of single cepts within the same textual resources, e.g., in the web, can concepts or real-world things termed concepts. Some of the be an indicator of a relationship between these spatial and concepts in the Web of Data have an associated spatial di- non-spatial concepts. Techniques are presented for learn- mension or location, e.g., the White House or the San Diego ing and modeling relations among spatial and non-spatial Zoo. We refer to these concepts as spatial concepts. We as- concepts from web textual resources. Co-occurring concepts sume that each location that is itself a concept and we refer are extracted and modeled as a graph of relations. This to it as a spatial concept. In contrast, other concepts do graph is used to infer the location types related to a con- not have an associated spatial dimension or location, e.g., cept. A location type can be a hospital, restaurant, an edu- pollution, crime, traffic, and health. We refer to these con- cational facility and so forth. Due to the immense number cepts as non-spatial concepts. Non-spatial concepts can have of relations that are generated from the extraction process, an implicit relation with other spatial concepts. For exam- a semantically-guided query processing algorithm is intro- ple, \crime", a non-spatial concept, can be related to spatial duced to prune the graph to the most relevant set of related concepts that have the following types, e.g., bus-stops and concepts. For each concept, a set of most relevant types are avenues. \Crime" and bus stops do not conceptually belong matched against the location types. Experiments evaluate to the same type. The question that this paper addresses the proposed algorithm based on its filtering efficiency and is the following: Given a non-spatial query concept, say X, the relevance of the discovered relationships. Performance how can we identify spatial concept types that are related to results illustrate how semantically-guided query processing X? For example, consider the following query: \Find Pol- can outperform the baseline in terms of efficiency and rele- lution in NYC". In the query, \Pollution" is the non-spatial vancy. The proposed approach achieves an average precision query concept. The answer to the query is a list of spa- of 74% across three different datasets. tial results that have the following types, e.g., bus stops, railroads, and garages. Given a location of interest such as \NYC" of the query, if we can find the locations of the bus stops, the railroads, and the garages, we can now geotag Permission to make digital or hard copies of all or part of this work for the non-spatial concept \pollution" on the matching location personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear types in NYC. this notice and the full citation on the first page. Copyrights for components The co-occurrence of these concepts within textual re- of this work owned by others than ACM must be honored. Abstracting with sources provides evidence for identifying the implicit rela- credit is permitted. To copy otherwise, or republish, to post on servers or to tions between spatial and non-spatial concepts. Most keyword- redistribute to lists, requires prior specific permission and/or a fee. Request based search engines retrieve relevant results based on the permissions from [email protected]. query keywords occurring in the textual resources. For ex- MobiGIS ’15 November 03-06, 2015, Bellevue, WA, USA c 2015 ACM. ISBN 978-1-4503-3977-3/15/11 ...$15.00 ample, answering the query: \Find Education in Seattle"can DOI: http://dx.doi.org/10.1145/2834126.2834138 obtain spatial results, e.g., schools and Universities that can only have the keyword \Education" (or its derivative forms) Los Angeles. The location of interest helps restrict the re- appearing explicitly in the corresponding textual resources. sults to a specific region. These two parameters are passed to In this paper, we propose to answer this type of queries by a query processing algorithm that learns the types of spatial identifying the relation between the non-spatial query con- concepts that are most related to the query. This learning cept and the spatial concepts types. We refer to this pro- process takes place on the global graph in the knowledge cess as type relatedness. Some notable applications that can store. First, the query processor filters out the spatial con- make of this proposal are semantic search over spatial data cepts that do not belong in the location of interest from the and a web directory for spatial locations similar to DMOZ global graph. Next, the query processor filters the remain- for websites. For example, consider the query: \Find Pollu- ing concepts based on a proposed set of semantic predicates tion in Indiana". The spatial concept \Wabash Valley Power (i.e., relations). Finally, the query processor identifies the Authority" that has Type \Power Plant" is related to the types of the remaining set. These types are used to geotag non-spatial query concept \Pollution", and hence needs to the non-spatial query concept with spatial concepts in the considered by the query. This work focuses on concepts as location specified by the query. defied by Wikipedia where concepts will have a Wikipedia article describing the concept. This implies also the appli- 1.1 Contribution cability of such approach for other languages as long as the The contributions of this paper are as follows. concept contains a multilingual entry in Wikipedia. In or- • We propose CGTag, a system for geotagging a non- der to be able to answer these types of queries, this paper spatial concept query with spatial concepts based on addresses the following two challenges. The first challenge type relatedness. is related to representing the co-occurrences of spatial and • We propose a semantic query-processing algorithm that non-spatial concepts within the same textual resources. We uses several Linked-Data filters. propose to create an undirected weighted graph that con- • We propose an evaluation method for type relatedness tains an edge between concepts that occur in the same tex- in addition to a baseline to determine the correctness tual resource. A textual resource can be a sentence, para- of the results. graph, article, web page or even a microblog entry. The The rest of paper proceeds as follows. Section2 presents second challenge is related to the traversal of the graph to the related work. Section3 illustrates how CGTag repre- infer the types of spatial concepts that are semantically re- sents the relation between co-occurring concepts. Section4 lated to the non-spatial concept in the query. We propose presents the architecture of CGTag and discusses its main a series of Linked-Data filters for pruning the results and components. Sections5 and6 present the experimental presenting the user with the most relevant types that relate setup and experimentalresults, respectively. Finally, Sec- spatial concepts to the non-spatial query concept. tion7 contains concluding remarks. This paper introduces a system for Geo-tagging concepts. Geo-tagging being the process of assigning a textual identi- 2. RELATED WORK fier, namely a concept, to a location.The proposed system, There has been a variety of studies performed for con- termed Concept Geotagger, dubbed \CGTag", operates in structing large knowledge bases. Some of these knowledge two phases:( i) an offline phase, and( ii) an online phase. bases are constructed in an automated fashion. These knowl- In the offline phase, CGTag extracts the co-occurring con- edge bases utilize unsupervised techniques that process web cepts in every textual resource. Then, CGTag creates a resources. For example, Linkedgeodata [15] converts data clique graph among these co-occurring concepts for every from OpenStreetMap to an RDF model. Linkedgeodata de- textual resource. We refer to these clique graphs as the local rives a lightweight ontology from the OpenStreetMap data. graphs. An edge between two concepts indicates the exis- Linkedgeodata also provides an interlinking dataset that tence of a co-occurrence relation and is assigned a weight of links its concepts with DBpedia, GeoNames, and other datasets. how frequent the relation has appeared across documents. Linkedgeodata also provides simple spatial semantic predi- The weight does not include the number of occurrences of cates (i.e., relations) based on proximity and the contain- the relation in the same document.

Load more