A Logical Framework for Frequent Pattern Discovery in Spatial Data

A Logical Framework for Frequent Pattern Discovery in Spatial Data

A Logical Frameworkfor Frequent Pattern Discovery in Spatial Data Donato Malerba Floriana Esposito Francesca A. Lisi Dipartimentodi Informatica- Universit/idegli Studidi Bari via Orahona4 - 70126Bari { malerbaI espositoI lisi} @di.uniba.it Abstract and recreation and for commercefacilities. Thus, once the In recenttimes, several extensions of data mining methods language of geographyhas been acquired, the major tasks andtechniques have been explored aiming at dealingwith amonggeographers are to observe the relevant spatial advanceddatabases. Many promising applications of features, to identify spatial patterns, to describe and inductivelogic programming (ILP) to knowledgediscovery quantify spatial associations and to elicit explanations for in databaseshave also emerged in order to benefitfrom pattern interactions. With the advent of geographical semanticsandinference rules of first-order logic. In this information systems (GIS), advanced functionalities paper,an ILP framework for frequent pattern discovery in spatial data mining such as frequent pattern discovery are spatialdata is presented. Thepattern discovery algorithm of great interest to GISusers. operatesonfirst-order logic descriptions computed byan initialstep of feature extraction froma spatialdatabase. The The design of algorithms for frequent pattern discovery algorithmbenefits of theavailable background knowledge has turned out to be a popular topic in data mining. This is on thespatial domain and systematically explores the not surprising given the relevance of data and patterns in hierarchical structure of task-relevant geographiclayers. the definition of data mining as a core step in the KDD Preliminary results have been obtained by running the process (Fayyad, Piatetsky-Shapiro, Smyth 1996). The algorithmSPADA on spatial data froman Italian province. blueprint for most algorithms proposed in the literature is the levelwise method by Mannila and Toivonen (1997), which is based on a breadth-first search in the lattice 1 Introduction spanned by a generality order betweenpatterns. The space In recent times, several extensions of data mining methods is searched one level at a time, starting from the most and techniques have been explored to deal with advanced general patterns and iterating between candidate databases such as spatial databases, temporal databases, generation and candidate evaluation phases. Frequent object-oriented databases and multimedia databases. patterns are commonly not considered useful for Progress in spatial databases, such as spatial data structures presentation to the user as such. They can be efficiently (Gating 1994), spatial reasoning (Egenhofer 1991), post-processed into rules that exceed given threshold computational geometry (Preparata and Shames, 1985), values. In the case of association rules the threshold values etc., paved the wayfor the study of knowledgediscovery in of support and confidence offer a natural way of pruning spatial databases which aims at the extraction of implicit weakand rare rules (Agrawaland Srikant 1994). knowledge, spatial relations, or other patterns not In this paper, we propose a logical framework for explicitly stored in spatial databases (Koperski, Adhikary frequent pattern discovery in spatial data. The main novelty and Hen1996). Generally speaking, a spatial pattern is a with respect to previous contributions to spatial data pattern showing the interaction of two or more spatial mining (Koperski and Han 1995) is the expressive power objects or space-depending attributes according to a of the language chosen for representing both data and particular spacing or set of arrangements (DeMers2000). patterns. Indeed, the research to date in the field has For instance, cities across nations are often clustered near generally taken the path of merely embedding spatial lakes,oceans and streams. Actually such an arrangement constructs on the top of well-established statistical reveals a spatial association, meaning that one spatial techniques in order to accommodatethe space dimension pattern is totally or partially related to someother spatial (Roddick and Spiliopoulu 1999). Weclaim the application pattern. Furthermore, questions can be raised about the of Inductive Logic Programming (ILP) methods and causes not only of single distributions but also of spatially techniques (Lavrac and Dzeroski 1994) to knowledge correlated distributions of phenomena.For instance, we discovery in spatial databases in order to benefit from mayexplain that the tendencyof cities to cluster near water semanticsand inference rules of first-order logic. bodies is driven by the need for sources of drinking water The paper is organized as follows. Section 2 will introduce the task of mining spatial association rules viewed as context for frequent pattern discovery in spatial Copyright©2000, American Association for Artificial Intelligence (www.aaai.org).All rightsreserved. data. In Section 3, representation, problemand algorithmic issues in the ILP approach to the task at hand will be From: FLAIRS-01 Proceedings. Copyright © 2001, AAAI (www.aaai.org). All rights reserved. SPATIOTEMPORALREASONING 6S7 discussed and illustrated by means of a sample task of 1998). Anyway,no insight in the algorithmic issues has frequent pattern discovery in data of an Italian province. been provided. A proposal of logical frameworkinspired to Conclusionsand future workare given in Section 4. the work on mining association rules from multiple relations by Dehaspeand De Raedt (1997) is sketched the following Section. 2 The mining task The discovery of spatial association rules is a descriptive 3 The logical framework miningtask aiming at the detection of associations between reference objects and some task-relevant objects, the The basic idea in our proposal of logical frameworkis that former being the main subject of the description while the a spatial databaseboils downto a deductive relational latter being spatial objects that are relevant for the task at database(DDB) once the spatial relationships between hand and spatially related to the former. The discovery reference objects and task-relevant objects have been process may be activated by a user query expressed in a extracted. Indeed, DDBs define relations both database mining query language such as extensionally as groundfacts (extensional database,EDB) MINEASSOCIATIONS DESCRIBING "large_towns’ and intensionally as rules (intensional database,IDB). WITHRESPECT TO topology(T.geo, R.geo), R.name, Thus,the expressivepower of first-order logic in databases topology(T.geo,W.geo), W.name, topology(T.geo, B.geo), allows to specify backgroundknowledge (BK) such B.admin_region2 spatial hierarchies,spatial constraintsand rules for spatial FROMtown T, road R, waterW, boundary B qualitative reasoning. WHERET.type=*large" AND distance(T.geo, R.geo) < "5 km’ ANDdistance(T.geo, W.geo) < "5 kin" 3.1 Representation issues ANDdistance(T.geo, B.geo) < "30 km" Let L={al, a2 ..... a~} a set of Datalog atoms of the form where large townsplay the role of reference objects while p(h,..,t,), whereeach term tj maybe either a variable or a roads, water bodies and boundaries play the role of constant (Ceri, Gottlob and Tanca 1989). A conjunction geographic layers from which task-relevant objects are atoms is named atomset. In our framework patterns are taken. Query processing involves massive spatial represented as atomsets. Since the ILP approach operates computationto extract spatial relations from the underlying in the context of a DDB,we denote the DDBat hand D(S) spatial database. Somekind of taxonomic knowledge on to meanthat it is obtained by adding spatial relations task-relevant geographic layers may also be taken into extracted from SDBas concerns the set of reference objects account to get descriptions at different concept levels S to the previously supplied BK.The tuples in D(S) can (multiple-level association rules). As usual in the problem grouped into distinct subsets: Each group, uniquely setting of association rule mining, we search for identified by the corresponding reference object sES, is associations with large support and high confidence (strong called spatial observation and denoted O[s]. Actually, a rules). spatial observation is multi-key, namelyit contains not only Formally, the problemcan be stated as follows: spatial relations betweenthe reference object seS and some Given task-relevant object rjeRt but also spatial relations between ¯ a spatial database SDB, rj and somes’ES. Thus, a spatial observation is given by ¯ a set of reference objects S, O[s] = O[sls] u{O[r~[s][ 3 tuple 0~D(S):0(s, r~)} ¯ some task-relevant geographic layers Rk, I<k<m, whereO[r~ls] is the observationwith key rj given s. together with spatial hierarchies defined on them, Example I Suppose the mining task is to discover ¯ a couple of thresholds for each level 1 in the spatial associations relating large towns (S) with water bodies (R=), hierarchies, minsup[l]and minconj[l] roads (R2) and province boundaries (R3) in the Province Find strong multiple-level spatial association rules. Bari, Italy. Weare also given a BKincluding the spatial The most representative work in the literature for the hierarchies of interest (see Figure 1 for a graphical mining task of interest is the progressive-refinement representation of the layer of roads). method by Koperski and Han (1995). It relies on the so- spatial_hierarchy(town,1, null, [town]). called attribute-value approach (AV) to data mining, spatial_hierarchy(town,2, town, [large_town,

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us