Research on Multi-Level Association Rules Based on Geosciences Data
Total Page:16
File Type:pdf, Size:1020Kb
JOURNAL OF SOFTWARE, VOL. 8, NO. 12, DECEMBER 2013 3269 Research on Multi-Level Association Rules Based on Geosciences Data Dongmei Han School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, P.R.China Shanghai Financial Information Technology Key Research Laboratory, Shanghai 200433, P.R.China Email: [email protected] Yiyin Shi, Wen Wang and Yonghui Dai School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China Email: [email protected], [email protected], [email protected] Abstract—This paper proposes a framework of multisource feature of remote sensing image was recognized with the geo-knowledge discovery with association rules. Taking into Bayes classifier and genetic algorithm [2]. Quintero, et al account spatial data exist semantic fuzziness, and the (2012) presented FERD, a methodology aimed to conversion between qualitative concept and quantitative automatically identify, extract and describe relevant description is uncertain, in our study, conceptual partition spatial objects contained in raster spatial datasets [3]. algorithm and membership grade judgment algorithm based on cloud model was used. Meanwhile, there are many Zheng and Zhou (2011) predicted the location correlations among concepts in the field of geoscience, and information with Warrants theory [4]. Other research on the underlying correlation also need to be found with geospatial data knowledge discovery such as gravity flow different membership grade functions, therefore, a method among gravity anomaly with the graph theory and of multi-level association rules mining was proposed. In shortest path theory, the visualization technologies used order to enhance frequent item set discovery efficiency, in spatial data mining as well as typical applications, improved FP (Frequent Pattern)-Growth algorithm was seismicity of the Longmen Mountains fault zone and its presented. The algorithm was used in the empirical research vicinity before the 12 May 2008 Wenchuan Ms8. 0 on judgment of fault property at the south of Longmen earthquakes, etc [5]-[7]. In addition, research on Mountains in Chengdu city of China. The empirical result shows that the improved FP-Growth model acts better in association rules algorithm main include three aspects, frequent item-set mining. research on efficiency of association rules algorithm, algorithm adaptability and geospatial uncertainty, which Index Terms—geosciences data, multi-level association rules, can be described as following. cloud model 1. Research on efficiency of association rules algorithm. The Apriori algorithm was first proposed by Agrawal, he proposed the association rules based on I. INTRODUCTION candidate item set [8]. After this, many scholars proposed With the development of geophysical exploration, improvement based on the algorithm in order to improve geochemical exploration, land surveying and remote the efficiency of operations. The classical algorithms are sensing, a vast amount of associated data has been AprioriTid algorithm, AprioriHybrid algorithm, collected, which makes it possible for scientists to reveal transaction reduction technique and so on [9]-[10]. In the internal constitution of earth. Many scholars have addition, the different data structures to further improve contributed to the multi-source geospatial knowledge the efficiency of association rule discovery algorithm in discovery include the research on geospatial data many Chinese scholars [11]-[13]. knowledge discovery and research on association rules 2. Research on algorithm adaptability. Study on the algorithm. Fauvel, et al. (2012) researched on a spatial– applicability of the algorithm mainly included research on spectral kernel based approach for the classification of multi value attribute association rules, research on the remote-sensing images, the proposed method deals with multilevel association rules mining, association rules with the joint use of the spatial and the spectral information constraints, positive and negative association rules, provided by the remote-sensing images [1]. Linear classification association rules, and sequence association rule. Among, association rules can be divided into two This work was supported in part by National Natural Science kinds of boolean type (such as Apriori algorithm) and the Foundation of China (No.41174007) and Graduate Innovation Fund Program (No.CXJJ-2013-436, No.CXJJ-2013-445) of Shanghai multi value type. Li, et al (2004) extended the original University of Finance and Economics transactional database by the method of similar attribute Corresponding author: Yonghui Dai. set, making the association rules available to be applied © 2013 ACADEMY PUBLISHER doi:10.4304/jsw.8.12.3269-3276 3270 JOURNAL OF SOFTWARE, VOL. 8, NO. 12, DECEMBER 2013 to the multi value attributes [14]. Fdez, et al (2009) adoption of visual translation, affected the efficiency of presented a new fuzzy data-mining algorithm for data processing. extracting both fuzzy association rules and membership (3) Visual interpretation is subjective, results functions by means of a genetic learning of the interpretation of different specialist is difficult to be membership functions and a basic method for mining quantified and popularized. fuzzy association rules [15]. Research on the multilevel Secondly, the spatial data are fuzzy and uncertainty, association rules mining is mainly divided into two kinds we often need such expressions as " around the top,100 of model, one is attribute-oriented induction method, the meters south of landslide displacement", and the other is method based on concept tree. Wang, et al (2009) traditional method of data-discretization is based on hard discussed the method of attribute induction threshold division method that divides concept into either this or determination and multilayer multidimensional attribute that, the description of the uncertainty is not exact. generalization, proposed the Hastu concept climb method Thirdly, traditional association rule-mining is only based concept lattice development to solve the problem applicable to single-level data. While geospatial data can that the granularity is too coarse or small to mine [16]- often be hierarchically processed in accordance with [17]. Manda, et al (2013) presented a data mining elevation, it is necessary to discover knowledge at approach, which is called Multi-Ontology data mining at different information granularity .For example, mining at All Levels (MOAL) [18]. Hsu, et al (2004) focused on coarse granularity level so as to find the common using correlation of multiple reference attribute threshold meaning of knowledge, but the information granularity is to improve the efficiency of generalization and accuracy too high and may leads to over-generalization of the rules of knowledge discovery, and gave the general path and discovered. Therefore, one of the problems of the approach of attribute induction. Knowledge discovery traditional single level association rules is very difficult process with attribute oriented generalization method to carry out effective mining at appropriate level to find focus on how to control information thickness to meet the the useful knowledge. requirements of the rules discovery [19]. Finally, apriori algorithm based on candidate item sets 3. Research on geospatial uncertainty. Geospatial data is widely used to generate frequent item sets, which bears many kinds of uncertainties, such as uncertainties greatly influences the efficiency of the algorithm due to caused by approximation in data sampling process and the need to scan the database many times while model abstract, conversion between spatial concept and generating frequent item sets. While the FP-Growth spatial data and so on. Therefore, in the face of the algorithm only needs to scan database two times to unavoidable problems, many scholars began to make generate frequent item sets due to the unique data research, which can be divided into the following two structure (FP-Tree structure), but it repeats traversal of categories: the uncertainty of spatial relationship and parent node path while constructing the FP-Tree data spatial reasoning. Research on the uncertainty of spatial structure, so the FP-Tree tree structure will show relations is mainly based on the following theories: the explosive growth in the face of massive data, which still theory of probability, the rough set theory, evidence affects the efficiency of data mining. theory, and cloud theory [20]-[22]. In the geometric space, uncertainty reasoning between geometry is the main II KNOWLEDGE DISCOVERY FRAMEWORK BASED ON research direction of spatial reasoning. Viard and Lévy MUTI-SOURCE GEOSPATIAL DATA (2011) reviewed a typical example of decision making For some existing problems mentioned above in the under uncertainty, where uncertainty visualization field of multi-source geoscience knowledge discovery, methods can actually make a difference [23]. Yao and Li this paper puts forward a complete set of geological (2008) first defined different geometric space features, knowledge discovery framework based on mining then introduced spatial relations to define that between association rules which is as shown in Fig. 1. geometric bodies, established relevant inference mechanism [24], while Justice, et al (2011) combined spatial attribute and non-spatial attribute inference mechanism [25]. Up to now, many scholars have theoretically studied on spatial knowledge