The Application of Apriori Algorithm in Predicting Flood Areas

The Application of Apriori Algorithm in Predicting Flood Areas

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by International Journal on Advanced Science, Engineering and Information Technology Vol.7 (2017) No. 3 ISSN: 2088-5334 The Application of Apriori Algorithm in Predicting Flood Areas Nur Ashikin Harun#1, Mokhairi Makhtar#2, Azwa Abd Aziz#3, Zahrahtul Amani Zakaria#4, Fadzli Syed Abdullah#5, Julaily Aida Jusoh#6 #Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin, Tembila, Besut, Terengganu, Malaysia E-mail: [email protected], [email protected], [email protected], [email protected], [email protected], 6 [email protected] Abstract — The changing of physical characteristics of the hydrological system have caused a lot of natural phenomenon, which leads to flooding as one of the major problems that cause economic damages and affect people’s life. Therefore, the need for a systematic and comprehensive approach to flood area prediction is needed. This research proposed a flood area prediction model with the application of Apriori algorithm towards hydrological data sets. Department of Irrigation and Drainage Malaysia supply the data sets and flood report from year 2009 to 2015 (November until January) which consist of 7 district. The research begins with the data selection, pre-process the data, and data transformation, then the cleaned data will be tested with the Apriori algorithm. The rules will be evaluating using support, confidence and lift value to rank either it is best rules or not. The results show that each district generates best and crucial rules which consist the association of the villages and water level. Thus, hopefully the result can be use in flood management and can give early an early warning to the villagers at flood risk area. Keywords— data mining; Apriori algorithm; association rule; flood disaster; knowledge discovery in database unpredictable. I. INTRODUCTION Therefore, this research applies data mining toward Floods have been befalling throughout the Earth history hydrological data to obtain new knowledge. Data mining is a and are estimated to endlessly occur as the water cycle step that applies data analysis and algorithms to produce a continues to run. According to the United Nations Office for particular enumeration of patterns or model over the data [8]. Disaster Risk Reduction (UNISDR), flooding is the biggest Data mining has various methods and techniques that could natural disaster around the world for the year 1980 until be applied in many fields of research as a problem solver and 2011 with mark 3455 events all around the world. Besides, also important to decision makers. Association rules are one flood is the most devastating disaster and cause a lot of of the major techniques of data mining. In [9] discovered the damages and trauma towards sufferers [1], [2]. The factors association rules when they need to do research on sales that cause flood at a certain area are geographical condition, pattern on a large database. For instance, association rules metrological condition, planning problem, hydrological are mining large datasets to find frequent item sets by condition and environmental status due to human activities considering the minimum support and minimum confidence [3]. Due to the damages and loss, the effective management [10]. Apriori Algorithm is the algorithm to mine frequent of flood risk is a spark issue all around the globe to have item sets that satisfied support and confidence by generating better management and prevention [4], [5]. candidates in the process of joining and pruning [11]. This In Malaysia, there are always flood occurrence event, research uses Apriori Algorithm compare to other algorithms especially during monsoon season. In Malaysia, there are because it reduces the number of scans in the database to distinct dry and rainy seasons with rainfall annually 3000 m, extract frequent item sets. Therefore, it maximizes the and an average of humidity is 80%. There is a total of 189 computational workload [12]. The Apriori algorithm had river basins in Malaysia including Sabah and Sarawak, been applied in many fields such as medical [13], [14], which the main channels are flowing to the South China Sea education [15], weather forecasting [16] and disaster and 85 of them are disposed to erratic flood [6]. For this management [17], [18]. research, Terengganu which is one of the low-lying area [7] This research aims to identify the association between were chosen as the research area. Terengganu experienced water level and flood area during the monsoon season. Northeast monsoon season, which exposed to heavy rainfall Besides, a model was developed in this study by every early November and ends in March. Due to this implementing association rule mining to predict the flood phenomenon, floods always occur which sometimes area in Terengganu. In flood forecasting, past research 763 focuses more on the application of remote sensing [19]. Thus, B. Research Workflow this research applies Apriori Algorithm to predict flood area In order to accomplish the research objectives, a process based on attributes and instances. Despite the vast number of flow is designed with the research approach. studies available in the literature, the current study that uses According to Fig. 2, the research starts with the collecting data mining approaches can contribute to flood area flood data from Department of Irrigation and Drainage prediction using hydrological data with cost effectiveness, Malaysia, Department Irrigation and Drainage Terengganu reliable results and help in flood management in research and Terengganu Flood Portal. The data is selected to create area [20]. the desired datasets. Then, this dataset will be integrated to develop a dataset to be mined using the algorithm. The II. MATERIAL AND METHOD research finds that the output of phase one is the flood This section explains the study area, the process of the datasets, which is the collation of some (or all) data from the research and also the data used and the algorithm that had aforementioned data resources. In fact, in phase two, the been applied. association rule algorithm will be run using the flood datasets to generate association rule of flood area. Finally, A. Study Area the rules will be used at phase three to create a model of The research focuses on Terengganu that consists of 7 flood area prediction. districts. The districts are Marang, Dungun, Setiu, Kemaman, Moreover, this study follows the Knowledge Discovery in Besut, Kuala Terengganu and Hulu Terengganu. Terengganu Database process flow as to apply the data mining in the is located in the North East Malaysia with the latitude of 3o research. The Knowledge Discovery Database has five 53’U-5o 50’U and longitude of 102o 23’T-103o 30’T. Fig. 1 phases which are the selection of data, data pre-processing, shows the study area. Each district will generate best rules data transformation, data mining and lastly the interpretation which show the association of village during flood happens. of the data into desire reports. The phase could be loop or Thus, using the rules will we see the correlation of the rules iterative to produce the output to be used in the next phase. with the water level at the main river and selected stations. Fig. 3 illustrates the phases of the KDD model. Fig. 1 Research area Fig. 2 Proposed model of flood prediction 764 Fig. 3 KDD process This research used secondary data which come from For this research, we used Apriori algorithm to test on the reports, newspaper and also data storage. The raw data for datasets collected. Association rules have lots of other this research mainly come from the Malaysian Irrigation and algorithms, but the Apriori is the most suitable algorithm for Drainage Department. In order to extract the data, the this research because of the particular data set and also the selection of data set and subset requires an understanding of rules extraction solve the research problems. The advantage the domain. In this research, the main emphasis is on finding of this algorithm is it will shrink the search space in term of a correlation between river flow and flood area and “if an item set Z is not frequent then for any item A, Z U A developing a model to predict flood area by implementing will not be frequent”. association rule. At first, the data selection [22] phase is the In obtaining the rules, there are two major steps applied initial phase to create a target data. At pre-processing phase, [21] which are: the target data will be cleaned from the missing values, • Find all sets of items that have support value greater outliers, inconsistent and much more. Then, at data than the minimum support. These item sets are called transformation, the data will be transforming into the format large item sets. All others are called small item sets. of destination data. While this research uses WEKA as the • Use the large item sets to generate the desired rules. It aided tools, the data are in “.csv” format to make sure the begins with finding all non-empty subsets of every WEKA can process and load it. In data mining phase, the large item set L. For every such subset A generate a association rules are applied to the data sets. Basically, the rule of the form A => (L-A) if the ratio of the support association rules apply if/then statements to discover the (A) is at least minconf. All subsets of L must be relationship between unrelated data in the data repository. considered to generate rules with multiple The basic formula of association rules is consequences. Additionally, we set up the rank of rules that will be A => B (1) generated according to lift metric.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us