Identifying Effective Features and Classifiers for Short Term Rainfall Forecast Using Rough Sets Maximum Frequency Weighted Feature Reduction Technique
Total Page:16
File Type:pdf, Size:1020Kb
CIT. Journal of Computing and Information Technology, Vol. 24, No. 2, June 2016, 181–194 181 doi: 10.20532/cit.2016.1002715 Identifying Effective Features and Classifiers for Short Term Rainfall Forecast Using Rough Sets Maximum Frequency Weighted Feature Reduction Technique Sudha Mohankumar1 and Valarmathi Balasubramanian2 1Information Technology Department, School of Information Technology and Engineering, VIT University, Vellore, India 2Software and Systems Engineering Department, School of Information Technology and Engineering, VIT University, Vellore, India Precise rainfall forecasting is a common challenge Information systems → Information retrieval → Re- across the globe in meteorological predictions. As trieval tasks and goals → Clustering and classification; rainfall forecasting involves rather complex dy- Applied computing → Operations research → Fore- namic parameters, an increasing demand for novel casting approaches to improve the forecasting accuracy has heightened. Recently, Rough Set Theory (RST) has at- Keywords: rainfall prediction, rough set, maximum tracted a wide variety of scientific applications and is frequency, optimal reduct, core features and accuracy extensively adopted in decision support systems. Al- though there are several weather prediction techniques in the existing literature, identifying significant input for modelling effective rainfall prediction is not ad- dressed in the present mechanisms. Therefore, this in- 1. Introduction vestigation has examined the feasibility of using rough set based feature selection and data mining methods, Rainfall forecast serves as an important di- namely Naïve Bayes (NB), Bayesian Logistic Re- saster prevention tool. Agricultural yields and gression (BLR), Multi-Layer Perceptron (MLP), J48, agriculture based industrial development more Classification and Regression Tree (CART), Random often than not, rely on natural water resources Forest (RF), and Support Vector Machine (SVM), to forecast rainfall. Feature selection or reduction process like rain and rain-bound static water bodies for is a process of identifying a significant feature subset, productivity. Rainfall prediction or forecasting in which the generated subset must characterize the is not a simple process, rather it is a demand- information system as a complete feature set. This pa- ing scientific task; precise prediction is a chal- per introduces a novel rough set based Maximum Fre- quency Weighted (MFW) feature reduction technique lenging job for meteorological scientists across for finding an effective feature subset for modelling the world. Prediction demands attention as the an efficient rainfall forecast system. The experimen- atmospheric factors that determine the rainfall tal analysis and the results indicate substantial im- event are highly dynamic. Thus, determining provements of prediction models when trained using the future from the available historical data is the selected feature subset. CART and J48 classifiers have achieved an improved accuracy of 83.42% and not a simple task as it is influenced by current 89.72%, respectively. From the experimental study, dynamics of the atmosphere. [1] reinstated the relative humidity2 (a4) and solar radiation (a6) have immense importance of prediction of the tor- been identified as the effective parameters for model- rential downpour to minimize natural disas- ling rainfall prediction. ters before the events occur. As reported in [2], Weather Research Forecast (WRF) model as a ACM CCS (2012) Classification: Computing meth- suitable heavy rainfall event prediction model. odologies → Machine learning → Machine learning The experimental results proved that their pro- algorithms → Feature selection; 182 S. Mohankumar and V. Balasubramanian Identifying Effective Features and Classifiers for Short Term Rainfall Forecast Using Rough Sets... 183 posed model could predict rainfall in significant r∈ AS. : X ×→ AV is an information function, racy is determined using the lower and upper features in the dataset based on discernibility consistency with real measurements. [3] de- which designates each object feature value in V approximation space. matrix. [14] described a modified binary dis- scribed a feature selection approach using ge- that constitutes an information system. Definition 6. Reduct and Core cernibility matrix for feature selection algo- netic algorithm for heavy rainfall predictions in rithm based on discernibility matrix that dealt South Korea. They used weather data collected Definition 2. Decision Table The two fundamental concepts of RST are core directly with inconsistent decision-making sys- from European Medium Range Weather Fore- It is a finite set of objects that constitute an in- and reduct. The reduct set {R} is the indispens- tem. Ordering and simple link technique in the cast centre between 1989 and 2009. Despite the formation system description represented as a able part of an information system with all pos- algorithm reduced the size of the input table, existence of several works related to rainfall table. The decision table consists of a finite set sible subsets of features. Reduct set can discern which, in turn, reduced the computation and prediction, there is always a demand for better of conditional and decision features. The sam- all objects as a complete feature set {A}. The storage space. [15] reported a two-step feature techniques for prediction due to the magnitude ple information table of this rainfall forecast core is the indispensable feature of reducts, selection technique. A discretisation process of of impact rainfall has on the environment. system is presented in Table 1. The dataset has where reducing the domain of a continuous features eight conditional attributes or features repre- with irreducible and an optimal set of cuts was The foremost intent of this research is to accom- Core=∩∩{R 1} { R 2} { R 3} ... ∩{ Rn} (1) plish improved prediction accuracy through sented as (a1, a2, a3, a4, a5, a6, a7, and a8) and adopted based on the discernibility matrix. the use of effective weather parameters using one decision variable (Rf). The proposed rough set based Maximum Fre- [16] described a reduct optimization method proposed rough set feature selection and data Definition 3. Approximation Space quency Weighted (MFW) feature reduction based on the conditional features to generate mining methods. This research is different from model is designed based on RST. This model representative data to simplify the discernibil- previous work as it delves into identifying the In an information system I = {X, A, V, S}, the is used to reduce the feature space of meteoro- ity matrix. [17] described a reduct construction influencing features that enable efficient rain- rough set concept can be defined by means of logical (rainfall) dataset and to identify the sig- method based on discernibility matrix simpli- fall forecast. The proposed model not only fo- topological operations, i.e., interior and closure, nificant features. Initially, the required possible fication. Elements of a minimum discernibil- cuses on feature reduction but also on finding called approximations. The key concept of the combinations of feature reduct sets are gener- ity matrix are either the empty set or singleton a list of parameters contributing to precision presented rough set approach is the mathemat- ated using Rosetta (http://rosetta.org), which subsets, in which the union derives a reduct instead of just finding a subset or attaining di- ical formulation of the concept of equality of is an open source Rough Set Toolkit for data using heuristic reduct algorithms. [18] inves- mensionality reduction. sets defined as approximation space [6]. analysis (Rosetta) [11]. The required target in- tigated Support Vector Machine (SVM) classi- put for the proposed algorithm is determined Definition 4. Discernible and Indiscernible Re- fier as a suitable model for rainfall forecasting. using Rosetta's genetic algorithm based feature lation [19] utilized a new fuzzy based feature selec- 2. Foundation reduction technique. These sets of reducts are tion approach for a medical dataset in tumour For an information system I = (O, A), O → {} then used in identifying the effective features diagnosis. In this approach, feature selection Rough Set Theory (RST), as proposed by Z. is a non-empty set of objects and A → {} based on the novel feature ranking based fre- method based on the fuzzy gain ratio under the Pawlak in 1982, is a mathematical model that a non-empty set of features where each quency weighted feature reduction approach. In framework of fuzzy RST has been described deals with vague and imperfect knowledge. {SA} → subset of features of {A}. 'R' is the the next phase, the impact of succeeding set of by experiment results. They have demonstrated RST does not require any prior knowledge or equivalence relation called indiscernibility re- feature subsets is evaluated based on the perfor- the effectiveness of the proposed approach by additional information about the data [4] [5] lation on the set {I} that contains the elements mance of classification models. The proposed comparing the proposed approach with several [6]. In rough set, data analysis starts from a ta- with similar feature values and complements of algorithm background and implementation de- other approaches on three real world tumour ble referred to as decision or information table it as discernible relation. This equivalence re- tails are described in subsection 5.2. The de- datasets in gene expression.