RKEEL: Using KEEL in R Code

RKEEL: Using KEEL in R Code Jose Moyano Luciano Sanchez University of Cordoba University of Oviedo Cordoba, Spain Oviedo, Spain Email: [email protected] Email: [email protected] Abstract—KEEL is a popular Java suite for a large number to concatenate experiments of other R packages or other R of different knowledge data discovery tasks. R is a emerging and utilities, which is a strong advantage. quite used programming language and environment for statistical The rest of the article is organized as follows: Section computing. In order to make a system where the advantages of KEEL and R language could be taken, it is created an R package II presents the preliminaries, Section III shows the general called RKEEL, allowing to use KEEL algorithms in simple R structure of RKEEL, Section IV explains how to use the code. The implemented R code layer between R and KEEL makes RKEEL package and its advantages, and Section V gives the easy both using KEEL algorithms in R as implementing new guidelines to implement new interfaces for KEEL algorithms algorithms for RKEEL in a very simple way. Thus allows making in RKEEL. Finally, Section VI presents the main conclusions a more complete experimentation process. and future work. I. INTRODUCTION II. PRELIMINARIES KEEL (Knowledge Extraction based on Evolutionary Learn- In this section, first it is analysed in detail the KEEL tool, ing) [1] is an open source suite [2] implemented in Java, de- focusing on its structure and how it works. Then, it is presented signed to use a large number of different knowledge data dis- RWeka, an R interface for using Weka. covery tasks. One of the strengths of KEEL is that it allows to A. KEEL design experiments based on data flow, with different datasets and computational intelligence algorithms, in order to assess As mentioned before, KEEL is a suite to assess algorithms the performance of the algorithms. It contains a wide variety for data mining problems including regression, classification, of classical knowledge algorithms, preprocessing techniques, clustering, pattern mining and so on, paying special attention computational intelligence based learning algorithms, hybrid to evolutionary algorithms. models, statistical methodologies for contrasting experiments The current version of KEEL (v3.0) has the following and so forth. So, with all these features, the user can perform a blocks: complete analysis of new computational intelligence proposals ∙ Data management: this part is composed by a set of tools to compare to existing ones. that can be used to build new data, export and import R [3] is a programming language and environment for data in other formats to KEEL format, data edition and statistical computing and graphics. It provides a wide variety visualization, apply transformations to data, and so on. of statistical (as linear and non-linear modelling, classical ∙ Experiments: the aim of this part is the design of statistical tests, classification, clustering, ...) and graphical experiments over some selected data sets. It provides techniques. One of the principal features of R is that it can be some types of experiments, as classification, regression, easily extended via packages. Any user can publish a package unsupervised learning and subgroup discovery, and a that extends the R basic configuration. They are available wide number of algorithms for the distinct types of principally through the CRAN (Comprehensive R Archive experiments. Figure 1 shows an example of an experiment Network). designed in KEEL. KEEL provides a simple GUI, useful for the design of ∙ Educational experiments: this part is similar to the Ex- experiments in a easy and intuitive way, but it is an in- periments one, but it allows to debug step-by-step the convenient if we want to use the KEEL algorithms from experiments in order to use it as a guideline of the source code, because KEEL does not have an API to use its learning process, to use with educational objectives. functionalities. In this way, one of our objectives is to link ∙ Modules: this block implements other functionalities, KEEL and R, providing an interface to use KEEL features as design of experiments of imbalanced learning, semi- in R source code, creating the RKEEL package and making supervised learning or multiple instance learning. Also it it available via CRAN repository. Thereby, with RKEEL we offers the possibility of doing non-parametric statistical could preprocess the data, use algorithms from KEEL, get analysis. the results and draw a plot with them, for example, all from Altogether, KEEL has implemented more than 500 algo- our R code. Also, handling with KEEL algorithms in R rithms of multiple types, as data preprocessing, classification makes the experiment process much more complete, allowing or regression among others. 978-1-5090-0626-7/16/$31.00 c 2016 IEEE 257 these learners classes. In conclusion, this package is a bridge between R and Weka classes to create Weka’s object in R. III. RKEEL STRUCTURE In this section, the main structure of RKEEL is explained, focusing in its main classes and its functionality. A. General structure of RKEEL RKEEL is an R package, which code is completely written in R and only uses the java .jar executable files of KEEL. The main objective of the package is to create a layer between R and the KEEL tool, to use its functionalities in an experiment Fig. 1. Example of KEEL experiment designed in R. As mentioned before, for a designed experiment, KEEL creates a folder with all the .jars and configuration files that are Once an experiment is generated and saved, KEEL creates necessary, and then the user have to execute the RunKeel.jar. a folder with the following structure: Thus, the main objective of our proposal is to create the full ∙ exe: it contains the .jar files of the algorithms included in experiment folder with all the necessary .jar and configuration the experiment. They are executed from the correspondent files, execute the experiment, get the results and remove the configuration files of each algorithm in the experiment. experiment folder. Thereby, in our R code, we create a learner ∙ scripts: it contains the configuration files for each algo- corresponding to an algorithm, we execute it and we get the rithm, the XML configuration file for the whole experi- results, but the process of the experiment folder creation is ment and the RunKeel.jar file, which is used to run the transparent to the user. experiment. Figure 2 shows the class diagram of RKEEL. As it can be ∙ datasets: it contains the dataset files required for the ex- seen, there is an abstract class called KeelAlgorithm which periment. Each dataset is in a separately sub-folder. Also, is the base class for all the implemented KEEL algorithms. it will store the results of the preprocessing algorithms. From this class will inherit the classes that defines the different ∙ results: it will contain the output files of each algorithm, types of algorithms, as ClassificationAlgorithm, RegressionAl- storing the results. gorithm or PreprocessAlgorithm. Then, all the implemented So, when the user wants to execute the experiment, he has interfaces will inherit from its corresponding class, as KNN to run the RunKeel.jar, which will read the configuration files or C45 from ClassificationAlgorithm for example, each one and execute the corresponding experiments (by its .jar files) declaring its parameters. In addition, they are included the with the corresponding datasets. When each algorithm finishes, classes KeelUtils (which contains several functions to deal the results are stored in the results folder as text, and the user with different aspects of the software, as parallelizing the can read them. experiment, dataset load, data handling and more), Classifica- Also, KEEL has a data repository on its web site where tionResults (which helps to obtain some classification results more than 900 datasets could be downloaded [4], including from the actual and predicted classes, which are the outputs of datasets for supervised classification, regression datasets, un- the classification algorithms) and RegressionResults (to obtain supervised learning datasets, low quality datasets and more. and store results from a regression algorithm). Most datasets can be downloaded full or in various types of B. Dependencies partitions as 5 or 10 folds cross-validation. The RKEEL package has some dependences to ensure its B. RWeka performance. First of all, and given by KEEL, this package As similar to our proposal, it is noteworthy to mention needs at least, Java version 7 installed on the computer. RWeka [5]. RWeka is an R interface to Weka [6], which Secondly, the RKEEL package uses some R packages from code is (mostly) implemented in R. Weka is a collection of the CRAN repository: machine learning algorithms for data mining tasks written in ∙ XML [8]: it includes many approaches for both reading Java, containing tools for data pre-processing, classification, and creating XML documents. Version 3.98-1.3 of XML regression, clustering, association rules, and visualization. So, package also needs at least version 2.1 of R. this package contains the interface code to use Weka function- ∙ R6 [9]: the R6 package allows the creation of classes alities from R code. with reference semantics, similar to R’s built-in reference It uses the rJava package [7] for low-level R/Java interfacing classes, but in a simpler and lighter-weight way, and they to provide access to Weka’s functionality. Thereby, the user are not built on S4 classes so they do not require the can create R versions of Weka’s classes with the usual “R methods package. These classes allow public and private look and feel”, because Weka provides abstract core classes for members, and they support inheritance, even when the its learners and a consistent functional methods interface for classes are defined in different packages.

Load more