The WEKA Workbench

The WEKA Workbench Eibe Frank, Mark A. Hall, and Ian H. Witten Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques” Morgan Kaufmann, Fourth Edition, 2016 2 Contents 1 Introduction to Weka 7 1.1 What’s in WEKA? . .7 1.2 How do you use it? . .8 1.3 What else can you do? . .8 1.4 How do you get it? . .9 1.5 The package management system . .9 2 The Explorer 15 2.1 Getting started . 16 2.1.1 Preparing the data . 17 2.1.2 Loading the data into the Explorer . 17 2.1.3 Building a decision tree . 18 2.1.4 Examining the output . 19 2.1.5 Doing it again . 19 2.1.6 Working with models . 21 2.1.7 When things go wrong . 22 2.2 Exploring the Explorer . 22 2.2.1 Loading and filtering files . 23 2.2.2 Converting files to ARFF . 23 2.2.3 Using filters . 24 2.2.4 Training and testing learning schemes . 25 2.2.5 Using a metalearner . 26 2.2.6 Clustering and association rules . 26 2.2.7 Attribute selection . 27 2.2.8 Visualization . 27 2.3 Filtering algorithms . 27 2.3.1 Unsupervised attribute filters . 28 2.3.1.1 Adding and removing attributes . 28 2.3.1.2 Changing values . 30 2.3.1.3 Conversions . 30 2.3.1.4 String conversion . 31 2.3.1.5 Time series . 32 2.3.1.6 Randomizing the attributes . 32 2.3.2 Unsupervised instance filters . 32 2.3.2.1 Randomizing and subsampling . 32 2.3.2.2 Sparse instances . 33 2.3.3 Supervised filters . 33 3 4 CONTENTS 2.3.3.1 Supervised attribute filters . 34 2.3.3.2 Supervised instance filters . 34 2.4 Learning algorithms . 35 2.4.1 Bayesian classifiers . 35 2.4.2 Trees . 36 2.4.3 Rules . 37 2.4.4 Functions . 37 2.4.5 Neural networks . 39 2.4.6 Lazy classifiers . 40 2.4.7 Miscellaneous classifiers . 40 2.4.8 Metalearning algorithms . 41 2.4.8.1 Bagging and randomization . 41 2.4.8.2 Boosting . 41 2.4.8.3 Combining classifiers . 42 2.4.8.4 Cost-sensitive learning . 42 2.4.8.5 Optimizing performance . 42 2.4.8.6 Retargeting classifiers for different tasks . 43 2.5 Clustering algorithms . 43 2.6 Association rule learners . 44 2.7 Attribute selection . 45 2.7.1 Attribute subset evaluators . 45 2.7.2 Single-attribute evaluators . 46 2.7.3 Search methods . 46 3 The Knowledge Flow Interface 73 3.1 Getting started . 73 3.2 Knowledge Flow components . 75 3.3 Configuring and connecting the components . 77 3.4 Incremental learning . 78 4 The Experimenter 81 4.1 Getting started . 81 4.2 Running an experiment . 83 4.3 Analyzing the results . 83 4.4 Simple setup . 84 4.5 Advanced setup . 85 4.6 The Analyze panel . 87 4.7 Distributing processing over several machines . 89 5 The Command-Line Interface 91 5.1 Getting started . 91 5.1.1 weka.Run . 91 5.2 The structure of WEKA . 92 5.2.1 Classes, instances, and packages . 92 5.2.2 The weka.core package . 93 5.2.3 The weka.classifiers package . 93 5.2.4 Other packages . 102 5.2.5 Javadoc indexes . 102 5.3 Command-line options . 102 CONTENTS 5 5.3.1 Generic options . 104 5.3.2 Scheme-specific options . 105 6 Embedded Machine Learning 107 6.1 A simple data mining application . 107 6.2 main() . 107 6.3 messageClassifier() . 111 6.4 updateData() . 111 6.5 classifyMessage() . 112 7 Writing New Learning Schemes 113 7.1 An example classifier . 113 7.1.1 buildClassifier() . 120 7.1.2 makeTree() . 120 7.1.3 computeInfoGain() . 121 7.1.4 classifyInstance() . 121 7.1.5 toSource() . 122 7.1.6 main() . 122 7.2 Conventions for implementing classifiers . 126 7.2.1 Capabilities . 126 6 CONTENTS Chapter 1 Introduction to Weka The WEKA workbench is a collection of machine learning algorithms and data preprocessing tools that includes virtually all the algorithms described in our book. It is designed so that you can quickly try out existing methods on new datasets in flexible ways. It provides extensive support for the whole process of experimental data mining, including preparing the input data, evaluating learning schemes statistically, and visualizing the input data and the result of learning. As well as a wide variety of learning algorithms, it includes a wide range of preprocessing tools. This diverse and comprehensive toolkit is accessed through a common interface so that its users can compare different methods and identify those that are most appropriate for the problem at hand. WEKA was developed at the University of Waikato in New Zealand; the name stands for Waikato Environment for Knowledge Analysis. Outside the university the WEKA, pronounced to rhyme with Mecca, is a flightless bird with an inquisitive nature found only on the islands of New Zealand. The system is written in Java and distributed under the terms of the GNU General Public License. It runs on almost any platform and has been tested under Linux, Windows, and Macintosh operating systems. 1.1 What’s in WEKA? WEKA provides implementations of learning algorithms that you can easily apply to your dataset. It also includes a variety of tools for transforming datasets, such as the algorithms for discretization and sampling. You can preprocess a dataset, feed it into a learning scheme, and analyze the resulting classifier and its performance—all without writing any program code at all. The workbench includes methods for the main data mining problems: regression, classifica- tion, clustering, association rule mining, and attribute selection. Getting to know the.

Load more