WEKA Manual for Version 3-7-8 Remco R. Bouckaert Eibe Frank Mark Hall Richard Kirkby Peter Reutemann Alex Seewald David Scuse January 21, 2013 ⃝c 2002-2013 University of Waikato, Hamilton, New Zealand Alex Seewald (original Commnd-line primer) David Scuse (original Experimenter tutorial) This manual is licensed under the GNU General Public License version 3. More information about this license can be found at http://www.gnu.org/licenses/gpl-3.0-standalone.html Contents ITheCommand-line 11 1Acommand-lineprimer 13 1.1 Introduction . 13 1.2 Basic concepts . 14 1.2.1 Dataset . 14 1.2.2 Classifier . 16 1.2.3 weka.filters . 17 1.2.4 weka.classifiers . 19 1.3 Examples . 23 1.4 Additional packages and the package manager . .24 1.4.1 Package management . 25 1.4.2 Running installed learning algorithms . 26 II The Graphical User Interface 29 2LaunchingWEKA 31 3PackageManager 35 3.1 Mainwindow ............................. 35 3.2 Installing and removing packages . 36 3.2.1 Unofficalpackages ...................... 37 3.3 Usingahttpproxy.......................... 37 3.4 Using an alternative central package meta data repository . 37 3.5 Package manager property file . 38 4SimpleCLI 39 4.1 Commands . 39 4.2 Invocation . 40 4.3 Command redirection . 40 4.4 Command completion . 41 5Explorer 43 5.1 The user interface . 43 5.1.1 Section Tabs . 43 5.1.2 Status Box . 43 5.1.3 Log Button . 44 5.1.4 WEKA Status Icon . 44 3 4 CONTENTS 5.1.5 Graphical output . 44 5.2 Preprocessing . 45 5.2.1 Loading Data . 45 5.2.2 The Current Relation . 45 5.2.3 Working With Attributes . 46 5.2.4 Working With Filters . 47 5.3 Classification . 49 5.3.1 Selecting a Classifier . 49 5.3.2 Test Options . 49 5.3.3 The Class Attribute . 50 5.3.4 Training a Classifier . 51 5.3.5 The Classifier Output Text . 51 5.3.6 TheResultList........................ 51 5.4 Clustering............................... 53 5.4.1 Selecting a Clusterer . 53 5.4.2 ClusterModes ........................ 53 5.4.3 Ignoring Attributes . 53 5.4.4 Working with Filters . 54 5.4.5 Learning Clusters . 54 5.5 Associating .............................. 55 5.5.1 SettingUp .......................... 55 5.5.2 Learning Associations . 55 5.6 Selecting Attributes . 56 5.6.1 Searching and Evaluating . 56 5.6.2 Options . 56 5.6.3 Performing Selection . 56 5.7 Visualizing .............................. 58 5.7.1 The scatter plot matrix . 58 5.7.2 Selecting an individual 2D scatter plot . 58 5.7.3 Selecting Instances . 59 6Experimenter 61 6.1 Introduction . 61 6.2 Standard Experiments . 62 6.2.1 Simple............................. 62 6.2.1.1 New experiment . 62 6.2.1.2 Results destination . 62 6.2.1.3 Experiment type . 64 6.2.1.4 Datasets . 66 6.2.1.5 Iteration control . 67 6.2.1.6 Algorithms . 67 6.2.1.7 Saving the setup . 69 6.2.1.8 Running an Experiment . 70 6.2.2 Advanced . 71 6.2.2.1 Defining an Experiment . 71 6.2.2.2 Running an Experiment . 74 6.2.2.3 Changing the Experiment Parameters . 76 6.2.2.4 Other Result Producers . 83 6.3 Cluster Experiments . 89 6.4 Remote Experiments . 92 CONTENTS 5 6.4.1 Preparation . 92 6.4.2 Database Server Setup . 92 6.4.3 Remote Engine Setup . 93 6.4.4 Configuring the Experimenter . 94 6.4.5 Multi-core support . 95 6.4.6 Troubleshooting . 95 6.5 AnalysingResults........................... 97 6.5.1 Setup ............................. 97 6.5.2 Saving the Results . 100 6.5.3 Changing the Baseline Scheme . 100 6.5.4 Statistical Significance . 101 6.5.5 Summary Test . 101 6.5.6 Ranking Test . 102 7KnowledgeFlow 103 7.1 Introduction . 103 7.2 Features . 105 7.3 Components . 106 7.3.1 DataSources . 106 7.3.2 DataSinks . 106 7.3.3 Filters . 106 7.3.4 Classifiers . 106 7.3.5 Clusterers . 106 7.3.6 Evaluation . 106 7.3.7 Visualization . 108 7.4 Examples . 109 7.4.1 Cross-validated J48 . 109 7.4.2 Plotting multiple ROC curves . 111 7.4.3 Processing data incrementally . 114 7.5 Plugins . 116 7.5.1 Flow components . 116 7.5.2 Perspectives . 116 8ArffViewer 119 8.1 Menus.................................120 8.2 Editing . 122 9BayesianNetworkClassifiers 125 9.1 Introduction . 125 9.2 Local score based structure learning . 129 9.2.1 Local score metrics . 129 9.2.2 Search algorithms . 130 9.3 Conditional independence test based structure learning. 133 9.4 Global score metric based structure learning . ...135 9.5 Fixed structure ’learning’ . 136 9.6 Distribution learning . 136 9.7 Running from the command line . 138 9.8 Inspecting Bayesian networks . 148 9.9 Bayes Network GUI . 151 9.10 Bayesian nets in the experimenter . 163 6 CONTENTS 9.11 Adding your own Bayesian network learners . .163 9.12 FAQ . 165 9.13 Future development . 166 III Data 169 10 ARFF 171 10.1 Overview . 171 10.2 Examples . 172 10.2.1 The ARFF Header Section . 172 10.2.2 The ARFF Data Section . 174 10.3 Sparse ARFF files . 175 10.4 Instance weights in ARFF files . 176 11 XRFF 177 11.1 File extensions . 177 11.2 Comparison . 177 11.2.1 ARFF . 177 11.2.2 XRFF . 178 11.3 Sparse format . 179 11.4 Compression . 180 11.5 Useful features . 180 11.5.1 Class attribute specification . 180 11.5.2 Attribute weights . 180 11.5.3 Instance weights . 181 12 Converters 183 12.1 Introduction . 183 12.2 Usage . 184 12.2.1 File converters . 184 12.2.2 Database converters . 184 13 Stemmers 187 13.1 Introduction . ..
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages327 Page
-
File Size-