Rapidminer-4.3-Tutorial.Pdf

RapidMiner 4.3 User Guide Operator Reference Developer Tutorial 2 Rapid-I GmbH Stockumer Str. 475 44227 Dortmund, Germany http://www.rapidminer.com/ Copyright 2001-2008 by Rapid-I November 22, 2008 Contents 1 Introduction 27 1.1 Modeling Knowledge Discovery Processes as Operator Trees.. 28 1.2 RapidMiner as a Data Mining Interpreter........... 28 1.3 Different Ways of Using RapidMiner .............. 30 1.4 Multi-Layered Data View Concept................ 30 1.5 Transparent Data Handling.................... 31 1.6 Meta Data............................. 31 1.7 Large Number of Built-in Data Mining Operators........ 31 1.8 Extending RapidMiner ..................... 32 1.9 Example Applications....................... 33 1.10 How this tutorial is organized................... 34 2 Installation and starting notes 35 2.1 Download............................. 35 2.2 Installation............................. 35 2.2.1 Installing the Windows executable............ 35 2.2.2 Installing the Java version (any platform)........ 36 2.3 Starting RapidMiner ...................... 36 2.4 Memory Usage........................... 38 2.5 Plugins............................... 38 2.6 General settings.......................... 38 2.7 External Programs......................... 39 2.8 Database Access.......................... 39 3 4 CONTENTS 3 First steps 43 3.1 First example........................... 43 3.2 Process configuration files..................... 46 3.3 Parameter Macros......................... 47 3.4 File formats............................ 48 3.4.1 Data files and the attribute description file........ 49 3.4.2 Model files......................... 53 3.4.3 Attribute construction files................ 53 3.4.4 Parameter set files..................... 54 3.4.5 Attribute weight files................... 54 3.5 File format summary....................... 55 4 Advanced processes 57 4.1 Feature selection.......................... 57 4.2 Splitting up Processes....................... 59 4.2.1 Learning a model..................... 59 4.2.2 Applying the model.................... 59 4.3 Parameter and performance analysis............... 61 4.4 Support and tips.......................... 64 5 Operator reference 67 5.1 Basic operators.......................... 68 5.1.1 ModelApplier....................... 68 5.1.2 ModelGrouper....................... 68 5.1.3 ModelUngrouper...................... 69 5.1.4 ModelUpdater....................... 70 5.1.5 OperatorChain....................... 70 5.2 Core operators........................... 72 5.2.1 CommandLineOperator.................. 72 5.2.2 DataMacroDefinition................... 73 5.2.3 Experiment........................ 74 November 22, 2008 CONTENTS 5 5.2.4 FileEcho.......................... 75 5.2.5 IOConsumer........................ 76 5.2.6 IOMultiplier........................ 76 5.2.7 IORetriever........................ 77 5.2.8 IOSelector......................... 78 5.2.9 IOStorer.......................... 79 5.2.10 MacroDefinition...................... 80 5.2.11 MaterializeDataInMemory................. 81 5.2.12 MemoryCleanUp...................... 81 5.2.13 Process.......................... 82 5.2.14 SQLExecution....................... 83 5.2.15 SingleMacroDefinition................... 83 5.3 Input/Output operators...................... 85 5.3.1 AccessExampleSource................... 85 5.3.2 ArffExampleSetWriter................... 86 5.3.3 ArffExampleSource.................... 86 5.3.4 AttributeConstructionsLoader............... 88 5.3.5 AttributeConstructionsWriter............... 89 5.3.6 AttributeWeightsLoader.................. 90 5.3.7 AttributeWeightsWriter.................. 90 5.3.8 BibtexExampleSource................... 91 5.3.9 C45ExampleSource.................... 92 5.3.10 CSVExampleSetWriter.................. 94 5.3.11 CSVExampleSource.................... 95 5.3.12 CachedDatabaseExampleSource............. 96 5.3.13 ChurnReductionExampleSetGenerator.......... 98 5.3.14 ClusterModelReader.................... 99 5.3.15 ClusterModelWriter.................... 99 5.3.16 DBaseExampleSource................... 100 5.3.17 DatabaseExampleSetWriter................ 100 The RapidMiner 4.3 Tutorial 6 CONTENTS 5.3.18 DatabaseExampleSource................. 101 5.3.19 DirectMailingExampleSetGenerator............ 104 5.3.20 ExampleSetGenerator................... 104 5.3.21 ExampleSetWriter..................... 105 5.3.22 ExampleSource...................... 107 5.3.23 ExcelExampleSetWriter.................. 109 5.3.24 ExcelExampleSource.................... 110 5.3.25 GnuplotWriter....................... 111 5.3.26 IOContainerReader.................... 112 5.3.27 IOContainerWriter..................... 113 5.3.28 IOObjectReader...................... 113 5.3.29 IOObjectWriter...................... 114 5.3.30 MassiveDataGenerator.................. 115 5.3.31 ModelLoader........................ 115 5.3.32 ModelWriter........................ 116 5.3.33 MultipleLabelGenerator.................. 117 5.3.34 NominalExampleSetGenerator............... 118 5.3.35 ParameterSetLoader.................... 119 5.3.36 ParameterSetWriter.................... 119 5.3.37 PerformanceLoader.................... 120 5.3.38 PerformanceWriter.................... 121 5.3.39 ResultWriter........................ 121 5.3.40 SPSSExampleSource................... 122 5.3.41 SalesExampleSetGenerator................ 123 5.3.42 SimpleExampleSource................... 124 5.3.43 SparseFormatExampleSource............... 126 5.3.44 StataExampleSource................... 127 5.3.45 TeamProfitExampleSetGenerator............. 128 5.3.46 ThresholdLoader...................... 129 5.3.47 ThresholdWriter...................... 129 November 22, 2008 CONTENTS 7 5.3.48 TransfersExampleSetGenerator.............. 130 5.3.49 UpSellingExampleSetGenerator.............. 131 5.3.50 WekaModelLoader..................... 131 5.3.51 XrffExampleSetWriter................... 132 5.3.52 XrffExampleSource.................... 133 5.4 Learning schemes......................... 136 5.4.1 AdaBoost......................... 136 5.4.2 AdditiveRegression.................... 137 5.4.3 AgglomerativeClustering................. 138 5.4.4 AssociationRuleGenerator................. 139 5.4.5 AttributeBasedVote.................... 140 5.4.6 Bagging.......................... 140 5.4.7 BasicRuleLearner..................... 141 5.4.8 BayesianBoosting..................... 142 5.4.9 BestRuleInduction..................... 144 5.4.10 Binary2MultiClassLearner................. 145 5.4.11 CHAID........................... 146 5.4.12 ClassificationByRegression................ 147 5.4.13 ClusterModel2ExampleSet................ 148 5.4.14 CostBasedThresholdLearner................ 149 5.4.15 DBScanClustering..................... 150 5.4.16 DecisionStump...................... 151 5.4.17 DecisionTree........................ 152 5.4.18 DefaultLearner....................... 153 5.4.19 EvoSVM.......................... 154 5.4.20 ExampleSet2ClusterModel................ 156 5.4.21 ExampleSet2Similarity................... 157 5.4.22 ExampleSet2SimilarityExampleSet............ 157 5.4.23 FPGrowth......................... 158 5.4.24 FlattenClusterModel.................... 160 The RapidMiner 4.3 Tutorial 8 CONTENTS 5.4.25 GPLearner......................... 160 5.4.26 HyperHyper........................ 161 5.4.27 ID3............................. 162 5.4.28 ID3Numerical....................... 163 5.4.29 IteratingGSS........................ 164 5.4.30 JMySVMLearner...................... 166 5.4.31 KMeans.......................... 168 5.4.32 KMedoids......................... 168 5.4.33 KernelKMeans....................... 169 5.4.34 KernelLogisticRegression................. 171 5.4.35 LibSVMLearner...................... 172 5.4.36 LinearRegression...................... 174 5.4.37 LogisticRegression..................... 175 5.4.38 MetaCost......................... 176 5.4.39 MultiCriterionDecisionStump............... 177 5.4.40 MyKLRLearner...................... 178 5.4.41 NaiveBayes........................ 179 5.4.42 NearestNeighbors..................... 180 5.4.43 NeuralNet......................... 181 5.4.44 OneR........................... 183 5.4.45 Perceptron......................... 183 5.4.46 PolynomialRegression................... 184 5.4.47 PsoSVM.......................... 185 5.4.48 RVMLearner........................ 187 5.4.49 RandomFlatClustering................... 188 5.4.50 RandomForest....................... 189 5.4.51 RandomTree........................ 190 5.4.52 RelativeRegression..................... 191 5.4.53 RelevanceTree....................... 192 5.4.54 RuleLearner........................ 193 November 22, 2008 CONTENTS 9 5.4.55 Similarity2ExampleSet................... 194 5.4.56 Stacking.......................... 195 5.4.57 SubgroupDiscovery.................... 196 5.4.58 SupportVectorClustering................. 197 5.4.59 TopDownClustering.................... 198 5.4.60 TransformedRegression.................. 199 5.4.61 Tree2RuleConverter.................... 200 5.4.62 Vote............................ 201 5.4.63 W-ADTree......................... 202 5.4.64 W-AODE......................... 203 5.4.65 W-AODEsr........................ 204 5.4.66 W-AdaBoostM1...................... 205 5.4.67 W-AdditiveRegression................... 206 5.4.68 W-Apriori......................... 207 5.4.69 W-BFTree......................... 208 5.4.70 W-BIFReader....................... 209 5.4.71 W-Bagging........................ 210 5.4.72 W-BayesNet........................ 211 5.4.73 W-BayesNetGenerator................... 212 5.4.74 W-BayesianLogisticRegression.............

Rapidminer-4.3-Tutorial.Pdf

Feature Selection for Gender Classification in TUIK Life Satisfaction Survey A

Effect of Distance Measures on Partitional Clustering Algorithms

An Analysis on the Performance of Rapid Miner and R Programming Language As Data Pre-Processing Tools for Unsupervised Form of Insurance Claim Dataset

Açık Kaynak Kodlu Veri Madenciliği Yazılımlarının Karşılaştırılması

Industry Applications of Machine Learning and Data Science

Mining the Web of Linked Data with Rapidminer

Rapidminer Operator Reference Manual ©2014 by Rapidminer

ML-Flex: a Flexible Toolbox for Performing Classification Analyses

Machine Learning

A Comparative Study on Various Data Mining Tools for Intrusion Detection

The Forrester Wave

Magic Quadrant for Data Science Platforms Published: 14 February 2017 ID: G00301536 Analyst(S): Alexander Linden, Peter Krensky, Jim Hare, Carlie J