Rapidminer 5 Operator Reference

Fareed Akthar, Caroline Hahne RapidMiner 5 Operator Reference Fareed Akthar, Caroline Hahne RapidMiner 5 Operator Reference 24th August 2012 Rapid-I www.rapid-i.com © 2012 by Rapid-I GmbH. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of Rapid-I GmbH. Preface Welcome to the RapidMiner Operator Reference, the final result of a long work- ing process. When we first started to plan this reference, we had an extensive discussion about the purpose of this book. Would anybody want to read the hole book? Starting with Ada Boost and reading the description of every single operator to the X-Validation? Or would it only serve for looking up particular operators, although that is also possible in the program interface itself? We decided for the latter and with growing progress in the making of this book, we realized how fuitile this entire discussion has been. It was not long until the book reached the 600 pages limit and now we nearly hit the 1000 pages, what is far beyond anybody would like to read entirely. Even if there would be a great structure, explaining the usage of single groups of operators as guiding transitions between the explanations of single operators, nobody could comprehend all that. The reader would have long forgotten about the Loop Clusters operator until he get's to know about cross validation. So we didn't dump any effort in that and hence the book has become a pure reference. For getting to know RapidMiner itself, this is not a suitable document. Therefore we would rather recommend to read the manual as a starting point. There are other documents available for particular scenarios, like using RapidMiner as a researcher or when you want to extend it's functionality. Please take a look at our website rapid-i.com to get an overview, which documentations are available. From that fact, we can draw some suggestions about how to read this book: Whenever you want to know about a particular operator, just open the index at the end of this book, and directly jump to the operator. The order of the V operators in this book is determined by the group structure in the operator tree, as you will immediately see, when taking a look at the contents. As operators for similar tasks are grouped together in RapidMiner, these operators are also near to each other in this book. So if you are interested in broading your perspective of RapidMiner beyond an already known operator, you can continue reading a few pages before and after the operator you picked from the index. Once you read the description of an operator, you can jump to the tutorial process, that will explain a possible use case. Often the functionality of an operator can be understood easier with a context of a complete process. All these pro- cesses are also available in RapidMiner. You simply need to open the description of this operator in the help view and scroll down. After pressing on the respective link, the process will be opened and you can inspect the details, execute it and analyse the results from break points. Apart from that, the explanation of the parameters will give you a good insight of what the operator is capable of and what it can be configured for. I think there's nothing left to say except wishing you a lot of illustrative encoun- ters with the various operators. And if you really read it from start to end, please tell us, as we have bets running on that. Of course we will verify that by checking if you found all the easter eggs. Sebastian Land VI Contents 1 Process Control 1 Remember...........................1 Recall..............................4 Multiply.............................7 Join Paths........................... 10 Handle Exception....................... 12 Throw Exception........................ 15 1.1 Parameter................................ 17 Set Parameters......................... 17 Optimize Parameters (Grid).................. 21 Optimize Parameters (Evolutionary)............. 26 1.2 Loop................................... 32 Loop............................... 32 Loop Attributes........................ 36 Loop Values........................... 42 Loop Examples......................... 46 Loop Clusters.......................... 49 Loop Data Sets......................... 51 Loop and Average....................... 54 Loop Parameters........................ 57 Loop Files............................ 62 X-Prediction.......................... 65 1.3 Branch.................................. 71 Branch............................. 71 VII Contents Select Subprocess........................ 75 1.4 Collections............................... 79 Collect.............................. 79 Select.............................. 82 Loop Collection......................... 85 2 Utility 89 Subprocess........................... 89 2.1 Macros.................................. 93 Set Macro............................ 93 Set Macros........................... 98 Generate Macro........................ 102 Extract Macro......................... 110 2.2 Logging................................. 118 Log............................... 118 Provide Macro as Log Value.................. 125 Log to Data........................... 128 2.3 Execution................................ 132 Execute Process........................ 132 Execute Script......................... 135 Execute SQL.......................... 141 Execute Program........................ 145 2.4 Files................................... 148 Write as Text.......................... 149 Copy File............................ 151 Rename File.......................... 153 Delete File........................... 156 Move File............................ 158 Create Directory........................ 161 2.5 Data Generation............................ 163 Generate Data......................... 163 Generate Nominal Data.................... 165 Generate Direct Mailing Data................. 167 Generate Sales Data...................... 169 VIII Contents Add Noise............................ 171 2.6 Miscellaneous.............................. 177 Materialize Data........................ 177 Free Memory.......................... 179 3 Repository Access 183 Retrieve............................. 183 Store............................... 185 4 Import 189 4.1 Data................................... 189 Read csv............................ 189 Read Excel........................... 194 Read SAS............................ 197 Read Access........................... 199 Read AML........................... 201 Read ARFF........................... 206 Read Database......................... 210 Stream Database........................ 215 Read SPSS........................... 219 4.2 Models.................................. 222 Read Model........................... 222 4.3 Attributes................................ 224 Read Weights.......................... 224 5 Export 227 Write.............................. 227 5.1 Data................................... 229 Write AML........................... 229 Write Arff............................ 232 Write Database......................... 236 Update Database........................ 240 Write Special Format..................... 244 5.2 Models.................................. 248 Write Model.......................... 248 IX Contents Write Clustering........................ 250 5.3 Attributes................................ 252 Write Weights......................... 252 Write Constructions...................... 255 5.4 Results.................................. 257 Write Performance....................... 257 5.5 Other.................................. 259 Write Parameters........................ 259 Write Threshold........................ 261 6 Data Transformation 265 6.1 Name and Role Modification..................... 265 Rename............................. 265 Rename by Replacing..................... 267 Set Role............................. 273 Exchange Roles......................... 277 6.2 Type Conversion............................ 279 Numerical to Binominal.................... 279 Numerical to Polynominal................... 285 Numerical to Real....................... 290 Real to Integer......................... 295 Nominal to Binominal..................... 300 Nominal to Text........................ 306 Nominal to Numerical..................... 310 Nominal to Date........................ 318 Text to Nominal........................ 327 Date to Numerical....................... 331 Date to Nominal........................ 336 Parse Numbers......................... 343 Format Numbers........................ 349 Guess Types.......................... 356 6.2.1 Discretization.......................... 361 Discretize by Size........................ 361 Discretize by Binning..................... 368 X Contents Discretize by Frequency.................... 375 Discretize by User Specification................ 382 Discretize by Entropy..................... 389 6.3 Attribute Set Reduction and Transformation............ 395 6.3.1 Generation........................... 395 Generate ID........................... 396 Generate Empty Attribute.................. 397 Generate Copy......................... 400 Generate Attributes...................... 402 Generate Concatenation.................... 417 Generate Aggregation..................... 419 Generate Function Set..................... 425 Optimization.......................... 429 Optimize

Load more