Package 'Rattle'
Total Page:16
File Type:pdf, Size:1020Kb
Package ‘rattle’ March 19, 2013 Type Package Title Graphical user interface for data mining in R Version 2.6.26 Date 2013-03-16 Depends R (>= 2.13.0) Suggests RGtk2, pmml (>= 1.2.13), bitops, colorspace, ada, amap,arules, arulesViz, biclust, cairoDe- vice, cba, Deducer, descr,doBy, e1071, ellipse, fBasics, foreign, fpc, gdata, ggden- dro,ggplot2, gplots, graph, grid, gtools, gWidgetsRGtk2, hmeasure,Hmisc, kernlab, latticist, Ma- trix, methods, mice, network,nnet, odfWeave, party, playwith, psych, randomFor- est, RBGL,RColorBrewer, reshape, rggobi, RGtk2Extras, ROCR, RODBC, rpart,rpart.plot, Snow- ball, survival, timeDate, verification,weightedKmeans, XML, pkgDepTools, Rgraphviz Description Rattle (the R Analytic Tool To Learn Easily) provides a Gnome (RGtk2) based interface to R functionality for data mining. The aim is to provide a simple and intuitive interface that allows a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. All of this with knowing little about R. All R commands are logged and commented through the log tab. Thus they are available to the user as a script file or as an aide for the user to learn R or to copy-and-paste directly into R itself. Rattle also exports a number of utility functions and the graphical user interface, invoked as rattle(), does not need to be run to deploy these. License GPL (>= 2) LazyLoad yes LazyData yes URL http://rattle.togaware.com/ Author Graham Williams [aut, cph, cre], Mark Vere Culp [cph], Ed Cox [ctb], Anthony Nolan [ctb], Denis White [cph], Daniele Medri [ctb], Akbar Waljee [ctb] (OOB AUC for Random Forest) 1 2 R topics documented: Maintainer Graham Williams <[email protected]> NeedsCompilation no X-CRAN-Comment Earlier versions of this package have been removed: it contained copies of copyright code used contrary to its license and with no acknowledgment of copyright. Repository CRAN Date/Publication 2013-03-19 07:27:59 R topics documented: acquireAuditData . .3 asRules . .4 asRules.rpart . .4 audit . .5 binning . .6 calcInitialDigitDistr . .7 calculateAUC . .8 centers.hclust . .9 doRiskChart . 10 drawTreeNodes . 11 drawTreesAda . 12 evaluateRisk . 13 fancyRpartPlot . 14 genPlotTitleCmd . 15 listAdaVarsUsed . 16 listTreesAda . 16 listVersions . 17 modalvalue . 18 plotBenfordsLaw . 18 plotNetwork . 19 plotOptimalLine . 20 plotRisk . 21 printRandomForests . 23 randomForest2Rules . 24 rattle . 25 rattle.print.summary.multinom . 26 rattleInfo . 27 rescale.by.group . 28 riskchart . 29 savePlotToFile . 31 setupDataset . 32 treeset.randomForest . 32 weather . 33 whichNumerics . 35 wine............................................. 36 acquireAuditData 3 Index 38 acquireAuditData Generate the audit dataset. Description Rattle uses an artificial dataset for demonstration purposes. This function retrieves the source data ftp://ftp.ics.uci.edu/pub/machine-learning-databases/adult/adult.data and then trans- forms the data in a variety of ways. Usage acquireAuditData(write.to.file=FALSE) Arguments write.to.file Whether to generate a colleciton of files based on the data. The files generated include: audit.csv, audit.Rdata, audit.arf, and audit\_missing.csv Details See the function definition for details of the processing done on the data downloaded from the UCI repository. Value By default the function returns a data frame containing the audit dataset. If write.to.file is TRUE then the data frame is returned invisibly. Author(s) <[email protected]> References Package home page: http://rattle.togaware.com See Also audit, rattle. 4 asRules.rpart asRules List the rules corresponding to the rpart decision tree Description Display a list of rules for an rpart decision tree. Usage asRules(model, compact=FALSE, ...) Arguments model an rpart model. compact whether to list cateogricals compactly. ... further arguments passed to or from other methods. Details Traverse a decision tree to generate the equivalent set of rules, one rule for each path from the root node to a leaf node. Author(s) <[email protected]> References Package home page: http://rattle.togaware.com Examples ## Not run: asRules.rpart(my.rpart) asRules.rpart List the rules corresponding to the rpart decision tree Description Display a list of rules for an rpart decision tree. Usage ## S3 method for class ’rpart’ asRules(model, compact=FALSE, ...) audit 5 Arguments model an rpart model. compact whether to list cateogricals compactly. ... further arguments passed to or from other methods. Details Traverse a decision tree to generate the equivalent set of rules, one rule for each path from the root node to a leaf node. Author(s) <[email protected]> References Package home page: http://rattle.togaware.com Examples ## Not run: asRules.rpart(my.rpart) audit Sample dataset for illustration Rattle functionality. Description The audit dataset is an artificially constructed dataset that has some of the characteristics of a true financial audit dataset for modelling productive and non-productive audits of a person’s financial statement. A productive audit is one which identifies errors or inaccuracies in the information pro- vided by a client. A non-productive audit is usually an audit which found all supplied information to be in order. The audit dataset is used to illustrate binary classification. The target variable is identified as TARGET_Adjusted. The dataset is quite small, consisting of just 2000 entities. Its primary purpose is to illustrate modelling in Rattle, so a minimally sized dataset is suitable. The dataset itself is derived from publicly available data (which has nothing to do with audits). See acquireAuditData for details. 6 binning Format A data frame. In line with data mining terminology we refer to the rows of the data frame (or the observations) as entities. The columns are refered to as variables. The entities represent people in this case. We describe the variables here: ID This is a unique identifier for each person. Age The age. Employment The type of employment. Education The highest level of education. Marital Current marital status. Occupation The type of occupation. Income The amount of income declared. Gender The persons gender. Deductions Total amount of expenses that a person claims in their financial statement. Hours The average hours worked on a weekly basis. IGNORE_Accounts The main country in which the person has most of their money banked. Note that the variable name is prefixed with IGNORE. This is recognised by Rattle as the default role for this variable. RISK_Adjustment This variable records the monetary amount of any adjustment to the person’s financial claims as a result of a productive audit. This variable, which should not be treated as an input variable, is thus a measure of the size of the risk associated with the person. TARGET_Adjusted The target variable for modelling (generally for classification modelling). This is a numeric field of class integer, but limited to 0 and 1, indicating non-productive and pro- ductive audits, respectively. Productive audits are those that result in an adjustment being made to a client’s financial statement. binning Perform binning over numeric data Description Perform binning. Usage binning(x, bins=4, method=c("quantile", "wtd.quantile", "kmeans"), labels=NULL, ordered=TRUE, weights=NULL) calcInitialDigitDistr 7 Arguments x the numeric data to bin. bins the number of bins to use. method whether to use "quantile", weighted quantile "wtd.quantile" or "kmeans" bin- ning. labels the labels or names to use for each of the bins. ordered whether to build an ordered factor or not. weights vector of numeric weights for each observation for weighted quantile binning. Details Bin the provided nmeric data into the specified number of bins using one of the supported methods. The bins will have the names specified by labels, if supplied. The result can optionally be an ordered factor. Value A factor is returned. Author(s) Daniele Medri and <[email protected]> References Package home page: http://rattle.togaware.com calcInitialDigitDistr Generate a frequency count of the initial digits Description In the context of Benford’s Law calculate the distribution of the frequencies of the first digit of the numbers supplied as the argument. Usage calcInitialDigitDistr(l, digit=1, split=c("none", "positive", "negative")) Arguments l a vector of numbers. digit the digit to generate frequencies for. split whether and how to split the digits. 8 calculateAUC Author(s) <[email protected]> References Package home page: http://rattle.togaware.com calculateAUC Determine area under a curve (e.g. a risk or recall curve) of a risk chart Description Given the evaluation returned by evaluateRisk, for example, calculate the area under the risk or recall curves, to use as a metric to compare the performance of a model. Usage calculateAUC(x, y) Arguments x a vector of values for the x points. y a vector of values for the y points. Details The area is returned. Author(s) <[email protected]> References Package home page: http://rattle.togaware.com See Also evaluateRisk. centers.hclust 9 Examples ## this is usually used in the context of the evaluateRisk function ## Not run: ev <- evaluateRisk(predicted, actual, risk) ## imitate this output here ev <- data.frame(Caseload=c(1.0, 0.8, 0.6, 0.4, 0.2, 0), Precision=c(0.15, 0.18, 0.21, 0.25, 0.28, 0.30), Recall=c(1.0, 0.95, 0.80, 0.75, 0.5, 0.0), Risk=c(1.0, 0.98, 0.90, 0.77, 0.30, 0.0)) ## Calculate the areas unde the Risk and the Recall curves. calculateAUC(ev$Caseload, ev$Risk) calculateAUC(ev$Caseload, ev$Recall) centers.hclust List Cluster Centers for a Hierarchical Cluster Description Generate a matrix of centers from a hierarchical cluster. Usage centers.hclust(x, h, nclust=10, use.median=FALSE) Arguments x The data used to build the cluster. h A hclust object. nclust Number of clusters.