Package 'Rattle'

Total Page:16

File Type:pdf, Size:1020Kb

Package 'Rattle' Package ‘rattle’ September 5, 2017 Type Package Title Graphical User Interface for Data Science in R Version 5.1.0 Date 2017-09-04 Depends R (>= 2.13.0) Imports stats, utils, ggplot2, grDevices, graphics, magrittr, methods, RGtk2, stringi, stringr, tidyr, dplyr, cairoDevice, XML, rpart.plot Suggests pmml (>= 1.2.13), bitops, colorspace, ada, amap, arules, arulesViz, biclust, cba, cluster, corrplot, descr, doBy, e1071, ellipse, fBasics, foreign, fpc, gdata, ggdendro, ggraptR, gplots, grid, gridExtra, gtools, gWidgetsRGtk2, hmeasure, Hmisc, kernlab, Matrix, mice, nnet, odfWeave, party, playwith, plyr, psych, randomForest, rattle.data, RColorBrewer, readxl, reshape, rggobi, ROCR, RODBC, rpart, SnowballC, survival, timeDate, tm, verification, wskm, RGtk2Extras, xgboost Description The R Analytic Tool To Learn Easily (Rattle) provides a Gnome (RGtk2) based interface to R functionality for data science. The aim is to provide a simple and intuitive interface that allows a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. All of this with knowing little about R. All R commands are logged and commented through the log tab. Thus they are available to the user as a script file or as an aide for the user to learn R or to copy-and-paste directly into R itself. Rattle also exports a number of utility functions and the graphical user interface, invoked as rattle(), does not need to be run to deploy these. License GPL (>= 2) LazyLoad yes LazyData yes URL http://rattle.togaware.com/ 1 2 R topics documented: NeedsCompilation no Author Graham Williams [aut, cph, cre], Mark Vere Culp [cph], Ed Cox [ctb], Anthony Nolan [ctb], Denis White [cph], Daniele Medri [ctb], Akbar Waljee [ctb] (OOB AUC for Random Forest), Brian Ripley [cph] (print.summary.nnet), Jose Magana [ctb] (ggpairs plots), Surendra Tipparaju [ctb] (initial RevoScaleR/XDF), Durga Prasad Chappidi [ctb] (initial RevoScaleR/XDF), Dinesh Manyam Venkata [ctb] (initial RevoScaleR/XDF), Mrinal Chakraborty [ctb] (initial RevoScaleR/XDF), Fang Zhou [ctb] (initial xgboost) Maintainer Graham Williams <[email protected]> Repository CRAN Date/Publication 2017-09-05 07:31:14 UTC R topics documented: acquireAuditData . .3 asRules . .4 asRules.rpart . .5 audit . .6 binning . .7 calcInitialDigitDistr . .8 calculateAUC . .8 centers.hclust . .9 comcat . 10 drawTreeNodes . 11 drawTreesAda . 12 errorMatrix . 13 evaluateRisk . 14 fancyRpartPlot . 15 genPlotTitleCmd . 16 ggVarImp . 17 listAdaVarsUsed . 18 listTreesAda . 18 listVersions . 19 modalvalue . 20 plotOptimalLine . 20 plotRisk . 21 printRandomForests . 23 randomForest2Rules . 24 rattle . 25 acquireAuditData 3 rattle.print.summary.multinom . 26 rattleInfo . 27 rescale.by.group . 28 riskchart . 29 savePlotToFile . 32 setupDataset . 33 treeset.randomForest . 33 weather . 34 whichNumerics . 36 Index 37 acquireAuditData Generate the audit dataset. Description Rattle uses an artificial dataset for demonstration purposes. This function retrieves the source data ftp://ftp.ics.uci.edu/pub/machine-learning-databases/adult/adult.data and then trans- forms the data in a variety of ways. Usage acquireAuditData(write.to.file=FALSE) Arguments write.to.file Whether to generate a colleciton of files based on the data. The files generated include: audit.csv, audit.Rdata, audit.arf, and audit\_missing.csv Details See the function definition for details of the processing done on the data downloaded from the UCI repository. Value By default the function returns a data frame containing the audit dataset. If write.to.file is TRUE then the data frame is returned invisibly. Author(s) <[email protected]> References Package home page: http://rattle.togaware.com 4 asRules See Also audit, rattle. asRules List the rules corresponding to the rpart decision tree Description Display a list of rules for an rpart decision tree. Usage asRules(model, compact=FALSE, ...) Arguments model an rpart model. compact whether to list cateogricals compactly. ... further arguments passed to or from other methods. Details Traverse a decision tree to generate the equivalent set of rules, one rule for each path from the root node to a leaf node. Author(s) <[email protected]> References Package home page: http://rattle.togaware.com Examples ## Not run: asRules.rpart(my.rpart) asRules.rpart 5 asRules.rpart List the rules corresponding to the rpart decision tree Description Display a list of rules for an rpart decision tree. Usage ## S3 method for class 'rpart' asRules(model, compact=FALSE, ...) Arguments model an rpart model. compact whether to list cateogricals compactly. ... further arguments passed to or from other methods. Details Traverse a decision tree to generate the equivalent set of rules, one rule for each path from the root node to a leaf node. Author(s) <[email protected]> References Package home page: http://rattle.togaware.com Examples ## Not run: asRules.rpart(my.rpart) 6 audit audit Sample dataset to illustrate Rattle functionality. Description The audit dataset is an artificially constructed dataset that has some of the characteristics of a true financial audit dataset for modelling productive and non-productive audits of a person’s financial statement. A productive audit is one which identifies errors or inaccuracies in the information pro- vided by a client. A non-productive audit is usually an audit which found all supplied information to be in order. The audit dataset is used to illustrate binary classification. The target variable is identified as TARGET_Adjusted. The dataset is quite small, consisting of just 2000 entities. Its primary purpose is to illustrate modelling in Rattle, so a minimally sized dataset is suitable. The dataset itself is derived from publicly available data (which has nothing to do with audits). Format A data frame. In line with data mining terminology we refer to the rows of the data frame (or the observations) as entities. The columns are refered to as variables. The entities represent people in this case. We describe the variables here: ID This is a unique identifier for each person. Age The age. Employment The type of employment. Education The highest level of education. Marital Current marital status. Occupation The type of occupation. Income The amount of income declared. Gender The persons gender. Deductions Total amount of expenses that a person claims in their financial statement. Hours The average hours worked on a weekly basis. IGNORE_Accounts The main country in which the person has most of their money banked. Note that the variable name is prefixed with IGNORE. This is recognised by Rattle as the default role for this variable. RISK_Adjustment This variable records the monetary amount of any adjustment to the person’s financial claims as a result of a productive audit. This variable, which should not be treated as an input variable, is thus a measure of the size of the risk associated with the person. TARGET_Adjusted The target variable for modelling (generally for classification modelling). This is a numeric field of class integer, but limited to 0 and 1, indicating non-productive and pro- ductive audits, respectively. Productive audits are those that result in an adjustment being made to a client’s financial statement. binning 7 binning Perform binning over numeric data Description Perform binning. Usage binning(x, bins=4, method=c("quantile", "wtd.quantile", "kmeans"), labels=NULL, ordered=TRUE, weights=NULL) Arguments x the numeric data to bin. bins the number of bins to use. method whether to use "quantile", weighted quantile "wtd.quantile" or "kmeans" bin- ning. labels the labels or names to use for each of the bins. ordered whether to build an ordered factor or not. weights vector of numeric weights for each observation for weighted quantile binning. Details Bin the provided nmeric data into the specified number of bins using one of the supported methods. The bins will have the names specified by labels, if supplied. The result can optionally be an ordered factor. Value A factor is returned. Author(s) Daniele Medri and Graham Williams References Package home page: http://rattle.togaware.com 8 calculateAUC calcInitialDigitDistr Generate a frequency count of the initial digits Description In the context of Benford’s Law calculate the distribution of the frequencies of the first digit of the numbers supplied as the argument. Usage calcInitialDigitDistr(l, digit=1, len=1, sp=c("none", "positive", "negative")) Arguments l a vector of numbers. digit the digit to generate frequencies for. len The number of digits. sp whether and how to split the digits. Author(s) <[email protected]> References Package home page: http://rattle.togaware.com calculateAUC Determine area under a curve (e.g. a risk or recall curve) of a risk chart Description Given the evaluation returned by evaluateRisk, for example, calculate the area under the risk or recall curves, to use as a metric to compare the performance of a model. Usage calculateAUC(x, y) Arguments x a vector of values for the x points. y a vector of values for the y points. centers.hclust 9 Details The area is returned. Author(s) <[email protected]> References Package home page: http://rattle.togaware.com See Also evaluateRisk. Examples ## this is usually used in the context of the evaluateRisk function ## Not run: ev <- evaluateRisk(predicted, actual, risk) ## imitate this output here ev <- data.frame(Caseload=c(1.0, 0.8, 0.6, 0.4, 0.2, 0), Precision=c(0.15, 0.18, 0.21, 0.25, 0.28, 0.30), Recall=c(1.0, 0.95, 0.80, 0.75, 0.5, 0.0), Risk=c(1.0, 0.98, 0.90, 0.77, 0.30, 0.0)) ## Calculate the areas unde the Risk and the Recall curves. calculateAUC(ev$Caseload, ev$Risk) calculateAUC(ev$Caseload, ev$Recall) centers.hclust List Cluster Centers for a Hierarchical Cluster Description Generate a matrix of centers from a hierarchical cluster. Usage centers.hclust(x, object, nclust=10, use.median=FALSE) Arguments x The data used to build the cluster.
Recommended publications
  • Outline of Machine Learning
    Outline of machine learning The following outline is provided as an overview of and topical guide to machine learning: Machine learning – subfield of computer science[1] (more particularly soft computing) that evolved from the study of pattern recognition and computational learning theory in artificial intelligence.[1] In 1959, Arthur Samuel defined machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed".[2] Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.[3] Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions. Contents What type of thing is machine learning? Branches of machine learning Subfields of machine learning Cross-disciplinary fields involving machine learning Applications of machine learning Machine learning hardware Machine learning tools Machine learning frameworks Machine learning libraries Machine learning algorithms Machine learning methods Dimensionality reduction Ensemble learning Meta learning Reinforcement learning Supervised learning Unsupervised learning Semi-supervised learning Deep learning Other machine learning methods and problems Machine learning research History of machine learning Machine learning projects Machine learning organizations Machine learning conferences and workshops Machine learning publications
    [Show full text]
  • Package 'Rattle'
    Package ‘rattle’ December 16, 2019 Type Package Title Graphical User Interface for Data Science in R Version 5.3.0 Date 2019-12-16 Depends R (>= 3.5.0) Imports stats, utils, ggplot2, grDevices, graphics, magrittr, methods, stringi, stringr, tidyr, dplyr, XML, rpart.plot Suggests pmml (>= 1.2.13), bitops, colorspace, ada, amap, arules, arulesViz, biclust, cairoDevice, cba, cluster, corrplot, descr, doBy, e1071, ellipse, fBasics, foreign, fpc, gdata, ggdendro, ggraptR, gplots, grid, gridExtra, gtools, gWidgetsRGtk2, hmeasure, Hmisc, kernlab, Matrix, mice, nnet, party, plyr, psych, randomForest, RColorBrewer, readxl, reshape, rggobi, RGtk2, ROCR, RODBC, rpart, scales, SnowballC, survival, timeDate, tm, verification, wskm, xgboost Description The R Analytic Tool To Learn Easily (Rattle) provides a collection of utilities functions for the data scientist. A Gnome (RGtk2) based graphical interface is included with the aim to provide a simple and intuitive introduction to R for data science, allowing a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. A key aspect of the GUI is that all R commands are logged and commented through the log tab. This can be saved as a standalone R script file and as an aid for the user to learn R or to copy-and-paste directly into R itself. License GPL (>= 2) LazyLoad yes LazyData yes URL https://rattle.togaware.com/ NeedsCompilation no 1 2 R topics documented: Author Graham
    [Show full text]
  • Customer and Business Analytics
    Customer and Business Analytics Computer Science/Business The R Series Customer and Business Analytics: Applied Data Mining for Business Decision Customer and Making Using R explains and demonstrates, via the accompanying open-source software, how advanced analytical tools can address various business problems. It also gives insight into some of the challenges faced when deploying these Business Analytics tools. Extensively classroom-tested, the text is ideal for students in customer and business analytics or applied data mining as well as professionals in small- Applied Data Mining for to medium-sized organizations. The book offers an intuitive understanding of how different analytics algorithms Business Decision Making Using R work. Where necessary, the authors explain the underlying mathematics in an accessible manner. Each technique presented includes a detailed tutorial that enables hands-on experience with real data. The authors also discuss issues often encountered in applied data mining projects and present the CRISP-DM process model as a practical framework for organizing these projects. Features • Enables an understanding of the types of business problems that advanced analytical tools can address • Explores the benefits and challenges of using data mining tools in business applications • Provides online access to a powerful, GUI-enhanced customized R package, allowing easy experimentation with data mining techniques • Includes example data sets on the book’s website Showing how data mining can improve the performance of organizations, this book and its R-based software provide the skills and tools needed to successfully develop advanced analytics capabilities. Putler • Krider Daniel S. Putler K14501 Robert E. Krider K14501_Cover.indd 1 4/6/12 9:50 AM Customer and Business Analytics Applied Data Mining for Business Decision Making Using R Chapman & Hall/CRC The R Series Series Editors John M.
    [Show full text]
  • The R System – an Introduction and Overview
    The R System – An Introduction and Overview J H Maindonald Centre for Mathematics and Its Applications Australian National University. http://www.maths.anu.edu.au/˜johnm c J. H. Maindonald 2007, 2008, 2009. Permission is given to make copies for personal study and class use. Current draft: October 28, 2010 Languages shape the way we think, and determine what we can think about. [Benjamin Whorf.] S has forever altered the way people analyze, visualize, and manipulate data... S is an elegant, widely accepted, and enduring software system, with conceptual integrity, thanks to the insight, taste, and effort of John Chambers. [From the citation for the 1998 Association for Computing Machinery Software award.] 2 Contents 1 Preliminaries 11 1.1 Installation of R and of R Packages . 11 1.2 The R Commander Graphical User Interface, and Alternatives . 12 1.2.1 The R Commander GUI – A Guided to Getting Started . 12 2 An Overview of R 15 2.1 Use of the console (i.e., command line) window . 15 2.2 A Short R Session . 16 2.3 Data frames – Grouping together columns of data . 19 2.4 Input of Data from a File . 20 2.5 Demonstrations, & Help Examples . 21 2.6 Summary . 22 2.7 Exercises . 23 3 The R Working Environment 25 3.1 The Working Directory and the Workspace . 25 3.2 Saving and retrieving R objects . 26 3.3 Installations, packages and sessions . 27 3.3.1 The architecture of an R installation – Packages . 27 3.3.2 The search path: library() and attach() . 28 3.4 Summary .
    [Show full text]
  • USING R to TEACH STATISTICS Charles C. Taylor Department Of
    ICOTS10 (2018) Contributed Paper Taylor USING R TO TEACH STATISTICS Charles C. Taylor Department of Statistics, University of Leeds, Leeds LS2 9JT, U.K. [email protected] Before 2003, the Department of Statistics at the University of Leeds exposed its Maths students to MINITAB, SPSS and SAS as they progressed from first to third year. As a result of a unanimous decision in a departmental meeting, these were all replaced with just R. This short contribution reflects on the ups and downs of the last decade, and will also cover attempts to teach (rather than merely illustrate) statistics using R, and a comparison of different versions in a small designed experiment. BACKGROUND In 2003 the Department of Statistics at the University of Leeds decided to teach exclusively using R. Previously we had used various statistical packages (MINITAB, SPSS, SAS and Splus) – and even MAPLE – but our students were graduating without much confidence in practical data analysis. Even though there were few job adverts which even mentioned R (or S/Splus) at the time, we felt that more ability (and confidence) to program in one language could provide the basis for easy conversion to others. Other points in favour included excellent graphics capabilities; ease of extension; ease of installation – on a variety of computing platforms. It was (and is) also free, which meant it could be easily installed onto personal equipment. We also recognized some more negative aspects: • no menu-driven interface (as MINITAB had) and so finding the function you need is not always easy • less extensive import/export capabilities • programming language requires some knowledge of syntax • takes a while to learn, and feel confident in and – despite more recent developments - many of these are still valid.
    [Show full text]
  • Teaching Data Mining in the Era of Big Data
    Paper ID #7580 Teaching Data Mining in the Era of Big Data Dr. Brian R. King, Bucknell University Brian R. King is an Assistant Professor in computer science at Bucknell University, where he teaches in- troductory courses in programming, as well as advanced courses in software engineering and data mining. He graduated in 2008 with his PhD in Computer Science from University at Albany, SUNY. Prior to com- pleting his PhD, he worked 11 years as a Senior Software Engineer developing data acquisition systems for a wide range of real-time environmental quality monitors. His research interests are in bioinformat- ics and data mining, specifically focusing on the development of methods that can aid in the annotation, management and understanding of large-scale biological sequence data. Prof. Ashwin Satyanarayana, New York City College of Technology Dr. Satyanarayana serves as an Assistant Professor in the Department of Computer Systems Technology at New York City College of Technology (CUNY). He received both his MS and PhD degrees in Com- puter Science from SUNY – Albany in 2006 with Specialization in Data Mining. Prior to joining CUNY, he was a Research Scientist at Microsoft, involved in Data Mining Research in Bing for 5 years. His main areas of interest are Data Mining, Machine Learning, Data Preparation, Information Theory, Ap- plied Probability with applications in Real World Learning Problems. Address: Department of Computer Systems Technology, N-913, 300 Jay Street, Brooklyn, NY-11201. Page 23.1140.1 Page c American Society for Engineering Education, 2013 Teaching Data Mining in the Era of Big Data Abstract The amount of data being generated and stored is growing exponentially, owed in part to the continuing advances in computer technology.
    [Show full text]
  • Package 'Rattle'
    Package ‘rattle’ May 23, 2020 Type Package Title Graphical User Interface for Data Science in R Version 5.4.0 Date 2020-05-23 Depends R (>= 3.5.0), tibble, bitops Imports stats, utils, ggplot2, grDevices, graphics, magrittr, methods, stringi, stringr, tidyr, dplyr, XML, rpart.plot Suggests pmml (>= 1.2.13), colorspace, ada, amap, arules, arulesViz, biclust, cairoDevice, cba, cluster, corrplot, descr, doBy, e1071, ellipse, fBasics, foreign, fpc, gdata, ggdendro, ggraptR, gplots, grid, gridExtra, gtools, hmeasure, Hmisc, kernlab, Matrix, mice, nnet, party, plyr, psych, randomForest, RColorBrewer, readxl, reshape, rggobi, RGtk2, ROCR, RODBC, rpart, scales, SnowballC, survival, timeDate, tm, verification, wskm, xgboost Description The R Analytic Tool To Learn Easily (Rattle) provides a collection of utilities functions for the data scientist. A Gnome (RGtk2) based graphical interface is included with the aim to provide a simple and intuitive introduction to R for data science, allowing a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (predictive modelling markup language) or as scores. A key aspect of the GUI is that all R commands are logged and commented through the log tab. This can be saved as a standalone R script file and as an aid for the user to learn R or to copy-and-paste directly into R itself. License GPL (>= 2) LazyLoad yes LazyData yes URL https://rattle.togaware.com/ NeedsCompilation no 1 2 R topics documented: Author Graham Williams
    [Show full text]
  • Data Mining with Rattle and R
    Use R! Series Editors: Robert Gentleman Kurt Hornik Giovanni G. Parmigiani For further volumes: http://www.springer.com/series/6991 Graham Williams Data Mining with Rattle and R The Art of Excavating Data for Knowledge Discovery Graham Williams Togaware Pty Ltd PO Box 655 Jamison Centre ACT, 2614 Australia Graham [email protected] Series Editors: Robert Gentleman Kurt Hornik Program in Computational Biology Department of Statistik and Mathematik Division of Public Health Sciences Wirtschaftsuniversität Wien Fred Hutchinson Cancer Research Center Augasse 2-6 1100 Fairview Avenue, N. M2-B876 A-1090 Wien Seattle, Washington 98109 Austria USA Giovanni G. Parmigiani The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University 550 North Broadway Baltimore, MD 21205-2011 USA ISBN 978-1-4419-9889-7 e-ISBN 978-1-4419-9890-3 DOI 10.1007/978-1-4419-9890-3 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2011934490 © Springer Science+Business Media, LLC 2011 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
    [Show full text]
  • The 8Th International Conference
    THE 8TH INTERNATIONAL CONFERENCE ABSTRACT BOOKLET DEPARTMENT OF BIOSTATISTICS ABSTRACT BOOKLET The abstracts contained in this document were reviewed and accepted by the useR! 2012 program committee for presentation at the conference. The abstracts appear in the order that the presen- tations were given, beginning with tutorial abstracts, followed by invited and contributed abstracts. The index at the end of this document may be used to navigate by presenting author name. Reproducible Research with R,LATEX, & Sweave Theresa A Scott, MS1?, Frank E Harrell, Jr, PhD1? 1. Vanderbilt University School of Medicine, Department of Biostatistics ?Contact author: [email protected] Keywords: reproducible research, literate programming, statistical reports During this half-day tutorial, we will first introduce the concept and importance of reproducible research. We will then cover how we can use R,LATEX, and Sweave to automatically generate statistical reports to ensure reproducible research. Each software component will be introduced: R, the free interactive program- ming language and environment used to perform the desired statistical analysis (including the generation of graphics); LATEX, the typesetting system used to produce the written portion of the statistical report; and Sweave, the flexible framework used to embed the R code into a LaTeX document, to compile the R code, and to insert the desired output into the generated statistical report. The steps to generate a reproducible statistical report from scratch using the three software components will then be presented using a detailed example. The ability to regenerate the report when the data or analysis changes and to automatically update the output will also be demonstrated.
    [Show full text]
  • The R Journal, May 2009
    The Journal Volume 1/1, May 2009 A peer-reviewed, open-access publication of the R Foundation for Statistical Computing Contents Editorial..................................................3 Special section: The Future of R Facets of R.................................................5 Collaborative Software Development Using R-Forge........................9 Contributed Research Articles Drawing Diagrams with R........................................ 15 The hwriter Package........................................... 22 AdMit: Adaptive Mixtures of Student-t Distributions....................... 25 expert: Modeling Without Data Using Expert Opinion....................... 31 mvtnorm: New Numerical Algorithm for Multivariate Normal Probabilities.......... 37 EMD: A Package for Empirical Mode Decomposition and Hilbert Spectrum.......... 40 Sample Size Estimation while Controlling False Discovery Rate for Microarray Experiments Using the ssize.fdr Package..................................... 47 Easier Parallel Computing in R with snowfall and sfCluster................... 54 PMML: An Open Standard for Sharing Models........................... 60 News and Notes Forthcoming Events: Conference on Quantitative Social Science Research Using R...... 66 Forthcoming Events: DSC 2009..................................... 66 Forthcoming Events: useR! 2009.................................... 67 Conference Review: The 1st Chinese R Conference......................... 69 Changes in R 2.9.0............................................ 71 Changes on CRAN...........................................
    [Show full text]
  • Data Mining with Rattle and R: Survival Guide
    Personal Copy for Martin Schultz, 18 January 2008 | | | | Data Mining Desktop Survival Guide Togaware Watermark For Data Mining Survival Copyright c 2006-2008 Graham Williams | | | Graham Williams | Data Mining Desktop Survival 18th January 2008 | Personal Copy for Martin Schultz, 18 January 2008 | | | | Togaware Series of Open Source Desktop Survival Guides This innovative series presents open source and freely available software tools and techniques with a focus on the tasks they are built for, ranging from working with the GNU/Linux operating system, through common desktop productivity tools, to sophisticated data mining applications. Each volume aims to be self contained, and slim, presenting the informa- tion in an easy to follow format without overwhelming the reader with details. Togaware Watermark For Data Mining Survival Series Titles Data Mining with Rattle R for the Data Miner Text Mining with Rattle Debian GNU/Linux OpenMoko and the Neo 1973 ii Copyright c 2006-2008 Graham Williams | | | Graham Williams | Data Mining Desktop Survival 18th January 2008 | Personal Copy for Martin Schultz, 18 January 2008 | | | | Data Mining Desktop Survival Guide Graham Williams Togaware.com Togaware Watermark For Data Mining Survival iii Copyright c 2006-2008 Graham Williams | | | Graham Williams | Data Mining Desktop Survival 18th January 2008 | Personal Copy for Martin Schultz, 18 January 2008 | | | | The procedures and applications presented in this book have been in- cluded for their instructional value. They have been tested but are not guaranteed for any particular purpose. Neither the publisher nor the author offer any warranties or representations, nor do they accept any liabilities with respect to the programs and applications.
    [Show full text]
  • The Smart Way to Fight Crime - Machine Intelligence Forecasting
    “The Art of Empowering your Workforce” LE Mentoring Solutions THE SMART WAY TO FIGHT CRIME - MACHINE INTELLIGENCE FORECASTING The criminal justice system is increasingly using machine intelligence in forecasting crime trends. Specialized computer analysis is assisting law enforcement agencies, the court system and parole boards in making critical decisions regarding public safety. Such as: Granting parole to an incarcerated individual Establishing conditions for parole Recommendations for bail Sentencing recommendations Where to expect crime to occur When to expect crime to occur What kind of crime to expect Risk assessment is now performed by extensive databases that have proven accurate in the application of parameters to determine guidelines for arrest, sentencing and release of criminal offenders. What does this mean for human analysts? Will they become obsolete, replaced by data entry clerks and teams of criminal justice decision makers who make a final determination based on a computer model? The digital age of crime forecasting is now a part of society’s daily policing operations. Using predictive policing policies is nothing new. What’s new is the equipment and software. Methodology for predicting crime activity can range from simplistic to complex. Sophisticated computer software tools have proven to be as accurate as a well-trained human analyst. There are two questions then, for law enforcement to answer in determining which crime mapper they opt for: 1. Which is more cost effective? 2. Which can deliver accurate results more quickly? Small communities may have a police department that is operating on what many would call a shoestring budget. Crime may be minimal.
    [Show full text]