Predictive Analytics Using R
Total Page:16
File Type:pdf, Size:1020Kb
Predictive Analytics using R Dr. Jeffrey Strickland is a Senior Predictive Analytics Consultant with over 20 years of Predictive Analytics expereince in multiple industiries including financial, insurance, defense and NASA. He is a subject matter expert on mathematical and statistical modeling, as well as machine using R learning. He has published numerous books on modeling and simulation. Dr. Strickland resides in Colorado. This book is about predictive analytics. Yet, each chapter could easily be handled by an entire volume of its own. So one might think of this a survey of predictive modeling, both statistical (parametric and nonparametric), as well as machine learning. We define predictive model as a statistical model or machine learning model used to predict future behavior based on past behavior. In order to use this book, one should have a basic understanding of mathematical statistics (statistical inference, models, tests, etc.) — this is an advanced book. Some theoretical foundations are laid out (perhaps subtlety) but not proven, but references are provided for additional coverage. Every chapter culminates in an example using R. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. To download R, please choose your preferred CRAN mirror at http://www.r-project.org/. An introduction to R is also available at http://cran.r-project.org /doc/manuals/r-release/R-intro.html. The book is organized so that statistical models are presented first (hopefully in a logical order), followed by machine learning models, and then applications: uplift modeling and time series. One could use this a textbook with problem solving in R—but there are no “by-hand” exercises. Jeffrey Strickland Jeffrey Strickland ISBN 978-1-312-84101-7 90000 ID: 16206475 www.lulu.com 7813129 841017 Predictive Analytics using R By Jeffrey S. Strickland Simulation Educators Colorado Springs, CO Predictive Analytics using R Copyright 2014 by Jeffrey S. Strickland. All rights Reserved ISBN 978-1-312-84101-7 Published by Lulu, Inc. All Rights Reserved Acknowledgements The author would like to thank colleagues Adam Wright, Adam Miller, Matt Santoni. Working with them over the past two years has validated the concepts presented herein. A special thanks to Dr. Bob Simmonds, who mentored me as a senior operations research analyst. v vi Contents Acknowledgements ................................................................................. v Contents ................................................................................................. vii Preface ................................................................................................. xxv More Books by the Author .................................................................xxvii 1. Predictive analytics ............................................................................ 1 Business Analytics ............................................................................. 1 Types ................................................................................................. 2 Descriptive analytics ................................................................... 2 Predictive analytics ..................................................................... 2 Prescriptive analytics .................................................................. 3 Applications ................................................................................ 3 Definition ........................................................................................... 3 Not Statistics...................................................................................... 4 Types ................................................................................................. 4 Predictive models ....................................................................... 5 Descriptive models ..................................................................... 6 Decision models .......................................................................... 6 Applications ....................................................................................... 6 Analytical customer relationship management (CRM) ............... 6 Clinical decision support systems ............................................... 7 Collection analytics ..................................................................... 7 Cross-sell ..................................................................................... 8 Customer retention .................................................................... 8 vii Direct marketing ......................................................................... 8 Fraud detection .......................................................................... 9 Portfolio, product or economy-level prediction ....................... 10 Risk management ..................................................................... 10 Underwriting ............................................................................. 10 Technology and big data influences ................................................ 11 Analytical Techniques ...................................................................... 11 Regression techniques .............................................................. 12 Machine learning techniques ................................................... 19 Tools ................................................................................................ 22 Notable open source predictive analytic tools include: ........... 22 Notable commercial predictive analytic tools include: ............ 23 PMML ........................................................................................ 23 Criticism ........................................................................................... 24 2. Predictive modeling......................................................................... 25 Propensity Models .......................................................................... 25 Cluster Models ................................................................................ 28 Conclusion ....................................................................................... 30 Presenting and Using the Results of a Predictive Model ................ 31 Applications ..................................................................................... 32 Uplift Modeling ......................................................................... 32 Archaeology .............................................................................. 32 Customer relationship management ........................................ 33 Auto insurance .......................................................................... 33 Health care ................................................................................ 34 viii Notable failures of predictive modeling .......................................... 34 Possible fundamental limitations of predictive model based on data fitting ............................................................................................... 35 Software .......................................................................................... 35 Open Source.............................................................................. 36 Commercial ............................................................................... 37 Introduction to R ............................................................................. 38 3. Modeling Techniques ...................................................................... 41 Statistical Modeling ......................................................................... 41 Formal definition ...................................................................... 42 Model comparison .................................................................... 42 An example ............................................................................... 43 Classification ............................................................................. 43 Machine Learning ............................................................................ 44 Types of problems/tasks ........................................................... 44 Approaches ............................................................................... 46 Applications .............................................................................. 50 4. Empirical Bayes method .................................................................. 53 Introduction ..................................................................................... 53 Point estimation .............................................................................. 55 Robbins method: non-parametric empirical Bayes (NPEB) ...... 55 Example - Accident rates .......................................................... 56 Parametric empirical Bayes ...................................................... 57 Poisson–gamma model ............................................................. 57 Bayesian Linear Regression ....................................................... 58 Software .......................................................................................... 60 ix Example Using R .............................................................................. 60 Model Selection in Bayesian Linear Regression ....................... 60 Model Evaluation .....................................................................