Introduction to Probability and Statistics Using R
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to Probability and Statistics Using R G. Jay Kerns First Edition ii IPSUR: Introduction to Probability and Statistics Using R Copyright © 2010 G. Jay Kerns ISBN: 978-0-557-24979-4 Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foun- dation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNU Free Documentation License”. Date: March 24, 2011 Contents Preface vii List of Figures xiii List of Tables xv 1 An Introduction to Probability and Statistics1 1.1 Probability....................................... 1 1.2 Statistics........................................ 1 Chapter Exercises...................................... 2 2 An Introduction to R 3 2.1 Downloading and Installing R ............................ 3 2.2 Communicating with R ................................ 4 2.3 Basic R Operations and Concepts .......................... 6 2.4 Getting Help...................................... 12 2.5 External Resources.................................. 14 2.6 Other Tips....................................... 14 Chapter Exercises...................................... 16 3 Data Description 17 3.1 Types of Data..................................... 17 3.2 Features of Data Distributions ............................ 33 3.3 Descriptive Statistics ................................. 35 3.4 Exploratory Data Analysis .............................. 41 3.5 Multivariate Data and Data Frames.......................... 45 3.6 Comparing Populations................................ 47 Chapter Exercises...................................... 55 4 Probability 67 4.1 Sample Spaces .................................... 67 4.2 Events......................................... 73 4.3 Model Assignment .................................. 78 4.4 Properties of Probability ............................... 83 4.5 Counting Methods .................................. 87 4.6 Conditional Probability................................ 93 iii iv CONTENTS 4.7 Independent Events.................................. 99 4.8 Bayes’ Rule......................................102 4.9 Random Variables...................................106 Chapter Exercises......................................109 5 Discrete Distributions 111 5.1 Discrete Random Variables..............................112 5.2 The Discrete Uniform Distribution..........................114 5.3 The Binomial Distribution ..............................116 5.4 Expectation and Moment Generating Functions...................122 5.5 The Empirical Distribution..............................125 5.6 Other Discrete Distributions .............................128 5.7 Functions of Discrete Random Variables.......................136 Chapter Exercises......................................138 6 Continuous Distributions 143 6.1 Continuous Random Variables ............................143 6.2 The Continuous Uniform Distribution ........................148 6.3 The Normal Distribution ...............................149 6.4 Functions of Continuous Random Variables.....................153 6.5 Other Continuous Distributions............................157 Chapter Exercises......................................164 7 Multivariate Distributions 165 7.1 Joint and Marginal Probability Distributions.....................166 7.2 Joint and Marginal Expectation............................172 7.3 Conditional Distributions...............................174 7.4 Independent Random Variables............................176 7.5 Exchangeable Random Variables...........................178 7.6 The Bivariate Normal Distribution..........................179 7.7 Bivariate Transformations of Random Variables...................181 7.8 Remarks for the Multivariate Case..........................184 7.9 The Multinomial Distribution.............................186 Chapter Exercises......................................190 8 Sampling Distributions 191 8.1 Simple Random Samples...............................192 8.2 Sampling from a Normal Distribution ........................193 8.3 The Central Limit Theorem..............................196 8.4 Sampling Distributions of Two-Sample Statistics ..................197 8.5 Simulated Sampling Distributions ..........................200 Chapter Exercises......................................203 9 Estimation 205 9.1 Point Estimation....................................205 9.2 Confidence Intervals for Means............................214 9.3 Confidence Intervals for Differences of Means....................221 CONTENTS v 9.4 Confidence Intervals for Proportions.........................223 9.5 Confidence Intervals for Variances..........................225 9.6 Fitting Distributions..................................225 9.7 Sample Size and Margin of Error...........................225 9.8 Other Topics......................................227 Chapter Exercises......................................228 10 Hypothesis Testing 229 10.1 Introduction......................................229 10.2 Tests for Proportions .................................230 10.3 One Sample Tests for Means and Variances .....................235 10.4 Two-Sample Tests for Means and Variances.....................239 10.5 Other Hypothesis Tests................................241 10.6 Analysis of Variance .................................241 10.7 Sample Size and Power................................243 Chapter Exercises......................................248 11 Simple Linear Regression 249 11.1 Basic Philosophy ...................................249 11.2 Estimation.......................................253 11.3 Model Utility and Inference..............................262 11.4 Residual Analysis...................................267 11.5 Other Diagnostic Tools................................275 Chapter Exercises......................................283 12 Multiple Linear Regression 285 12.1 The Multiple Linear Regression Model........................285 12.2 Estimation and Prediction...............................288 12.3 Model Utility and Inference..............................296 12.4 Polynomial Regression................................299 12.5 Interaction.......................................304 12.6 Qualitative Explanatory Variables ..........................307 12.7 Partial F Statistic ...................................310 12.8 Residual Analysis and Diagnostic Tools .......................312 12.9 Additional Topics...................................313 Chapter Exercises......................................317 13 Resampling Methods 319 13.1 Introduction......................................319 13.2 Bootstrap Standard Errors...............................321 13.3 Bootstrap Confidence Intervals............................326 13.4 Resampling in Hypothesis Tests ...........................328 Chapter Exercises......................................332 14 Categorical Data Analysis 333 15 Nonparametric Statistics 335 vi CONTENTS 16 Time Series 337 A R Session Information 339 B GNU Free Documentation License 341 C History 349 D Data 351 D.1 Data Structures ....................................351 D.2 Importing Data ....................................356 D.3 Creating New Data Sets................................357 D.4 Editing Data......................................357 D.5 Exporting Data ....................................359 D.6 Reshaping Data....................................359 E Mathematical Machinery 361 E.1 Set Algebra......................................361 E.2 Differential and Integral Calculus...........................362 E.3 Sequences and Series.................................365 E.4 The Gamma Function.................................368 E.5 Linear Algebra ....................................368 E.6 Multivariable Calculus ................................369 F Writing Reports with R 373 F.1 What to Write.....................................373 F.2 How to Write It with R ................................374 F.3 Formatting Tables...................................377 F.4 Other Formats.....................................377 G Instructions for Instructors 379 G.1 Generating This Document..............................380 G.2 How to Use This Document..............................381 G.3 Ancillary Materials..................................381 G.4 Modifying This Document ..............................382 H RcmdrTestDrive Story 383 Bibliography 389 Index 395 Preface This book was expanded from lecture materials I use in a one semester upper-division undergradu- ate course entitled Probability and Statistics at Youngstown State University. Those lecture mate- rials, in turn, were based on notes that I transcribed as a graduate student at Bowling Green State University. The course for which the materials were written is 50-50 Probability and Statistics, and the attendees include mathematics, engineering, and computer science majors (among others). The catalog prerequisites for the course are a full year of calculus. The book can be subdivided into three basic parts. The first part includes the introductions and elementary descriptive statistics; I want the students to be knee-deep in data right out of the gate. The second part is the study of probability, which begins at the basics of sets and the equally likely model, journeys past discrete/continuous random variables, and continues through to multivariate distributions. The chapter on sampling distributions