Stat 133 Class Notes - Spring, 2011

Stat 133 Class Notes - Spring, 2011 Phil Spector May 31, 2011 Contents 1 Introduction 5 1.1 What's this course about? . 6 1.2 Some Basic Concepts in Computing with Data . 7 1.3 A Short Note on Academic Integrity . 7 1.4 Introduction to R . 8 2 The R Language 10 2.1 Data in R . 11 2.2 Vectors . 11 2.3 Modes and Classes . 17 2.4 Reading Vectors . 17 2.5 Missing Values . 18 2.6 Matrices . 19 2.7 Data Frames . 21 2.8 More on Data Frames . 24 2.9 Reading Data Frames from Files and URLs . 25 2.10 Working with Multiple Data Frames . 32 2.11 Adding Color to Plots . 36 2.12 Using Dates in R . 39 2.13 Data Summaries . 43 2.14 Functions . 45 2.15 Functions . 48 2.16 Sizes of Objects . 55 2.17 Character Manipulation . 55 2.18 Working with Characters . 60 3 Unix 63 3.1 Software for Remote Access . 64 3.2 Basics of Unix . 64 3.3 Command Path . 64 3.4 Basic Commands . 65 3.5 Command History . 65 3.6 Editors . 66 1 3.7 Wildcards . 66 3.8 Redirection . 67 3.9 Job Control . 67 4 Regular Expressions 70 4.1 Regular Expressions . 71 4.2 Regular Expressions . 73 4.3 How matches are found . 76 4.4 Tagging and Backreferences . 79 4.5 Getting Text into R . 79 4.6 Examples of Reading Web Pages with R . 81 4.7 Reading a web page into R . 81 4.8 Another Example . 82 4.9 Dynamic Web Pages . 84 4.10 Writing a Function . 88 4.11 Another Example . 88 5 Graphics 92 5.1 More on Plotting . 93 5.2 Multiple Plots on a Page . 99 5.3 More on Barplots . 103 5.4 Mapping . 106 5.5 The Lattice Plotting Library . 116 5.6 Customizing the Panel Function . 119 5.7 Univariate Displays . 125 5.7.1 dotplot .................................. 125 5.7.2 bwplot ................................... 126 5.7.3 densityplot ............................... 128 5.8 barchart ..................................... 130 5.9 3-D Plots: cloud ................................. 132 6 Spreadsheets and Databases 134 6.1 Spreadsheets . 134 6.2 Writing Spreadsheets . 139 6.3 Databases . 140 6.4 Working with Databases . 142 6.5 Regular Expressions in SQL . 149 6.6 Accessing databases in R . 151 6.7 Using SQL in R . 155 6.8 Reading Spreadsheets with the RODBC Library . 155 2 7 Cluster Analysis 158 7.1 Introduction to Cluster Analysis . 159 7.2 Standardization . 159 7.3 Distance Measures . 160 7.4 Clustering Techniques . 161 7.5 Hierarchial Clustering . 163 7.6 PAM: Partitioning Around Medoids . 170 7.7 AGNES: Agglomerative Nesting . 173 8 XML 177 8.1 What is XML? . 178 8.2 A Simple Example . 180 8.3 More Complex Example . 183 9 Programming 189 9.1 Operating on Groups of Data . 190 10 Classification Analysis 203 10.1 Introduction to Classification Methods . 204 10.2 kth Nearest Neighbor Classification . 204 10.3 Cross Validation . 205 10.4 Linear Discriminant Analysis . 206 10.5 Recursive Partitioning . 214 11 Random Numbers and Simulations 219 11.1 Hypothesis Testing . 220 11.2 Determining Power . 223 11.3 Probability Distributions . 226 11.4 A Note about Random Numbers in R . 227 11.5 t-tests . 229 11.6 Power of the t-test . 236 12 Graphical User Interfaces (GUIs) 238 12.1 Graphical User Interfaces . 239 12.2 Making a Calculator . 243 12.3 Improving the Calculator . 246 12.4 Appearance of Widgets . 246 12.5 Fonts and Colors . 249 12.6 Plotting . 252 12.7 Binding . 256 12.8 Checkbuttons . 258 12.9 Opening Files and Displaying Text . 260 12.10Using Images with the tcltk package . 264 3 13 CGI Programming with R 266 13.1 Web Servers . 266 13.2 CGI Scripting . 271 13.3 A First CGI program with R . 272 13.4 Data . 276 13.5 Combo Forms . 277 13.6 Graphs . 279 13.7 Hidden Variables . 280 13.8 Outgoing HTTP Headers . 281 13.9 Creating Pretty Output . ..

Load more