The Art of R Programming

TAMETAME YOURYOUR DATADATA THE THE ART R PROGRAMMING OF THE THE ART R PROGRAMMING OF ARTART OFOF RR R is the world’s most popular language for developing • Interface R with C/C++ and Python for increased statistical software: Archaeologists use it to track the speed or functionality spread of ancient civilizations, drug companies use it • Find new packages for text analysis, image manipula- PROGRAMMINGPROGRAMMING to discover which medications are safe and effective, tion, and thousands more and actuaries use it to assess financial risks and keep A TOUR OF STATISTICAL SOFTWARE DESIGN markets running smoothly. • Squash annoying bugs with advanced debugging techniques The Art of R Programming takes you on a guided tour of software development with R, from basic types Whether you’re designing aircraft, forecasting the and data structures to advanced topics like closures, weather, or you just need to tame your data, The Art of NORMAN MATLOFF recursion, and anonymous functions. No statistical R Programming is your guide to harnessing the power knowledge is required, and your programming skills of statistical computing. can range from hobbyist to pro. ABOUT THE AUTHOR Along the way, you’ll learn about functional and object- Norman Matloff is a professor of computer science oriented programming, running mathematical simulations, (and a former professor of statistics) at the University and rearranging complex data into simpler, more useful of California, Davis. His research interests include formats. You’ll also learn to: parallel processing and statistical regression, and • Create artful graphs to visualize complex data sets he is the author of several widely used web tutorials and functions on software development. He has written articles for the New York Times, the Washington Post, Forbes • Write more efficient code using parallel R and Magazine, and the Los Angeles Times, and he is the vectorization co-author of The Art of Debugging (No Starch Press). THE FINEST IN GEEK ENTERTAINMENT™ www.nostarch.com MATLOFF “I LIE FLAT.” $39.95 ($41.95 CDN) This book uses RepKover —a durable binding that won’t snap shut. SOFTWARE STATISTICAL & COMPUTERS/MATHEMATICAL SHELVE IN: FSC LOGO THE ART OF R PROGRAMMING THE ART OF R PROGRAMMING A Tour of Statistical Software Design by Norman Matloff San Francisco THE ART OF R PROGRAMMING. Copyright © 2011 by Norman Matloff. All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. 1514131211 123456789 ISBN-10: 1-59327-384-3 ISBN-13: 978-1-59327-384-2 Publisher: William Pollock Production Editor: Alison Law Cover and Interior Design: Octopod Studios Developmental Editor: Keith Fancher Technical Reviewer: Hadley Wickham Copyeditor: Marilyn Smith Compositors: Alison Law and Serena Yang Proofreader: Paula L. Fleming Indexer: BIM Indexing & Proofreading Services For information on book distributors or translations, please contact No Starch Press, Inc. directly: No Starch Press, Inc. 38 Ringold Street, San Francisco, CA 94103 phone: 415.863.9900; fax: 415.863.9950; [email protected]; www.nostarch.com Library of Congress Cataloging-in-Publication Data Matloff, Norman S. The art of R programming : tour of statistical software design / by Norman Matloff. p. cm. ISBN-13: 978-1-59327-384-2 ISBN-10: 1-59327-384-3 1. Statistics-Data processing. 2. R (Computer program language) I. Title. QA276.4.M2925 2011 519.50285'5133-dc23 2011025598 No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the beneﬁt of the trademark owner, with no intention of infringement of the trademark. The information in this book is distributed on an “As Is” basis, without warranty. While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it. BRIEF CONTENTS Acknowledgments ...................................................................xvii Introduction . ................................................................... xix Chapter 1: Getting Started . ......................................................... 1 Chapter 2: Vectors . ......................................................... 25 Chapter 3: Matrices and Arrays. .................................................. 59 Chapter 4: Lists. ................................................................... 85 Chapter 5: Data Frames . .........................................................101 Chapter 6: Factors and Tables . ..................................................121 Chapter 7: R Programming Structures ..................................................139 Chapter 8: Doing Math and Simulations in R . ........................................189 Chapter 9: Object-Oriented Programming . ........................................207 Chapter 10: Input/Output . .........................................................231 Chapter 11: String Manipulation . ..................................................251 Chapter 12: Graphics . .........................................................261 Chapter 13: Debugging . .........................................................285 Chapter 14: Performance Enhancement: Speed and Memory . .......................305 Chapter 15: Interfacing R to Other Languages . ........................................323 Chapter 16: Parallel R . .........................................................333 Appendix A: Installing R. .........................................................353 Appendix B: Installing and Using Packages . ........................................355 CONTENTS IN DETAIL ACKNOWLEDGMENTS xvii INTRODUCTION xix Why Use R for Your Statistical Work? .............................................. xix Object-Oriented Programming ............................................ xvii Functional Programming .................................................. xvii Whom Is This Book For? .......................................................... xviii My Own Background ............................................................. xix 1 GETTING STARTED 1 1.1 How to Run R .............................................................. 1 1.1.1 Interactive Mode ................................................ 2 1.1.2 Batch Mode .................................................... 3 1.2 A First R Session ............................................................ 4 1.3 Introduction to Functions ..................................................... 7 1.3.1 Variable Scope ................................................. 9 1.3.2 Default Arguments .............................................. 9 1.4 Preview of Some Important R Data Structures .................................. 10 1.4.1 Vectors, the R Workhorse ........................................ 10 1.4.2 Character Strings ............................................... 11 1.4.3 Matrices ....................................................... 11 1.4.4 Lists ........................................................... 12 1.4.5 Data Frames ................................................... 14 1.4.6 Classes ........................................................ 15 1.5 Extended Example: Regression Analysis of Exam Grades ....................... 16 1.6 Startup and Shutdown ...................................................... 19 1.7 Getting Help ............................................................... 20 1.7.1 The help() Function .............................................. 20 1.7.2 The example() Function .......................................... 21 1.7.3 If You Don’t Know Quite What You’re Looking For ................. 22 1.7.4 Help for Other Topics ........................................... 23 1.7.5 Help for Batch Mode ............................................ 24 1.7.6 Help on the Internet ............................................. 24 2 VECTORS 25 2.1 Scalars, Vectors, Arrays, and Matrices ....................................... 26 2.1.1 Adding and Deleting Vector Elements ............................. 26 2.1.2 Obtaining the Length of a Vector ................................. 27 2.1.3 Matrices and Arrays as Vectors .................................. 28 2.2 Declarations ............................................................... 28 2.3 Recycling .................................................................. 29 2.4 Common Vector Operations ................................................. 30 2.4.1 Vector Arithmetic and Logical Operations ......................... 30 2.4.2 Vector Indexing ................................................. 31 2.4.3 Generating Useful Vectors with the : Operator ..................... 32 2.4.4 Generating Vector Sequences with seq() .......................... 33 2.4.5 Repeating Vector Constants with rep()............................. 34 2.5 Using all() and any() ........................................................ 35 2.5.1 Extended Example: Finding Runs of Consecutive Ones ............. 35 2.5.2 Extended Example:

The Art of R Programming

Tinn-R Team Has a New Member Working on the Source Code: Wel- Come Huashan Chen

DECIPHER: Harnessing Local Sequence Context to Improve Protein Multiple Sequence Alignment Erik S

Rkward: a Comprehensive Graphical User Interface and Integrated Development Environment for Statistical Analysis with R

'Unix' in a File

5 Command Line Functions by Barbara C

The AWK Programming Language

Analysis the Structure of SAM and Cracking Password Base on Windows Operating System

Decipher Prostate Cancer Classifier

Comparing Tools for Non-Coding RNA Multiple Sequence Alignment Based On

Statistical Graphical User Interface Plug-In for Survival Analysis in R Statistical and Graphics Language and Environment

The Evolution of the Unix Time-Sharing System*

PR 6201 (500 Kg… 50 T) Precision Compression Load Cell