Bluesky Statistics Intro Guide

Robert A. Muenchen BlueSky Statistics 7.1 Intro Guide December 15, 2020 Robert A. Muenchen The University of Tennessee Greve Hall, Room 517 821 Volunteer Blvd. Knoxville, TN 37996-3395 USA [email protected] Typeset using the LATEXdocument preparation system. Printed on FSC certified, lead-free, acid-free, buffered paper made from wood-based pulp. ISBN 978-1-716-44352-7 ©Copyright 2020 by Robert A. Muenchen. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the author, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of the names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Preface This introductory guide for BlueSky Statistics assumes little knowledge of either computing or data analysis. It is a subset of the BluesSky Statistics 7.1 User Guide [17], available at Lulu.com. While this guide excludes most of the User Guide's sections on graphics examples and advanced modeling, it does cover many other topics, such as how to: Install BlueSky Statistics and determine the settings that will optimize your workflow. Read data from a wide variety of sources including delimited text files, Excel files, SAS, SPSS or Stata data sets, and relational databases. Manage your data by creating new variables, transforming or recoding existing ones, combining data sets from both the add-cases and add- variables approaches, and pivoting data sets to become wider, or longer, to better enable various graphical and analytic methods. Create publication-quality graphs including, bar, pie, scatter, line, box, error bar, and model diagnostic plots. Perform all the types of analysis located on BlueSky's Analysis menu, including measures of agreement, clustering, contingency tables, cor- relation, factor analysis, PCA, market basket analysis, missing value imputation, non-parametrics, reliability, survival, and time series. Who This Book Is For This book is written for three main audiences: 1. People who need to analyze data, but who lack the time or inclination to become programmers. BlueSky Statistics is an excellent tool for such analysts. 2. R coders who wish to speed their program development. BlueSky Statis- tics can automatically generate significant chunks of error-free code to add to their programs. 3. Teams that combine the above two types of analysts, using their programmers to extend the capabilities of BlueSky Statistics for use by their non-programming team members. vi Preface This book explains the statistical output and how to interpret it, but it does not replace a thorough book devoted to statistical analysis. While the book covers how to use BlueSky Statistics to speed the generation of R programs, it does not teach the R programming language itself. If you are migrating from SAS or SPSS to the R language, I recommend my book, R for SAS and SPSS Users [16]. If you are migrating from Stata, I recommend another of my books, R for Stata Users [18]. Acknowledgments The lead technical reviewer was the consummate Matthew Marler, whose edits greatly improved nearly every section. Thank you, Matthew! I am also grateful to the people who provided advice, caught typos, and suggested improvements, including Ross Dierkhising, Steven Miller, and Frank Thomas. I am also grateful to the staff of BlueSky Statistics, LLC for the many hours of advice, demonstrations, and feedback. Many of the examples I present here are modestly enhanced versions from blog posts, the R-help discussion list, and help files. The main benefit I add is the selection, organization, and explanation. Of course, this book would not have been possible without the efforts of many software developers, including the team at BlueSky Statistics, LLC, the developers of the S language on which R is based, John Chambers, Douglas Bates, Rick Becker, Bill Cleveland, Trevor Hastie, Daryl Pregibon and Allan Wilks[20]; the people who started R itself, Ross Ihaka and Robert Gentleman; the R Development Core Team[21]; Hadley Wickham and tidyverse associates, many other R developers for providing such useful tools for free and all of the R-help participants who have kindly answered so many questions. Finally, I am grateful to my wife, Carla Foust, and sons Alexander and Conor, who put up with many lost weekends while I wrote this book. Robert A. Muenchen [email protected] Knoxville, Tennessee December 2020 Preface vii About the Author Robert A. Muenchen is the author of the BlueSky Statistics 7.1 User Guide[17], R for SAS and SPSS Users [16] and, with Joseph Hilbe, R for Stata Users [18], and with Robert Hoyt, An Introduction to Biomedical Data Science [9]. He is also the creator of http://r4stats.com, a popular web site devoted to analyzing trends in data science software, reviewing such software, and helping people learn the R language. Bob is an ASA Accredited Professional Statistician who helps organizations migrate from SAS, SPSS, and Stata to the R Language. He has taught workshops on data science topics for more than 500 organizations and has presented workshops in partnership with the American Statistical Association, RStudio, DataCamp.com, Predictive Analytics World, and Rev- olution Analytics. Bob has written or co-authored over 70 articles published in scientific journals and conference proceedings and has guided more than 1,000 graduate theses and dissertations at the University of Tennessee. Bob has served on the advisory boards of BlueSky Statistics, Question- Pro, SAS Institute, SPSS, and the Statistical Graphics Corporation. His contributions have been incorporated into software from those companies as well as JMP, jamovi, and numerous R packages. His research interests include data science software, graphics and visualization, machine learning, and text analytics. Linux® is the registered trademark of Linus Torvalds. Macintosh® and Mac OS® are registered trademarks of Apple, Inc. Oracle® and Oracle Data Mining are registered trademarks of Oracle, Inc. RStudio® is a registered trademark of RStudio, Inc. SAS®, is a registered trademark of the SAS Institute. SPSS®, IBM SPSS Statistics®, is a registered trademark of SPSS, Inc., an IBM company. Stata®, is a registered trademark of StataCorp, LLC. UNIX® is a registered trademark of The Open Group. Windows®, Excel®, and Microsoft Word® are registered trademarks of Microsoft, Inc. Contents 1 Introduction ............................................... 1 1.1 Overview.............................................1 1.2 Choosing a Version...................................2 1.3 Installing BlueSky Statistics...........................3 1.4 Getting Started with BlueSky..........................3 1.5 History Menu........................................7 1.6 Getting Help.........................................8 1.6.1 Visiting the BlueSky Website.................. 10 1.6.2 About...................................... 10 1.6.3 The Index.................................... 11 1.6.4 Technical Support............................ 12 1.7 Your Next Steps..................................... 12 2 File Menu................................................. 15 2.1 New Dataset......................................... 15 2.2 New Output Window................................. 19 2.3 Open............................................... 20 2.3.1 Opening a Comma Separated Values File........ 20 2.3.2 Opening an R Dataset......................... 21 2.4 Open Output......................................... 21 2.5 Paste Dataset from the Clipboard...................... 23 2.6 Load Dataset from a Package.......................... 24 2.7 Import Data......................................... 24 2.7.1 SQL Database............................... 25 2.8 Save & Save As...................................... 25 2.9 Save as PDF......................................... 25 2.10 Recent.............................................. 26 3 Output ................................................... 27 3.1 Using the Output Window............................. 27 3.2 Controlling Output Options........................... 28 3.3 Using the Output Navigator........................... 29 3.4 Managing Multiple Output Windows................... 30 3.5 The Layout Menu..................................... 31 x Contents 3.5.1 Horizontal Layout............................. 31 3.5.2 Vertical Layout............................... 31 3.5.3 Show / Hide Output Navigator Menu Item....... 31 3.6 The Edit Menu...................................... 32 4 Data Menu ................................................ 33 4.1 Add ID............................................. 33 4.2 Bin Numeric Variables................................ 34 4.3 Compute Dummy Variables............................ 37 4.4 Compute New Variables............................... 38 4.4.1 Compute.................................... 39 4.4.2 Compute, Apply a Function Across All Rows.... 39 4.4.3 Conditional Compute, if/then................... 41 4.4.4 Conditional Compute, if/then/else............... 41 4.5 Concatenate Multiple Variables........................ 43 4.6 Convert Variable(s) to factors.......................... 44 4.7 Dates..............................................

Bluesky Statistics Intro Guide

Installing R

The Rockerverse: Packages and Applications for Containerisation

Installing R and Cran Binaries on Ubuntu

R Generation [1] 25

Working with R Steph Locke (Locke Data)

Introduction to R, Day 1: “Getting Oriented” Michael Chimenti and Diana Kolbe Dec 5 & 6 2019 Iowa Institute of Human Genetics

Introduction to Ggplot2

Mastering Software Development in R

The Voice of the R Community

Tools for Analyzing R Code the Tidy Way by Lucy D’Agostino Mcgowan, Sean Kross, Jeffrey Leek

Chapter 1. Preparing Data for Analysis and Visualization in R

Tidyverse, Tidymodels, R-Lib, and Gt R Packages: Regulatory Compliance and Validation Issues