Introduction to R

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to R Tuan V. Nguyen Gene$cs Epidemiology of Osteoporosis Lab Garvan Ins$tute of Medical Research Garvan Ins$tute Biostas$cal Workshop 17 April 2014 © Tuan V. Nguyen Introduction to R • A brief history • Installaon • Packages • Essen$al grammar • A session with R Previously … • Many stas$cal packages were/are available • Popular packages include Systat, Minitab, Stas$ca, BMDP, S+, Gauss, Spida JMP, SPSS, Stata, SAS and now R R is gaining popularity Number of scholarly ar$cles that reference each soUware by year (Source: Muenchen R. The popularity of data analysis soUware, r4stat.com/ar$cles/popularity) R is gaining popularity Number of scholarly ar$cles that reference each soUware by year, aer removing the top two, SPSS and SAS (Source: Muenchen R. The popularity of data analysis soUware, r4stat.com/ar$cles/popularity) A brief history • R is a “stas$cal and graphical programming language” • Originated from S – 1988 - S2: RA Becker, JM Chambers, A Wilks – 1992 - S3: JM Chambers, TJ Has$e – 1998 - S4: JM Chambers • R was ini$ally wriben by Ross Ihaka and Robert Gentleman (Univ of Auckland, New Zealand) in 1990s • From 1997: internaonal “R-core”, 15 people What can R do? • It is a sta$s$cal language • All models of stas$cal analysis • Great for simulaon work • Programming (do you want to take a challenge?) Why R ? • Open source – totally free! • Developed by professional and academic stas$cians • Run on Windows, Unix, MacOS • Keep up-to-date with methodological developments • Speak the language of experts (bioinformacs and stas$cs) • Large user community Installaon cran.r-project.org Installation of R on Windows • Select Windows • Select “base” • Run à OK à Next • Then Finish – R icon on your desktop A screenshot of R RStudio An “add-on” of R RStudio hbp://rstudio.org Introduction to RStudio • An IDE (Interface Development Environment) of R. • Provide some convenient func$ons for running R • R also has a number of other IDEs: • TinnR • R commander R and RStudio Can run R within Rstudio (you don’t need to start R) RStudio Workspace: Variables R console Files Packages R is a real demonstration of the power of collaboration Ihaka Packages • R = Base + Packages • Base R includes basic R func$ons for simple func$ons and analyses • Packages are modules for specific analyses • More than 6000 packages in R ! Common packages Hmisc: Miscellaneous for data rms: Regression modeling strategies manipulaon car: Companion to regression tables: For tabulaon of data analysis foreign: For reading data from survival: Survival analyses other soUwares EpiR: Epidemiological analyses tables: For tabulaon of data epicalc: Epidemiological analyses gmodels: Programming tools boot: Bootstrap analyses ggplot2: Advanced graphics cluster: Cluster analysis sciplot: Scien$fic graphs psych: Psychometrics and Zelig: “Every one’s stas$cal descrip$ve stas$cs soUware” Basic management of packages • Installing new packages (try now!) install.packages(c("Hmisc", "rms", "tables", "foreign", "gmodels", "ggplot2", "sciplot", "Zelig", "car", "survival", "EpiR", "epicalc", "boot", "cluster", "psych", "binom", "BMA", "ExactCIdiff", "lattice", "mgcv", "gam", "nlme", "quantreg") • To find out which packages you have installed library() R Grammar: a quick introduc9on Interacting with R • Start up R • Can use up/down arrow keys to retrieve command history • Can use leU/right keys to edit a command line • Can use TAB to append a full command – very useful! • Mul$ple commands can be wriben in 1 line by using “;” separator Variable names • Use lebers, numbers, and signs (., -, _) • Assignment symbol: <- or = • Dis$nc$on between upper and lower case lebers Genotype = 5; genotype <- 7; Geno.type = Genotype + genotype Object-oriented language R is an object-oriented language • Funcon • Vector • Matrix • Dataframe Function • R “commands” = func$on • Func$on has arguments • Arguments include variables (name), parameters, opons, etc • Example: fing a linear regression model y = a + bx m1 = lm(y ~ x, data=test) Function • R “commands” = func$on • Func$on has arguments • Example: fing a linear regression model y = a + bx m1 = lm(y ~ x, data=test) Object name Func9on Arguments: m1 lm = linear model variables: y, x dataset name Vector • Vectors are basic building block in R • Vector = a series of values • Values can be numeric or character score = c(4,2,1,5) gender = c('F','M','F','M') c (concatenaon) for direct data entry Matrix • Rectagular data à rows, columns • Matrix can be a collec$on of vectors 1 3 6 7 3 4 7 9 5 7 8 0 Matrix 1 3 6 7 3 4 7 9 5 7 8 0 v1 = c(1,3,5) v2 = c(3,4,7) v3 = c(6,7,8) v4 = c(7,9,0) m = cbind(v1,v2,v3,v4) m Reference to matrix > m • Row first, column later v1 v2 v3 v4 [1,] 1 3 6 7 • Flexible in R [2,] 3 4 7 9 [3,] 5 7 8 0 > m[2,3] v3 7 > m[,2:3] v2 v3 > m[1,] [1,] 3 6 v1 v2 v3 v4 [2,] 4 7 1 3 6 7 [3,] 7 8 > m[1:2,] > m[,3:4]*m[1,2] v1 v2 v3 v4 v3 v4 [1,] 1 3 6 7 [1,] 18 21 [2,] 3 4 7 9 [2,] 21 27 [3,] 24 0 Dataframe Dataset in R = “Dataframe” = matrix fields, columns, variables ID Gender Math Reading 1 F 5 8 2 M 5 2 rows records 3 F 7 3 observaons 4 F 8 6 numeric character numeric numeric Reference to field/column in a dataframe • Dataframe should be attached prior to analysis • Reference to field: (dataframe name)$(field name) • Example: v1 = c(1,3,5) v2 = c(3,4,7) v3 = c(6,7,8) v4 = c(7,9,0) dat = data.frame(v1, v2, v3, v4) attach(dat) dat$sum = dat$v1 + dat$v3 sum1 = v1 + v3 dat The effect of $ v1 = c(1,3,5) > dat v2 = c(3,4,7) v1 v2 v3 v4 sum v3 = c(6,7,8) 1 1 3 6 7 7 v4 = c(7,9,0) 2 3 4 7 9 10 dat=data.frame(v1,v2,v3,v4) 3 5 7 8 0 13 attach(dat) dat$sum = dat$v1 + dat$v3 There is NO sum1 ! sum1 = v1 + v3 dat Data coding in R id = c(1, 2, 3, 4, 5) gender = c("male", "female", "male", "female", "female") dat = data.frame(id, gender) We want to create a new variable called sex with numeric values (1, 2) dat$sex[gender=="male"] <- 1 dat$sex[gender=="female"] <- 2 Character and numeric coding Character to numeric X = c("1", "2", "3", "4", "5") We want to create a new variable called Y with numeric values (for calculaon) Y = as.numeric(X) mean(Y) Numeric to character Y = 1:10 We want to create a new variable called X with character values X = as.character(Y) Sorting dat: sort() X = rnorm(10); X [1] 1.5651300 -0.5382971 -0.1995302 1.0111098 0.3590144 -1.5245237 [7] -0.3192534 0.1323256 -0.7916954 -0.0664167 sort(X) [1] -1.5245237 -0.7916954 -0.5382971 -0.3192534 -0.1995302 -0.0664167 [7] 0.1323256 0.3590144 1.0111098 1.5651300 Merging datasets id = c(1,2,3,4) id = c(1,2,3,4,5) sex=c("M","F","M","F") age=c(21,34,45,32,18) dat1=data.frame(id,sex) dat2=data.frame(id,age) dat = merge(dat1, dat2, by="id") dat = merge(dat1, dat2, by="id", all.x=T, all.y=T) An R Session (demo) To work with R … • R, like most stas$cal programs, works on observaons (rows) and variables • You should keep in mind – Name of dataframe – Name of variables Allison and Cichhetti’s study Trueb Allison; Domenic V. Cicche. Sleep in Mammals: Ecological and Cons$tu$onal Correlates. Science 1976; 194:732-734. R Session • Reading a file into R for analysis Filename: allison.csv • Some graphical analyses • Some descrip$ve (and not so descrip$ve) analyses Allison T, Cicchetti DV (1976). Sleep in mammals: ecological and constitutional correlates. Science 194, 732–734. NonDrea Species BodyWt BrainWt ming Dreaming TotalSleep LifeSpan Gestaon Predaon Exposure Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3 Africangiantpouchedrat 1 6.6 6.3 2 8.3 4.5 42 3 1 3 ArccFox 3.385 44.5 NA NA 12.5 14 60 1 1 1 Arccgroundsquirrel 0.92 5.7 NA NA 16.5 NA 25 5 2 3 Asianelephant 2547 4603 2.1 1.8 3.9 69 624 3 5 4 Baboon 10.55 179.5 9.1 0.7 9.8 27 180 4 4 4 Bigbrownbat 0.023 0.3 15.8 3.9 19.7 19 35 1 1 1 Braziliantapir 160 169 5.2 1 6.2 30.4 392 4 5 4 Cat 3.3 25.6 10.9 3.6 14.5 28 63 1 2 1 Chimpanzee 52.16 440 8.3 1.4 9.7 50 230 1 1 1 Chinchilla 0.425 6.4 11 1.5 12.5 7 112 5 4 4 Cow 465 423 3.2 0.7 3.9 30 281 5 5 5 Deserthedgehog 0.55 2.4 7.6 2.7 10.3 NA NA 2 1 2 Donkey 187.1 419 NA NA 3.1 40 365 5 5 5 EasternAmericanmole 0.075 1.2 6.3 2.1 8.4 3.5 42 1 1 1 Reading file csv • Locate your folder and filename • Use the func$on read.csv • In Mac, you simply drag the filename to the R command line dat = read.csv("~/Dropbox/Garvan Lectures 2014/Datasets and Teaching Materials/ allison.csv", header=T, na.strings="NA") Reading file through file.choose() f = file.choose() # find the file dat = read.csv(f, header=T, na.strings="NA") attach(dat) # aach the data before analysis names(dat) # want to know variable names dim(dat) # how many rows and columns? summary(dat) # summarize data Summary: an overall “picture” > summary(dat) Species BodyWt BrainWt Africanelephant : 1 Min.
Recommended publications
  • STAT 3304/5304 Introduction to Statistical Computing
    STAT 3304/5304 Introduction to Statistical Computing Statistical Packages Some Statistical Packages • BMDP • GLIM • HIL • JMP • LISREL • MATLAB • MINITAB 1 Some Statistical Packages • R • S-PLUS • SAS • SPSS • STATA • STATISTICA • STATXACT • . and many more 2 BMDP • BMDP is a comprehensive library of statistical routines from simple data description to advanced multivariate analysis, and is backed by extensive documentation. • Each individual BMDP sub-program is based on the most competitive algorithms available and has been rigorously field-tested. • BMDP has been known for the quality of it’s programs such as Survival Analysis, Logistic Regression, Time Series, ANOVA and many more. • The BMDP vendor was purchased by SPSS Inc. of Chicago in 1995. SPSS Inc. has stopped all develop- ment work on BMDP, choosing to incorporate some of its capabilities into other products, primarily SY- STAT, instead of providing further updates to the BMDP product. • BMDP is now developed by Statistical Solutions and the latest version (BMDP 2009) features a new mod- ern user-interface with all the statistical functionality of the classic program, running in the latest MS Win- dows environments. 3 LISREL • LISREL is software for confirmatory factor analysis and structural equation modeling. • LISREL is particularly designed to accommodate models that include latent variables, measurement errors in both dependent and independent variables, reciprocal causation, simultaneity, and interdependence. • Vendor information: Scientific Software International http://www.ssicentral.com/ 4 MATLAB • Matlab is an interactive, matrix-based language for technical computing, which allows easy implementation of statistical algorithms and numerical simulations. • Highlights of Matlab include the number of toolboxes (collections of programs to address specific sets of problems) available.
    [Show full text]
  • El Modelo Lineal Sin T ´Ermino Independiente Y El Coeficiente De
    QUESTII¨ O´ , vol. 22, 1, p. 3-37, 1998 EL MODELO LINEAL SIN TERMINO´ INDEPENDIENTE Y EL COEFICIENTE DE DETERMINACION.´ UN ESTUDIO MONTE CARLO RAFAELA DIOS PALOMARES Universidad de C´ordoba En el presente trabajo se analiza y compara mediante un experimento Monte Carlo el comportamiento de cinco expresiones para el Coeficien- te de Determinacion´ cuando el modelo lineal se especifica sin termino´ in- dependiente. Se ensayan distintos valores del parametro´ poblacional P2, que mide la proporcion´ de varianza explicada por el modelo, introducien- do tambien´ la multicolinealidad como factor de variacion´ en el diseno.˜ Se confirma el coeficiente propuesto por Heijmans y Neudecker (1987) y el de Barten (1987), como idoneos´ para medir la bondad del modelo. The linear model without a constant term and the coefficient of deter- mination. A Monte Carlo study. Palabras clave: Coeficiente de determinaci´on, bondad de ajuste, modelo lineal sin t´ermino independiente, m´etodo Monte Carlo. Clasificacion´ AMS: (MSC): 62J20 *Rafaela Dios Palomares. Dpto. de Estad´ıstica e Investigaci´on Operativa. Escuela T´ecnica Superior de Ingenieros Agr´onomos y de Montes de la Universidad de C´ordoba. –Recibido en mayo de 1996. –Aceptado en octubre de 1997. 3 1. INTRODUCCION´ La econometr´ıa emp´ırica tiene como objetivo fundamental llegar a la estimaci´on de un modelo econom´etrico que represente el comportamiento conjunto de las varia- bles econ´omicas objeto de estudio. Dicha estimaci´on debe ser contrastada tanto para verificar el acierto en la previa especificaci´on del modelo, como para admitir el cum- plimiento de las hip´otesis supuestas al mismo.
    [Show full text]
  • Visualizing Multivariate Data: Graphs That Tell Stories
    IASE 2020 Roundtable Paper – Refereed Engel, Campos, Nicholson, Ridgway & Teixeira VISUALIZING MULTIVARIATE DATA: GRAPHS THAT TELL STORIES Joachim Engel1, Pedro Campos2, James Nicholson3, Jim Ridgway3, and Sónia Teixeira2 1Ludwigsburg University of Education, Germany 2University of Porto, Portugal 3University of Durham, UK [email protected] Important statistical ideas can be introduced via visualizations without heavy mathematics, hence can become accessible to a broader citizenry. Along a few selected examples, from historical to modern, with technology-based data visualizations, we highlight the potential of data visualizations to enhance students’ capacity to reason with complex data and discuss the role of visualization as a tool to strengthen civic participation in democracy. BACKGROUND Visual representations are a central means of conveying information, illuminating facts, supporting the user in recognizing patterns and gaining insights into difficult concepts (see, e.g., Chambers et al., 1983; Chance et al., 2007; Tishkovskaja & Lancaster, 2012). A Graph can provide a compelling approach to statistical thinking that focuses on important concepts rather than formal mathematics and procedures (Biehler, 1993; Nolan & Perrett, 2016). Graphical methods provide powerful diagnostic tools for confirming assumptions, or, when assumptions are not met for suggesting corrective actions. Therefore, creating meaningful data visualizations to communicate information is an important skill in its own right. It is an important mean of informing citizens about governance and presenting evidence about the state of the world in order to raise awareness for injustices and structural social inequalities or burning problems like global warming or demographic change.The simulation to illustrate the outbreak of COVID-19 and the effect of social distancing, published by the Washington Post, is another striking example (see https://www.washingtonpost.com/graphics/2020/world/corona-simulator/).
    [Show full text]
  • Recent Statistical Software
    PRODUCT LISTING Recent Statistical Software The following infonnation is from press releases or other material provided by publishers. DOS SYSTAT Version 5.0 Documentation for DOS SYSTAT 5.0 is entirely re­ SYSTAT, Inc., announces the release ofSYSTAT Ver­ written and is in four paperback manuals-Getting Staned, sion 5.0, a comprehensive statistical analysis, data man­ Data, Graphics, and Statistics. Each chapter introduces agement, and graphics software package for ffiM PCIATs the statistical concepts behind the procedure, with step­ and DOS-compatible systems with 640K RAM and a by-step instructions and examples. Tips and shortcuts give hard disk. hints that make new users productive quickly. SYSTAT Version 5.0 integrates graphics and analysis SYSTAT Version 5.0 requires an ffiM-eompatible sys­ procedures through its menu-driven interface. New pull­ tem running MS/PC-DOS 3.0 or higher, with 640K RAM down hierarchical menus organize procedures intuitively, and at least a 20MB hard disk. SYSTAT supports all com­ with a command window available at any time should the mon video displays and popular hard-eopy devices such user wish to edit commands. Context-sensitive help is as the HP Laserjet, ffiM and HP plotters, Epson, Toshiba, available throughout. All graphics procedures support and other printers. DOS SYSTAT Version 5.0 costs $895. color output, with default settings fully controllable by Current users ofSYSTAT on DOS systems may upgrade the user. to Version 5.0 for $195. In addition to standard graphs, plots, bar charts, histo­ For more infonnation, contact SYSTAT, Inc., 1800 grams, and scatterplots, DOS SYSTAT 5.0 features Sherman Ave., Suite 801, Evanston, IL60201 (phone: 708­ graphs rarely found in other commercial packages.
    [Show full text]
  • Cumulation of Poverty Measures: the Theory Beyond It, Possible Applications and Software Developed
    Cumulation of Poverty measures: the theory beyond it, possible applications and software developed (Francesca Gagliardi and Giulio Tarditi) Siena, October 6th , 2010 1 Context and scope Reliable indicators of poverty and social exclusion are an essential monitoring tool. In the EU-wide context, these indicators are most useful when they are comparable across countries and over time. Furthermore, policy research and application require statistics disaggregated to increasingly lower levels and smaller subpopulations. Direct, one-time estimates from surveys designed primarily to meet national needs tend to be insufficiently precise for meeting these new policy needs. This is particularly true in the domain of poverty and social exclusion, the monitoring of which requires complex distributional statistics – statistics necessarily based on intensive and relatively small- scale surveys of households and persons. This work addresses some statistical aspects relating to improving the sampling precision of such indicators in EU countries, in particular through the cumulation of data over rounds of regularly repeated national surveys. 2 EU-SILC The reference data for this purpose are EU Statistics on Income and Living Conditions, the major source of comparative statistics on income and living conditions in Europe. A standard integrated design has been adopted by nearly all EU countries. It involves a rotational panel, with a new sample of households and persons introduced each year to replace one-fourth of the existing sample. Persons enumerated in each new sample are followed-up in the survey for four years. The design yields each year a cross- sectional sample, as well as longitudinal samples of 2, 3 and 4 year duration.
    [Show full text]
  • A Review of Two Different Approaches for the Analysis of Growth Data Using
    THE STATISTICAL SOFTWARENEWSLETTER 583 A review of two different approaches for the analysis of growth data using longitudinal mixed linear models: Comparing hierarchical linear regression (ML3, HLM) and repeated measures designs with structured covariance matrices (BMDP5V) Rien van der Leeden 1), Karen Vrijburg 1) & Jan de Leeuw~) 1~Department of Psychometrics and Research Methodology, University of Leiden, Postbus 9555, 2300 RB Leiden,The Netherlands 2)UCLA Department of Statistics, University of California at Los Angeles, USA Abstract: In this paper we review two approaches for method to handle growth data, theoretically, as well as the analysis of growth data by means of longitudinal in a practical sense. For the most part, shortcomings mixed linear models. In these models the individual are induced by the accompanying software, developed growth parameters, (most often) specifying polynomial within different scientific traditions. Applied to com- growth curves, may vary randomly across individuals. parable problems, the three programs produce This variation may in turn be accounted for by explain- equivalent results. ing variables. Keywords: Multilevel analysis, Mixed linear models, The first approach we discuss, is a type of multilevel MANOVA, Repeated mesaures, Growth curve models, model in which growth data are treated as having a hi- Structured covariance matrices. erarchical slructure: measurements are 'nested' within (SSNinCSDA 20, 583-605 (1996)) individuals. The second is a version of a MANOVA repeated measures model employing a structured Received: March 1995 Revised: February 1996 (error)covariance matrix. Of both approaches we ex- amine the underlying statistical models and their inter- I. Introduction relations. Apart from this theoretical comparison we review software by which they can be applied for real In social science one frequently encounters re- data analysis: two multilevel programs, ML3 and search situations in which individuals are meas- HLM, and one repeated measures program, BMDP5V.
    [Show full text]
  • Porcelain & Ceramic Products (B2B Procurement)
    Porcelain & Ceramic Products (B2B Procurement) Purchasing World Report Since 1979 www.datagroup.org Porcelain & Ceramic Products (B2B Procurement) Porcelain & Ceramic Products (B2B Procurement) 2 B B Purchasing World Report Porcelain & Ceramic Products (B2B Procurement) The Purchasing World Report is an extract of the main database and provides a number of limited datasets for each of the countries covered. For users needing more information, detailed data on Porcelain & Ceramic Products (B2B Procurement) is available in several Editions and Database versions. Users can order (at a discount) any other Editions, or the Database versions, as required from the After-Sales Service or from any Dealer. This research provides data the Buying of Materials, Products and Services used for Porcelain & Ceramic Products. Contents B2B Purchasing World Report ................................................................................................................................... 2 B2B Purchasing World Report Specifications ............................................................................................................ 4 Materials, Products and Services Purchased : US$ ........................................................................................... 4 Report Description .................................................................................................................................................. 6 Tables ....................................................................................................................................................................
    [Show full text]
  • (Microsoft Powerpoint
    Acil T ıp’ta İstatistik Sunum Plan ı IV. Acil Tıp Asistan Sempozyumu 19-21 Haziran 2009 Biyoistatistik nedir Sa ğlık alan ında biyoistatistik Acil T ıp’ta biyoistatistik Meral Leman Almac ıoğlu İstatistik yaz ılım programlar ı Uluda ğ Üniversitesi T ıp Fakültesi Acil T ıp AD Bursa Biyoistatistik Nedir Biyoistatistik Niçin Gereklidir 1. Biyolojik, laboratuar ve klinik verilerdeki yayg ınl ık ı ğ ı Biyoistatistik; t p ve sa l k bilimleri 2. Verilerin anla şı lmas ı alanlar ında veri toplanmas ı, özetleme, 3. Yorumlanmas ı analiz ve de ğerlendirmede istatistiksel 4. Tıp literatürünün kriti ğinin yap ılmas ı yöntemleri kullanan bilim dal ı 5. Ara ştırmalar ın planlanmas ı, gerçekle ştirilmesi, analiz ve yorumlanmas ı Biyoistatistiksel teknikler kullan ılmadan gerçekle ştirilen ara ştırmalar bilimsel ara ştırmalar de ğildir Acil’de İstatistik Sa ğlık istatistikleri sa ğlık çal ış anlar ının verdi ği bilgilerden derlenmekte Acil Servis’in hasta yo ğunlu ğunun y ıl-ay-gün-saat baz ında de ğerlendirilmesi bu veriler bir ülkede sa ğlık hizmetlerinin planlanmas ı Çal ış ma saatlerinin ve çal ış mas ı gereken ki şi say ısının ve de ğerlendirmesinde kullan ılmakta planlanmas ı Gerekli malzeme, yatak say ısı, ilaç vb. planlanmas ı Verilen hizmetin kalitesinin ölçülmesi İyi bir biyoistatistik eğitim alan sa ğlık personelinin o Eğitimin kalitesinin ölçülmesi ülkenin sa ğlık informasyon sistemlerine güvenilir Pandemi ve epidemilerin tespiti katk ılarda bulunmas ı beklenir Yeni çal ış malar, tezler … İstatistik Yaz ılım Programlar ı İİİstatistiksel
    [Show full text]
  • Generalizability Theory in R
    Practical Assessment, Research, and Evaluation Volume 24 Volume 24, 2019 Article 5 2019 Generalizability Theory in R Alan Huebner Marisa Lucht Follow this and additional works at: https://scholarworks.umass.edu/pare Recommended Citation Huebner, Alan and Lucht, Marisa (2019) "Generalizability Theory in R," Practical Assessment, Research, and Evaluation: Vol. 24 , Article 5. DOI: https://doi.org/10.7275/5065-gc10 Available at: https://scholarworks.umass.edu/pare/vol24/iss1/5 This Article is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Practical Assessment, Research, and Evaluation by an authorized editor of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. Huebner and Lucht: Generalizability Theory in R A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. PARE has the right to authorize third party reproduction of this article in print, electronic and database forms. Volume 24 Number 5, July 2019 ISSN 1531-7714 Generalizability Theory in R Alan Huebner and Marisa Lucht, University of Notre Dame Generalizability theory is a modern, powerful, and broad framework used to assess the reliability, or dependability, of measurements. While there exist classic works that explain the basic concepts and mathematical foundations of the method, there is currently a lack of resources addressing computational resources for those researchers wishing to apply generalizability in practice.
    [Show full text]
  • Introduction to Astrostatistics and R
    Introduction to Astrostatistics and R Eric Feigelson Penn State University Cosmology on the Beach, 2014 Lecture #1 My credentials Professor of Astronomy & Astrophysics and of Statistics Assoc Director, Center for Astrostatistics at Penn State Scientific Editor (methodology), Astrophysical Journal Chair, IAU Working Group in Astrostatistics & Astroinformatics Councils, Intl Astrostatistics Assn, AAS/WGAA, LSST/ISSC Lead author, MSMA textbook (2012 PROSE Award) Lead editor, Astrostatistics & Astroinformatics Portal Observational astronomer on small-scale structure at z<0.000001 Outline Introduction to astrostatistics – Role of statistics in astronomy – History of astrostatistics – Status of astrostatistics today Introduction to R – History of statistical computing – The R language & CRAN packages – Sample R script What is astronomy? Astronomy is the observational study of matter beyond Earth: planets in the Solar System, stars in the Milky Way Galaxy, galaxies in the Universe, and diffuse matter between these concentrations. Astrophysics is the study of the intrinsic nature of astronomical bodies and the processes by which they interact and evolve. This is an indirect, inferential intellectual effort based on the assumption that physics – gravity, electromagnetism, quantum mechanics, etc – apply universally to distant cosmic phenomena. What is statistics? (No consensus !!) Statistics characterizes and generalizes data – “… briefly, and in its most concrete form, the object of statistical methods is the reduction of data” (R. A. Fisher, 1922)
    [Show full text]
  • Industrial Boilers & Pressure Vessels World Report
    Industrial Boilers & Pressure Vessels World Report established in 1974, and a brand since 1981. www.datagroup.org Industrial Boilers & Pressure Vessels World Report Database Ref: M05014_M This database is updated monthly. Industrial Boilers & Pressure Vessels World Report INDUSTRIAL BOILERS WORLD REPORT The Industrial Boilers and Pressure Vessels Report has the following information. The base report has 59 chapters, plus the Excel spreadsheets & Access databases specified. This research provides World Data on Industrial Boilers and Pressure Vessels. The report is available in several Editions and Parts and the contents and cost of each part is shown below. The Client can choose the Edition required; and subsequently any Parts that are required from the After-Sales Service. Contents Description ....................................................................................................................................... 5 REPORT EDITIONS ........................................................................................................................... 6 World Report ....................................................................................................................................... 6 Regional Report ................................................................................................................................... 6 Country Report .................................................................................................................................... 6 Town & Country Report ......................................................................................................................
    [Show full text]
  • Comparison of Statistical Packages 1 Comparison of Statistical Packages
    Comparison of statistical packages 1 Comparison of statistical packages The following tables compare general and technical information for a number of statistical analysis packages. General information Basic information about each product (developer, license, user interface etc.). Price note [1] indicates that the price was promotional (so higher prices may apply to current purchases), and note [2] indicates that lower/penetration pricing is offered to academic purchasers (e.g. give-away editions of some products are bundled with some student textbooks on statistics). Product Example(s) Developer Latest version Cost (USD) Open Software Interface Written Scripting source license in languages ADaMSoft Marco Scarno May 5, 2012 Free Yes GNU GPL CLI/GUI Java Analyse-it Analyse-it $185–495 No Proprietary GUI VSN October 2009 >$150 Proprietary CLI ASReml No International Statistical $1095 Proprietary BMDP No Solutions Alan Heckert March 2005 Public CLI/GUI Dataplot Free Yes domain Centers for January 26, Public CLI/GUI Visual Disease 2011 domain Basic Epi Info Free Yes Control and Prevention IHS November student: $40 / acad: Proprietary CLI/GUI EViews No 2011 $425 / comm: $1075 Aptech October 2011 Proprietary CLI/GUI GAUSS No systems VSN July 2011 >$190 Proprietary CLI/GUI GenStat No International GraphPad GraphPad Feb. 2009 $595 Proprietary GUI No Prism Software, Inc. The gretl December 22, GNU GPL CLI/GUI C gretl Team 2011 Free Yes SAS Institute October, 2010 $1895 (commercial) Proprietary GUI/CLI JSL (JMP $29.95/$49.95 Scripting JMP No (student) $495 for Language) H.S. site licence Maplesoft March 28, 2012 $2275 (commercial), Proprietary CLI/GUI Maple No $99 (student) Wolfram 8.0.4, October $2,495 Proprietary CLI/GUI Research 2011 (Professional), $1095 (Education), Mathematica $140 (Student), No $69.95 (Student [3] annual license) [4] $295 (Personal) Comparison of statistical packages 2 The New releases Depends on many Proprietary CLI/GUI Java MATLAB No MathWorks twice per year things.
    [Show full text]