Descriptive Statistics and ANOVA

Total Page:16

File Type:pdf, Size:1020Kb

Descriptive Statistics and ANOVA Basic statistics Descriptive statistics and ANOVA Thomas Alexander Gerds Department of Biostatistics, University of Copenhagen Contents I Data are variable I Statistical uncertainty I Summary and display of data I Confidence intervals I ANOVA Data are variable A statistician is used to receive a value, such as 3.17 %, together with an explanation, such as "this is the expression of 1-B6.DBA-GTM in mouse 12". The value from the next mouse in the list is 4.88% . The measurement is difficult Data processing is done by humans Two mice have different genes They are exposed . and treated differently Decomposing variance Variability of data is usually a composite of I Measurement error, sampling scheme I Random variation I Genotype I Exposure, life style, environment I Treatment Statistical conclusions can often be obtained by explaining the sources of variation in the data. Example 1 In the yeast experiment of Smith and Kruglyak (2008) 1 transcript levels were profiled in 6 replicates of the same strain called ’RM’ in glucose under controlled conditions. 1the article is available at http://biology.plosjournals.org Example 1 Figure: Sources of the variation of these 6 values I Measurement error I Random variation Example 1 In the same yeast experiment Smith and Kruglyak (2008) profiled also 6 replicates of a different strain called ’By’ in glucose.The order in which the 12 samples were processed was at random to minimize a systematic experimental effect. Example 1 Figure: Sources of the variation of these 12 values I Measurement error I Study design/experimental environment I Genotype Example 1 Furthermore, Smith and Kruglyak (2008) cultured 6 ’RM’ and 6 ’By’ replicates in ethanol.The order in which the 24 samples were processed was random to minimize a systematic experimental effect. Sources of variation Figure: Sources of variation I Measurement error I Experimental environment I Genes I Exposure, environmental factors Example 2 Festing and Weigler in the Handbook of Laboratory Animal Science ... consider the results of an experiment using a completely randomized design... in which adult C57BL/6 mice were randomly allocated to one of four dose levels of a hormone compound. The uterus weight was measured after an appropriate time interval. Example 2 Figure: Example 2 Figure: Example 2 Figure: Example 2 Conclusions from the figures I The uterus weight depends on the dose I The variation of the data increases with increasing dose Question: Why could these first conclusions be wrong? Descriptive statistics Descriptive statistics (summarizing data) Categorical variables: count (%). Continuous variables: I raw values (if n is small) I range (min, max) I location: median (IQR=inter quartile range) I location: means (SD) Sample: Table 1 2 2Quality of life (QOL), supportive care, and spirituality in hematopoietic stem cell transplant (HSCT) patients. Sirilla & Overcash. Supportive Care in Cancer, October 2012. Sample: Table 1 R excursion: calculating descriptive statistics in groups library(Publish) library(data.table) data(Diabetes) setDT(Diabetes)## make data.table Diabetes[,.(mean.age=mean(age), sd.age=sd(age),median. chol=median(chol,na.rm=TRUE)),by=location] location mean.age sd.age median.chol 1: Buckingham 47.07500 16.74849 202 2: Louisa 46.63054 15.90929 206 R excursion: making table one library(Publish) data(Diabetes) tab1 <- summary(utable(location∼gender + age + Q(chol) + BMI, data=Diabetes)) tab1 Variable Level Buckingham (n=200) Louisa (n=203) Total (n=403) p-value gender female 114 (57.0) 120 (59.1) 234 (58.1) male 86 (43.0) 83 (40.9) 169 (41.9) 0.7422 age mean (sd) 47.1 (16.7) 46.6 (15.9) 46.9 (16.3) 0.7847 chol median [iqr] 202.0 [174.0, 231.0] 206.0 [183.5, 229.0] 204.0 [179.0, 230.0] 0.2017 missing 1 0 1 BMI mean (sd) 28.6 (7.0) 29.0 (6.2) 28.8 (6.6) 0.5424 missing 3 3 6 Method 2: Use kable3 and include in dynamic report4 ‘‘‘{r,results=’asis’} knitr::kable(tab1) ‘‘‘ R excursion: exporting a table Method 1: Write table to file write.csv(tab1,file="tables/tab1.csv") Then open file tab1.csv with Excel 3https://cran.r-project.org/web/packages/kableExtra/vignettes/ awesome_table_in_html.html 4https://www.rdocumentation.org/packages/knitr/versions/1.17/ topics/kable R excursion: exporting a table Method 1: Write table to file write.csv(tab1,file="tables/tab1.csv") Then open file tab1.csv with Excel Method 2: Use kable3 and include in dynamic report4 ‘‘‘{r,results=’asis’} knitr::kable(tab1) ‘‘‘ 3https://cran.r-project.org/web/packages/kableExtra/vignettes/ awesome_table_in_html.html 4https://www.rdocumentation.org/packages/knitr/versions/1.17/ topics/kable Dynamite plots are depreciated (DO NOT USE) Exercise I Read and discuss the documentation of why dynamite plots are not good: http://biostat.mc.vanderbilt.edu/wiki/Main/DynamitePlots Dot plots are appreciated when n is small 3 2 ● ● 1 ● ● ● 0 ● ● ● ● Measurement scale ● −1 −2 −3 A B C Figure: Group A (n=3), group B (n=3, one replicate), group C (n=4) Box plots are appreciated when n is large 4 ● ● ● ● ● ● ● ● 2 0 Measurement scale −2 ● ● ● ● −4 A B C Figure: Group A (n=300), group B (n=400), group C (n=400) Making boxplots with ggplot2 library(ggplot2) bp <- ggplot(Diabetes, aes(location,chol)) bp <- bp + geom_boxplot(aes(fill=location)) print(bp) Find the ggplot2 cheat sheet via help menu in Rstudio Making boxplots with ggplot2 ● ● 400 ● ● ● ● ● ● 300 ● location Buckingham chol Louisa 200 ● 100 ● Buckingham Louisa location Making boxplots with ggplot2 bp+facet_grid(.∼gender) female male ● ● 400 ● ● ● ● ● ● 300 ● location Buckingham chol Louisa 200 ● 100 ● Buckingham Louisa Buckingham Louisa location Making dotplots with ggplot2 dp <- ggplot(mice,aes(x=Dose,fill=Dose,y=BodyWeight)) dp <- dp + geom_dotplot(binaxis="y") print(dp) ● ●●● 13 ● ● ●● ● Dose ● 0 12 ● ● ● ● 1 ● 2.5 ● 7.5 BodyWeight ● ● 50 ● ● 11 ● ● ● ● 10 ● 0 1 2.5 7.5 50 Dose R excursion: exporting a figure Write figure to pdf (vector graphics, also eps 5, infinite zoom) ggsave(dp,file="dotplot-mice-bodyweight.pdf") # or pdf("figures/dotplot-mice-bodyweight.pdf") dp dev.off() Write figure to jpg (image file, also tiff, giff etc) jpeg("figures/dotplot-mice-bodyweight.jpg") dp dev.off() 5postscript("figures/dotplot-mice-bodyweight.eps") Quantifying variability A sample of data X1;:::; XN has a standard deviation (sd); it is defined by v u N N u 1 X 1 X SD = t (X − X )2; X = X N − 1 i N i i=1 i=1 SD measures the variability of the measurements in the sample. The variance of the sample is defined as SD2. The term ’standard deviation’ relates to the normal distribution. Normal distribution What is so special about the normal distribution? I It is symmetric around the mean, thus the mean is equal to the median. I The mean is the most likely value. Mean and standard deviation describe the full destribution. I The distribution of measurements, like height, distance, volume is often normal. I The distribution of statistics, like mean, proportion, mean difference, etc. are very often approximately normal. Quantifying statistical uncertainty For statistical inference and conclusion making, via p-values and confidence intervals, it is crucial to quantify the variability of the statistic (mean, proportion, mean difference, risk ratio, etc.): The standard error is the standard deviation of the statistic. The standard error is a measure of the statistical uncertainty. Illustration Population: Mean = 3.81 Illustration Population: Mean = 2.13 Mean = 3.81 Illustration Population: Mean = 2.13 Mean = 3.81 Mean = 4.01 Based on N = 4 values, 0.012, 0.0088, 0.0069, 0.009: I mean: β^ = 0:0091 I standard deviation: SD = 0:002108 I empirical variance: var = 0:0000044 I standard error: SE = 0:002108=2 = 0:001054 Quantifying statistical uncertainty Example: We want to estimate the unknown mean uterus weight for untreated mice. The standard error of the mean is defined as p SE = SD= N where N is the sample size: Quantifying statistical uncertainty Example: We want to estimate the unknown mean uterus weight for untreated mice. The standard error of the mean is defined as p SE = SD= N where N is the sample size: Based on N = 4 values, 0.012, 0.0088, 0.0069, 0.009: I mean: β^ = 0:0091 I standard deviation: SD = 0:002108 I empirical variance: var = 0:0000044 I standard error: SE = 0:002108=2 = 0:001054 The standard error is the standard deviation of the mean 0.015 The unknown true average uterus 0.010 ● weight ● ● ● Uterus (g) weight 0.005 0.000 Our Hypothetical Hypothetical Hypothetical study study 1 study 47 study 100 The (hypothetical) mean values are approximately normally distributed, even if the data are not normally distributed! Variance vs statistical uncertainty "’The terms standard error and standard deviation are often confused. The contrast between these two terms reflects the important distinction between data description and inference, one that all researchers should appreciate."’ 6 Rules: I The higher the unexplained variability of the data, the higher the statistical uncertainty. I The higher the sample size, the lower the statistical uncertainty. 6Altman & Bland, Statistics Notes, BMJ, 2005, Nagele P, Br J Anaesthesiol 2003;90: 514-6 Confidence intervals Constructing confidence limits A 95% confidence interval for the parameter β is [β^ − 1:96 ∗ SE; β^ + 1:96 ∗ SE] Example: a confidence interval for the mean uterus weight of untreated mice is given by 95%CI = [0:0091 − 1:96 ∗ 0:001054; 0:0091 + 1:96 ∗ 0:001054] = [0:007; 0:011]: The standard error SE measures the variability of the mean β^ around the (unknown) population value β, under the assumption that the model is correctly specified.
Recommended publications
  • Introduction to Biostatistics
    Introduction to Biostatistics Jie Yang, Ph.D. Associate Professor Department of Family, Population and Preventive Medicine Director Biostatistical Consulting Core Director Biostatistics and Bioinformatics Shared Resource, Stony Brook Cancer Center In collaboration with Clinical Translational Science Center (CTSC) and the Biostatistics and Bioinformatics Shared Resource (BB-SR), Stony Brook Cancer Center (SBCC). OUTLINE What is Biostatistics What does a biostatistician do • Experiment design, clinical trial design • Descriptive and Inferential analysis • Result interpretation What you should bring while consulting with a biostatistician WHAT IS BIOSTATISTICS • The science of (bio)statistics encompasses the design of biological/clinical experiments the collection, summarization, and analysis of data from those experiments the interpretation of, and inference from, the results How to Lie with Statistics (1954) by Darrell Huff. http://www.youtube.com/watch?v=PbODigCZqL8 GOAL OF STATISTICS Sampling POPULATION Probability SAMPLE Theory Descriptive Descriptive Statistics Statistics Inference Population Sample Parameters: Inferential Statistics Statistics: 흁, 흈, 흅… 푿ഥ , 풔, 풑ෝ,… PROPERTIES OF A “GOOD” SAMPLE • Adequate sample size (statistical power) • Random selection (representative) Sampling Techniques: 1.Simple random sampling 2.Stratified sampling 3.Systematic sampling 4.Cluster sampling 5.Convenience sampling STUDY DESIGN EXPERIEMENT DESIGN Completely Randomized Design (CRD) - Randomly assign the experiment units to the treatments
    [Show full text]
  • Experimentation Science: a Process Approach for the Complete Design of an Experiment
    Kansas State University Libraries New Prairie Press Conference on Applied Statistics in Agriculture 1996 - 8th Annual Conference Proceedings EXPERIMENTATION SCIENCE: A PROCESS APPROACH FOR THE COMPLETE DESIGN OF AN EXPERIMENT D. D. Kratzer K. A. Ash Follow this and additional works at: https://newprairiepress.org/agstatconference Part of the Agriculture Commons, and the Applied Statistics Commons This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License. Recommended Citation Kratzer, D. D. and Ash, K. A. (1996). "EXPERIMENTATION SCIENCE: A PROCESS APPROACH FOR THE COMPLETE DESIGN OF AN EXPERIMENT," Conference on Applied Statistics in Agriculture. https://doi.org/ 10.4148/2475-7772.1322 This is brought to you for free and open access by the Conferences at New Prairie Press. It has been accepted for inclusion in Conference on Applied Statistics in Agriculture by an authorized administrator of New Prairie Press. For more information, please contact [email protected]. Conference on Applied Statistics in Agriculture Kansas State University Applied Statistics in Agriculture 109 EXPERIMENTATION SCIENCE: A PROCESS APPROACH FOR THE COMPLETE DESIGN OF AN EXPERIMENT. D. D. Kratzer Ph.D., Pharmacia and Upjohn Inc., Kalamazoo MI, and K. A. Ash D.V.M., Ph.D., Town and Country Animal Hospital, Charlotte MI ABSTRACT Experimentation Science is introduced as a process through which the necessary steps of experimental design are all sufficiently addressed. Experimentation Science is defined as a nearly linear process of objective formulation, selection of experimentation unit and decision variable(s), deciding treatment, design and error structure, defining the randomization, statistical analyses and decision procedures, outlining quality control procedures for data collection, and finally analysis, presentation and interpretation of results.
    [Show full text]
  • Double Blind Trials Workshop
    Double Blind Trials Workshop Introduction These activities demonstrate how double blind trials are run, explaining what a placebo is and how the placebo effect works, how bias is removed as far as possible and how participants and trial medicines are randomised. Curriculum Links KS3: Science SQA Access, Intermediate and KS4: Biology Higher: Biology Keywords Double-blind trials randomisation observer bias clinical trials placebo effect designing a fair trial placebo Contents Activities Materials Activity 1 Placebo Effect Activity Activity 2 Observer Bias Activity 3 Double Blind Trial Role Cards for the Double Blind Trial Activity Testing Layout Background Information Medicines undergo a number of trials before they are declared fit for use (see classroom activity on Clinical Research for details). In the trial in the second activity, pupils compare two potential new sunscreens. This type of trial is done with healthy volunteers to see if the there are any side effects and to provide data to suggest the dosage needed. If there were no current best treatment then this sort of trial would also be done with patients to test for the effectiveness of the new medicine. How do scientists make sure that medicines are tested fairly? One thing they need to do is to find out if their tests are free of bias. Are the medicines really working, or do they just appear to be working? One difficulty in designing fair tests for medicines is the placebo effect. When patients are prescribed a treatment, especially by a doctor or expert they trust, the patient’s own belief in the treatment can cause the patient to produce a response.
    [Show full text]
  • Observational Studies and Bias in Epidemiology
    The Young Epidemiology Scholars Program (YES) is supported by The Robert Wood Johnson Foundation and administered by the College Board. Observational Studies and Bias in Epidemiology Manuel Bayona Department of Epidemiology School of Public Health University of North Texas Fort Worth, Texas and Chris Olsen Mathematics Department George Washington High School Cedar Rapids, Iowa Observational Studies and Bias in Epidemiology Contents Lesson Plan . 3 The Logic of Inference in Science . 8 The Logic of Observational Studies and the Problem of Bias . 15 Characteristics of the Relative Risk When Random Sampling . and Not . 19 Types of Bias . 20 Selection Bias . 21 Information Bias . 23 Conclusion . 24 Take-Home, Open-Book Quiz (Student Version) . 25 Take-Home, Open-Book Quiz (Teacher’s Answer Key) . 27 In-Class Exercise (Student Version) . 30 In-Class Exercise (Teacher’s Answer Key) . 32 Bias in Epidemiologic Research (Examination) (Student Version) . 33 Bias in Epidemiologic Research (Examination with Answers) (Teacher’s Answer Key) . 35 Copyright © 2004 by College Entrance Examination Board. All rights reserved. College Board, SAT and the acorn logo are registered trademarks of the College Entrance Examination Board. Other products and services may be trademarks of their respective owners. Visit College Board on the Web: www.collegeboard.com. Copyright © 2004. All rights reserved. 2 Observational Studies and Bias in Epidemiology Lesson Plan TITLE: Observational Studies and Bias in Epidemiology SUBJECT AREA: Biology, mathematics, statistics, environmental and health sciences GOAL: To identify and appreciate the effects of bias in epidemiologic research OBJECTIVES: 1. Introduce students to the principles and methods for interpreting the results of epidemio- logic research and bias 2.
    [Show full text]
  • Design of Experiments and Data Analysis,” 2012 Reliability and Maintainability Symposium, January, 2012
    Copyright © 2012 IEEE. Reprinted, with permission, from Huairui Guo and Adamantios Mettas, “Design of Experiments and Data Analysis,” 2012 Reliability and Maintainability Symposium, January, 2012. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of ReliaSoft Corporation's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. 2012 Annual RELIABILITY and MAINTAINABILITY Symposium Design of Experiments and Data Analysis Huairui Guo, Ph. D. & Adamantios Mettas Huairui Guo, Ph.D., CPR. Adamantios Mettas, CPR ReliaSoft Corporation ReliaSoft Corporation 1450 S. Eastside Loop 1450 S. Eastside Loop Tucson, AZ 85710 USA Tucson, AZ 85710 USA e-mail: [email protected] e-mail: [email protected] Tutorial Notes © 2012 AR&MS SUMMARY & PURPOSE Design of Experiments (DOE) is one of the most useful statistical tools in product design and testing. While many organizations benefit from designed experiments, others are getting data with little useful information and wasting resources because of experiments that have not been carefully designed. Design of Experiments can be applied in many areas including but not limited to: design comparisons, variable identification, design optimization, process control and product performance prediction. Different design types in DOE have been developed for different purposes.
    [Show full text]
  • A Randomized Control Trial Evaluating the Effects of Police Body-Worn
    A randomized control trial evaluating the effects of police body-worn cameras David Yokuma,b,1,2, Anita Ravishankara,c,d,1, and Alexander Coppocke,1 aThe Lab @ DC, Office of the City Administrator, Executive Office of the Mayor, Washington, DC 20004; bThe Policy Lab, Brown University, Providence, RI 02912; cExecutive Office of the Chief of Police, Metropolitan Police Department, Washington, DC 20024; dPublic Policy and Political Science Joint PhD Program, University of Michigan, Ann Arbor, MI 48109; and eDepartment of Political Science, Yale University, New Haven, CT 06511 Edited by Susan A. Murphy, Harvard University, Cambridge, MA, and approved March 21, 2019 (received for review August 28, 2018) Police body-worn cameras (BWCs) have been widely promoted as The existing evidence on whether BWCs have the anticipated a technological mechanism to improve policing and the perceived effects on policing outcomes remains relatively limited (17–19). legitimacy of police and legal institutions, yet evidence of their Several observational studies have evaluated BWCs by compar- effectiveness is limited. To estimate the effects of BWCs, we con- ing the behavior of officers before and after the introduction of ducted a randomized controlled trial involving 2,224 Metropolitan BWCs into the police department (20, 21). Other studies com- Police Department officers in Washington, DC. Here we show pared officers who happened to wear BWCs to those without (15, that BWCs have very small and statistically insignificant effects 22, 23). The causal inferences drawn in those studies depend on on police use of force and civilian complaints, as well as other strong assumptions about whether, after statistical adjustments policing activities and judicial outcomes.
    [Show full text]
  • Randomized Controlled Trials, Development Economics and Policy Making in Developing Countries
    Randomized Controlled Trials, Development Economics and Policy Making in Developing Countries Esther Duflo Department of Economics, MIT Co-Director J-PAL [Joint work with Abhijit Banerjee and Michael Kremer] Randomized controlled trials have greatly expanded in the last two decades • Randomized controlled Trials were progressively accepted as a tool for policy evaluation in the US through many battles from the 1970s to the 1990s. • In development, the rapid growth starts after the mid 1990s – Kremer et al, studies on Kenya (1994) – PROGRESA experiment (1997) • Since 2000, the growth have been very rapid. J-PAL | THE ROLE OF RANDOMIZED EVALUATIONS IN INFORMING POLICY 2 Cameron et al (2016): RCT in development Figure 1: Number of Published RCTs 300 250 200 150 100 50 0 1975 1980 1985 1990 1995 2000 2005 2010 2015 Publication Year J-PAL | THE ROLE OF RANDOMIZED EVALUATIONS IN INFORMING POLICY 3 BREAD Affiliates doing RCT Figure 4. Fraction of BREAD Affiliates & Fellows with 1 or more RCTs 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 1980 or earlier 1981-1990 1991-2000 2001-2005 2006-today * Total Number of Fellows and Affiliates is 166. PhD Year J-PAL | THE ROLE OF RANDOMIZED EVALUATIONS IN INFORMING POLICY 4 Top Journals J-PAL | THE ROLE OF RANDOMIZED EVALUATIONS IN INFORMING POLICY 5 Many sectors, many countries J-PAL | THE ROLE OF RANDOMIZED EVALUATIONS IN INFORMING POLICY 6 Why have RCT had so much impact? • Focus on identification of causal effects (across the board) • Assessing External Validity • Observing Unobservables • Data collection • Iterative Experimentation • Unpack impacts J-PAL | THE ROLE OF RANDOMIZED EVALUATIONS IN INFORMING POLICY 7 Focus on Identification… across the board! • The key advantage of RCT was perceived to be a clear identification advantage • With RCT, since those who received a treatment are randomly selected in a relevant sample, any difference between treatment and control must be due to the treatment • Most criticisms of experiment also focus on limits to identification (imperfect randomization, attrition, etc.
    [Show full text]
  • In Cardiovascular Epidemiology in the Early 21St Century
    255 VIEWPOINT Heart: first published as 10.1136/heart.89.3.255 on 1 March 2003. Downloaded from A “natural experiment” in cardiovascular epidemiology in the early 21st century A Sekikawa, B Y Horiuchi, D Edmundowicz, H Ueshima, J D Curb, K Sutton-Tyrrell, T Okamura, T Kadowaki, A Kashiwagi, K Mitsunami, K Murata, Y Nakamura, B L Rodriguez, L H Kuller ............................................................................................................................. Heart 2003;89:255–257 Despite similar traditional risk factors, morbidity and mortality rates from coronary heart disease in western sectional study of cardiovascular disease in migrant Japanese men aged 45–69 years in and non-western cohorts remain substantially different. Hawaii, and California, and Japanese in Japan in Careful study of such cohorts may help identify novel the 1960s.5 Most of those migrated to the USA in risk factors for CHD, and contribute to the formulation of the late 19th or early 20th century, or were second generation Japanese American. By adopting new preventive strategies Americanised dietary lifestyle, the concentrations .......................................................................... of serum total cholesterol among Japanese American men in the 1960s were higher than that he term “natural experiment” is defined as: in men in Japan by almost 1.3 mmol/l. The study “Naturally occurring circumstances in which showed that the CHD mortality was significantly subsets of the population have different levels higher in Japanese American men than in men in T Japan. of exposure to a supposed causal factor, in a situ- While developed countries have witnessed a ation resembling an actual experiment where dramatic decline in CHD mortality during the human subjects would be randomly allocated to 1 20th century, it remains one of the leading causes groups.” The term is derived from the work of Dr of mortality in the western world.
    [Show full text]
  • Balanced Design Analysis of Variance
    NCSS Statistical Software NCSS.com Chapter 211 Balanced Design Analysis of Variance Introduction This procedure performs an analysis of variance on up to ten factors. The experimental design must be of the factorial type (no nested or repeated-measures factors) with no missing cells. If the data are balanced (equal-cell frequency), this procedure yields exact F-tests. If the data are not balanced, approximate F-tests are generated using the method of unweighted means (UWM). The F-ratio is used to determine statistical significance. The tests are nondirectional in that the null hypothesis specifies that all means for a specified main effect or interaction are equal and the alternative hypothesis simply states that at least one is different. Studies have shown that the properties of UWM F-tests are very good if the amount of unbalance in the cell frequencies is small. Despite that relative accuracy, you might well ask, “If the results are not always exact, why provide the method?” The answer is that the general linear models (GLM) solution (discussed in the General Linear Models chapter) sometimes requires more computer time and memory than is available. When there are several factors each with many levels, the GLM solution may not be obtainable. In these cases, UWM provides a very useful approximation. When the design is balanced, both procedures yield the same results, but the UWM method is much faster. The procedure also calculates Friedman’s two-way analysis of variance by ranks. This test is the nonparametric analog of the F-test in a randomized block design.
    [Show full text]
  • Analysis of Variance in the Modern Design of Experiments
    Analysis of Variance in the Modern Design of Experiments Richard DeLoach* NASA Langley Research Center, Hampton, Virginia, 23681 This paper is a tutorial introduction to the analysis of variance (ANOVA), intended as a reference for aerospace researchers who are being introduced to the analytical methods of the Modern Design of Experiments (MDOE), or who may have other opportunities to apply this method. One-way and two-way fixed-effects ANOVA, as well as random effects ANOVA, are illustrated in practical terms that will be familiar to most practicing aerospace researchers. I. Introduction he Modern Design of Experiments (MDOE) is an integrated system of experiment design, execution, and T analysis procedures based on industrial experiment design methods introduced at the beginning of the 20th century for various product and process improvement applications. MDOE is focused on the complexities and special requirements of aerospace ground testing. It was introduced to the experimental aeronautics community at NASA Langley Research Center in the 1990s as part of a renewed focus on quality and productivity improvement in wind tunnel testing, and has been used since then in numerous other applications as well. MDOE has been offered as a more productive, less costly, and higher quality alternative to conventional testing methods used widely in the aerospace industry, and known in the literature of experiment design as ―One Factor At a Time‖ (OFAT) testing. The basic principles of MDOE are documented in the references1-4, and some representative examples of its application in aerospace research are also provided5-16. The MDOE method provides productivity advantages over conventional testing methods by generating more information per data point.
    [Show full text]
  • DOE It Yourself Fun Science Projects Compiled by Mark J
    DOE It Yourself Fun science projects compiled by Mark J. Anderson, Principal, Stat-Ease, Inc. Give design of experiments a try! These are my favorites for doing at home or in class – in no particular order. You don’t need any unusual equipment. The details are sketchy but they should be sufficient. Use your imagination*! If you have your own favorite DOE that anyone can do, send me the details. I’ll add it to the list. ---Mark *(To maximize creativity, I encourage you to get friends and family together for some brainstorming. Use a ‘fishbone’ diagram to organize your ideas. See a helpful description by the American Society of Quality of this tool, also known as a cause-and-effect diagram, at http://asq.org/learn-about-quality/cause-analysis-tools/overview/fishbone.html.) 1. Paper Helicopter (From “George’s Column: Teaching Engineers Experimental Design with a Paper Helicopter”, Quality Engineering, 4 (3), pages 453-459, 1992, by George E. P. Box): • The diagram shows how to make a helicopter. You can experiment on paper type, length and width of various parts, and anything else you can think of. See how long you can keep it in the air.* By making use of various DOE methods, Box’s students achieved hover times of over 4 seconds from a step-ladder. *(For a video of South Dakota School of Mines and Technology engineering students dropping paper helicopters, see: www.statsmadeeasy.net/2011/05/video-of-paper-helicopter-fly-offs- at-south-dakota-school-of-mines-technology/.) • A variation on this experiment makes use of foam cups.
    [Show full text]
  • Principles of Statistical Inference
    Principles of Statistical Inference In this important book, D. R. Cox develops the key concepts of the theory of statistical inference, in particular describing and comparing the main ideas and controversies over foundational issues that have rumbled on for more than 200 years. Continuing a 60-year career of contribution to statistical thought, Professor Cox is ideally placed to give the comprehensive, balanced account of the field that is now needed. The careful comparison of frequentist and Bayesian approaches to inference allows readers to form their own opinion of the advantages and disadvantages. Two appendices give a brief historical overview and the author’s more personal assessment of the merits of different ideas. The content ranges from the traditional to the contemporary. While specific applications are not treated, the book is strongly motivated by applications across the sciences and associated technologies. The underlying mathematics is kept as elementary as feasible, though some previous knowledge of statistics is assumed. This book is for every serious user or student of statistics – in particular, for anyone wanting to understand the uncertainty inherent in conclusions from statistical analyses. Principles of Statistical Inference D.R. COX Nuffield College, Oxford CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521866736 © D. R. Cox 2006 This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.
    [Show full text]