Basics of Biostatistics

BasicsBasics ofof BiostatisticsBiostatistics 1 BasicsBasics ofof BiostatisticsBiostatistics September, 2013 Marija J. Norušis [email protected] This work is licensed under the Creative Commons Attribution-NonCommercial- ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. Preface This book is a first attempt at nonintimidating teaching material for an introductory course in biostatistics. The emphasis is on basic concepts that are needed to understand results sections of medical publications. Examples are taken from open source journals which students can access. The focus is on problems that confront many developing nations: malaria, HIV, low birth weight infants. The book includes output from statistical software packages but, at present, is not tied to any software product that needs to be downloaded. I am grateful for any corrections, suggestions, or comments that readers may have. Future versions will be downloadable under a Creative Commons license from www.norusis.com. I am grateful to Dr. Richard Heller and Dr. Jim Todd for useful suggestions and encouragement. Marija J. Norušis [email protected] i Contents 1Summarizing and Displaying Data............................................................1 Scales of Measurement................................................................................2 Frequency Table..........................................................................................4 Pie charts.....................................................................................................6 Bar charts.....................................................................................................9 Line Charts................................................................................................15 Scatterplots................................................................................................17 Descriptive Statistics.................................................................................19 Summarizing the Insecticide Treated Net Data.........................................24 Percentiles.................................................................................................26 Standard Scores.........................................................................................31 2Evaluating Results from Samples............................................................35 What is a Population?................................................................................36 What is a Sample?.....................................................................................36 Selecting a Sample....................................................................................37 Designing Experiments.............................................................................41 Properties of Samples................................................................................45 Sampling Distribution of a Statistic..........................................................51 Evaluating a Claim....................................................................................56 The Binomial Test.....................................................................................59 3The Normal Distribution..........................................................................62 Distributions of Sample Means.................................................................66 Confidence Intervals.................................................................................69 Testing Hypotheses about Population Means............................................74 4Basics of Testing Hypotheses....................................................................78 Steps in Testing a Hypothesis....................................................................79 To Err is Statistical....................................................................................86 Statistical Power........................................................................................87 Estimating Sample Size.............................................................................89 ii 5Statistical Procedures for Testing Hypotheses .......................................95 Some Warning on the Use of Significance Tests.......................................96 Assumption, Assumptions.........................................................................97 Distributions of Test Statistics...................................................................98 Statistical Software..................................................................................100 Testing That Two Population Means Are Equal.....................................103 Analysis of Variance ..............................................................................110 Testing Hypotheses About Count Data...................................................114 Measuring Association in 2 by 2 Tables.................................................125 6Correlation and Linear Regression.......................................................133 Plotting Data............................................................................................134 Predictors of Birth Weight.......................................................................135 The Pearson Correlation Coefficient.......................................................137 Correlation Coefficients Based on Ranks...............................................145 Linear Regression Model........................................................................146 Least Squares Regression Line...............................................................149 7Statistical Models ....................................................................................159 Why Fit a Model?....................................................................................160 The Multiple Linear Regression Model..................................................161 Nominal Variables in Models..................................................................167 The Logistic Regression Model..............................................................168 Analyzing Survival Data.........................................................................179 iii 11 Summarizing and Displaying Data • What types of graphical displays are useful for summarizing data? • What are levels of measurement? • What summary statistics describe central tendency? Dispersion? • What is a standard score? Whether you're presenting the results of a complicated international drug trial or gathering data for your own clinic or hospital, you have to summarize your observations. You know you can't just list ages, blood pressures, or number of malaria nets for everyone in the study and leave it to the readers and reviewers to draw their own conclusions. You have to select appropriate charts, tables and statistical measures to convey the information that is important. There's no single best way to summarize and display data. The best way depends on the properties of the data and of the statistical measures, on your audience and on your personal tastes in charts and graphs. Charts and graphs that display only one or two variables are usually not too difficult to understand, while those that attempt to portray complex relationships, such as a map that shows the distribution and severity of dengue in different age groups over time may require quite a bit of effort to decipher. 1 Always remember that the purpose of visual displays is to make it easier for the audience to understand the results. If you're making the charts, even if you have software which can produce incredibly complicated displays, don't put more information on a chart than the human mind can absorb without overheating. Some published charts are impossible to understand without the author sitting next to you. (And it's possible that the author has relegated chart making to someone else and is embarrassed to admit that he/she doesn't quite understand it either.) When calculating summary measures make sure they are appropriate for your data. The widespread availability of statistical software makes it so easy with a couple of clicks of the mouse to calculate all kinds of statistics, whether they make sense or not. (In this chapter "statistic" refers to any number calculated from your data values.) Don't calculate an average religion or ethnicity! Scales of Measurement When you conduct a study you typically collect multiple pieces of information, called variables, for each person or case. For example you may record height, weight, gender, and admitting hospital for a group of patients. Each patient has a value for each variable. If Faramola is 60 cm tall, 60 cm. is the value of the height variable. If you are using software to analyze your data you'll have to enter the values of the variables for all of the cases into what's called a data file. How you assign values or symbols to what you are measuring is called the scale of measurement. Variables can be measured in different ways. Height can have many values, while gender is restricted to two values. Heights can be ordered from smallest to largest, while admitting hospital is just a name and can't be ordered based on the name alone. Variables are often classified into four categories based on how they are measured: 2 • Nominal variables have values that cannot be ordered in any

Load more