Methods 01 [Schreibgeschützt]
Total Page:16
File Type:pdf, Size:1020Kb
Introduction A posse ad esse Deduction The Gold-Standard of Scientific Inference Experiment/Observation Prediction (Model) confirm Hypothesis refute add discard Established Theory Alcohol Consumption and Pregnancy Theory: Alcohol is a teratogen. Hypothesis: Alcohol consumption during pregnancy influences the development of the fetal brain. Prediction: Children exposed to alcohol during pregnancy face a higher risk for behavioural problems. Experiment: Assessment and comparison of the prevalence of behavioural problems among 100 exposed and non-exposed children. What is Statistics ? Etymology from Latin statisticus ("of state affairs") Definition "The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling" American Heritage ® Dictionary What is Statistics ? (Almost) a Course Outline Summary of Experimental and Observational Data Descriptive Statistics (graphs, tables, plots etc.) Estimation of Treatment Effects Estimation Theory (mean, standard deviation, confidence interval etc.) Decision-Making on the Basis of Empirical Samples Hypothesis Testing (significance, power, p-value etc.) Modelling of the Relationship between Factors Association and Regression Analysis (correlation coefficient, linear regression etc.) Why Statistics? Medical Students Critical Evaluation of Observations consideration of randomness supplementing of common sense and gut feeling Comprehension of the Scientific Literature understanding of purpose and design assessment of scientific credibility Design and Conduct of Experiments and Surveys choice of appropriate statistical methods choice of appropriate study design correct use of formulae and software correct interpretation of results Evidence-Based Medicine "... the process of systematically finding, appraising, and using contemporaneous research findings as the basis for clinical decisions." Rosenberg W, Donald A (1995). Evidence based medicine: an approach to clinical problem solving. BMJ 310: 1122-1126 "... requires you to read the right papers at the right time and then to alter your behaviour (and, what is often more difficult, the behaviour of other people) in the light of what you have found." Greenhalgh T (1997). How to read a paper: the basics of evidence based medicine. BMJ, London Evidence-Based Medicine 1,4 ×10 6 1,2 ×10 6 1,0 ×10 6 0,8 ×10 6 0,6 ×10 6 Number or New Entries New or Number 0,4 ×10 6 0,2 ×10 6 1975 1980 1985 1990 1995 2000 2005 2010 2015 Year Variation in Populations Statistics deals with the nature and variability of a given parameter in a population. - What are the common or average values of the parameter ? - How much does the parameter vary ? - Are there any distinct subpopulations with respect to the parameter? - What is the normal range of the parameter? - Are abnormal values associated with pathology ? - Is the parameter associated with other parameters ? Population and Sample A population is the (usually wide) group of objects about which inference is being made. A sample comprises a (usually small) number of members of the population. Population and Sample Population Sampling Sample Conclusion Data Collection Population Sample Inference Parameter Statistic Female Body-Mass Index (BMI) Undernourished (WHO): BMI<18.5, Underweight (NHANES): BMI<19.1 US average 20-29 years: BMI=22.7 Year Name BMI 1984 Suzette Charles 17.7 1985 Sharlene Wells 18.2 1986 Susan Akin 16.8 1987 Kellye Cash 17.6 1988 Kaye Lani Rae Rafko 18.8 1989 Gretchen Carlson 19.1 1990 Debbye Turner 17.9 1991 Marjorie Vincent 17.8 1998 Kate Shindle 20.2 1999 Nicole Johnson 19.6 2001 Angela Perez Baraquio 20.3 2002 Katie Harman 19.5 http://www.pbs.org/wgbh/amex/missamerica/sfeature/sf_list.html Descriptive Statistics Descriptive Statistics is a useful means of summarising data in an understandable way. Graphical methods are better suited than numerical methods for identifying patterns in the data. Numerical approaches are more precise and objective. Types of Variables Qualitative Variables - nominal : classes without ordering (e.g. gender, ABO blood group) - ordinal : classes with ordering (e.g. verbal exam marks, tumor grading) Quantitative Variables - discrete : integer numbers (e.g. cell count, offspring number) - continuous : real numbers (e.g. body weight, blood pressure) Descriptive Statistics Sample Characteristics sample size n = observations xi i 1...n Qualitative Data number of classes k = absolute frequencies hi i 1...k 1 relative frequencies p = ⋅ h i n i AB0 Blood Groups k=4 0 A B AB i 1 2 3 4 hi 28 32 10 5 pi 0.373 0.427 0.133 0.067 ⋅ 100 pi 37.3 42.7 13.3 6.7 0 A B AB n=75 Data Description Measures of Location Type Measures of Location Measures of Dispersion qualitative index of qualitative mode nominal variation (IQV) qualitative mode IQV ordinal quantiles (median) interquartile range interquartile range quantiles (median) quantitative standard deviation mean median deviation Measures of Location Qualitative Data The mode of a sample is the most common value, which may not always be unambiguous 0 A B AB i 1 2 3 4 hi 28 32 10 5 pi 0.373 0.427 0.133 0.067 ⋅ 100 pi 37.3 42.7 13.3 6.7 Measures of Location Ordinal and Quantitative Data ~ The α-quantile of a sample is a value x α such that the relative frequency of smaller values is at most α, and the relative frequency of larger values is at most 1-α. ~ x .0 45 < < < < < < < < < 0.40<0.45 0.50<1-0.45=0.55 Female Body-Mass Index (BMI) α=0.40 BMI [BMI] 17.7 16.8 18.2 17.6 4 = 0.33 ≤ 0.40 16.8 17.7 12 17.6 17.8 sorting 17.9 ~ = 18.8 x .0 40 17.9 19.1 18.2 17.9 18.8 17.8 19.1 7 20.2 19.5 = 0.58 ≤ 0.60 19.6 19.6 12 20.3 20.2 19.5 20.3 Measures of Location Ordinal and Quantitative Data ~ x0.50 is called the median . ~ ~ ~ st nd rd x0.25(x0.50 , x0.75) is called the 1 (2 , 3 ) quartile Apgar Score Named in honor of one of the first pediatricians to specialise in newborn (neonatal) care, Dr. Virginia Apgar. Observed at 1, 5 and 10 minutes after birth. The total of the individual dead fully alive category scores is the Apgar 0 10 score for that scoring session. Apgar Score n=15 i 1 2 3 4 5 6 7 8 9101112131415 xi 79 4 6 9 8105109 6 8 9107 k=11 score i 1 2 3 4 5 6 78 910 hi 0 0 0 1 1 2 2 2 4 3 ~ = x0.25 6 2 (0.13) 11 (0.73) ~ = x0.50 8 6 (0.40) 7 (0.47) ~ = x0.75 9 8 (0.53) 3 (0.20) Measures of Location Ordinal and Quantitative Data Advantages of Quantiles insensitive to outliers already applicable on an ordinal scale generally return meaningful values intuitively appealing Disadvantage of Quantiles do not use all information, only the ranking not necessarily unambiguous Measures of Location Quantitative Data The mean x of a sample is the average taken over all sample values, i.e. = 1 ⋅ n x ∑ xi n i=1 Advantages of the Mean use all available information Disadvantages of the Mean meaningful only for quantitative measurements sensitive to outliers Female Body-Mass Index (BMI) Year Name BMI 1984 Suzette Charles 17.7 1985 Sharlene Wells 18.2 1986 Susan Akin 16.8 1987 Kellye Cash 17.6 1988 Kaye Lani Rae Rafko 18.8 1989 Gretchen Carlson 19.1 1990 Debbye Turner 17.9 1991 Marjorie Vincent 17.8 1998 Kate Shindle 20.2 1999 Nicole Johnson 19.6 2001 Angela Perez Baraquio 20.3 2002 Katie Harman 19.5 223.5 18.2 + 18.8 x = = 18.6 ~x = = 18.5 12 .0 50 2 Female Body-Mass Index (BMI) Year Name BMI 1984 Suzette Charles 17.7 1985 Sharlene Wells 18.2 1986 Susan Akin 16.8 1987 Kellye Cash 17.6 1988 Kaye Lani Rae Rafko 18.8 1989 Gretchen Carlson 19.1 1990 Debbye Turner 17.9 1991 Marjorie Vincent 17.8 1998 Kate Shindle 20.2 266.1 1999 Nicole Johnson 19.6 x = = 20.5 2001 Angela Perez Baraquio 20.3 13 2002 Katie Harman 19.5 1630 Venus Rubens 42.6 ~ = x .0 50 18.8 Female Body-Mass Index (BMI) The mean marks the point of gravity, or point of balance, of the data. Data Description Measures of Dispersion Type Measures of Location Measures of Dispersion qualitative index of qualitative mode nominal variation (IQV) qualitative mode IQV ordinal quantiles (median) interquartile range interquartile range quantiles (median) quantitative standard deviation mean median deviation Measures of Dispersion Quantitative Data range R=x max -xmin the range is sensitive to outliers = ~ − ~ interquartile range IQR x0.75 x0.25 Apgar Score vi 012 3456 78910 hi 00001122243 vmin =4, vmax =10 R=6 ~ = ~ = x0.25 6,x0.75 9 IQR=3 hi 1 0 0 0 1 1 2 2 2 4 3 vmin =0, vmax =10 R=10 ~ = ~ = x0.25 6,x0.75 9 IQR=3 Measures of Dispersion Quantitative Data = ~ median deviation MAD y0.50 = − ~ with yi xi x .0 50 n − 2 ∑ (xi x) standard deviation s = i=1 n −1 Female Body-Mass Index (BMI) = ~ = x 18.6 x .0 50 18.5 − 2 i xi (xi x) x[i] yi y[i] 1 17.7 0.81 16.8 0.8 0.3 2 18.2 0.16 17.6 0.3 0.3 3 16.8 3.24 17.7 1.7 0.6 4 17.6 1.00 17.8 0.9 0.6 5 18.8 0.04 17.9 0.3 0.7 6 19.1 0.25 18.2 0.6 0.8 7 17.9 0.49 18.8 0.6 0.9 8 17.8 0.64 19.1 0.7 1.0 9 20.2 2.56 19.5 1.7 1.1 10 19.6 1.00 19.6 1.1 1.7 11 20.3 2.89 20.2 1.8 1.7 12 19.5 0.81 20.3 1.0 1.8 0.8 + 0.9 13.89 MAD = = 0.85 s = = 1.12 2 11 Female Body-Mass Index (BMI) = ~ = x 20.5 x .0 50 18.8 − 2 i xi (xi x) y[i] 1 17.7 7.84 0.0 2 18.2 5.29 0.3 3 16.8 13.69 0.6 4 17.6 8.41 0.7 5 18.8 2.89 0.8 6 19.1 1.96 0.9 7 17.9 6.76 1.0 8 17.8 7.29 1.2 9 20.2 0.09 1.4 MAD = 1.00 10 19.6 0.81 1.5 11 20.3 0.04 1.7 544.48 12 19.5 1.00 2.0 s = = 6.74 13 42.6 488.41 23.8 12 Data Description Histogram Divide the range of sample values xi into k equally sized intervals ( bins ) of width (xmax -xmin )/k: (x 1, a2] (a 2, a3] ..