Sample Size Calculations

MGH/MPH301 H19 Epidemiology and Biostatistics with special reference to Social Epidemiology ADNAN NOOR BALOCH SCHOOL OF PUBLIC HEALTH AND COMMUNITY MEDICINE THE SAHLGRENSKA ACADEMY 2019-09-10 MGH/MPH301 H19 Epidemiology and Biostatistics with special reference to Social Epidemiology • Schedule – canvas The computer sessions report • Teaching material – canvas • CS_1 due 11 september • Computer lab - canvas • CS_2 due 20 september • Computer session • CS_3 due 25 september – Be familiar with SPSS before • CS_4 due 2 october – Read through instructions before • CS_5 due 9 october – Due dates • CS_6 due 18 october • Examination: answering research questions by analyzing data 2 Tips on books & online resources 1. Medical statistics: A textbook for the health sci. (Campbell;Machin;Walters) E-book @ t GU-Lib 2. SPSS for Applied Sciences (E-book @ GU-Library) 3. Statistics at Square One (BMJ Publishing Group) 4. Online resource to learn SPSS 5. Online resource to learn STATA 6. Regression Methods in Biostatistics (Vittinghoff; Glidden;Shiboski; McCulloch) E-book @ GU-Library 3 LECTURE 1 A) CLASSIFICATION OF VARIABLES AND THE CHOICE OF ANALYSIS B) SAMPLE SIZE CALCULATIONS ADNAN NOOR BALOCH SCHOOL OF PUBLIC HEALTH AND COMMUNITY MEDICINE THE SAHLGRENSKA ACADEMY 2019-09-10 What is statistics? • The science that studies the methods necessary to compile, analyze and interpret data. • In research, statistical theory can be used as a help when constructing studies, collecting data and drawing conclusions. • Medical statistics deals with applications of statistics in medicine, health sciences and epidemiology. 5 Why use statistics? Population and sample • There is a variation in the results • Practically (and/or ethically) impossible to collect data from the whole population • How to deal with uncertainty in a sample? • How to use what we observe in a sample to make inferences about the population? 6 (Based on figure 1.2 in the JB) Population and sample Population Parameters Mean value: 2μ (e.g. average survival time) Variance: σ Population Inference (“an educated guess”) Sample data is used to draw conclusions about the population i.e. the sample statistics is estimates of the population parameters Sample Statistics Mean value: Sample Variance: 2̅ 7 Steps when planning a study 8 Planning (sample size determination) How many individuals should be included is a very important issue in the design (planning) of a study. Factors to consider are: • The minimum difference of interest to be detected, or the precision in estimate. • The degree of variability among the individuals. • The statistical power, i.e. the ability to detect the smallest difference of interest. 9 Variables used for different purposes Direct object of interest • Outcome variable (response/dependent): blood pressure, survival, incidence, quality of life Indirect object of interest (can affect outcome variable) • Treatment variable (predictor/independent/covariate): treatment group, dose, frequency, date of treatment • Background variables (independent variable or covariate): age, sex, clinical variables, education, etc. 10 Data collection: types of variables & Levels of measurement 11 Types of scales • Nominal scale: Name, Gender, color, brand Is the ”lowest” level. No particular order • Ordinal scale: Good-Better-Best, Age category Has a natural order • Interval scale: Temperature Has an natural order and a defined “distance” (negative, 0, positive) • Ratio scale: Weights Has an natural order, “distance” and a “Zero-value” 12 Statistical inference Statistical inference is drawing conclusion about the population based characteristics of the sample. There two types of statistical inferences: 1. Hypothesis testing (using a test/CI) 2. Estimation (confidence interval) 13 Hypothesis testing The idea is that we have a question about the population that we want to try to answer with a statistical test. A requirement of any statistical test is to formulate a null hypothesis (H0) and an alternative hypothesis (H1). 14 Hypothesis testing 1. Formulate i. Null hypothesis (H0): Often cautiously formulated; can never be proven; considered true until disproved ii. Alternative hypothesis (HA): includes everything except the null hypothesis 2. Test the hypothesis using a statistical test, which results in a p-value/confidence interval 3. If p-value below level of significance (decided a priori) reject (H0) else we do not have enough evidence to reject H0 15 Parametric and non-parametric tests The type (quantitative, qualitative) and the distribution of the outcome variable • Parametric tests: the outcome variable is, E.g., normally distributed; mean and standard deviation are parameters . You test assumptions about the parameters at population level (ex: mean) • Non-parametric tests: you don’t know (can’t assume) any distribution for the outcome variable . You test assumptions about the population independent of any distribution (ex: median or the whole distribution) 16 17 Discovering Statistics Using IBM SPSS Statistics by Andy Field 5th Edition ISBN-13:978-1-5264-1952-1 18 Type of errors True state of nature is true is true Our Do Not Correct Type-2 error decision reject decision (β) Reject Type-1 error Correct decision (α) (power) (1 - β) • Type 1 Error: The probability (α) to reject H0 when H0 is true. “False Positive” • Type 2 Error: The probability (β) to not reject H0 when H0 is false. “False Negative” • The Power of a test (1- β) is the probability to reject H0 19 when H0 is false. Significance Level The significance level (α) should be chosen before any analyses are done. It specifies the highest acceptable risk for erroneously rejecting H0 (i.e. making a type 1 error). E.g. the significance level of a 95% CI is 0.05 (1-0.95) or 5% We accept that 5% of our CIs won't cover the true value, hence lead us the a wrongful conclusion. 20 Statistical Power In a study you want as high a power as possible There are two ways to increase the power: 1. Select a higher significance level (α) 2. Collect a larger sample The problem is that: 1. You don’t usually want a higher significance level (why not?) 2. It can be expensive to collect a larger sample 21 Experimental Design Design of Experiments is the area of statistics that helps us planning experiments and how to gather data to achieve good (or optimal) inference 1. How large does a sample need to be so that a confidence interval will be no wider than a given size? 2. How large does a sample need to be so that a hypothesis test will have a low p-value if a certain alternative hypothesis is true ? 22 Experimental Design Sample size is dependent on 1. How large effects (differences) are we looking for? (d or Δ) 2. How big is the variation (std. deviation) in data? 3. Significance level, (Type-1 error) 4. Power (usually 80 %; 90 %) 5. Study design: balanced/unbalanced, cohort with 2 or more than 2 group comparison; longitudinal data; number of covariates (explanatory / independent variables), etc 23 Sample size calculation The needed sample size is calculated under the assumption that H0 is true. • You have to make assumptions about: – The size of the standard deviation. • Often based on earlier studies with the same variable. • Make the following choices in advance – The smallest effect you want to detect, d or δ. – Which method to use • e.g. t-test – Significance level, the lower it is the larger the sample • Often 5%, i.e. α = 0.05 – The intended power • Often 80%, i.e. 1-β = 0.80. 24 Sample size calculation – 2 sample mean Let say that I’m going to do a study there I want to be at least 80% sure to find any effect of size d (or larger), if I use a test with a 5 % significance level This means that for an effect (group difference) of size d, I want 1-β = 0.8 and α = 0.05. With these criteria I can , before the study starts, calculate how many participants I need (i.e. the size of n) 25 Standarized effect size One important aspect is the variability in outcome variable • E.g. If both groups similar (low std. deviation) outcome a smaller sample is needed to detect smaller effect sizes • It is a common practice to use standardized effect size ( ), which is ∆ = • Δ = . δ Anticipated difference Anticipated std deviation Campbell, M. J., Machin, D., & Walters, S. J. Medical statistics : a textbook for the health sciences. 26 Standarized effect size Standarized effect size 0.20 Small-medium An average member of intervention group had better outcome than 60% of control group member 0.50 Medium – large An average member of intervention group had better outcome than 70% of control group member 0.8 Large An average member of intervention group had better outcome than 80% of control group member Cohen J, Statistical Power Analysis for the Behavioral Science, 2nd Edition (1988) 27 Sample size calculation for mean difference We want to calculate the needed sample size for detecting a difference between two independent groups = 16 2 for 2-sided α = 0.05 and Power of∗80% (formula 14.1 on p 267*) A trial of cognitive behavioural therapy. Outcome = Hospital Anxiety and Depression scale (HADS) with values 0 (not anxious or distressed) to 21 (very anxious or distressed). A change of 2 points is clinically important. Previous published studies, Standard Deviation( ) = 4 points. = = 2 (moderate’ effect) gives m = __ patients per 4 28 group or 2__ patients in all. *Another formula We want to calculate the needed sample size for detecting a difference between two independent groups with the same standard deviation. ( ( = 2 = 2. 1− )+ 1−) 2 2 2 : the kth percentiles of the standard normal distribution (i.e. the critical δ δ value in a normal distribution ⋅ table for = k) : the assumed sample standard deviation. m: the required sample size per group.

Sample Size Calculations

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support