Statistical Inference with Paired Observations and Independent Observations in Two Samples
Total Page:16
File Type:pdf, Size:1020Kb
Statistical inference with paired observations and independent observations in two samples Benjamin Fletcher Derrick A thesis submitted in partial fulfilment of the requirements of the University of the West of England, Bristol, for the degree of Doctor of Philosophy Applied Statistics Group, Engineering Design and Mathematics, Faculty of Environment and Technology, University of the West of England, Bristol February, 2020 Abstract A frequently asked question in quantitative research is how to compare two samples that include some combination of paired observations and unpaired observations. This scenario is referred to as ‘partially overlapping samples’. Most frequently the desired comparison is that of central location. Depend- ing on the context, the research question could be a comparison of means, distributions, proportions or variances. Approaches that discard either the paired observations or the independent observations are customary. Existing approaches evoke much criticism. Approaches that make use of all of the available data are becoming more prominent. Traditional and modern ap- proaches for the analyses for each of these research questions are reviewed. Novel solutions for each of the research questions are developed and explored using simulation. Results show that proposed tests which report a direct measurable difference between two groups provide the best solutions. These solutions advance traditional methods in this area that have remained largely unchanged for over 80 years. An R package is detailed to assist users to per- form these new tests in the presence of partially overlapping samples. Acknowledgments I pay tribute to my colleagues in the Mathematics Cluster and the Applied Statistics Group at the University of the West of England, Bristol. Some intriguing and friendly characters have assisted and entertained during the process. I appreciate all of the hard working and enthusiastic students at the Univer- sity of the West of England, Bristol. Particularly my outstanding final year project students, Jack Pearce and Annalise Ruck, along with additional pub- lication co-authors; Anselma Dobson-McKittrick, Bethan Russ and Antonia Broad. Acknowledgement is given to the journal editors and peer reviewers for their valuable contributions. This includes Denis Cousineau, Claudiu Herteliu, Shlomo Sawilowsky, and numerous reviewers who remain anonymous. I give considerable thanks to the guidance of my supervisory team, Paul White and Deirdre Toher, for their encouragement and help on all matters PhD, and beyond. They have been a pleasure to work with and it is to their credit my achievement is maximised. I am grateful to my friends and family for showing an interest, whether they wanted to or not, and assisting in the proof reading. I am eternally grateful to Mum and Dad for being so supportive and proud. To Dad, for devoting your life to your family, and always believing in me. Contents List of Tables vii List of Figures ix Commonly used parameters xi Commonly used test statistics xii 1 Introduction 1 1.1 Motivation . .1 1.2 Previous research into the comparison of central location for partially overlapping samples . .7 1.2.1 Maximum likelihood estimators . .8 1.2.2 Weighted combination tests . .9 1.2.3 Tests based on a simple mean difference . 11 1.3 Aims and Objectives . 12 1.4 Declaration . 13 2 Literature review and preliminary investigation 14 2.1 History of the t-test . 14 2.2 Assumptions of the t-test . 15 2.3 Violations of the assumptions . 20 2.3.1 Normality assumption . 21 2.3.2 Within sample independence assumption . 24 2.3.3 Equal variances assumption . 25 2.4 Modifying the test to overcome the equal variances assumption 26 i 2.5 Designing a new model . 29 2.5.1 Trimmed means . 29 2.5.2 Transformations . 31 2.6 Non-parametric tests . 32 2.7 Confidence intervals . 35 2.8 Preliminary testing . 36 2.8.1 Preliminary tests for normality . 39 2.8.2 Preliminary tests for equal variances . 41 2.8.3 Significance level for preliminary tests . 43 2.9 Summary . 46 3 Proposed solutions for the comparison of means with par- tially overlapping samples 48 3.1 Derivation of solutions . 48 3.1.1 Partially overlapping samples t-test, equal variances . 49 3.1.2 Partially overlapping samples t-test, not constrained to equal variances . 50 3.1.3 Partially overlapping samples t-test applied to rank data 52 3.1.4 Partially overlapping samples t-test applied to data fol- lowing Inverse Normal Transformation . 52 3.2 Examples of application . 53 4 Methodology 63 4.1 Monte-Carlo methods . 63 4.1.1 Generating normally distributed random variates . 65 4.1.2 Generating non-normal data . 66 4.1.3 Assessing randomness . 66 4.1.4 Correlated variates . 69 4.1.5 Sample size . 70 4.1.6 Number of iterations . 71 4.2 Analysis methodology . 73 5 The comparison of means, for normally distributed data 76 5.1 Simulation parameters and test statistics . 76 ii 5.2 Type I error rates . 77 5.3 Power . 81 5.4 R package . 86 5.4.1 Help manual . 87 5.4.2 Further application . 90 5.5 The performance of the partially overlapping samples t-tests at the limits. 91 5.6 Summary . 98 6 The comparison of two groups, for non-normally distributed data 100 6.1 Simulation parameters and test statistics . 100 6.2 Samples taken from distributions of the same shape . 102 6.3 Samples taken from Normal distributions with unequal variance107 6.4 Samples taken from distributions of differing functional form . 109 6.5 Summary . 110 7 The impact of outliers in the comparison of two samples 111 7.1 Introduction to the outlier debate . 111 7.2 Robustness of tests in the presence of an aberrant observation 113 7.2.1 Independent samples simulation design . 114 7.2.2 Paired samples simulation design . 118 7.2.3 Partially overlapping samples simulation design . 123 7.3 Robustness in the presence of multiple outliers . 127 7.4 Summary . 129 8 The comparison of two samples, for ordinal data 131 8.1 Background . 131 8.2 Example . 136 8.3 Methodology . 137 8.4 Results: Type I error rates . 139 8.5 Results: Power . 144 8.6 Summary . 147 iii 9 The comparison of two samples, with dichotomous data 148 9.1 Current strategies for comparing proportions . 148 9.2 Definition of standard test statistics . 150 9.2.1 Discarding paired observations . 151 9.2.2 Discarding unpaired observations . 152 9.2.3 Combination of the independent and paired tests using all of the available data . 152 9.3 Definition of alternative test statistics using of all of the avail- able data . 153 9.3.1 Proposals using the phi correlation coefficient or the tetrachoric correlation coefficient. 153 9.3.2 Proposal by Choi and Stablein (1982) . 157 9.3.3 Proposals based on the formation of a new correlation coefficient. 157 9.4 Worked example . 158 9.5 Simulation design and results . 160 9.5.1 Simulation design . 160 9.5.2 Comparison of correlation coefficients . 161 9.5.3 Type I error rates . 161 9.5.4 Power . 165 9.5.5 Formation of confidence intervals . 166 9.6 Comparison to t-test solution . 167 9.7 R Package . 168 9.7.1 Help manual . 168 9.7.2 Further application . 171 9.8 Summary . 171 10 The comparison of variances 173 10.1 Background . 173 10.2 Example . 178 10.3 Methodology . 178 10.4 Results . 179 10.5 Summary . 183 iv 11 Conclusions and Further Work 184 11.1 Conclusion . 184 11.2 Further work . 187 11.2.1 Further R releases . 187 11.2.2 Extension to Parallel Randomised Controlled Trials . 187 11.2.3 Multivariate scenario . 190 11.2.4 Recent advances . 190 11.3 Reflection . 192 11.4 Summary . 193 References 195 Appendix 1: Testimonials 217 Appendix 2: Outputs 220 P1 Derrick, B. et al. (2015). “Test statistics for comparing two proportions with partially overlapping samples”. Journal of Applied Quantitative Methods 10 (3), pp. 1–14 . 223 P2 Derrick, B., Toher, D., and White, P. (2016). “Why Welch’s test is Type I error robust”. The Quantitative Methods in Psychology 12 (1), pp. 30–38 . 238 P3 Derrick, B. et al. (2017a). “Test statistics for the comparison of means for two samples which include both paired observa- tions and independent observations.” Journal of Modern Ap- plied Statistical Methods 16 (1), pp. 137–157 . 248 P4 Derrick, B., Toher, D., and White, P. (2017). “How to com- pare the means of two samples that include paired observa- tions and independent observations: A companion to Derrick, Russ, Toher and White (2017)”. The Quantitative Methods in Psychology 13 (2), pp. 120–126 . 272 P5 Derrick, B. et al. (2017b). “The impact of an extreme observa- tion in a paired samples design”. metodološki zvezki-Advances in Methodology and Statistics 14 . 280 v P6 Derrick, B. et al. (2018). “Tests for equality of variances between two samples which contain both paired observations and inde- pendent observations”. Journal of Applied Quantitative Meth- ods 13 (2), pp. 36–47 . 298 P7 Derrick, B., White, P., and Toher, D. (in press). “Parametric and non-parametric tests for the comparison of two samples which both include paired and unpaired observations”. Jour- nal of Modern Applied Statistical Methods ......... 311 P8 Pearce, J. and Derrick, B. (2019). “Preliminary testing: the devil of statistics?” Reinvention: an International Journal of Undergraduate Research 12 . 335 P9 Derrick, B. and White, P. (2017). “Comparing two samples from an individual Likert question”. International Journal of Mathematics and Statistics 18 (3), pp. 1–13 . 352 P10 Derrick, B. and White, P. (2018). “Methods for comparing the responses from a Likert question, with paired observations and independent observations in each of two samples”.