Statistical Inference with Paired Observations and Independent Observations in Two Samples

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Inference with Paired Observations and Independent Observations in Two Samples Statistical inference with paired observations and independent observations in two samples Benjamin Fletcher Derrick A thesis submitted in partial fulfilment of the requirements of the University of the West of England, Bristol, for the degree of Doctor of Philosophy Applied Statistics Group, Engineering Design and Mathematics, Faculty of Environment and Technology, University of the West of England, Bristol February, 2020 Abstract A frequently asked question in quantitative research is how to compare two samples that include some combination of paired observations and unpaired observations. This scenario is referred to as ‘partially overlapping samples’. Most frequently the desired comparison is that of central location. Depend- ing on the context, the research question could be a comparison of means, distributions, proportions or variances. Approaches that discard either the paired observations or the independent observations are customary. Existing approaches evoke much criticism. Approaches that make use of all of the available data are becoming more prominent. Traditional and modern ap- proaches for the analyses for each of these research questions are reviewed. Novel solutions for each of the research questions are developed and explored using simulation. Results show that proposed tests which report a direct measurable difference between two groups provide the best solutions. These solutions advance traditional methods in this area that have remained largely unchanged for over 80 years. An R package is detailed to assist users to per- form these new tests in the presence of partially overlapping samples. Acknowledgments I pay tribute to my colleagues in the Mathematics Cluster and the Applied Statistics Group at the University of the West of England, Bristol. Some intriguing and friendly characters have assisted and entertained during the process. I appreciate all of the hard working and enthusiastic students at the Univer- sity of the West of England, Bristol. Particularly my outstanding final year project students, Jack Pearce and Annalise Ruck, along with additional pub- lication co-authors; Anselma Dobson-McKittrick, Bethan Russ and Antonia Broad. Acknowledgement is given to the journal editors and peer reviewers for their valuable contributions. This includes Denis Cousineau, Claudiu Herteliu, Shlomo Sawilowsky, and numerous reviewers who remain anonymous. I give considerable thanks to the guidance of my supervisory team, Paul White and Deirdre Toher, for their encouragement and help on all matters PhD, and beyond. They have been a pleasure to work with and it is to their credit my achievement is maximised. I am grateful to my friends and family for showing an interest, whether they wanted to or not, and assisting in the proof reading. I am eternally grateful to Mum and Dad for being so supportive and proud. To Dad, for devoting your life to your family, and always believing in me. Contents List of Tables vii List of Figures ix Commonly used parameters xi Commonly used test statistics xii 1 Introduction 1 1.1 Motivation . .1 1.2 Previous research into the comparison of central location for partially overlapping samples . .7 1.2.1 Maximum likelihood estimators . .8 1.2.2 Weighted combination tests . .9 1.2.3 Tests based on a simple mean difference . 11 1.3 Aims and Objectives . 12 1.4 Declaration . 13 2 Literature review and preliminary investigation 14 2.1 History of the t-test . 14 2.2 Assumptions of the t-test . 15 2.3 Violations of the assumptions . 20 2.3.1 Normality assumption . 21 2.3.2 Within sample independence assumption . 24 2.3.3 Equal variances assumption . 25 2.4 Modifying the test to overcome the equal variances assumption 26 i 2.5 Designing a new model . 29 2.5.1 Trimmed means . 29 2.5.2 Transformations . 31 2.6 Non-parametric tests . 32 2.7 Confidence intervals . 35 2.8 Preliminary testing . 36 2.8.1 Preliminary tests for normality . 39 2.8.2 Preliminary tests for equal variances . 41 2.8.3 Significance level for preliminary tests . 43 2.9 Summary . 46 3 Proposed solutions for the comparison of means with par- tially overlapping samples 48 3.1 Derivation of solutions . 48 3.1.1 Partially overlapping samples t-test, equal variances . 49 3.1.2 Partially overlapping samples t-test, not constrained to equal variances . 50 3.1.3 Partially overlapping samples t-test applied to rank data 52 3.1.4 Partially overlapping samples t-test applied to data fol- lowing Inverse Normal Transformation . 52 3.2 Examples of application . 53 4 Methodology 63 4.1 Monte-Carlo methods . 63 4.1.1 Generating normally distributed random variates . 65 4.1.2 Generating non-normal data . 66 4.1.3 Assessing randomness . 66 4.1.4 Correlated variates . 69 4.1.5 Sample size . 70 4.1.6 Number of iterations . 71 4.2 Analysis methodology . 73 5 The comparison of means, for normally distributed data 76 5.1 Simulation parameters and test statistics . 76 ii 5.2 Type I error rates . 77 5.3 Power . 81 5.4 R package . 86 5.4.1 Help manual . 87 5.4.2 Further application . 90 5.5 The performance of the partially overlapping samples t-tests at the limits. 91 5.6 Summary . 98 6 The comparison of two groups, for non-normally distributed data 100 6.1 Simulation parameters and test statistics . 100 6.2 Samples taken from distributions of the same shape . 102 6.3 Samples taken from Normal distributions with unequal variance107 6.4 Samples taken from distributions of differing functional form . 109 6.5 Summary . 110 7 The impact of outliers in the comparison of two samples 111 7.1 Introduction to the outlier debate . 111 7.2 Robustness of tests in the presence of an aberrant observation 113 7.2.1 Independent samples simulation design . 114 7.2.2 Paired samples simulation design . 118 7.2.3 Partially overlapping samples simulation design . 123 7.3 Robustness in the presence of multiple outliers . 127 7.4 Summary . 129 8 The comparison of two samples, for ordinal data 131 8.1 Background . 131 8.2 Example . 136 8.3 Methodology . 137 8.4 Results: Type I error rates . 139 8.5 Results: Power . 144 8.6 Summary . 147 iii 9 The comparison of two samples, with dichotomous data 148 9.1 Current strategies for comparing proportions . 148 9.2 Definition of standard test statistics . 150 9.2.1 Discarding paired observations . 151 9.2.2 Discarding unpaired observations . 152 9.2.3 Combination of the independent and paired tests using all of the available data . 152 9.3 Definition of alternative test statistics using of all of the avail- able data . 153 9.3.1 Proposals using the phi correlation coefficient or the tetrachoric correlation coefficient. 153 9.3.2 Proposal by Choi and Stablein (1982) . 157 9.3.3 Proposals based on the formation of a new correlation coefficient. 157 9.4 Worked example . 158 9.5 Simulation design and results . 160 9.5.1 Simulation design . 160 9.5.2 Comparison of correlation coefficients . 161 9.5.3 Type I error rates . 161 9.5.4 Power . 165 9.5.5 Formation of confidence intervals . 166 9.6 Comparison to t-test solution . 167 9.7 R Package . 168 9.7.1 Help manual . 168 9.7.2 Further application . 171 9.8 Summary . 171 10 The comparison of variances 173 10.1 Background . 173 10.2 Example . 178 10.3 Methodology . 178 10.4 Results . 179 10.5 Summary . 183 iv 11 Conclusions and Further Work 184 11.1 Conclusion . 184 11.2 Further work . 187 11.2.1 Further R releases . 187 11.2.2 Extension to Parallel Randomised Controlled Trials . 187 11.2.3 Multivariate scenario . 190 11.2.4 Recent advances . 190 11.3 Reflection . 192 11.4 Summary . 193 References 195 Appendix 1: Testimonials 217 Appendix 2: Outputs 220 P1 Derrick, B. et al. (2015). “Test statistics for comparing two proportions with partially overlapping samples”. Journal of Applied Quantitative Methods 10 (3), pp. 1–14 . 223 P2 Derrick, B., Toher, D., and White, P. (2016). “Why Welch’s test is Type I error robust”. The Quantitative Methods in Psychology 12 (1), pp. 30–38 . 238 P3 Derrick, B. et al. (2017a). “Test statistics for the comparison of means for two samples which include both paired observa- tions and independent observations.” Journal of Modern Ap- plied Statistical Methods 16 (1), pp. 137–157 . 248 P4 Derrick, B., Toher, D., and White, P. (2017). “How to com- pare the means of two samples that include paired observa- tions and independent observations: A companion to Derrick, Russ, Toher and White (2017)”. The Quantitative Methods in Psychology 13 (2), pp. 120–126 . 272 P5 Derrick, B. et al. (2017b). “The impact of an extreme observa- tion in a paired samples design”. metodološki zvezki-Advances in Methodology and Statistics 14 . 280 v P6 Derrick, B. et al. (2018). “Tests for equality of variances between two samples which contain both paired observations and inde- pendent observations”. Journal of Applied Quantitative Meth- ods 13 (2), pp. 36–47 . 298 P7 Derrick, B., White, P., and Toher, D. (in press). “Parametric and non-parametric tests for the comparison of two samples which both include paired and unpaired observations”. Jour- nal of Modern Applied Statistical Methods ......... 311 P8 Pearce, J. and Derrick, B. (2019). “Preliminary testing: the devil of statistics?” Reinvention: an International Journal of Undergraduate Research 12 . 335 P9 Derrick, B. and White, P. (2017). “Comparing two samples from an individual Likert question”. International Journal of Mathematics and Statistics 18 (3), pp. 1–13 . 352 P10 Derrick, B. and White, P. (2018). “Methods for comparing the responses from a Likert question, with paired observations and independent observations in each of two samples”.
Recommended publications
  • Appendix a Basic Statistical Concepts for Sensory Evaluation
    Appendix A Basic Statistical Concepts for Sensory Evaluation Contents This chapter provides a quick introduction to statistics used for sensory evaluation data including measures of A.1 Introduction ................ 473 central tendency and dispersion. The logic of statisti- A.2 Basic Statistical Concepts .......... 474 cal hypothesis testing is introduced. Simple tests on A.2.1 Data Description ........... 475 pairs of means (the t-tests) are described with worked A.2.2 Population Statistics ......... 476 examples. The meaning of a p-value is reviewed. A.3 Hypothesis Testing and Statistical Inference .................. 478 A.3.1 The Confidence Interval ........ 478 A.3.2 Hypothesis Testing .......... 478 A.3.3 A Worked Example .......... 479 A.1 Introduction A.3.4 A Few More Important Concepts .... 480 A.3.5 Decision Errors ............ 482 A.4 Variations of the t-Test ............ 482 The main body of this book has been concerned with A.4.1 The Sensitivity of the Dependent t-Test for using good sensory test methods that can generate Sensory Data ............. 484 quality data in well-designed and well-executed stud- A.5 Summary: Statistical Hypothesis Testing ... 485 A.6 Postscript: What p-Values Signify and What ies. Now we turn to summarize the applications of They Do Not ................ 485 statistics to sensory data analysis. Although statistics A.7 Statistical Glossary ............. 486 are a necessary part of sensory research, the sensory References .................... 487 scientist would do well to keep in mind O’Mahony’s admonishment: statistical analysis, no matter how clever, cannot be used to save a poor experiment. The techniques of statistical analysis, do however, It is important when taking a sample or designing an serve several useful purposes, mainly in the efficient experiment to remember that no matter how powerful summarization of data and in allowing the sensory the statistics used, the inferences made from a sample scientist to make reasonable conclusions from the are only as good as the data in that sample.
    [Show full text]
  • Towards a Fully Automated Extraction and Interpretation of Tabular Data Using Machine Learning
    UPTEC F 19050 Examensarbete 30 hp August 2019 Towards a fully automated extraction and interpretation of tabular data using machine learning Per Hedbrant Per Hedbrant Master Thesis in Engineering Physics Department of Engineering Sciences Uppsala University Sweden Abstract Towards a fully automated extraction and interpretation of tabular data using machine learning Per Hedbrant Teknisk- naturvetenskaplig fakultet UTH-enheten Motivation A challenge for researchers at CBCS is the ability to efficiently manage the Besöksadress: different data formats that frequently are changed. Significant amount of time is Ångströmlaboratoriet Lägerhyddsvägen 1 spent on manual pre-processing, converting from one format to another. There are Hus 4, Plan 0 currently no solutions that uses pattern recognition to locate and automatically recognise data structures in a spreadsheet. Postadress: Box 536 751 21 Uppsala Problem Definition The desired solution is to build a self-learning Software as-a-Service (SaaS) for Telefon: automated recognition and loading of data stored in arbitrary formats. The aim of 018 – 471 30 03 this study is three-folded: A) Investigate if unsupervised machine learning Telefax: methods can be used to label different types of cells in spreadsheets. B) 018 – 471 30 00 Investigate if a hypothesis-generating algorithm can be used to label different types of cells in spreadsheets. C) Advise on choices of architecture and Hemsida: technologies for the SaaS solution. http://www.teknat.uu.se/student Method A pre-processing framework is built that can read and pre-process any type of spreadsheet into a feature matrix. Different datasets are read and clustered. An investigation on the usefulness of reducing the dimensionality is also done.
    [Show full text]
  • Master Thesis
    Structural Complexity of Manufacturing Systems Layout by Valeria Betzabe Espinoza Vega A Thesis Submitted to the Faculty of Graduate Studies through Industrial and Manufacturing Systems Engineering in Partial Fulfillment of the Requirements for the Degree of Master of Applied Science at the University of Windsor Windsor, Ontario, Canada © 2012 Valeria Espinoza Library and Archives Bibliothèque et Canada Archives Canada Published Heritage Direction du Branch Patrimoine de l'édition 395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 Canada Canada Your file Votre référence ISBN: 978-0-494-76377-3 Our file Notre référence ISBN: 978-0-494-76377-3 NOTICE: AVIS: The author has granted a non- L'auteur a accordé une licence non exclusive exclusive license allowing Library and permettant à la Bibliothèque et Archives Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par télécommunication ou par l'Internet, prêter, telecommunication or on the Internet, distribuer et vendre des thèses partout dans le loan, distrbute and sell theses monde, à des fins commerciales ou autres, sur worldwide, for commercial or non- support microforme, papier, électronique et/ou commercial purposes, in microform, autres formats. paper, electronic and/or any other formats. The author retains copyright L'auteur conserve la propriété du droit d'auteur ownership and moral rights in this et des droits moraux qui protege cette thèse. Ni thesis. Neither the thesis nor la thèse ni des extraits substantiels de celle-ci substantial extracts from it may be ne doivent être imprimés ou autrement printed or otherwise reproduced reproduits sans son autorisation.
    [Show full text]
  • Kwame Nkrumah University of Science and Technology, Kumasi
    KWAME NKRUMAH UNIVERSITY OF SCIENCE AND TECHNOLOGY, KUMASI, GHANA Assessing the Social Impacts of Illegal Gold Mining Activities at Dunkwa-On-Offin by Judith Selassie Garr (B.A, Social Science) A Thesis submitted to the Department of Building Technology, College of Art and Built Environment in partial fulfilment of the requirement for a degree of MASTER OF SCIENCE NOVEMBER, 2018 DECLARATION I hereby declare that this work is the result of my own original research and this thesis has neither in whole nor in part been prescribed by another degree elsewhere. References to other people’s work have been duly cited. STUDENT: JUDITH S. GARR (PG1150417) Signature: ........................................................... Date: .................................................................. Certified by SUPERVISOR: PROF. EDWARD BADU Signature: ........................................................... Date: ................................................................... Certified by THE HEAD OF DEPARTMENT: PROF. B. K. BAIDEN Signature: ........................................................... Date: ................................................................... i ABSTRACT Mining activities are undertaken in many parts of the world where mineral deposits are found. In developing nations such as Ghana, the activity is done both legally and illegally, often with very little or no supervision, hence much damage is done to the water bodies where the activities are carried out. This study sought to assess the social impacts of illegal gold mining activities at Dunkwa-On-Offin, the capital town of Upper Denkyira East Municipality in the Central Region of Ghana. The main objectives of the research are to identify factors that trigger illegal mining; to identify social effects of illegal gold mining activities on inhabitants of Dunkwa-on-Offin; and to suggest effective ways in curbing illegal mining activities. Based on the approach to data collection, this study adopts both the quantitative and qualitative approach.
    [Show full text]
  • Outlier Impact and Accommodation on Power Hongjing Liao Beijing Foreign Studies University, [email protected]
    Journal of Modern Applied Statistical Methods Volume 16 | Issue 1 Article 15 5-1-2017 Outlier Impact and Accommodation on Power Hongjing Liao Beijing Foreign Studies University, [email protected] Yanju Li Western Carolina University, Cullowhee, NC, [email protected] Gordon P. Brooks Ohio University, [email protected] Follow this and additional works at: http://digitalcommons.wayne.edu/jmasm Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theory Commons Recommended Citation Liao, H., Yanju, Li., & Brooks, G. P. (2017). Outlier impact and accommodation on power. Journal of Modern Applied Statistical Methods, 16(1), 261-278. doi: 10.22237/jmasm/1493597640 This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState. Outlier Impact and Accommodation on Power Cover Page Footnote This research was supported by School of English for Specific urP poses, Beijing Foreign Studies University, grant ZJ1513 This regular article is available in Journal of Modern Applied Statistical Methods: http://digitalcommons.wayne.edu/jmasm/vol16/ iss1/15 Journal of Modern Applied Statistical Methods Copyright © 2017 JMASM, Inc. May 2017, Vol. 16, No. 1, 261-278. ISSN 1538 − 9472 doi: 10.22237/jmasm/1493597640 Outlier Impact and Accommodation on Power Hongjing Liao Yanju Li Gordon P. Brooks Beijing Foreign Studies University Western Carolina University Ohio University Beijing, China Cullowhee, NC Athens, OH The outliers’ influence on power rates in ANOVA and Welch tests at various conditions was examined and compared with the effectiveness of nonparametric methods and Winsorizing in minimizing the impact of outliers.
    [Show full text]
  • One Sample T Test
    Copyright c October 17, 2020 by NEH One Sample t Test Nathaniel E. Helwig University of Minnesota 1 How Beer Changed the World William Sealy Gosset was a chemist and statistician who was employed as the Head Exper- imental Brewer at Guinness in the early 1900s. As a part of his job, Gosset was responsible for taking samples of beer and performing various analyses to ensure that the beer was of the proper composition and quality. While at Guinness, Gosset taught himself experimental design and statistical analysis (which were new and exciting fields at the time), and he even spent some time in 1906{1907 studying in Karl Pearson's laboratory. Gosset's work often involved collecting a small number of samples, and testing hypotheses about the mean of the population from which the samples were drawn. For example, in the brewery, Gosset may want to test whether the population mean amount of some ingredient (e.g., barley) was equal to the intended value. And, on the farm, Gosset may want to test how different farming techniques and growing conditions affect the barley yields. Through this work, Gos- set noticed that the normal distribution was not adequate for testing hypotheses about the mean in small samples of data with an unknown variance. 2 2 1 Pn As a reminder, if X ∼ N(µ, σ ), thenx ¯ ∼ N(µ, σ =n) wherex ¯ = n i=1 xi, which implies x¯ − µ Z = p ∼ N(0; 1) σ= n i.e., the Z test statistic follows a standard normal distribution. However, in practice, the 2 2 1 Pn 2 population variance σ is often unknown, so the sample variance s = n−1 i=1(xi − x¯) must be used in its place.
    [Show full text]
  • Multivariate Process Control with Radar Charts
    SEMATECH 1996 Statistical Methods Symposium, San Antonio Multivariate Process Control with Radar Charts Dave Trindade, Ph.D. Senior Fellow, Applied Statistics AMD, Sunnyvale, CA Introduction ! Multivariate statistical process control ! Monitoring with separate X control charts ! Hotelling T2 statistic ! Radar or web plots ! Microsoft EXCEL capabilities – Matrix manipulation – Graphical Outline of Presentation ! Example of process with two quality characteristics per sample ! Univariate and multivariate considerations ! Calculations and graphical analysis in EXCEL ! Implications Vocabulary ! Multivariate ! Matrix representation ! Hotelling T2 statistic ! Radar or web plots ! Subgroups Vs. individual observations Multivariate SPC ! Consider a process in which two quality characteristics (p = 2) are measured on each of four samples (n = 4) within a subgroup. ! Data on twenty subgroups are available (k = 20) ! Data is from Thomas P. Ryan, Statistical Methods for Quality Improvement Sample Data for Each Subgroup Subgroup Number First Variable X Second Variable X 1 7284794923302810 2 5687334214318 9 3 5573226013226 16 4 44 80 54 74 9 28 15 25 5 9726485836101415 6 8389916230353618 7 4766535812181416 8 8850846931113019 9 5747414614108 10 10 26 39 52 48 7 11 35 30 11 46 27 63 34 10 8 19 9 12 49 62 78 87 11 20 27 31 13 71 63 82 55 22 16 31 15 14 71 58 69 70 21 19 17 20 15 67 69 70 94 18 19 18 35 16 55 63 72 49 15 16 20 12 17 49 51 55 76 13 14 16 26 18 72 80 61 59 22 28 18 17 19 61 74 62 57 19 20 16 14 20 35 38 41 46 10 11 13 16 Multivariate observations are (72,23), (84,30), …, (46, 16).
    [Show full text]
  • Assessing the Environmental Adaptation of Wildlife And
    Assessing the Environmental Adaptation of Wildlife and Production Animals Production and Wildlife of Adaptation Assessing Environmental the Assessing the Environmental Adaptation of Wildlife and • Edward Narayan Edward • Production Animals Applications of Physiological Indices and Welfare Assessment Tools Edited by Edward Narayan Printed Edition of the Special Issue Published in Animals www.mdpi.com/journal/animals Assessing the Environmental Adaptation of Wildlife and Production Animals: Applications of Physiological Indices and Welfare Assessment Tools Assessing the Environmental Adaptation of Wildlife and Production Animals: Applications of Physiological Indices and Welfare Assessment Tools Editor Edward Narayan MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin Editor Edward Narayan The University of Queensland Australia Editorial Office MDPI St. Alban-Anlage 66 4052 Basel, Switzerland This is a reprint of articles from the Special Issue published online in the open access journal Animals (ISSN 2076-2615) (available at: https://www.mdpi.com/journal/animals/special issues/ environmental adaptation). For citation purposes, cite each article independently as indicated on the article page online and as indicated below: LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. Journal Name Year, Volume Number, Page Range. ISBN 978-3-0365-0142-0 (Hbk) ISBN 978-3-0365-0143-7 (PDF) © 2021 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher areproperly credited, which ensures maximum dissemination and a wider impact of our publications. The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.
    [Show full text]
  • General Linear Models - Part II
    Introduction to General and Generalized Linear Models General Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby October 2010 Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 1 / 32 Today Test for model reduction Type I/III SSQ Collinearity Inference on individual parameters Confidence intervals Prediction intervals Residual analysis Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 2 / 32 Tests for model reduction Assume that a rather comprehensive model (a sufficient model) H1 has been formulated. Initial investigation has demonstrated that at least some of the terms in the model are needed to explain the variation in the response. The next step is to investigate whether the model may be reduced to a simpler model (corresponding to a smaller subspace),. That is we need to test whether all the terms are necessary. Henrik Madsen Poul Thyregod (IMM-DTU) Chapman & Hall October 2010 3 / 32 Successive testing, type I partition Sometimes the practical problem to be solved by itself suggests a chain of hypothesis, one being a sub-hypothesis of the other. In other cases, the statistician will establish the chain using the general rule that more complicated terms (e.g. interactions) should be removed before simpler terms. In the case of a classical GLM, such a chain of hypotheses n corresponds to a sequence of linear parameter-spaces, Ωi ⊂ R , one being a subspace of the other. n R ⊆ ΩM ⊂ ::: ⊂ Ω2 ⊂ Ω1 ⊂ R ; where Hi : µ 2
    [Show full text]
  • Regularization of Case-Specific Parameters for Robustness And
    Regularization of Case-Specific Parameters for Robustness and Efficiency Yoonkyung Lee, Steven N. MacEachern, and Yoonsuh Jung The Ohio State University, USA Summary. Regularization methods allow one to handle a variety of inferential problems where there are more covariates than cases. This allows one to consider a potentially enormous number of covari- ates for a problem. We exploit the power of these techniques, supersaturating models by augmenting the “natural” covariates in the problem with an additional indicator for each case in the data set. We attach a penalty term for these case-specific indicators which is designed to produce a desired effect. For regression methods with squared error loss, an ℓ1 penalty produces a regression which is robust to outliers and high leverage cases; for quantile regression methods, an ℓ2 penalty decreases the variance of the fit enough to overcome an increase in bias. The paradigm thus allows us to robustify procedures which lack robustness and to increase the efficiency of procedures which are robust. We provide a general framework for the inclusion of case-specific parameters in regularization problems, describing the impact on the effective loss for a variety of regression and classification problems. We outline a computational strategy by which existing software can be modified to solve the augmented regularization problem, providing conditions under which such modification will converge to the opti- mum solution. We illustrate the benefits of including case-specific parameters in the context of mean regression and quantile regression through analysis of NHANES and linguistic data sets. Keywords: Case indicator; Large margin classifier; LASSO; Leverage point; Outlier; Penal- ized method; Quantile regression 1.
    [Show full text]
  • Statistical Analysis Plan Revisions Version 1.0 – 16 October 2017
    Risankizumab (ABBV-066) M16-002 (BI 1311.5) – Statistical Analysis Plan Revisions Version 1.0 – 16 October 2017 3.0 Introduction This document is an explanation of differences between the analysis performed after final database lock and those detailed in the SAP. The changes include: ● Section 6.4: Updates to the visit window for mTSS. Rationale: Updated window more accurately encompasses observations appropriate to label as baseline. ● Section 6.5: mTSS and PsAMRIS removed from 6.5 Missing Data Handling. Rationale: No subject level imputation will be performed for continuous imaging data. ● Section 7.1: Removed statistical testing between placebo arm and the mixed two arms at baseline. Rationale: No statistical testing for baseline characteristics. ● Section 9.0: Planned number of visits when injection planned has been capped at 5. Rationale: No subject should have more than 5 planned visits. ● Section 10.1: Table 12 has been updated. Rationale: Some efficacy endpoints have been added and some analyses have changed. ● Section 10.6 List of further efficacy endpoints has been updated. Rationale: Some efficacy endpoints have been added and some analyses have changed ● Section 10.9.10: The addition of a PASI LOCF sensitivity analysis. Rationale: Sensitivity analysis added due to the presence of PASI missing data. ● Section 10.9.12: Updates to what defines a dactylic digit. Rationale: Added a missing component of the definition of dactylitis. ● Section 10.9.20: Updates to the scoring and missing data rules for mTSS. Rationale: Two separate methods of analysis will be performed for mTSS. 2 Risankizumab (ABBV-066) M16-002 (BI 1311.5) – Statistical Analysis Plan Revisions Version 1.0 – 16 October 2017 ● Section 10.9.21: Updated PsAMRIS analysis to be descriptive statistics only.
    [Show full text]
  • Security Systems Services World Report
    Security Systems Services World Report established in 1974, and a brand since 1981. www.datagroup.org Security Systems Services World Report Database Ref: 56162 This database is updated monthly. Security Systems Services World Report SECURITY SYSTEMS SERVICES WORLD REPORT The Security systems services Report has the following information. The base report has 59 chapters, plus the Excel spreadsheets & Access databases specified. This research provides World Data on Security systems services. The report is available in several Editions and Parts and the contents and cost of each part is shown below. The Client can choose the Edition required; and subsequently any Parts that are required from the After-Sales Service. Contents Description ....................................................................................................................................... 5 REPORT EDITIONS ........................................................................................................................... 6 World Report ....................................................................................................................................... 6 Regional Report ................................................................................................................................... 6 Country Report .................................................................................................................................... 6 Town & Country Report ......................................................................................................................
    [Show full text]