Robust Methods in Biostatistics
Total Page:16
File Type:pdf, Size:1020Kb
Robust Methods in Biostatistics Stephane Heritier The George Institute for International Health, University of Sydney, Australia Eva Cantoni Department of Econometrics, University of Geneva, Switzerland Samuel Copt Merck Serono International, Geneva, Switzerland Maria-Pia Victoria-Feser HEC Section, University of Geneva, Switzerland A John Wiley and Sons, Ltd, Publication Robust Methods in Biostatistics WILEY SERIES IN PROBABILITY AND STATISTICS Established by WALTER A. SHEWHART and SAMUEL S. WILKS Editors David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Iain M. Johnstone, Geert Molenberghs, David W. Scott, Adrian F. M. Smith, Ruey S. Tsay, Sanford Weisberg, Harvey Goldstein. Editors Emeriti Vic Barnett, J. Stuart Hunter, Jozef L. Teugels A complete list of the titles in this series appears at the end of this volume. Robust Methods in Biostatistics Stephane Heritier The George Institute for International Health, University of Sydney, Australia Eva Cantoni Department of Econometrics, University of Geneva, Switzerland Samuel Copt Merck Serono International, Geneva, Switzerland Maria-Pia Victoria-Feser HEC Section, University of Geneva, Switzerland A John Wiley and Sons, Ltd, Publication This edition first published 2009 c 2009 John Wiley & Sons Ltd Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Library of Congress Cataloging-in-Publication Data Robust methods in biostatistics / Stephane Heritier . [et al.]. p. cm. Includes bibliographical references and index. ISBN 978-0-470-02726-4 (cloth) 1. Biometry–Statistical methods. I. Heritier, Stephane. [DNLM: 1. Biometry–methods. WA 950 R667 2009] QH323.5.R615 2009 570.1’5195–dc22 2009008863 A catalogue record for this book is available from the British Library. ISBN 9780470027264 Set in 10/12pt Times by Sunrise Setting Ltd, Torquay, UK. Printed in Great Britain by CPI Antony Rowe, Chippenham, Wiltshire. To Anna, Olivier, Cassandre, Oriane, Sonia, Johannes, Véronique, Sébastien and Raphaël, who contributed in their ways. Contents Preface xiii Acknowledgments xv 1 Introduction 1 1.1WhatisRobustStatistics?...................... 1 1.2AgainstWhatisRobustStatisticsRobust?.............. 3 1.3 Are Diagnostic Methods an Alternative to Robust Statistics? . 7 1.4 How do Robust Statistics Compare with Other Statistical Procedures inPractice?.............................. 11 2 Key Measures and Results 15 2.1 Introduction . ............................. 15 2.2 Statistical Tools for Measuring Robustness Properties . 16 2.2.1 TheInfluenceFunction.................... 17 2.2.2 TheBreakdownPoint.................... 20 2.2.3 GeometricalInterpretation.................. 20 2.2.4 TheRejectionPoint..................... 21 2.3GeneralApproachesforRobustEstimation............. 21 2.3.1 The General Class of M-estimators............. 23 2.3.2 Properties of M-estimators.................. 27 2.3.3 The Class of S-estimators.................. 30 2.4StatisticalToolsforMeasuringTestsRobustness........... 32 2.4.1 Sensitivity of the Two-sample t-test............. 34 2.4.2 Local Stability of a Test: the Univariate Case . 34 2.4.3 Global Reliability of a Test: the Breakdown Functions . 37 2.5GeneralApproachesforRobustTesting............... 38 2.5.1 WaldTest,ScoreTestandLRT................ 39 2.5.2 GeometricalInterpretation.................. 40 2.5.3 General -typeClassesofTests............... 40 2.5.4 AsymptoticDistributions................... 42 2.5.5 RobustnessProperties.................... 43 viii CONTENTS 3 Linear Regression 45 3.1 Introduction .............................. 45 3.2EstimatingtheRegressionParameters................ 47 3.2.1 TheRegressionModel.................... 47 3.2.2 RobustnessPropertiesoftheLSandMLEEstimators.... 48 3.2.3 Glomerular Filtration Rate (GFR) Data Example . .... 49 3.2.4 RobustEstimators...................... 50 3.2.5 GFRDataExample(continued)............... 54 3.3TestingtheRegressionParameters.................. 55 3.3.1 SignificanceTesting..................... 55 3.3.2 DiabetesDataExample.................... 58 3.3.3 Multiple Hypothesis Testing . ............... 59 3.3.4 DiabetesDataExample(continued)............. 61 3.4CheckingandSelectingtheModel.................. 62 3.4.1 ResidualAnalysis...................... 62 3.4.2 GFRDataExample(continued)............... 62 3.4.3 DiabetesDataExample(continued)............. 65 3.4.4 CoefficientofDetermination................. 66 3.4.5 GlobalCriteriaforModelComparison............ 69 3.4.6 DiabetesDataExample(continued)............. 75 3.5CardiovascularRiskFactorsDataExample............. 78 4 Mixed Linear Models 83 4.1 Introduction .............................. 83 4.2TheMLM............................... 84 4.2.1 TheMLMFormulation.................... 84 4.2.2 SkinResistanceData..................... 88 4.2.3 SemanticPrimingData.................... 89 4.2.4 Orthodontic Growth Data . ............... 90 4.3ClassicalEstimationandInference.................. 91 4.3.1 MarginalandREMLEstimation............... 91 4.3.2 ClassicalInference...................... 94 4.3.3 LackofRobustnessofClassicalProcedures......... 96 4.4RobustEstimation........................... 97 4.4.1 Bounded Influence Estimators . ............... 97 4.4.2 S-estimators......................... 98 4.4.3 MM-estimators........................ 100 4.4.4 Choosing the Tuning Constants ............... 102 4.4.5 SkinResistanceData(continued).............. 103 4.5RobustInference........................... 104 4.5.1 TestingContrasts....................... 104 4.5.2 Multiple Hypothesis Testing of the Main Effects . .... 106 4.5.3 SkinResistanceDataExample(continued)......... 107 4.5.4 SemanticPrimingDataExample(continued)........ 107 4.5.5 Testing the Variance Components . ........... 110 CONTENTS ix 4.6CheckingtheModel.......................... 110 4.6.1 Detecting Outlying and Influential Observations . 110 4.6.2 PredictionandResidualAnalysis.............. 112 4.7FurtherExamples........................... 116 4.7.1 Metallic Oxide Data . .................. 116 4.7.2 Orthodontic Growth Data (continued) . .......... 118 4.8DiscussionandExtensions...................... 122 5 Generalized Linear Models 125 5.1 Introduction . ............................. 125 5.2TheGLM............................... 126 5.2.1 ModelBuilding........................ 126 5.2.2 Classical Estimation and Inference for GLM . 129 5.2.3 HospitalCostsDataExample................ 132 5.2.4 ResidualAnalysis...................... 133 5.3 A Class of M-estimatorsforGLMs.................. 136 5.3.1 Choice of ψ and w(x) .................... 137 5.3.2 FisherConsistencyCorrection................ 138 5.3.3 NuisanceParametersEstimation............... 139 5.3.4 IF andAsymptoticProperties................ 140 5.3.5 HospitalCostsExample(continued)............. 140 5.4RobustInference........................... 141 5.4.1 SignificanceTestingandCIs................. 141 5.4.2 General Parametric Hypothesis Testing and VariableSelection...................... 142 5.4.3 HospitalCostsDataExample(continued).......... 144 5.5BreastfeedingDataExample..................... 146 5.5.1 RobustEstimationoftheFullModel............. 146 5.5.2 VariableSelection...................... 148 5.6DoctorVisitsDataExample..................... 151 5.6.1 RobustEstimationoftheFullModel............. 151 5.6.2 VariableSelection...................... 154 5.7DiscussionandExtensions...................... 158 5.7.1 Robust Hurdle Models for Counts .............. 158 5.7.2 RobustAkaikeCriterion................... 159 5.7.3 General Cp CriterionforGLMs............... 159 5.7.4 PredictionwithRobustModels................ 160 6 Marginal Longitudinal Data Analysis 161 6.1 Introduction . ............................. 161 6.2 The Marginal Longitudinal Data Model (MLDA) and Alternatives . 163 6.2.1 Classical Estimation and Inference in MLDA . 164 6.2.2 Estimators for τ and α .................... 166 6.2.3 GUIDEDataExample.................... 169 6.2.4 ResidualAnalysis...................... 171 x CONTENTS 6.3ARobustGEE-typeEstimator...................