Pling Designs for Segregation and Linkage Analysis
Total Page:16
File Type:pdf, Size:1020Kb
ASCERTAINMENT IN TWO-PHASE SAMPLING DESIGNS FOR SEGREGATION AND LINKAGE ANALYSIS by GUOHUA ZHU Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Dissertation Advisor: Dr. Robert C. Elston Department of Epidemiology and Biostatistics CASE WESTERN RESERVE UNIVERSITY May, 2005 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the dissertation of ______________________________________________________ candidate for the Ph.D. degree *. (signed)_______________________________________________ (chair of the committee) ________________________________________________ ________________________________________________ ________________________________________________ ________________________________________________ ________________________________________________ (date) _______________________ *We also certify that written approval has been obtained for any proprietary material contained therein. TABLE OF CONTENTS TABLE OF CONTENTS iii LIST OF TABLES vii LIST OF FIGURES xiv ACKNOWLEDGEMENTS xv ABSTRACT xvi CHAPTER I. LITERATURE REVIEW AND STATEMENT OF THE PROBLEM 1.1 Introduction 1 1.2 The Ascertainment Problem in Segregation Analysis 2 1.2.1 Segregation Analysis 2 1.2.2 The Ascertainment Problem in Segregation Analysis 5 1.2.2.1 The Ascertainment Problem in the Segregation Analysis of Sibships 6 1.2.2.2 Correcting for Ascertainment Bias in the analysis of Sibships 9 1.2.2.3 Correcting for Ascertainment in Pedigree Analysis 13 1.3 The Ascertainment Problem in Linkage Analysis 17 1.3.1 Linkage Analysis 17 1.3.1.1 Model-Based Linkage Analysis 17 1.3.1.2 Model-Free Linkage Analysis 21 1.3.2 The Ascertainment Problem in Linkage Analysis 26 1.4 The Ascertainment Problem in Joint Segregation and Linkage Analysis 29 iii 1.4.1 Joint Segregation and Linkage Analysis 29 1.4.2 The Ascertainment Problem in Joint Segregation and Linkage Analysis 35 1.5 Optimum Study Design and Cost Effectiveness 35 1.6 Weighted Distributions and Two-Phase (Sampling) Designs 36 1.6.1 Weighted Distributions 37 1.6.2 Two-Phase (Sampling) Designs 38 1.7 Statement of the Problem 42 CHAPTER II. TWO-PHASE DESIGN LIKELIHOODS FOR SEGREGATION ANALYSIS IN NUCLEAR FAMILIES 2.1 The Model and Its Assumptions 44 2.2 Simple Segregation Analysis 45 2.3 Complex Segregation Analysis 49 2.3.1 Estimation of Allele Frequencies from Segregation Analysis for a Recessive Model with Incomplete Heterozygote Penetrance 53 2.3.2 Estimation of Allele Frequencies from Segregation Analysis for a Dominant Model with Incomplete Heterozygote Penetrance 55 2.4 Summary 57 CHAPTER III. TWO-PHASE DESIGNS FOR SEGREGATION ANALYSIS IN PEDIGREES 3.1 Introduction 58 3.2 Simulation Procedure 59 iv 3.3 Two-Phase (Sampling) Designs 61 3.4 Results 63 3.4.1 The Effects under Dominant Models 63 3.4.1.1 The Estimates of the Allele Frequency 63 3.4.1.2 The Estimates of the Penetrances 67 3.4.2 The Effects under Recessive Models 71 3.4.2.1 The Estimates of the Allele Frequency 71 3.4.2.2 The Estimates of the Penetrances 77 3.5 Summary and Discussion 83 CHAPTER IV. TWO-PHASE DESIGNS FOR LINKAGE ANALYSIS IN PEDIGREES 4.1 Introduction 87 4.2 Simulations 88 4.3 Method of Analysis 88 4.4 Results 89 4.4.1 The Effects under Dominant Models 89 4.4.1.1 Estimates of the Recombination Fraction 89 4.4.1.2 Maximum Lod Scores 93 4.4.2 The Effects under Recessive Models 94 4.4.2.1 Estimates of the Recombination Fraction 94 4.4.2.2 Maximum Lod Score 100 4.5 Summary 101 v CHAPTER V. THE COST EFFECTIVENESS OF LINKAGE ANALYSIS IN TWO-PHASE DESIGNS 5.1 Introduction 104 5.2 A Cost Function for Two-Phase Sampling Designs 105 5.3 Results 106 5.4 Summary 113 CHAPTER VI. CONCLUSIONS AND TOPICS FOR FURTHER STUDY 6.1 Conclusions 116 6.2 Topics for Further Study 118 BIBLIOGRAPHY 120 vi LIST OF TABLES Table 1.1 Genetic Transition Matrix for Two Alleles at One Autosomal Locus. Each Entry is a Genotypic Distribution [ pst1 pst 2 pst3 ] Conditional on Mating Type s × t (Elston and Stewart 1971) 4 Table 2.1 Segregation Models for Recessive Inheritance with Incomplete Heterozygote Penetrance 54 Table 2.2 Segregation Models for Dominant Inheritance with Incomplete Heterozygote Penetrance 56 Table 3.1 Genetic Parameters for Simulation of Pedigrees 60 Table 3.2 Sample Pedigrees Simulated in Order to Produce Nm= 50 and 100, Respectively 61 Table 3.3 Mean (qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Dominant Model for Segregation Analysis: Sample with 50 Multiplex Families 64 Table 3.4 Mean (qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Dominant Model for Segregation Analysis: Sample with 100 Multiplex Families 64 Table 3.5 Mean (qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Incompletely Dominant Model for Segregation Analysis: Sample with 50 Multiplex Families 65 Table 3.6 Mean (qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Incompletely Dominant Model for Segregation vii Analysis: Sample with 100 Multiplex Families 66 Table 3.7 Mean (qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Incompletely Dominant Model for Segregation Analysis: Sample with 50 Multiplex Families 66 Table 3.8 Mean (qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Incompletely Dominant Model for Segregation Analysis: Sample with 100 Multiplex Families 67 Table 3.9 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the Penetrance (f ) in the Dominant Model for Segregation Analysis: Sample with 50 Multiplex Families 68 Table 3.10 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the Penetrance (f ) in the Dominant Model for Segregation Analysis: Sample with 100 Multiplex Families 68 Table 3.11 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the Penetrance (f ) in the Incomplete Dominant Model for Segregation Analysis: Sample with 50 Multiplex Families 69 Table 3.12 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the Penetrance (f ) in the Incomplete Dominant Model for Segregation Analysis: Sample with 100 Multiplex Families 69 Table 3.13 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the Penetrance (f ) in the Incompletely Dominant Model for Segregation Analysis: Sample with 50 Multiplex Families 70 viii Table 3.14 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the Penetrance (f ) in the Incompletely Dominant Model for Segregation Analysis: Sample with 100 Multiplex Families 71 Table 3.15 Mean(qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Recessive Model for Segregation Analysis: Sample with 50 Multiplex Families 72 Table 3.16 Mean(qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Recessive Model for Segregation Analysis: Sample with 100 Multiplex Families 73 Table 3.17 Mean(qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Incompletely Recessive Model for Segregation Analysis: Sample with 50 Multiplex Families 74 Table 3.18 Mean(qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Incompletely Recessive Model for Segregation Analysis: Sample with 100 Multiplex Families 75 Table 3.19 Mean(qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Incompletely Recessive Model for Segregation Analysis: Sample with 50 Multiplex Families 76 Table 3.20 Mean(qˆ), Standard Error (Se), Bias and RMSE of the Estimates of the Allele Frequency (q) in the Incompletely Recessive Model for Segregation Analysis: Sample with 100 Multiplex Families 77 Table 3.21 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the ix Penetrance (f ) in the Recessive Model for Segregation Analysis: Sample with 50 Multiplex Families 78 Table 3.22 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the Penetrance (f) in the Recessive Model for Segregation Analysis: Sample with 100 Multiplex Families 79 Table 3.23 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the Penetrance (f) in the Incompletely Recessive Model for Segregation Analysis: Sample with 50 Multiplex Families 80 Table 3.24 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the Penetrance (f) in the Incompletely Recessive Model for Segregation Analysis: Sample with 100 Multiplex Families 81 Table 3.25 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the Penetrance (f) in the Incompletely Recessive Model for Segregation Analysis: Sample with 50 Multiplex Families 82 Table 3.26 Mean (fˆ ), Standard Error (Se), Bias and RMSE of the Estimates of the Penetrance (f) in the Incompletely Recessive Model for Segregation Analysis: Sample with 100 Multiplex Families 83 Table 4.1 Mean (θˆ), Standard Error (Se), Bias, RMSE of the Estimates of the Recombination Fraction (θ) and Mean and Se of the Lod Score in the Dominant Model for Linkage Analysis: Sample with 50 Multiplex Families 90 Table 4.2 Mean (θˆ), Standard Error (Se), Bias, RMSE of the Estimates of the Recombination Fraction (θ) and Mean and Se of the Lod Score in the x Dominant Model for Linkage Analysis: Sample with 100 Multiplex Families 90 Table 4.3 Mean (θˆ), Standard Error (Se), Bias, RMSE of the Estimates of the Recombination Fraction (θ) and Mean and Se of the Lod Score in the