A Posterior Probability of Linkage & Association Study
Total Page:16
File Type:pdf, Size:1020Kb
A POSTERIOR PROBABILITY OF LINKAGE & ASSOCIATION STUDY OF 111 AUTISM CANDIDATE GENES B y FANG CHEN A dissertation submitted to the Graduate School – New Brunswick Rutgers, The State University of New Jersey and The Graduate School of Biomedical Sciences University of Medicine and Dentistry of New Jersey In partial fulfillment of the requirements For the degree of Doctor of Philosophy Graduate Program in Microbiology and Molecular Genetics Written under the direction of Dr. Tara C. Matise & Dr. Jay Tischfield And approved by ____________________ _____________________ ____________________ _____________________ ____________________ _____________________ New Brunswick, New Jersey May, 2009 ABSTRACT OF THE DISSERTATION A Posterior Probability of Linkage & Association Study of 111 Autism Candidate Genes B y FANG CHEN Dissertation directors: Dr. Tara C. Matise & Dr. Jay Tischfield Autism is a neurodevelopmental disorder with a complex genetic basis. In this study we investigated the possible involvement of 111 candidate genes in autism by studying 386 patient families from the Autism Genetic Resource Exchange (AGRE). These genes were selected based on their functions that relate to the neurotransmission or central developmental system. In phase 1 of the study, 1497 tagSNPs were selected to efficiently capture the haplotype information of each gene and were genotyped in 265 AGRE nuclear families. The cleaned genotype data were analyzed through the Kelvin program to compute values of Posterior Probability of Linkage (PPL) and Posterior Probability of LD given linkage (PPLD), which directly measure the probability of linkage and/or association. Consistent supportive evidence for linkage was observed for EPHB6-EPHA1 locus at the 7q34 region by two- and multi-point PPL analysis. Some evidence for association was obtained from the intronic SNP rs2242601 of the EPHA1 gene (PPLD = 10.4%), and multiple SNPs from the MECP2 gene at Xq28 (PPLD range ii from 5~9%). Using a subset of the newly released AGRE genotype data from the Affymetrix 5.0 high-density SNP array, further evidence for association was obtained for 6 markers located 90kb distal of EPHA1 gene (PPLD range from 21% to 40%). In phase 2 of this study, in an attempt to conduct fine mapping as well as to replicate our phase 1 results in a set of 123 additional AGRE family samples, additional SNPs were selected from the EPHA1 and MECP2 gene region for fine-scale analysis. Strong support of association with autism was observed for the markers downstream of the EPHA1 gene using the original families, with the SNP rs7801889 showing a high PPLD value of 62%. Markers from the MECP2 gene region remained moderately associated with PPLD values around 8%. Nonetheless, none of the SNPs showed any support for association in the additional family samples. These mixed preliminary results suggested the polymorphisms within and downstream of the Ephrin receptor A1 gene as potential novel susceptibility loci for autism. Limited support for the role of MECP2 in autism etiology was also observed. iii ACKNOWLEDGEMENT I would like to acknowledge my thesis advisors, Dr. Tara Matise and Dr. Jay Tischfield for their advices, guidance, inspiration, and encouragement in my graduate study. I am especially grateful to them for giving me this precious opportunity to participate in this project. I also want to thank for Dr. Neda Gharani for her immense help in planning and executing the works throughout. I wish to express my sincere thanks all my committee members who also greatly contribute to this project, Dr. Lei Yu, Dr. Jim Millonig, Dr. Linda Brzustowicz, and Dr. Derek Gordon. The high-throughput genotyping was completed in the Chinese Human Genome Center at Shanghai. I would like to thank Dr. Wei Huang, Dr. Changzheng Dong, Ying Wang and Haifeng Wang for their great support to us. I want to thank Dr. Veronica Vieland and Dr. Yinggui Huang from The Research Institute at Nationwide Children's Hospital. Without their knowledge and advice in the statistical analysis this study would not have been successful. I would like to thank all my current and previous members in Dr Matise’s Lab for delightful cooperation in the past years. I also want to thank the lab members from Dr. Brzustowicz’s lab for their generous assistance to my bench work. Last but not least, I am particularly grateful to my family members and Zheng for their supporting and encouraging me to pursue this degree. I would not have finished the degree without them. iv DEDICATION To my parents Kun Chen and Jialun Tang for their unconditional love through all my life. v TABLE OF CONTENTS ABSTRACT OF THE DISERTATION ………………. ................................................... ii ACKNOWLEDGEMENT ……………………………………………………………….iv DEDICATION ……………………………………………………………………………v TABLE OF CONTENTS ………………………………………………………………..vi LIST OF TABLES ……………………………………………………………………..viii LIST OF FIGURES ………………………………………………………………………x Chapter 1. Introduction ........................................................................................................1 Chapter 2. Background ........................................................................................................3 2.1Autism…………………………………………………………………………...3 2.2 Whole genome linkage screens…………………………………………………6 2.3 Cytogenetic analysis……………………………………………………………7 2.4 Association study………………………………………………………………8 2.5 Candidate gene study……………………………………………………………9 2.6 Gene expression…………………………………………………………………15 2.7 Copy number variation…………………………………………………………16 2.8 Sampling strategy………………………………………………………………18 2.9 Selection of markers for candidate gene testing…………………………………19 2.10 TagSNP transferability…………………………………………………………20 2.11 Parametric and non-parametric linkage analysis………………………………22 2.12 Bayesian inference and PPL……………………………………………………23 Chapter 3. Methods ………………………………………………………………………30 3.1 Subjects ..................................................................................................................30 vi 3.2 Candidate gene selection........................................................................................32 3.3 Selection of TagSNPs……………………………………………………………34 3.4 Genotyping………………………………………………………………………38 3.5 Genotype data cleaning…………………………………………………………40 3.6 PPL and PPLD analysis on the Phase 1a dataset………………………………...41 3.6.1 Two-point PPL……………………………………………………………41 3.6.2 Multi-point PPL…………………………………………………………42 3.6.3 PPLD……………………………………………………………………42 3.7 PPL and PPLD analysis on the Phase 2b dataset ………………………………..43 3.8 PDT analysis……………………………………………………………………..43 3.9 TagSNP transferability test……………………………………………………44 Chapter 4. Results………………………………………………………………………45 4.1 Two-point PPL and PPLD analysis of the Phase 1a dataset…………………45 4.2 Multi-point PPL analysis of the phase 1a dataset……………………………45 4.3 PPL and PPLD analysis of the Phse 1b dataset from suggestive gene regions..47 4.4. PPLD analysis of the phase 2 datasets…………………………………………53 4.5. Effect of sample selection on PPLD result……………………………………55 4. 6 Result of PDT analysis…………………………………………………………58 4.7. Evaluation of tagSNP transferability……………………………………………59 4. 8. PPL and PPLD analysis of the phase 1 tagSNPs in ethnic subgroups……..62 Chapter 5. Discussion……………………………………………………………………66 REFERENCES…………………………………………………………………………..74 APPENDICES…………………………………………………………………………...83 vii Curriculum Vita.………………………………………………………………………..122 viii LIST OF TABLES Table 3.1 List of autism candidate genes of our study ………………………………….33 Table 3.2 Illumina Goldengate Genotyping assay QC data……………………………...38 Table 4.1 PPLD difference of some markers form EPHA1 gene region using the Phase 1b and Phase2 sample sets………………………………………………………………..…57 Table 4.2 PDT results of markers from the EPHA1 gene region in two sample sets……58 Table 4.3 Summary of the SNP minor allele frequency from two data sets……………..60 Table 4.4 Number of SNPs captured by two tagging approaches……………………….61 ix LIST OF FIGURES Figure 2.1 Diagram of gene or loci on each chromosome that may be related to autism…………………………………………………………………………………….17 Figure 3.1 Minor allele frequency of selected tagSNPs from the Phase 1 study………...36 Figure 3.2. HapMap LD plot of selected SNPs from the EPHA1 gene region for the Phase 2 study……………………………………………………………………………………37 Figure3.3. HapMap LD plot of selected SNPs from the MECP2 gene region for the Phase 2 study……………………………………………………………………………………37 Figure 4.1 Two-point PPL and PPLD of the Phase 1 tagSNPs………………………….46 Figure 4.2 PPL and PPLD results of the tagSNPs at gene regions on chromosome 4…………………………………………………………………………………………..48 Figure 4.3 PPL and PPLD results of the tagSNPs at gene regions on chromosome 6…..48 Figure 4.4 PPL and PPLD results of the tagSNPs at gene regions on chromosome 7…..49 Figure 4.5 PPL and PPLD of markers from combined dataset within the EPHB6-EPHA1 gene region on 7q34-35………………………………………………………………….50 Figure 4.6 LD plot of markers from combined dataset from the EPHA1 gene region…..51 Figure 4.7 The EPHA1 gene showing its genomic location and gene structure…………51 Figure 4.8 PPL and PPLD result of markers from combined dataset from the MECP2 gene region on chromosome X…………………………………………………………..52 Figure 4.9 LD plot of markers from combined dataset within MECP2 gene region on chromosome X…………………………………………………………………………...53 Figure 4.10 PPLD of markers from the Phase 2 study within the EPHA1 gene region…54 Figure 4.11 LD plot of SNPs from the Phase 2 study within EPHA1 gene region……...54 x Figure 4.12 PPLD of markers from the Phase 2 study within the MECP2 gene region…56 Figure