INTRODUCING a NOVEL METHOD for GENETIC ANALYSIS of AUTISM SPECTRUM DISORDER Sepideh Nouri
Total Page:16
File Type:pdf, Size:1020Kb
The Texas Medical Center Library DigitalCommons@TMC The University of Texas MD Anderson Cancer Center UTHealth Graduate School of The University of Texas MD Anderson Cancer Biomedical Sciences Dissertations and Theses Center UTHealth Graduate School of (Open Access) Biomedical Sciences 12-2013 INTRODUCING A NOVEL METHOD FOR GENETIC ANALYSIS OF AUTISM SPECTRUM DISORDER sepideh nouri Follow this and additional works at: https://digitalcommons.library.tmc.edu/utgsbs_dissertations Part of the Bioinformatics Commons, Computational Biology Commons, and the Genomics Commons Recommended Citation nouri, sepideh, "INTRODUCING A NOVEL METHOD FOR GENETIC ANALYSIS OF AUTISM SPECTRUM DISORDER" (2013). The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access). 417. https://digitalcommons.library.tmc.edu/utgsbs_dissertations/417 This Thesis (MS) is brought to you for free and open access by the The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences at DigitalCommons@TMC. It has been accepted for inclusion in The University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences Dissertations and Theses (Open Access) by an authorized administrator of DigitalCommons@TMC. For more information, please contact [email protected]. INTRODUCING A NOVEL METHOD FOR GENETIC ANALYSIS OF AUTISM SPECTRUM DISORDER by Sepideh Nouri, M.S. APPROVED: Eric Boerwinkle, Ph.D., Supervisor Kim-Anh Do, Ph.D. Alanna Morrison, Ph.D. James Hixson, Ph.D. Paul Scheet, Ph.D. APPROVED: Dean, The University of Texas Graduate School of Biomedical Sciences at Houston INTRODUCING A NOVEL METHOD FOR GENETIC ANALYSIS OF AUTISM SPECTRUM DISORDER A THESIS Presented to the Faculty of The University of Texas Health Science Center at Houston and The University of Texas M. D. Anderson Cancer Center Graduate School of Biomedical Sciences in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE by Sepideh Nouri, M.S. Houston, Texas December 2013 ACKNOWLEDGEMENTS I would like to thank my advisor, Dr. Eric Boerwinkle for giving me the opportunity to study under his supervision at the Graduate School of Biomedical Sciences at Houston. I would also like to thank my committee members, Dr. Kim- Anh Do, Dr. Alanna Morrison, Dr. James Hixson, Dr. Paul Scheet, and my previous member Dr. John Cook for their guidance throughout my master’s degree. iii INTRODUCING A NOVEL METHOD FOR GENETIC ANALYSIS OF AUTISM SPECTRUM DISORDER Sepideh Nouri, M.S. Supervisory Professor: Eric Boerwinkle, Ph.D. Autism is a spectrum of neurological disorders that is characterized by repetitive and stereotyped behaviors, lack of social skills in verbal and non-verbal communications, and intellectual disability. Recent statistics shows that 1 out of every 88 children in the US is affected by autism. In this thesis, I first review previous studies on genetic association analyses of autism spectrum disorder. A large number of these studies fall into two categories: Genome Wide Association Studies (GWAS) and sequencing studies. Although GWAS are able to identify multiple common risk variants associated with different diseases, these common variants explain only a small portion of disease occurrence. In the case of autism, GWAS has had only limited success in identification of common risk variants. Recent studies suggest that rare variants have an important role in causing this disorder. There are a number of sequence-based association studies that have analyzed the relationship of rare variants with autism. Chapter 2 of my thesis presents some of the most popular sequencing- based rare variant association analysis methods along with their advantages and disadvantages. From previously published methods, the Weighted Sum Method (WSM) and Sequence Kernel Association Test (SKAT) are used to analyze the iv autism disease sequencing data since they have some advantages over similar methods. A new method called the Single Variant Method (SVM) is introduced in section 2.4. This new method provides a better understanding of the data and identifies associated variants not found by the previous methods. In Chapter 3, the results of the analysis of the autism data using WSM, SKAT and SVM are presented and discussed. Although there is no gold standard, there is insightful information to be learned by applying multiple methods, including the novel SVM method proposed here. v TABLE OF CONTENTS ACKNOWLEDGEMENTS ............................................................................................... iii ABSTRACT ................................................................................................................... iv LIST OF FIGURES ........................................................................................................ viii LIST OF TABLES ............................................................................................................ ix ABBREVIATIONS ........................................................................................................... x CHAPTER I: INTRODUCTION AND LITERATURE REVIEW ................................................. 1 1.1 Introduction ...................................................................................................... 1 1.2 GEnetiCs of Autism- a rEviEw of thE litEraturE .................................................... 2 1.2.1 GWAS and Sequencing Studies ......................................................................... 4 1.3 Conclusion ......................................................................................................... 7 CHAPTER II: REVIEW OF RARE VARIANT ANALYSIS METHODS ....................................... 8 2.1 Introduction ...................................................................................................... 8 2.2 WEightEd-Sum MEthod (WSM) ........................................................................ 11 2.3 SEquencE KErnel Association Test (SKAT) ......................................................... 12 2.4 SinglE Variant MEthod (SVM) ........................................................................... 16 2.4.1 Choosing a Proper Constant and Initial Results .............................................. 22 2.4.2 Estimating the P-values of the Measured Scores ........................................... 26 2.4.3 The Effect of Constant on the Score of Rare Variants .................................... 29 2.4.4 Calculating the Score of Genes based on the Score of Variants ..................... 34 CHAPTER III: AUTISM SPECTRUM DISORDER DATA AND RESULTS ............................... 36 vi 3.1 Autism SpeCtrum Disorder Data ....................................................................... 36 3.3 DisCussion ........................................................................................................ 40 3.4 Conclusions ...................................................................................................... 42 APPENDIX A: MATLAB CODES ..................................................................................... 43 REFERENCES ............................................................................................................... 48 VITA ........................................................................................................................... 55 vii LIST OF FIGURES Figure 1: Beta(MAF; 0.3, 0.7) ............................................................................. 14 Figure 2: Beta(MAF; 0.3, 0.9) ............................................................................. 15 Figure 3: Beta(MAF; 0.5, 0.5) ............................................................................. 15 Figure 4: Beta(MAF; 1, 25) ................................................................................. 16 Figure 5: Weight as a function of SDcontrolsi for constant values 0.001, 0.01, 0.03, 0.1 ....................................................................................................... 22 Figure 6: Histogram of SDcontrolsi for all variants .............................................. 24 Figure 7: Categorizing variants based on their total number of minor alleles in controls ........................................................................................................ 25 Figure 8: Calculated scores for variant ID=17318, score=0.99 ........................... 27 Figure 9: Calculated scores for variant ID=224, score=0.66 ............................... 28 Figure 10: Calculated scores for a low-rank variant ............................................ 28 Figure 11: Calculated scores for multiple constants ........................................... 29 Figure 12: Calculated scores for multiple constants. Zoomed version ............... 30 viii LIST OF TABLES Table 1: Example 1, calculated scores for three variants based on TestA……....18 Table 2-a: Comparison between TestA and TestB with constant C=0.01…….….20 Table 2-b: Comparison between TestA and TestB with constant C=1……………20 Table 2-c: Comparison between TestA and TestB with constant C=100…………20 Table 3: Variants with top 10 scores for constant C=0.03………………………...26 Table 4: Calculated scores for multiple constants……………….…………………31 Table 5: Scores for variants calculated for C=0.001, 0.01, 0.02, 0.03, 0.06, 0.08…...……………………………………………………………………………33 Table 6: P-values for variants calculated for C=0.001, 0.01, 0.02, 0.03, 0.06, 0.08………………………………………………………………………...………34 Table