Allele Specific Expression in Various Tissues of Gallus
Total Page:16
File Type:pdf, Size:1020Kb
ALLELE SPECIFIC EXPRESSION IN VARIOUS TISSUES OF GALLUS GALLUS DOMESTICUS by M. Joseph Tomlinson IV A thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Master of Science in Bioinformatics and Computational Biology Fall 2018 © 2018 M. Joseph Tomlinson IV All Rights Reserved ALLELE SPECIFIC EXPRESSION IN VARIOUS TISSUES OF GALLUS GALLUS DOMESTICUS by M. Joseph Tomlinson IV Approved: __________________________________________________________ Behnam Abasht, Ph.D. Professor in charge of thesis on behalf of the Advisory Committee Approved: __________________________________________________________ Limin Kung, Jr., Ph.D. Chair of the Department of Animal and Food Sciences Approved: __________________________________________________________ Mark W. Rieger, Ph.D. Dean of the College of Agriculture and Natural Resources Approved: __________________________________________________________ Douglas J. Doren, Ph.D. Interim Vice Provost for the Office of Graduate and Professional Education ACKNOWLEDGMENTS I would like to thank my graduate advisor Dr. Behnam Abasht. Without his guidance and support this project would not have been possible. I would also like to thank him for giving me the opportunity to gain extensive experience in the bioinformatics field that greatly improved my skillsets. I would also like to thank my thesis committee members Dr. Shawn Polson, Dr. Jing Qiu and Dr. Randall Wisser for their advice and guidance on this project, which was essential in the overall success of this project. I also express my thanks to my classmates, labmates and friends at UD, who helped encourage me throughout the journey (Michael Papah, Daniel Chazi Capelo, Felix Francis, Juniper Lake, Emma Fare, Steve Chiou, Matt Saponaro, Terence Mhora and many more names). I would like to thank my family. My mom and step-dad who have helped support and encourage me over the years in my academic studies. My lovely wife, who whole heartedly encouraged me when I went back to school at UD and who also decided to join me in my academic endeavors and get a degree at University of Delaware too. Finally, I would love to dedicate this thesis to my daughter- Anna Bai Tomlinson! iii TABLE OF CONTENTS LIST OF TABLES ........................................................................................................ vi LIST OF FIGURES ....................................................................................................... ix ABSTRACT ................................................................................................................. xii Chapter 1 IMPORTANCE OF ALLELE SPECIFIC EXPRESSION ................................. 1 1.1 Prefix for Project ................................................................................... 1 1.2 Importance of ASE ............................................................................... 2 1.3 Types of ASE ........................................................................................ 4 1.4 Imprinting and ASE .............................................................................. 5 1.5 Brief History of ASE ............................................................................ 5 1.6 ASE in Chickens ................................................................................... 6 REFERENCES ........................................................................................................... 10 2 ALLELE SPECIFIC EXPRESSION ANALYSIS IN CHICKENS ................. 13 2.1 Introduction ......................................................................................... 13 2.2 Materials and Methods ........................................................................ 16 2.2.1 Sample Collection and Quality Control .................................. 16 2.2.2 Sequence Alignment and Variant Calling ............................... 17 2.2.3 Analyzing Unmappable Reads ................................................ 20 2.2.4 600K Genotyping Data ............................................................ 21 2.2.5 Validating RNA-Seq Analysis ................................................ 21 2.2.6 VCF ASE Detection Tool (VADT) ......................................... 22 2.2.6.1 VADT - Filtering of Data ......................................... 22 2.2.6.2 VADT – Detection of Reference Allele Bias ........... 23 2.2.6.3 VADT – Binomial Testing ....................................... 23 2.2.6.4 VADT – Statistical Analysis of Binomial Results ... 24 2.2.6.4.1 Sample Level Analysis ........................... 24 2.2.6.4.2 Meta-Analysis ........................................ 25 2.2.6.5 VADT – Settings Utilized in Our Study of ASE...... 25 2.2.7 Investigation Functional Significance ..................................... 25 2.2.8 Validation of VADT’s Robustness.......................................... 26 iv 2.3 Results ................................................................................................. 26 2.3.1 Mapping and Initial Variant Results ....................................... 26 2.3.2 Mapping Issue Between Tissue ............................................... 27 2.3.3 Validation of Variant Calling Pipeline .................................... 35 2.3.4 VADT Analysis and Results ................................................... 38 2.3.5 Functional Significance and Pathway Enrichment .................. 40 2.3.6 Identification of Tissue Specific Pathways with Strong ASE Signals ..................................................................................... 44 2.3.7 Identification of Robust ASE Genes Found in All Three Tissues ..................................................................................... 46 2.3.8 VADT Robustness ................................................................... 49 2.4 Discussion ........................................................................................... 52 REFERENCES ........................................................................................................... 56 3 FUTURE STUDIES OF ASE .......................................................................... 62 3.1 Introduction ......................................................................................... 62 3.2 Future Developments of ASE Detection Analysis Pipeline ............... 63 3.2.1 Improving VADT’s ASE Detection Model............................. 63 3.2.2 Strandness of Read Counts ...................................................... 64 3.3 Future Biological Directions ............................................................... 64 3.4 Conclusion .......................................................................................... 68 REFERENCES ........................................................................................................... 69 Appendix A DERIVATION OF BINOMIAL TEST ......................................................... 71 B FALSE DISCOVERY RATE ........................................................................ 74 C FISHER’S METHOD FOR COMBINED PROBABILITY ......................... 78 D BREAKDOWN OF INFORMATIVE VARIANT COUNTS ....................... 79 E EXPANDED VIEW OF DAVID RESULTS ................................................ 80 F TOP ASE CANDIDATE GENES ................................................................. 86 v LIST OF TABLES Table 2.1: Summary statistics of STAR alignment (1st Pass) for all the samples separated by project that were used to create the initial VCF used for masking. The wooden breast project samples had a diversity of input lengths (*) and represent the average between the samples. ................ 27 Table 2.2: Overall summary statistics from the unmappable reads for R1 and R2 (sample 47337) for all three tissues. The “Average Trimmed Length” is the length that FastqBLAST trimmed the sequence and the “Average Hit Seq Length” is the length BLAST returned for the match. ............ 29 Table 2.3: Top 5 FastqBLAST count results for R1 and R2 of unmappable reads for the three tissues (breast muscle, abdominal and liver) for sample 47337. ................................................................................................. 30 Table 2.4: Summary table of the 1st Pass of STAR alignment for the three tissues with chimeric setting turned on. ........................................................... 32 Table 2.5: Comparison of variant calls between 600K Genotyping panel and RNA-seq variants. The initial total variant counts for both the panel and RNA-seq are based on variant calls after filtering and represent all high-quality variants that can be compared between the datasets. ....... 36 Table 2.6: Top genes identified using VADT’s significant results and Ensembl’s VEP tool where genes show 100% ASE in all three tissues after normalization. Gene’s overall biological function summary based on GeneCards [55]. .................................................................................... 48 Table 2.7: Variants previously verified using Sanger sequencing and corresponding VADT results. VADT was able to identify all variants as statistically significant in its original tissue designation. Coordinates were lifted over from Gallus gallus 4.0 to Gallus gallus 5.0. ............... 51 Table 3.1: Variants previously verified using Sanger sequencing and corresponding VADT results from both chicken datasets tested using VADT. Variants were prior reported by Zhuo et al [8] as showing ASE and were verified using Sanger Sequencing in the in the tissue designated