Bioinformatics Tools and Applications for Rainbow Trout by Rafet Al
Total Page:16
File Type:pdf, Size:1020Kb
Bioinformatics Tools and Applications for Rainbow Trout By Rafet Al-Tobasei A Dissertation Submitted in Partial Fulfillment Of the Requirements for Degree of Doctor of Philosophy in Computational Science Middle Tennessee State University May 2017 Dissertation Committee: Dr. Mohamed Salem (Chair) Dr. Hyrum D. Carroll Dr. Cen Li Dr. Rebecca Seipelt -Thiemann Dr. Joshua L. Phillips Dedicated to My children (Ahmad, Zayna, Ruba, Leena), my lovely wife (Nour Kattih), and my Mother (Sadeqah Alkatib) ii ACKNOWLEDGMENTS During my study, I was blessed with many people that supported me and encouraged me to make this PhD a reality. My Advisor, Dr. Mohamed Salem, I do not know how to thank him enough for his support and guidance through this journey. Dr. Salem went far and beyond what a dissertation chair does. He taught me about a subject that was completely out of my radar. He helped me drafting my research with patience and excellent enlightenment. Working with Dr. Salem has been a pleasure that will influence me forever. Computational Science Director, Dr. John Wallin, thanks for all the supports and funds you provided me. Your support and encouragements will always be appreciated. I would like to thank Dr. Chrisila Pettey, Computer Science department chair, for her support. I always enjoy working with Dr. Hyrum Carroll sharing ideas and thoughts. Dr. Cen Li; I always enjoyed your classes as a student and as a lab assistant. I learned from you how to be a better instructor. I would like to thank my doctoral committee members: Dr. Rebecca Seipelt - Thiemann and Dr. Joshua Philips for their support, time, feedback, and assistance reviewing this dissertation. Thank you all for your support and guidance. I would like to thank my colleagues and collaborator, from MOBI program in Middle Tennessee State University Bam Paneru and Ali Ali, from USDA Dr. Yniv Palti and Dr. Timothy Leeds, and from West Virginia University Dr. Brett Kenney. I would like to thank my family for their support and encouragement during this journey. My wife, Nour Kattih, my children, Ahmad, Zayna, Ruba, and Leena who stood by my iii side all the times especially the days when I had to work late. I would like to thank my Mom for her prayer for me. Thank you Middle Tennessee State University for providing me the education and knowledge that helped me to achieve my PhD. iv ABSTRACT Rainbow trout is one of the widely used aquaculture species for food worldwide. Due to its commercial importance, various genomic resources are available for the trout including a draft reference genome, microRNA repertoire, quantitative trait loci and single nucleotide polymorphisms (SNPs) associated with different production traits. However, many of these genomic resources still need improvement in terms of quality and quantity. The only available genome draft is not completely annotated, and lacks non-coding RNA and some protein coding genes. Similarly, majority of the previous work aimed at identification of trait-associated genetic markers were not robust due to limitation of genomic resources that were previously available. In this study, we used genomics and transcriptomic approaches to identify missing genetic elements including long non-coding RNA and protein coding genes in the reference genome. In addition, we utilized these genomics resources to identify genes and genetic variations, especially SNPs, associated with growth and muscle quality traits in rainbow trout. In order to facilitate gene discovery and to improve the draft genome reference, we used deep transcriptome sequencing from 13 vital tissues. De novo assembly of ~1.167 billion paired-end reads from those 13 tissues identified a total of 474,524 protein coding transcripts, of them 11,843 transcripts were not previously reported in the genome reference. In order to discover long non-coding RNA repertoire, we used the same ~1.167 billion RNA sequencing reads in addition to RNA sequence data from 3 other published sources. Transcriptome assembly followed by various filtration steps identified 54,503 long non-coding RNA transcripts, which provided the first long non-coding RNA draft reference in rainbow trout. These long non-coding RNAs exhibited less sequence conservation, one exon biased structure and overall lower expression level compared to protein coding genes. The newly identified long non-coding RNAs showed differential expression in response to Flavobacterium psychrophilum infection, and their expression level strongly correlated with body bacterial load in selectively bred, resistant-, control-, and susceptible- genetic lines of rainbow trout. These findings suggest that the lncRNAs have importance roles in antibacterial immune response and disease resistance in rainbow trout. In addition, multiple bioinformatics algorithms were tested and successfully utilized to identify SNPs in protein coding genes and long non-coding RNAs that are associated with 5 important production traits: whole body weight (WBW), muscle yield, muscle crude fat content, muscle shear force (tenderness) and fillet whiteness. A total of 7,930 SNPs identified in protein coding genes and non-coding RNAs showed allelic imbalances (>2.0 as an amplification and <0.5 as loss of heterozygosity) between fish families showing contrasting phenotypes for above-mentioned traits suggesting their importance in the phenotypes. Validation of a small subset of the SNPs with allelic imbalances showed ~93% success rate of the pipelines in calling SNPs suggesting reliability of the algorithms. v This study provides new genomic resources to complement the genome annotation and facilitate functional genomics research in addition to genome-wide studies and selection in rainbow trout. vi DECLARATION I declare that all the contents of this dissertation are my original work and has been done with collaboration with other colleague. Some of this work has been done with students from MOBI program who may use it in their dissertation. My major contributions were on the second and third publications and I significantly contributed to the first and the fourth publications. Work presented in this dissertation are based on the following works: Published/Submitted Papers: 1. Salem M, Paneru B, Al-Tobasei R, Abdouni F, Thorgaard GH, et al. (2015) Transcriptome Assembly, Gene Annotation and Tissue Gene Expression Atlas of the Rainbow Trout. PLOS ONE 10(3): e0121778. doi: 10.1371/journal.pone.0121778 2. Al-Tobasei R, Paneru B, Salem M (2016) Genome-Wide Discovery of Long Non- Coding RNAs in Rainbow Trout. PLOS ONE 11(2): e0148940. doi: 10.1371/journal.pone.0148940 3. Al-Tobasei R, Ali Ali, Timothy D. Leeds, Sixin Liu, Yniv Palti, Brett Kenney, and Mohamed Salem (2017) Identification of SNPs Associated with Muscle Yield and Quality Traits Using Allelic-Imbalance Analysis in Pooled RNA-Seq Samples in Rainbow Trout (submitted) 4. Paneru B, Al-Tobasei R, Palti Y, Wiens GD, Salem M (2106) Differential expression of long non-coding RNAs in three genetic lines of rainbow trout in vii response to infection with Flavobacterium psychrophilum. Scientific Reports 2016, 6:36032. viii TABLE OF CONTENTS LIST OF FIGURES ............................................................................................................................. xii LIST OF TABLES ............................................................................................................................... xv CHAPTER I: INTRODUCTION ............................................................................................................. 1 Aquaculture .................................................................................................................................. 3 Genome Annotation ..................................................................................................................... 6 Transcriptome De novo Assembly and Gene Annotation ............................................................ 7 Long non Coding RNA (lncRNA) ............................................................................................... 8 Single Nucleotide Polymorphism (SNP) ..................................................................................... 9 Specific Objectives .................................................................................................................... 11 CHAPTER II: TRANSCRIPTOME ASSEMBLY, GENE ANNOTATION AND TISSUE GENE EXPRESSION ATLAS OF THE RAINBOW TROUT [51] ............................................................................................ 13 Introduction ................................................................................................................................ 13 Materials and Methods ............................................................................................................... 17 Ethics statement ..................................................................................................................... 17 Production of doubled haploid rainbow trout ........................................................................ 17 Tissue collection and RNA isolation ..................................................................................... 17 Illumina paired-end sequencing ............................................................................................. 18 Trinity assembly and annotation ............................................................................................ 19 ORF/full-length cDNA prediction and gene ontology analysis ............................................