Bioinformatic Analysis of Cis-Encoded Antisense Transcription

BIOINFORMATIC ANALYSIS OF CIS-ENCODED ANTISENSE TRANSCRIPTION by Anca Sorana Morrissy B.Sc., Simon Fraser University, 2002 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies (Medical Genetics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) December 2010 © Anca Sorana Morrissy, 2010 Abstract A key first step in understanding cellular processes is a quantitative and comprehensive measurement of gene expression profiles. The scale and complexity of the mammalian transcriptome is a significant challenge to efforts aiming to identify the complete set of expressed transcripts. Specifically, detection of low-abundance sequences, such as antisense transcripts, has historically been difficult to achieve using EST libraries, microarrays, or tag sequencing methods. Antisense transcripts are expressed from the opposite strand of a partner gene, and in some cases can regulate the processing of the sense transcript, highlighting their biological relevance. Recently, efficient profiling of low-frequency transcripts was made possible with the advent of next generation sequencing platforms. Thus, a major goal of my thesis was to assess the prevalence of antisense transcripts using Tag-seq, a tag sequencing method modified to take advantage of the Illumina sequencing platform. The increase in sampling depth provided by Tag-seq resulted in significantly improved detection of low abundance antisense transcripts, and allowed accurate measurements of their differential expression across normal and cancerous states. While antisense transcription is known to regulate sense transcript processing at a small number of loci, no genome wide assessments of this regulatory interaction exist. I addressed this knowledge gap using Affymetrix exon arrays, and found a significant correlation between antisense transcription and alternative splicing in normal human cells. Further exploring the biological relevance of antisense-correlated splicing events in human disease, I found that these events could be used to identify clinically distinct subtypes of cancer. Together, the findings in this thesis provide a new foundation for the investigation of antisense transcripts in the regulation of alternative transcript processing, and open new avenues of research into understanding the molecular heterogeneity of human cancers. ! ""! Preface In conjunction with my supervisor, Dr. Marco Marra, I was involved in the conceptualization and design of the research activities described in this thesis. In particular, I was responsible for the design and implementation of the computational experiments, data analysis, and generation of the tables, figures, and text in this thesis. Chapter 2 represents a collaborative research effort that involved the input of additional authors, as described here and at the beginning of that chapter. Thomas Zhang, Helen McDonald, Yonjun Zhao., and Martin Hirst were involved in the design of the Tag-seq method, and in the creation of Tag-seq and LongSAGE libraries analyzed (sections 2.2.1, 2.4.1, 2.4.2). Steven Jones and Marco Marra supervised the project and contributed design concepts and comments throughout. Allen Delaney designed the data filtering algorithm, and conducted inter-plaform correlation measures (sections 2.2.2, 2.4.2, 2.4.3, 2.2.4.1, 2.2.4.2, 2.4.6). Scott Diguistini processed RNA-seq data. Ryan Morin wrote scripts used in section 2.4.7, was involved in original manuscript planning, and contributed text to the manuscript. I performed the remaining computational experiments in the text, designed and performed the cancer-related analyses (sections 2.2.2-2.2.8 and 2.4.4, 2.4.5, 2.4.7-2.4.9), created figures and tables describing the results, and wrote the majority of the manuscript. The work described in Chapter 3 was entirely performed and written by me, with the exception of microarray data pre-processing, which was done by Malachi Griffith (section 3.4.3.1), who also contributed microarray analysis concepts. The work described in Chapter 4 was entirely conducted by me.. ! """! Table of Contents ˙ Abstract............................................................................................................................... ii Preface................................................................................................................................iii Table of Contents............................................................................................................... iv List of Tables .................................................................................................................... vii List of Figures..................................................................................................................viii Acknowledgements............................................................................................................ ix 1. Bioinformatic approaches for the analysis of antisense transcript expression, evolution, and regulatory effects......................................................................................... 1 1.1 Introduction............................................................................................................... 1 1.2 Thesis overview ........................................................................................................ 2 1.3 Biological roles of antisense transcription................................................................ 3 1.3.1 Transcriptional interference............................................................................... 3 1.3.2 Epigenetic silencing........................................................................................... 4 1.3.3 RNA editing....................................................................................................... 6 1.3.4 RNAi.................................................................................................................. 6 1.3.5 RNA masking..................................................................................................... 7 1.4 High-throughput discovery of SAS genes ................................................................ 7 1.4.1 Studies utilizing mRNAs, ESTs, and cDNA libraries ....................................... 8 1.4.2 Microarray studies ........................................................................................... 10 1.4.3 Tag-based studies............................................................................................. 12 1.4.4 Profiling antisense transcripts using next-generation approaches ................... 15 1.5 Functional analyses................................................................................................. 17 1.5.1 Evolutionary conservation ............................................................................... 17 1.5.2 Regulated expression ....................................................................................... 19 1.6 Cancer ..................................................................................................................... 20 1.6.1 SAS transcripts in human disease and cancer.................................................. 21 1.6.2 Defining cancer subtypes using microarrays ................................................... 21 1.7 Thesis objectives and chapter summaries............................................................... 22 2. Next generation tag sequencing for cancer gene expression profiling ...................... 32 2.1 Introduction............................................................................................................. 32 2.2 Results..................................................................................................................... 34 2.2.1 Data generation ................................................................................................ 34 2.2.2 Data filtering .................................................................................................... 35 2.2.3 Effect of library depth on tag sequence diversity and abundance ................... 35 2.2.4 Differences in gene abundance between Tag-seq and other gene expression platforms ................................................................................................................... 37 2.2.5 GC-content bias ............................................................................................... 39 2.2.6 Improved representation of low abundance LongSAGE transcripts in Tag-seq libraries ..................................................................................................................... 41 2.2.7 Sense-antisense transcripts in cancer libraries................................................. 43 2.2.8 Transcript isoforms in cancer libraries ............................................................ 45 2.3 Discussion............................................................................................................... 45 2.4 Methods................................................................................................................... 48 ! "#! 2.4.1 Tag-seq library construction ............................................................................ 48 2.4.2 Tag extraction .................................................................................................. 49 2.4.3 Tag-seq filtering............................................................................................... 49 2.4.4 Ensembl data...................................................................................................

Bioinformatic Analysis of Cis-Encoded Antisense Transcription

The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet Is Available from the National Academies Press, 500 Fifth Street, NW, Washington, D.C

Genomics and Its Impact on Science and Society: the Human Genome Project and Beyond

The Human Genome Project Focus of the Human Genome Project

From the Human Genome Project to Genomic Medicine a Journey to Advance Human Health

Comparative Genome Evolution Working Group

Gene Prediction Using Deep Learning

The Genome Project–Write Issues, Enabling Inclusive Decision-Making on the Topics Mentioned Above (7)

"An Overview of Gene Identification: Approaches, Strategies, and Considerations"

Gene Prediction and Verification in a Compact Genome with Numerous Small Introns

SNP Resources: Finding Snps, Databases and Data Extraction

The Human Genome Project

Bioinformatics: a Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D