Pan-Cancer Identification and Prioritization of Cancer- Associated Alternatively Spliced and Differentially Expressed Genes: a Biomarker Discovery Application
Total Page:16
File Type:pdf, Size:1020Kb
Pan-Cancer Identification and Prioritization of Cancer- Associated Alternatively Spliced and Differentially Expressed Genes: A Biomarker Discovery Application by Daryanaz Dargahi B.Sc., University of Tehran, 2009 M.Sc., Simon Fraser University, 2011 Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Department of Molecular Biology and Biochemistry Faculty of Science Ó Daryanaz Dargahi 2016 SIMON FRASER UNIVERSITY Fall 2016 Approval Name: Daryanaz Dargahi Degree: Doctor of Philosophy Title: Pan-Cancer Identification and Prioritization of Cancer- Associated Alternatively Spliced and Differentially Expressed Genes: A Biomarker Discovery Application Examining Committee: Chair: Sharon Gorski Professor Steven J.M. Jones Senior Supervisor Professor David L. Baillie Supervisor Professor Robert A. Holt Supervisor Professor Angela Brooks-Wilson Supervisor Professor Martin Hirst Supervisor Associate Professor Fiona Brinkman Internal Examiner Professor Denise Clark External Examiner Professor Biology University of New Brunswick Date Defended/Approved: November 21, 2016 ii Abstract Tumour cells arise through aberrant expression of genes and the proteins they encode. This may result from a direct change to DNA sequence or perturbations in the machinery responsible for production or activity of proteins, such as gene splicing. With the advent of massively parallel RNA-sequencing (RNA-seq), large-scale exploration of changes at the stage of transcription and posttranscriptional splicing has the potential to unravel the landscape of gene expression changes across human cancers. Aberrantly expressed genes in cancer can serve as molecular biomarkers for discrimination of tumour and normal cells if localized to the cell surface and therefore can be used as targets for targeted antibody-based cancer therapy. In the current study, I devised an analysis pipeline to identify and rank such events from human cancer RNA-seq datasets. Using my pipeline, I conducted a pan-cancer analysis in the RNA-sequencing data of more than 7,000 patients from 24 different cancer types generated by the cancer genome atlas (TCGA). I identified abnormally expressed and alternatively spliced genes, which seemed to be cancer-associated in comparison to a large compendium of transcriptomes from non-diseased tissues gathered from Genotype-Tissue Expression (GTEx) and TCGA. My analysis revealed 1,503 putative tumor-associated abnormally expressed genes and 1,142 novel cancer-associated splice variants occurring in 694 genes. In order to rank identified candidate genes, I performed an extensive literature search and studied known therapeutic antibody targets to collect the characteristics of an ideal antibody target in cancer. I developed an R package, Prize, based on the Analytic Hierarchy Process (AHP) algorithm. AHP is a multiple-criteria decision making solution that allows a user to prioritize a list of elements based of a set of user-define criteria and numerical score that express the importance of each criterion to achieving the goal. I built an AHP model to depict cancer biomarker target properties for ranking and prioritizing the genes. Using this model, Prize was able to successfully recognize and rank known tumour biomarker targets among the top 25 ranked list along with other novel candidates. Keywords: RNA-sequencing; Alternative splicing; Gene expression; Biomarker target; Prioritization; Analytic Hierarchy Process iii Preface Portions of section 3.1 and 3.3 is in preparation for submission as Daryanaz Dargahi, Christopher Bond, Ryan Dercho, Richard Swayze, Leanna Yee, Peter Bergqvist, Alireza Heravi-Moussavi, Bradley Hedberg, Jianghong An, Edie Dullaghan, Ismael Samudio, John Babcook, and Steven Jones. (2016). Pan-cancer Identification and Prioritization of Cancer-Associated Abnormally Expressed Genes: A Biomarker Discovery Application. I am the lead researcher and author of this publication. I performed data analysis, generated figures, performed literature search, designed and implemented the R package, and am writing the manuscript. Myself, SJ and JB conceived and designed the study. Myself, SJ, JB, CB, IS, RS, LY, PB, BH, ED, RD, JA, AHM designed the problem hierarchy, chose decision criteria and rating categories for prioritization, and generated consensus pairwise comparison matrices via multiple discussions and literature search. JB, CB, and IS are leading experts in antibody-drug conjugate development. The Prize R package described in section 3.3.1 is currently available to public on Bioconductor at https://www.bioconductor.org/packages/release/bioc/html/Prize.html. The package has been downloaded more than 1,300 time since the date of publication (October 2015). Portions of section 3.2 has been published as Daryanaz Dargahi, Richard Swayze, Leanna Yee, Peter Bergqvist, Bradley Hedberg, Alireza Heravi-Moussavi, Edie Dullaghan, Ryan Dercho, Jianghong An, John Babcook, and Steven Jones. (2014). A Pan-Cancer Analysis of Alternative Splicing Events Reveals Novel Tumor-Associated Splice Variants of Matriptase. Cancer Informatics. 2014 Dec; 13: 167–177. doi: 10.4137/CIN.S19435. I was the lead researcher and author of this publication. I performed data analysis, generated figures, performed literature search, wrote the manuscript, and was involved in designing validation experiments. Myself, SJ and JB conceived and designed the study. RS, LY, PB, BH, ED, and RD designed and ran validation experiments. JA and AHM provided technical assistance. All the authors made critical revisions and approved the final version of the manuscript. iv In addition, novel matriptase splice variants described in section 3.2.3 have been filled as a PCT international patent application No. PCT/CA2014/000875 entitled: MATRIPTASE VARIANTS ASSOCIATED WITH TUMORS, filed December 9, 2014. Inventors: Dargahi, D., Babcook, JS. and Jones SJM. Applicant: British Columbia Cancer Agency Branch and The Centre for Drug Research and Development. v Dedication To my parents whose unconditional love and support has made this possible for me to be here today... vi Acknowledgements I would like to greatly thank my senior supervisor, Dr. Steven J.M. Jones for giving me the opportunity to pursue my PhD and for providing me with exceptional mentorship, consistent support, and endless scientific expertise over the past 5 years. I would also like to thank my committee members, Dr. David L. Baillie, Dr. Robert Holt, Dr. Angela Brooks-Wilson, and Dr. Martin Hirst for their support over the past 5 years as well as their advice and guidance not only scientifically, but also in relation to my professional and personal development. In addition, I would like to acknowledge Dr. Fiona Brinkman and Dr. Denise Clark for being my internal and external examiners, respectively. This work would not have been made possible without the financial support of several funding agencies. I am deeply grateful for a PhD fellowship from the Mitacs Accelerate program, and grants from Genome British Columbia strategic opportunities fund, and the Terry Fox Research Institue (TFRI) new frontiers program. I would like to also thank Dr. John Babcook, The Centre for Drug Research and Development (CDRD), and CDRD ventures Inc. for the three years internship opportunity through Mitacs Accelerate program. In addition, I would like to thank Simon Fraser University and the Molecular Biology and Biochemistry Department. The results published in this thesis are in whole or part based upon data generated by Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) pilot projects established by national cancer institute (NCI) and national human genome research institute (NHGRI). I would like to thank GTEx and TCGA groups for making these data publically available. Information about TCGA can be found at http://cancergenome.nih.gov. Additional information about GTEx project is also available at http://www.gtexportal.org/. Finally, I am extremely grateful for the support of my friends and family. Specifically my mother Nahid Mojaverian, and my father Mohammad Ali Dargahi, who have always encouraged me to embrace and develop my individuality and have vii unconditionally supported the pursuit of my passions. Their endless love, support, and respect have made me the person I am today. viii Table of Contents Approval ............................................................................................................................ ii Abstract ............................................................................................................................ iii Preface ............................................................................................................................. iv Dedication ........................................................................................................................ vi Acknowledgements ......................................................................................................... vii Table of Contents ............................................................................................................. ix List of Tables ................................................................................................................... xii List of Figures ................................................................................................................. xiv List of Acronyms .............................................................................................................