Identifying Driver Mutations in Cancers

A University of Sussex PhD thesis Available online via Sussex Research Online: http://sro.sussex.ac.uk/ This thesis is protected by copyright which belongs to the author. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the Author The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the Author When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given Please visit Sussex Research Online for more information and further details Identifying driver mutations in cancers A thesis submitted to the University of Sussex for the degree of Doctor of Philosophy By Hanadi Baeissa October 2018 Declaration I hereby declare that this thesis has not been and will not be, submitted in whole or in part to another University for the award of any other degree. Hanadi Baeissa / / 2018 2 Preface The research presented in this thesis has been submitted for publication as follows: Chapter 2 Baeissa HM, Benstead-Hume G, Richardson CJ, Pearl FM. Mutational patterns in oncogenes and tumour suppressors. Biochemical Society Transactions. 2016; 44:925– 31. Author contributions: F.M.G.P. conceived the project and designed the analysis; H.B., C.R., G.B.-H. and F.M.G.P. implemented the informatics; and H.B. and F.M.G.P. undertook the data analysis and wrote the paper. Chapter 3 Baeissa HM, Benstead-Hume, G., Richardson, CJ. & Pearl, FM. Identification and analysis of mutational hotspots in oncogenes and tumour suppressors. Oncotarget. 2017; 8; 21290–304. Author contributions: F.M.G.P. conceived the project and designed the analysis; H.B., C.R., G.B.-H. and F.M.G.P. implemented the informatics; and H.B. and F.M.G.P. undertook the data analysis and wrote the paper. Chapter 4 Hanadi M Baeissa, Sarah K. Wooller, Chris J Richardson and Frances M G Pearl. Predicting loss of function and gain of function driver missense mutations in cancer. Submitted Author contributions: F.M.G.P. and H.M.B conceived the project and designed the analysis; H.B., C.R., and S.W. implemented the informatics; and H.B. undertook the data analysis. H.B and FMGP wrote the paper. 3 Chapter 5 Hanadi M Baeissa and Frances M G Pearl. Identifying the impact of inframe insertions and deletions on protein function in cancer. Submitted Author contributions: F.M.G.P. and H.B conceived the project and designed the analysis; H.B. implemented the informatics, undertook the data analysis and wrote the paper under the supervision of FMGP. Chapter 6 Hanadi M Baeissa, Sarah K Wooller and Frances M Pearl. Identifying actionable mutated proteins as targets for personalised medicine in lung cancer. In preparation Author contributions: F.M.G.P. and H.M.B conceived the project and designed the analysis; H.B., GB-H and S.W implemented the informatics. H.B undertook the data analysis and wrote the paper under the supervision of FMGP. 4 Acknowledgements This thesis would not have been possible without the constant help and guidance of my wonderful supervisor Dr. Frances Pearl. Her understanding and support are really appreciated. I have benefited from her knowledge, insight and enthusiasm. Also, I wish to thank my second supervisor Prof Laurence Pearl for the help. I am grateful to my friend Sarah Wooller for the warm support and caring that helped me through my most difficult period during the PhD. I wish to thank members of bioinformatics group: Graeme Benstead-Hume and Tina Chen for all the assistance, encouragement and friendship and to everyone who has been involved in the underlying work of this thesis. Most of all, I would like to thank my parents Mohammed and Noor. Their constant support and unwavering confidence that I can do anything and everything that I desire made me who I am today. My warmest thank also go out to my sister, my brothers, my kids for their support and opening my eyes to the future. Special thanks are owed to my husband Adel. His easy going attitude, understanding and love took the weight of life outside the research world off my shoulders so I could breathe a little easier. For the generous financial support through the years, thanks goes to King Abdulaziz University, Ministry of education in Saudi Arabia and Royal Embassy of Saudi Arabia Cultural Bureau in London. 5 Abstract All cancers depend upon mutations in critical genes, which confer a selective advantage to the tumour cell. The key to understanding the contribution of a disease- associated mutation to the development and progression of cancer comes from an understanding of the consequences of that mutation on the function of the affected protein, and the impact on the pathways in which that protein is involved. Using data from over 30 different cancers from whole-exome sequencing cancer genomic projects, I analysed over one million somatic mutations. I identified mutational hotspots within domain families by mapping small mutations to equivalent positions in multiple sequence alignments of protein domains. I found that gain of function mutations from oncogenes and loss of function mutations from tumour suppressors are normally found in different domain families and when observed in the same domain families, hotspot mutations are located at different positions within the multiple sequence alignment of the domain. Next, I investigated the ability of seven prediction algorithms to discriminate between driver missense mutations in oncogenes and tumour suppressors. Using 19 features to describe these mutations, I then developed a random forest classifier, MOKCaRF, to distinguish between gain of function and loss of function missense mutations in cancer. MOKCaRF performs significantly better than existing algorithms. I then evaluated the ability of six existing prediction tools to distinguish between pathogenic and neutral mutations for both inframe insertion and inframe deletion mutations. I developed my own classifiers using 11 features that perform better than the current algorithms. 6 Finally, using the algorithms that I developed, as well as changes in copy number and expression data for each gene, I analysed samples from 50 lung cancer patients to identify the actionable targets and potential new drug targets for each tumour. 7 List of Contents Chapter 1. Introduction ............................................................................................ 19 1.1 Cancer ................................................................................................................ 20 1.1.1 The Hallmarks of Cancer ............................................................................. 21 1.2 Genes involved in the development of cancer................................................. 23 1.2.1 Oncogenes .................................................................................................... 24 1.2.2 Tumour suppressor genes ............................................................................ 24 1.2.3 Therapeutically targeting driver genes ......................................................... 25 1.3 Mutations that arise in cancer ......................................................................... 26 1.3.1 Large-scale mutations .................................................................................. 26 1.3.2 Small-scale mutations .................................................................................. 27 1.3.2.1 Point Mutations ..................................................................................... 27 1.3.2.2. Indels .................................................................................................... 28 1.4 Functional consequence of small-scale mutations .......................................... 28 1.4.1 Loss of function mutations........................................................................... 28 1.4.2 Gain of function mutations .......................................................................... 30 1.5 Biological databases .......................................................................................... 31 1.5.1.Ensembl........................................................................................................ 31 1.5.2 The Universal Protein Resource .................................................................. 32 1.5.3 Protein Data Bank ........................................................................................ 33 1.5.4. CATH .......................................................................................................... 33 1.5.5 Pfam ............................................................................................................. 34 1.6 Cancer databases .............................................................................................. 36 1.6.1 Catalogue of Somatic Mutations in Cancer ................................................. 36 1.6.2 Cancer Gene Census .................................................................................... 37 1.6.3 ClinVar ......................................................................................................... 37 1.6.4 The Cancer Genome Atlas ........................................................................... 38 1.6.5 International Cancer Genome Consortium .................................................. 38 1.6.6 The Pan Cancer Analysis of Whole Genomes ............................................. 39 1.6.7 MOKCa .......................................................................................................

Load more