The Structure and Evolution of Breast Cancer Genomes

The Structure and Evolution of Breast Cancer Genomes Scott Newman Clare College, University of Cambridge A dissertation submitted to the University of Cambridge in candidature for the degree of Doctor of Philosophy February 2011 Declaration This dissertation contains the results of experimental work carried out between October 2007 and December 2010 in the Department of Pathology, University of Cambridge. This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. It has not been submitted whole or in part for any other qualification at any other University. i Summary The Structure and Evolution of Breast Cancer Genomes Scott Newman Chromosome changes in the haematological malignancies, lymphomas and sarcomas are known to be important events in the evolution of these tumours as they can, for example, form fusion oncogenes or disrupt tumour suppressor genes. The recently described recurrent fusion genes in prostate and lung cancer proved to be iconic examples as they indicated that important gene fusions are found in the common epithelial cancers also. Breast cancers often display extensive structural and numerical chromosome aberration and have among the most complex karyotyes of all cancers. Genome rearrangements are potentially an important source of mutation in breast cancer but little is known about how they might contribute to this disease. My first aim was to carry out a structural survey of breast cancer cell line genomes in order to find genes that were disrupted by chromosome aberrations in “typical” breast cancers. I investigated three breast cancer cell lines, HCC1187, VP229 and VP267 using data from array painting, SNP6 array CGH, molecular cytogenetics and massively parallel paired end sequencing. I then used these structural genomic maps to predict fusion transcripts and demonstrated expression of five fusion transcripts in HCC1187, three in VP229 and four in VP267. Even though chromosome aberrations disrupt and fuse many genes in individual breast cancers, a major unknown is the relative importance and timing of genome rearrangements compared to sequence-level mutation. For example, chromosome instability might arise early and be essential to tumour suppressor loss and fusion gene formation or be a late event contributing little to cancer development. To address this question, I considered the evolution of these highly rearranged breast cancer karyotypes. The VP229 and VP267 cell lines were derived from the same patient before and after therapy-resistant relapse, so any chromosome aberration found in both cell lines was probably found in the common in vivo ancestor of the two cell lines. A large majority of structural variants detected by massively parallel paired end sequencing, including three fusion transcripts, were found in both cell lines, and therefore, in the common ancestor. This probably means that the bulk of genome rearrangement pre-dated the relapse. For HCC1187, I classified most of its mutations as earlier or later according to whether they occurred before or after a landmark event in the evolution of the genome - endoreduplication (duplication of its entire genome). Genome rearrangements and sequence-level mutations were fairly evenly divided between earlier and later, implying that genetic instability was relatively constant throughout the evolution of the tumour. Surprisingly, the great majority of inactivating mutations and expressed gene fusions happened earlier. The non-random timing of these events suggests many were selected. ii Acknowledgements Thanks to members of the Edwards group past and present and especially to Paul Edwards, himself, for his sound advice and good humour over the past few years. I am also grateful to Clare College and the Medical Research Council for their financial support. Most of all, thanks also to my patient and understanding wife, Star who has supported me throughout my Master's and Ph.D studies. And last but not least, I need to thank my daughter, Ellinore. Her impending birth made me hasten the speed at which I was writing. iii Contents Declaration i Summary ii Acknowledgements iii List of Figures ix List of Tables xi Abbreviations xii Chapter 1 Introduction 1 1.1. Cancer 2 1.2. Breast Cancer 3 1.2.1. Susceptibility Alleles 3 1.2.2. Breast Cancer Histology 4 1.2.3. Gene expression patterns 5 1.2.4. Developmental Hierarchy of Breast Cells 6 1.3. Mutations that cause cancer 6 1.3.1. Sequence-level changes 7 1.3.2. Sequence-level changes in breast cancer 7 1.3.3. Changes to chromosome structure 9 1.3.4. The cytogenetics of breast cancer 10 1.3.5. Tumour Suppressor Gene Deletion 11 1.3.6. Oncogene Amplification 12 1.3.7. Gene Fusion 12 1.3.7.1. Receptor Tyrosine Kinases 13 1.3.7.2. Intracellular Kinases 14 1.3.7.3. Transcription factors and Chromatin Modifiers 14 1.3.8. Gene fusions in breast cancer 15 1.3.9. The complex structure of breast cancer genomes 15 1.4. Questions for post-genome cancer research 17 1.4.1. What types of mutations are needed to cause cancer? 18 1.4.2. How many mutations are required for cancer to develop? 19 1.4.2.1. Drivers versus Passengers 20 1.4.2.2. Driving mutations caused by chromosome aberrations 21 1.4.3. How should we deal with intra-tumour heterogeneity? 22 1.4.4. What is the role of chromosome instability? 23 1.4.4.1. The State of CIN 24 1.4.4.2. The timing of CIN 25 1.4.4.3. The Acquisition of CIN 26 1.5. Techniques used and discussed in this thesis 27 1.5.1. Florescence in situ Hybridization (FISH) 28 1.5.2. Spectral Karyotyping 28 1.5.3. Flow sorting of Chromosomes 29 1.5.4. Array CGH 30 1.5.5. Array Segmentation Algorithms 31 1.5.6. Array Painting 31 1.5.7. Massively Parallel Paired End Sequencing 32 1.7. The purpose of this thesis 35 1.7.1. Aim 1: Map chromosome rearrangements in breast cancer 35 1.7.2. Aim 2: Inveterate the relative timing of point mutations and chromosome Investiagte the realative timing of point mutations and chromosome aberrations 35 iv Chapter 2 Materials and Methods 37 2.1. Reagents, Manufacturers and Suppliers 37 2.2. Common Solutions 39 2.3. Cell Lines and Culture 40 2.3.1. Thawing Splitting and Feeding Cells 40 2.4. Chromosome Preparations 41 2.4.1. Metaphase Chromosome Preparation for Flow Sorting 41 2.4.2. Metaphase Preparation for FISH 43 2.4.3. Preparation of DNA Fibres for FISH 44 2.5. Fluorescence in situ Hybridization (FISH) 44 2.5.1. Preparation and Labelling of Chromosome Paints 44 2.5.2. BAC clones and their culture 46 2.5.3. Probe DNA Extraction and Labelling 46 2.5.4. Probe Precipitation 47 2.5.5. FISH Hybridization 47 2.5.6. Post Hybridization Washing and Detection 47 2.5.7. Fibre FISH hybridizations and Washes 48 2.5.8. Fibre FISH Detection of indirectly labelled probes 48 2.5.9. Image Acquisition and Processing 49 2.6. PCR and Sequencing 49 2.6.1. Amplification of Sorted Chromosomes for PCR 49 2.6.2. Genomic DNA preparation 50 2.6.3. cDNA Preparation 50 2.6.4. PCR of Fusion Transcripts 50 2.6.5. Sanger Sequencing of Fusion Transcripts 51 2.6.6. Sanger Sequencing of Somatically Mutated Region 52 2.6.7. Sanger Sequencing Across Genomic Breakpoints 53 2.6.8. Pyrosequencing 54 2.6.9. Illumina Sequencing 55 2.6.10. Quantitative PCR 56 2.7. Bioinformatics 56 2.7.1. SNP6 data and Segmentation 56 2.7.2. Break point regions from segmented SNP6 array CGH data 57 2.7.3. Genes at SNP6 break points 57 2.7.4. Ensembl API scripting to predict gene fusions 57 2.7.5. Ensembl API scripting to retrieve structural variant break point regions 57 2.7.6. Circular visualisation of data 57 2.8. Statistical Model 58 2.8.1. Maximum likelihood estimators and confidence intervals 58 2.8.2. Classical approach 58 2.8.3. Finding the MLEs 59 2.8.4. Confidence intervals 59 Chapter 3 The Structure of a Breast Cancer Genome 61 3.1. Introduction 62 3.2. Previous Data 62 3.2.1. Spectral Karyotyping (SKY) 62 3.2.2. Array Painting 62 3.2.3. Massively Parallel Paired End Sequencing 64 3.2.4. Exome-wide Mutation Screen and Targeted Resequencing 64 v 3.3. Analysis Part I. The Genome Structure of HCC1187 65 3.3.1. Combining Array Painting Data with SNP6 array CGH Data 65 3.3.2. Incorporating Massively Parallel Paired End Sequence data 69 3.3.3. Genes at Chromosome Break Points 70 3.3.4. Sub-Microscopic Aberrations From SNP6 and Massively Parallel paired End Sequencing Data 73 3.3.5. Broken and Predicted Fusion Genes in HCC1187 79 3.4. Expressed Fusion Genes in HCC1187 80 3.4.1. PUM1-TRERF1 80 3.4.2. CTCF-SCUBE2 85 3.4.3. RHOJ-SYNE2 88 3.4.4. CTAGE5-SIP1 92 3.4.5. SUSD1-ROD1 95 3.4.6. PLXND1-TMCC1 97 3.4.7. Other reported gene-fusions in HCC1187 100 3.5. Analysis Part II. Sequence-Level Mutations in HCC1187 101 3.5.1. Placing Sequence-Level mutations on the Genomic Map 101 3.5.2. Confirmation by pyrosequencing 102 3.6. Discussion 108 3.6.1. How complete was this analysis? 109 3.6.2. The rearrangements that fused genes 110 3.6.3. Conclusions 110 Chapter 4 The Evolution of a Breast Cancer Genome 111 4.1.

The Structure and Evolution of Breast Cancer Genomes

PARSANA-DISSERTATION-2020.Pdf

Analysis of Trans Esnps Infers Regulatory Network Architecture

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

Alternative Haplotypes of Antigen Processing Genes in Zebrafish Diverged Early in Vertebrate Evolution

Epigenetics Page 1

HSD17B8 (NM 014234) Human Tagged ORF Clone – RG203806

Epigenetic Reprogramming Underlies Efficacy of DNA Demethylation

A Dissertation Entitled the Androgen Receptor

Accumulated Degeneration of Transcriptional Regulation Contributes to Disease Development and Detrimental Clinical Outcomes of Alzheimer’S Disease

In This Table Protein Name, Uniprot Code, Gene Name P-Value

Differential Expression and Co-Expression Gene Networks Reveal Candidate Biomarkers of Boar Taint in Non-Castrated Pigs

High Resolution Physical Map of Porcine Chromosome 7 QTL Region and Comparative Mapping of This Region Among Vertebrate Genomes