Recurrent Integration of Human Papillomavirus Genomes at Transcriptional
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2021.08.30.457540; this version posted August 31, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. 1 Recurrent Integration of Human Papillomavirus Genomes at Transcriptional 2 Regulatory Hubs 3 Alix Warburton1, Tovah E. Markowitz2-3, Joshua P. Katz4, James M. Pipas4 and Alison 4 A. McBride1 * 5 6 1Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, 33 North Drive, 7 MSC3209, National Institutes of Health, Bethesda, Maryland 20892, USA 8 2NIAID Collaborative Bioinformatics Resource (NCBR), National Institute of Allergy and Infectious 9 Diseases, National Institutes of Health, Bethesda, MD, USA 10 3Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, 11 Frederick, MD, USA 12 4Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, USA 13 14 *Corresponding author 15 E-mail: [email protected] 16 Short Title: Integration of HPV genomes in human cancers 17 18 19 20 21 22 1 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.30.457540; this version posted August 31, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. 23 ABSTRACT 24 Oncogenic human papillomavirus (HPV) genomes are often integrated into host chromosomes in HPV- 25 associated cancers. HPV genomes are integrated either as a single copy, or as tandem repeats of viral 26 DNA interspersed with, or without, host DNA. Integration occurs frequently in common fragile sites 27 susceptible to tandem repeat formation, and the flanking or interspersed host DNA often contains 28 transcriptional enhancer elements. When co-amplified with the viral genome, these enhancers can form 29 super-enhancer-like elements that drive high viral oncogene expression. Here, we compiled highly 30 curated datasets of HPV integration sites in cervical (CESC) and head and neck squamous cell carcinoma 31 (HNSCC) cancers and assessed the number of breakpoints, viral transcriptional activity, and host genome 32 copy number at each insertion site. Tumors frequently contained multiple distinct HPV integration sites, 33 but often only one “driver” site that expressed viral RNA. Since common fragile sites and active 34 enhancer elements are cell-type specific, we mapped these regions in cervical cell lines using FANCD2 35 and Brd4/H3K27ac ChIP-seq, respectively. Large enhancer clusters, or super-enhancers, were also 36 defined using the Brd4/H3K27ac ChIP-seq dataset. HPV integration breakpoints were enriched at both 37 FANCD2-associated fragile sites, and enhancer-rich regions, and frequently showed adjacent focal DNA 38 amplification in CESC samples. We identified recurrent integration “hotspots” that were enriched for 39 super-enhancers, some of which function as regulatory hubs for cell-identity genes. We propose that 40 during persistent infection, extrachromosomal HPV minichromosomes associate with these 41 transcriptional epicenters, and accidental integration could promote viral oncogene expression and 42 carcinogenesis. 2 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.30.457540; this version posted August 31, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. 43 INTRODUCTION 44 Persistent infection with high-oncogenic risk HPV types is responsible for almost all cervical and 45 ~70% oropharyngeal carcinomas 1. One factor that can contribute to oncogenic progression of HPV- 46 positive lesions is integration of the viral genome into host chromatin. Integration is associated with 47 increased genetic instability in high-grade cervical intraepithelial neoplasia (CIN) and cervical and 48 oropharyngeal carcinomas 2-9 due to dysregulated expression of the viral oncoproteins, E6 and E7. Many 49 studies have compared the human genomic regions associated with HPV integration sites to elucidate 50 the mechanisms that might promote integration and carcinogenesis. Here, we have curated datasets 51 from The Cancer Genome Atlas and other published sources to define a rigorous database of integration 52 breakpoints and correlated these with “in-house” datasets of common fragile sites and enhancer 53 elements defined in cervical carcinoma cells. 54 Integration of HPV DNA occurs in all human chromosomes; however, integration sites are often 55 found within or in close proximity to common fragile sites 10-13. Common fragile sites are regions of the 56 genome that have difficulty completing replication and, as such, are susceptible to chromosome 57 breakage in mitosis. They are prone to replication stress that can be due to a shortage of replication 58 origins or clashes between replication and transcriptional processes. Therefore, they vary in fragility 59 depending on cell type and disease state 14. Most previous studies that documented an association of 60 HPV integration sites with common fragile sites used the classical FRA-regions that were defined 61 cytogenetically in lymphocytes 9-13,15. As such, these fragile sites are not cell-type specific and are often 62 large, poorly defined regions that cover a large proportion of the human genome. FANCD2 is required 63 for resolution of these genetically unstable sites and, as such, is a marker of common fragile sites 16-18. 64 HPV integration sites occur frequently in amplified regions of the host genome, and focal 65 amplification of cellular flanking sequences at sites of viral integration are frequently observed in HPV- 66 positive tumors 6,7,9,15. Co-amplification of the viral genome and flanking cellular sequences can result 3 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.30.457540; this version posted August 31, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. 67 from unlicensed initiation of replication at the viral origin resulting in endoreduplication 19-21. 68 Subsequent recombination can result in amplified tandem repeats. Genome amplification can also occur 69 at common fragile sites by breakage-fusion-bridge cycles 22. 70 HPV integration is also enriched at transcriptionally active regions of the host genome 15,23. We 71 previously identified an HPV16 integration site in the W12 20861 cervical cell line that was adjacent to a 72 cell-type specific enhancer. Co-amplification of this regulatory element and the viral genome to 73 approximately 25 copies resulted in the formation of a super-enhancer-like element to drive high viral 74 oncogene expression 24,25. This ‘enhancer-hijacking’ is a novel mechanism by which HPV integration can 75 promote oncogenesis; however, it is unclear how common this mechanism is in HPV-associated cancers. 76 The aim of this study was to examine the association among HPV integration loci, common 77 fragile sites, and genome amplification to determine whether insertion of HPV genomes adjacent to 78 active cellular enhancers often resulted in viral ‘enhancer-hijacking’, and whether genetic instability 79 could result in co-amplification of viral-cellular regulatory repeats to drive oncogenic progression of 80 HPV-associated cancers. We have extended our previously published work 26 to generate a common 81 fragile site dataset in cervical carcinoma cell lines using higher resolution mapping of FANCD2 binding 82 and have mapped cellular enhancers and super-enhancers in an HPV16-positive cell line derived from a 83 cervical lesion27 using H3K27ac and Brd4 ChIP-seq (Figure 1). Here we compare HPV integration sites 84 with these cervical cell-type specific enhancer and fragile site datasets. 4 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.30.457540; this version posted August 31, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. 85 RESULTS 86 CESC and HNSCC tumors frequently contain multiple, clustered HPV integration breakpoints 87 A dataset of HPV integration breakpoints was assembled from various sources 5,6,8,9,28-35, as 88 outlined in Figure 1 and Supplementary Table S1. Integration breakpoints were defined as the junctions 89 between the viral and host chimeric reads within the human reference genome. A total of 1,299 90 integration breakpoints from 333 cervical carcinomas (CESC) and 119 integration breakpoints from 41 91 head and neck squamous cell carcinomas (HNSCC) were included in this study (Supplementary Figure S1 92 and Supplementary Table S2-S3). We found that many tumor samples contained multiple integration 93 breakpoints that could have resulted from either independent integration events at different 94 chromosomal loci or from amplification of a single integration site resulting in a cluster of multiple, 95 closely spaced breakpoints. To classify this, we defined an integration locus as either a single HPV 96 insertion breakpoint, or as multiple, closely spaced breakpoints (a cluster), Figure 2a. Clustered 97 breakpoints within the same chromosome that had a maximum distance of 3 Mb between the most 5’ 98 and 3’ breakpoints were classified as a single integration locus. Based on this classification, the total 99 number of integration loci analyzed in our study was 584 for CESC samples and 58 for HNSCC samples.