bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Merkel cell polyomavirus in Merkel cell carcinoma: Integration sites and involvement of the KMT2D tumor suppressor

Reety Arora*1, Jae Eun Choi2-5,, Paul W. Harms2-6, Pratik Chandrani*,7,8

*These authors contributed equally to this work.

1National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India. 2Department of Pathology, University of Michigan, Ann Arbor, Michigan, USA. 3Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, Michigan 48109, USA. 4Rogel Cancer Center, University of Michigan, Ann Arbor, Michigan 48109, USA. 5School of Medicine, University of California, San Diego 6Department of Dermatology, University of Michigan, Ann Arbor, Michigan, USA. 7Medical Oncology – Molecular Laboratory, Medical Oncology Department, Tata Memorial Hospital, Mumbai, India. 8Centre for Computational Biology, Bioinformatics and Crosstalk Laboratory, ACTREC – Tata Memorial Centre, Navi Mumbai, India.

Corresponding author (Reety Arora) Email: [email protected]

ABSTRACT

Merkel cell carcinoma (MCC) is an uncommon, lethal cancer of the skin caused by either Merkel cell polyomavirus (MCV) or UV-linked mutations. MCV is found integrated into MCC tumor genomes, accompanied by truncation mutations that render the MCV large T replication incompetent. We used the open access HPV Detector/ Cancervirus Detector tool to determine the MCV integration sites in whole exome sequencing data from 5 MCC cases, thereby adding to the limited published MCV integration site junction data. We also systematically reviewed published data on integration for MCV in the , presenting a collation of 123 MCC cases and their linked chromosomal sites. We confirm that there are no highly recurrent specific sites of integration. We found that, 5 is the chromosome most frequently involved by MCV integration and that integration sites are significantly enriched for with binding sites for oncogenic transcription factors such as LEF1 and ZEB1, suggesting the possibility of increased open chromatin in these gene sets. Additionally, in one case we found integration involving the KMT2D for the first time, adding to previous reports of rare MCV integration into host tumor suppressor genes in MCC.

Keywords: Merkel cell polyomavirus, Merkel cell carcinoma, viral integration, KMT2D

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

INTRODUCTION

Merkel cell carcinoma (MCC) is an aggressive (5 year overall survival rate of about 44% and disease-specific mortality of 33-46% 1,2), predominantly arising in the elderly and the immunocompromised, that is associated with Merkel cell polyomavirus (MCV) in the majority of cases2,3. MCV was discovered in 2008 by Feng et al., and was found to be integrated into the MCC tumor genome in a mutated, replication-deficient form3. MCV is a small (5 kb), non- enveloped, circular, dsDNA polyomavirus that is commonly found on human skin2-4. The MCV early region expresses large T antigen (LT) and small T antigen (sT) , that have been shown to drive tumorigenesis and are likely causative for MCC5,6. MCC tumor cells express a truncated form of LT that cannot mediate viral replication, but retains the domain responsible for inhibition of the tumor suppressor Rb2-5. MCC that lack MCV display cellular genomic mutations in tumor suppressor genes, especially TP53 and RB17-9. Additional tumor suppressors such as NOTCH family genes and KMT2D are inactivated at lower rates in MCV- negative MCC2,7-9.

Integration into the human genome is implicated as an early oncogenic event for many DNA tumor viruses10,11. Integration increases the risk of cancer beyond simple viral as it can disrupt and deregulate nearby human cancer-associated genes, contribute to genome instability, and create fusion transcripts11,12. Various studies show an overall bias for integration near open chromatin regions and SINE elements for DNA tumor viruses12. However, interestingly there are differences between integration site patterns related to different types, cancer types and disease stages.

Because MCV infection is near-ubiquitous10, demonstration of viral integration can be helpful in distinguishing tumor-associated viral sequences from spurious detection of background virus. Furthermore, viral integration can interrupt tumor suppressor genes, hence representing a potential mechanism for viral tumorigenesis in addition to viral . Despite the diagnostic and biologic importance of this question, studies investigating MCV integration sites have been relatively few, due to the rarity of MCC and limited high-quality tumor samples. Furthermore, the identification of MCV integration is technically challenging, as the circular genome of the virus does not linearize in a consistent manner. Several approaches have been developed to address this question, often relying upon custom assays or analysis tools (Table 1 and Supplementary Table 1). A sensitive approach for detecting MCV integration sites using commercially available assays and open-access analysis tools has not yet been uniformly validated and adopted.

MCV integration sites have been reported across most , and may occur within genes or in intergenic regions (Table 2). With rare exceptions, the integration site for a particular tumor (and its related metastases) is unique to that case. Despite this variability, bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

several studies have suggested that the distribution of integration sites is non-random, with patterns related to mechanisms of integration8,9. Unlike HIV, MCV does not carry an , and integration into the host genome is not a part of viral life cycle3. Starrett et. al. reported that integration is likely mediated through erroneous DNA repair at sites of microhomology between the host and viral genomes, similar to mechanisms identified for HPV genome integration in squamous cell carcinoma tumors8,9. Similarly, Czech-Sioli et al. recently observed that MCV follows two main types of integration site structure: (1) linear patterns with chromosomal breakpoints related to non-homologous end joining, and (2) complex integration loci dependent on microhomology-mediated break-induced replication13,14. Doolittle-Hall et al., the first meta-analysis done on 37 MCCs for integration analysis, found that MCV integration occurred preferentially near SINEs and BDP1 binding sites12. They also confirmed viral preference for transcriptionally-active gene-dense regions and accessible chromatin. In addition, they observed a tropism toward integration near specific categories of genes: sensory perception and G-protein coupled receptor genes12. MCV integration within cellular genes could also act as a mechanism of tumor suppressor gene inactivation in a subset of tumors, as potentially disruptive integration into genes with tumor suppressor roles (PTPRG, XRCC4) has been described in individual cases. However, recurrent disruption of specific tumor suppressors has not been described3,8,9,12,13,15-21 (Supplementary Table 1 and Table 2).

Although most tumours demonstrate a single unique genomic integration site, rare exceptions have been described. Starett et al. 2020 found that two independent tumours had overlapping (although non-identical) integration sites on chromosome 1, representing a rare example of a recurrent integration site. An additional case had three distinct integration sites on different chromosomes within a single tumour: one full copy of the viral genome, one smaller segment predicted to retain oncogenic activity (containing NCCR and T antigen coding regions), and one smaller segment predicted to lack oncogenic activity9.

Patterns by which the circular MCV viral genome linearizes and integrates have been defined by several studies (Table 1)8,9. For both MCV- and HPV-mediated tumours, integration may be followed by the formation of a transiently circular DNA intermediate containing viral genome and flanking regions of the host genome, that can be further amplified through aberrant firing of the viral origin of replication. This piece then reintegrates into the host genome, appearing as amplified regions of the host genome in a tandem head-to-tail conformation interspersed with the viral genome8,9.

Although the breakpoint in the MCV genome is highly variable, most studies describe that junctions are predominantly located in exon 2 of the LT gene after the pRb binding site (Table 2). This could result in a truncated, replication-deficient LT protein, equivalent to tumor-specific LT truncating mutations. bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

More comprehensive characterization of MCV integration locations--and thereby preferences— will advance understanding of virus-mediated tumorigenesis and may reveal therapeutic vulnerabilities. Our study aimed at reviewing published MCV integration site data, and expanding this repertoire of known MCV integration sites. We then adapted a bioinformatics tool designed for HPV integration site detection (HPV Detector/Cancervirus Detector22) to detect MCV from previously described MCC whole exome sequencing datasets from patient tumour samples classified as MCV-positive by quantitative PCR7. Our results identify additional specific viral and host junction sites, including integration into a tumor suppressor gene frequently mutated in MCV-negative MCC (KMT2D). Furthermore, we demonstrate the successful use of HPV Detector/Cancervirus Detector, a simple open-access analysis tool with potential utility in research and clinical testing, for detection of MCV integration in standard whole-exome sequencing data sets.

RESULTS AND DISCUSSION

Integration site detection from exome sequencing data

Of the 7 MCC whole exome sequencing datasets, we predicted viral-host junctions in 5 (including 1 primary tumor and 4 metastases). Similar to previous studies, we found that the integration site was highly variable, and at either interchromosomal locations or in introns8,12,20. Three of these 5 sites were found in intergenic regions in chromosomes 4, 7 and 16, whereas for two tumors the sites were found in introns of two genes – KMT2D (Intron 34) on chromosome 12, and DCAF7 (Intron 3) on chromosome 17 (Table 1 and Figure 1A). KMT2D is a lysine methyltransferase and tumor suppressor gene. DCAF7 (DDB1 And CUL4 Associated Factor 7) encodes a scaffold protein for kinase signaling. Unlike a previous report12, we did not observe a trend toward involvement of sensory genes.

To determine the integration junction for the viral genome, we mapped the sequences to MCV DysKA isolate23 (GenBank ID # KX781279.1). The sites we found differed from the previously reported pattern in which integration junctions predominantly involved the LT 2nd exon. In our cases, we found two integration sites in the NCCR region, two in the VP1 region and only one in LT 2nd exon (Table 1 and Figure 1B). Intriguingly, the sequence we found for LT at the junction is preceded by a TAA sequence in the MCV genome, indicating that this integration could also be associated with the generation of a premature LT stop codon, similar to findings by Schrama et al15.

By PCR using primers flanking the virus-host junctions, followed by Sanger sequencing of the resulting PCR amplicon, we validated the integration sites predicted for three of these cases (Figure 1 C). The remaining cases lacked adequate remaining DNA for PCR.

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Correlation with previously reported MCV integration sites

We evaluated our 5 integration site results in the context of 91 other previously reported MCV integration sites into the human genome by using Circo Plot24 overlaid with COSMIC gene dataset and fragile sites25. stands out with the highest (21) MCC cases having MCV integrated. There are no integration sites in chromosomes 21, 22 and X. Only five genes of the many associated with MCV integration, namely SF3B1, TLX3, CD74, MYC and KMT2D are cancer-linked genes listed in the COSMIC database.26 Additional genes with potential tumor suppressor function that have been shown to be involved by MCV integration include PTPRG, GRWD1, and XRCC43,19,21. Hence, in a minority of MCV-positive MCC tumors, MCV integration itself may be contributing to oncogenesis in a variable and case-specific manner via disruption of cancer genes. Additionally, although there is some overlay with fragile sites, as previously reported, this effect did not appear significant. Furthermore, the integration junctions in the human genome were observed to be enriched in genes having motifs/binding sites of known oncogenic transcription factors such as LEF1 27, MYB 28 and AREB6 (ZEB1) 29 along with other pathways (Supplementary Table 2); we speculate that this integration pattern might be related to an increased probability of open chromatin for these gene sets.

For the integration junctions in the viral genome (89 total including this study), we observed the previously reported bias to the T antigen region, with 42 cases displaying junctions in LT, and 4 in sT (Table 2).

Limitations

We were limited by amount and quality of available MCC tumor DNA for PCR-Sanger validation, as the majority was used for next generation sequencing. Similarly, previously described inverse PCR approaches for demonstrating viral genome concatemerization15 were not possible due to the limited material. In addition, our approach relies upon identification of off-target capture or shoulder reads of viral sequences within human whole exome sequencing data, thus generating relatively less sequencing depth than a sequencing approach specifically targeting MCV.

Concluding remarks

Our analysis adds to the repertoire of MCV integration junction data and systematically collates current information regarding the same. We showed unique genetic sites of integration for both the virus and host, confirming the highly variable nature of MCV integration. For the first time we demonstrate integration involving the tumor suppressor gene KMT2D, representing the potential inactivation of a tumor suppressor gene also recurrently inactivated in MCV- negative MCC. We also demonstrate that data from tumor whole-exome sequencing data can be effectively interrogated by the simple, open-access Cancervirus Detector tool to bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

demonstrate viral integration sites, thus representing a powerful and accessible new approach for analyzing existing sequencing datasets.

MATERIALS AND METHODS

MCV Integration Analysis

HPV Detector uses computational subtraction approach to identify HPV DNA traces from NGS dataset, as published earlier22. We modified HPV Detector to align NGS data to Human and MCV genome. In downstream analysis, we subtract all the NGS reads mapping to only human genome. The remaining reads are either mapping to MCV genome completely or are split reads with part of it mapping to human genome and another part mapping to MCV genome. We identified all split reads and annotated them using nearby human and MCV genome regions through BEDTools 30. The source code can be accessed via GitHub (https://github.com/pratikchandrani). The sequencing datasets generated previously in Harms et al., 20157 were used for this study. The integration site junctions/genes were further subjected to pathway analysis using GSEA-MSigDB 31 using the default parameters.

PCR

Primers were designed at the flanking regions (one primer in human side and other in viral side) to amplify the integration breakpoints. PCR was performed on genomic DNA from the MCC tumor samples using Platinum Taq DNA Polymerase (catalog # 10966018, Invitrogen), DMSO spike and 20ng of DNA. Water was used as negative control.

MCV integration site meta-analysis

Based on the closest chromosome band for each integration junction on the human genome, the coordinates were taken from UCSC Browser, GRCh38/hg38. The chromosome and start position were then plotted by Circo plot to represent the integration sites. COSMIC v91 data was downloaded from Sanger Institute website (https://cancer.sanger.ac.uk/census)26 and Human Fragile Sites dataset was downloaded from the HumCFS database (https://webs.iiitd.edu.in/raghava/humcfs/)25. The Circoplot was drawn using the Circa Software (OMGenomics http://omgenomics.com/circa/). Of the 123 sites for which chromosome number was available, 96 (including 5 from our study) reported information regarding location on the chromosome. Using chromosome band information, circo plot was drawn for these 96 sites. Dots on the bars represent the location of genes in which integration sites were found.

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

ACKNOWLEDGEMENTS

We would like to thank Prof. Sudhir Krishna and Prof. Amit Dutt for their valuable support and discussions. We thank Pankaj Vats for assistance with data retrieval and transfer. RA would also like to thank Zaina, Ziyah, and Karan for their love, strength, and support.

AUTHOR CONTRIBUTIONS

Conceptualization, R.A.; methodology, R.A. and P.C.; validation, J.C.; formal analysis, R.A. and P.C.; investigation, R.A. and P.C.; resources, RA. And P.C.; writing—original draft preparation, R.A.; writing—review and editing, R.A., P.C., J.C., P.H.; supervision, R.A.; funding acquisition, R.A. All authors have read and agreed to the published version of the manuscript.

FUNDING

This work was supported by the Wellcome Trust/DBT India Alliance (Early Career Award IA/E/14/1/501773 to Dr. Reety Arora).

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

FIGURES

Figure 1. A.) Integration breakpoints for the different MCC cases. Blue is the human sequence detected and red is the MCV sequence. B.) Integration junctions identified for samples from the current study, by sample number and original tumor ID. C.) PCR validation of integration break- points. LT: large T antigen, 2nd exon. NCCR: non-coding control region. ST: small T antigen. T: common T antigen exon. VP: (capsid).

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Figure 2. Circo plot representing 96 integration sites of MCV mapped to different chromosomes. The outermost layers represent the chromosomes (colour coded) and their respective chromosome bands. The grey circle marks the human fragile sites from the HumCFS database and the next concentric circle marks the COSMIC database gene sites. The black bars mark 91 integration locations of MCV in MCC from previous reports. Dots on the bars represent the human genes associated with this integration event, their position on the bar is arbitrary. The red bars and dots are data from 5 MCC cases analyzed using HPV Detector in this study. The Circoplot was drawn using the Circa Software (OMGenomics http://omgenomics.com/circa/).

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Table 1: Summary of Studies on MCV Integration in the Human Genome

Table 2: Summary of Host and Viral Integration Sites

Table 3: Chimeric Sequences and Novel Integration Breakpoints From Current Study

Supplementary Table 1: MCV Integration Site Genes and Coordinates From Current Study and Published Reports

Supplementary Table 2: MCV Integration Site Genes and Their Associated Biological Pathways

REFERENCES

1. Harms, K.L., et al. Analysis of Prognostic Factors from 9387 Merkel Cell Carcinoma Cases Forms the Basis for the New 8th Edition AJCC Staging System. Ann Surg Oncol 23, 3564-3571 (2016). 2. Harms, P.W., et al. The biology and treatment of Merkel cell carcinoma: current understanding and research priorities. Nat Rev Clin Oncol 15, 763-776 (2018). 3. Feng, H., Shuda, M., Chang, Y. & Moore, P.S. Clonal integration of a polyomavirus in human Merkel cell carcinoma. Science 319, 1096-1100 (2008). 4. Arora, R., Chang, Y. & Moore, P.S. MCV and Merkel cell carcinoma: a molecular success story. Curr Opin Virol 2, 489-498 (2012). 5. Shuda, M., et al. T antigen mutations are a human tumor-specific signature for Merkel cell polyomavirus. Proc Natl Acad Sci U S A 105, 16272-16277 (2008). 6. Houben, R., et al. Merkel cell polyomavirus-infected Merkel cell carcinoma cells require expression of viral T . J Virol 84, 7064-7072 (2010). 7. Harms, P.W., et al. The Distinctive Mutational Spectra of Polyomavirus-Negative Merkel Cell Carcinoma. Cancer Res 75, 3720-3727 (2015). 8. Starrett, G.J., et al. Merkel Cell Polyomavirus Exhibits Dominant Control of the Tumor Genome and Transcriptome in Virus-Associated Merkel Cell Carcinoma. mBio 8(2017). 9. Starrett, G.J., et al. Clinical and molecular characterization of virus-positive and virus-negative Merkel cell carcinoma. Genome Med 12, 30 (2020). 10. Moore, P.S. & Chang, Y. Common Commensal Cancer . PLoS Pathog 13, e1006078 (2017). 11. Moore, P.S. & Chang, Y. Why do viruses cause cancer? Highlights of the first century of human tumour virology. Nat Rev Cancer 10, 878-889 (2010). 12. Doolittle-Hall, J.M., Cunningham Glasspoole, D.L., Seaman, W.T. & Webster-Cyriaque, J. Meta- Analysis of DNA Tumor-Viral Integration Site Selection Indicates a Role for Repeats, Gene Expression and Epigenetics. Cancers (Basel) 7, 2217-2235 (2015). 13. Czech-Sioli, M., et al. High-resolution analysis of Merkel Cell Polyomavirus in Merkel Cell Carcinoma reveals distinct integration patterns and suggests NHEJ and MMBIR as underlying mechanisms. bioRxiv, 2020.2004.2023.057703 (2020). 14. Leeman, J.E., et al. Human papillomavirus 16 promotes microhomology-mediated end-joining. Proc Natl Acad Sci U S A 116, 21573-21579 (2019). 15. Schrama, D., et al. Characterization of six Merkel cell polyomavirus-positive Merkel cell carcinoma cell lines: Integration pattern suggest that large T antigen truncating events occur before or during integration. Int J Cancer 145, 1020-1032 (2019). 16. Sastre-Garau, X., et al. Merkel cell carcinoma of the skin: pathological and molecular evidence for a causative role of MCV in oncogenesis. J Pathol 218, 48-56 (2009). bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

17. Laude, H.C., et al. Distinct merkel cell polyomavirus molecular features in tumour and non tumour specimens from patients with merkel cell carcinoma. PLoS Pathog 6, e1001076 (2010). 18. Duncavage, E.J., et al. Hybrid capture and next-generation sequencing identify viral integration sites from formalin-fixed, paraffin-embedded tissue. J Mol Diagn 13, 325-333 (2011). 19. Martel-Jantin, C., et al. Genetic variability and integration of Merkel cell polyomavirus in Merkel cell carcinoma. Virology 426, 134-142 (2012). 20. Slevin, M.K., et al. ViroPanel: Hybrid Capture and Massively Parallel Sequencing for Simultaneous Detection and Profiling of Oncogenic Virus Infection and Tumor Genome. J Mol Diagn 22, 476-487 (2020). 21. Garcia-Mulero, S., et al. Detection of Merkel cell polyomavirus using whole exome sequencing data. bioRxiv, 2020.2004.2027.063214 (2020). 22. Chandrani, P., et al. NGS-based approach to determine the presence of HPV and their sites of integration in human cancer genome. Br J Cancer 112, 1958-1965 (2015). 23. Nguyen, K.D., et al. Human polyomavirus 6 and 7 are associated with pruritic and dyskeratotic dermatoses. J Am Acad Dermatol 76, 932-940 e933 (2017). 24. Krzywinski, M., et al. Circos: an information aesthetic for comparative genomics. Genome Res 19, 1639-1645 (2009). 25. Kumar, R., et al. HumCFS: a database of fragile sites in human chromosomes. BMC Genomics 19, 985 (2019). 26. Sondka, Z., et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat Rev Cancer 18, 696-705 (2018). 27. Santiago, L., Daniels, G., Wang, D., Deng, F.M. & Lee, P. Wnt signaling pathway protein LEF1 in cancer, as a biomarker for prognosis and a target for treatment. Am J Cancer Res 7, 1389-1406 (2017). 28. Liu, X., Xu, Y., Han, L. & Yi, Y. Reassessing the Potential of Myb-targeted Anti-cancer Therapy. J Cancer 9, 1259-1266 (2018). 29. Liu, W., et al. Inhibition of TBK1 attenuates radiation-induced epithelial-mesenchymal transition of A549 human lung cancer cells via activation of GSK-3beta and repression of ZEB1. Lab Invest 94, 362-370 (2014). 30. Quinlan, A.R. BEDTools: The Swiss-Army Tool for Genome Feature Analysis. Curr Protoc Bioinformatics 47, 11 12 11-34 (2014). 31. Liberzon, A., et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 1, 417-425 (2015).

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Table 1. Summary of Studies on MCV Integration in the Human Genome

Number of MCC - Human Genome Integration Junction, MCV Genome Integration Junction Year First Author (Journal) Detection Assay Integration Analysis Tool Integration Sites Detected Chromosome (# of cases) (# of cases)

cDNA pyrosequencing, 3'-RACE, 5'-RACE, Southern 2008 Feng H. (Science) Digital Transcriptome Subtraction 2 chr3 LT (1) Hybridization 2009 Sastre-Garau X. (J Pathol) DIPS-PCR N/A 10 chr2, 3, 4, 5, 6, 8, 12, 20, and Y LT (7), NCCR (2), VP1 (2) 2010 Laude H.C. (PLoS Pathogens) DIPS-PCR N/A 4 chr6, 9, 11, 15 LT (3), VP1 (1) 2011 Duncavage E.J. (J Mol Diagn) MCPyV-specific Hybrid Capture NGS SLOPE and Breakdancer 4 chr6, 8, 9 Not reported 2012 Martel-Jantin C. (Virology) DIPS-PCR N/A 19 chr1 (3),3 (1),4 (3), 5 (4), 6. (2), 14 (1), 18 (1) and 19 (2) LT (13), ST(1), VP1(3), VP2(2) Lambda phage library ,3'RACE, 5'RACE and 2013 Guastafierro A. (J Virol Methods) N/A 4 chr3, 5, 6, 10 LT (3), ST (1) Southern hybirdization Library, 2015 Doolittle-Hall et al., Cancers N/A(Meta-analysis of published reports) Meta-analysis of published reports 37 Multiple, Non-recurrent Not reported 2017 Starrett G.J. (mBio) Whole Genome NGS Custom Pipeline 4 chr1, 6, 19, 20 LT (2), NCCR (2) 2019 Schrama D. et al., (Int J Cancer) DIPS-PCR N/A 5 chrs 1,2,6,11,12 LT (2), VP1 (4), VP2( 3) Hybird-capture and Massively parallel sequencing SvABA (structural variation and 2020 Slevin M.K. (J Mol Diagn) 33 chr1 (5), 2 (2), 3 (1), 5 (7), 6 (4), 7 (2), 8 (2), 9 (2), 10 (1), 11 (4), 15 (1) and 16 (2) LT (5), ST (2), VP1 (3), VP2 (1) and **(1) (Viropanel) insertion/deletion analysis by assembly) 2020 Long-read third generation NGS, Nanochannel and LT (5), NCCR (2), VP1 (8), VP2 (4) and ** Czech-Sioli M. (bioRxiv) Multiple 9 chr3, 4, 5 (4), 9, 11,13,20 (bioRxiv) Nanopore sequencing (1) 2020 Human coding region sequencing NGS (Hybird Garcia-Mulero S. (bioRxiv) No specialized analysis tool 1 chr 19 Not reported (bioRxiv) Capture) Custom pipeline ( tools- suite - Hybird-capture and Massively parallel sequencing 2020 Starrett G.J. (Genome Med) (https://github.com/gstarrett/oncovirus_tool 25 chr1 (4), 2(2), 3(1), 5(4), 6(3), 7(2), 8(2), 9(3), 10(1), 11(1), 15(1), 16(2) and 18(1) Not reported (Viropanel) s)) 2020 Arora R. (current study) whole exome (Hybrid Capture) HPVDetector 5 chr4, 7, 12, 16 and 17 LT, NCCR (2), VP1 (2) ** This was identified in the region between MCV LT and VP1 coding regions in the MCV genome bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Table 2. Summary of Host and Viral Integration Sites

Breakpoints in the Human Genome Chromosome Number of sites Genes Associated 1 14 PLA2G4A,RABGAP1L,MCOLN2,CASZ1 (I#1) 2 6 SF3B1 (I#1) 3 7 PTPRG (I#1), ATP11B,EEFSEC,ADAMTS9 (I) 4 6 SRD5A2L2,KCNIP4,RELL1,GRID2,FRG1(I), 5 21 TLX3,XRCC4,ISL1,PRLR,CAMK2A(I), CD74(I) IL20RA,GMDS,MAP3K7IP2,CDKAL1 (I#4),MRDS1 6 15 (I#6),CDKAL1, RP3-348I23.2 7 5 8 6 MYC 9 8 DENND1A;mir601,INPP5E (I) 10 3 11 8 PARVA;TEAD1,LUZP2 (I#10;Tr#1),TTC9C (E#1), AHNAK (I) 12 3 AX747640,BEST3 (I#4;Tr#1),KMT2D (I#34) 13 1 DACH1 (I) 14 1 NOVA1 15 3 ATPBD4;XP_002343377 16 5 17 1 DCAF7 (*I#3) 18 2 CDH2 19 4 EVI5L,ECH1,GRWD1 20 3 SNTA1,CBFA2T2,BCL2L1 (E ), TPX2 (I) Y 1 Total 123 Breakpoints in theViral Genome Genomic Region Number of sites Gene Large T Antigen 42 LT Small T Antigen 4 ST NCCR 8 Viral Capsid Protein 1 23 VP1 Viral Capsid Protein 2 10 VP2 Between LT and VP1 2 Total 89 In brackets : I stands for Intron, Tr is for Transcript and E for Exon. The number following # is the corresponding number of Intron/Transcript/Exon of gene bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Table 3. Chimeric Sequences and Novel Integration Breakpoints From Current Study

Human Genome MCV Genome Sample Library # Genome Location at Genome Location at Breakpoint Gene Forward Read Sequence Gene Reverse Read Sequence Breakpoint CACTTACTTCACTCGAAACAGTTTACTCCAGCCTAACCAATTCACTCACTGTTCTT Large T AGGGCCTATTAACAGTGGAAAAACAAGCTTTGCTGCAGCCTTAATAGATTTGCTAGAAGGGAAGGCCTT MCC614' 7057 chr12_49429977 KMT2D (*I#34) MCV_DysKA_ 3082 GAAACATCCCATAGGTTTTCCCACCATTGCAG antigen GAATATAAACTGTCCATCTGATAAACTACCT AAGAAGCCAAAATAAATTTTCTCAGATGTCACACTGGGCCATACAT MCC578' 7055 chr17_61656843 DCAF7 (*I#3) CCTAGAACAAGGCTCAGTGGGAAGTAGGCTAGGCGTGGAATGTGGTTATTTAT MCV_DysKA_ 172 NCCR GAGGCCTCGGAGGCT AGGAGCCCCAAGCCTCTGCCAACTTGAAAAAAAAAAGTCACCTAGG TCCTTTGTGTGGGTGAGCTTTCCGTGTGTATAGATCTTGTCTC TAGATCCCTGAGGAATCGCCACACTGTCTTCCACAGTGGTTGAACTAGTATACA MCC661' 9140 chr4_58486053 Intergenic MCV_DysKA_ 2201 VP1 TTATCTTTTCCTTCCATAGGTTGGCCTGACACTTTTGGCATTAAGTTGCTGAAGAGTGAGTTTGTTA GTCT GTTGAAAAGGGCTGAGGAAAGAAAAATAGAGAACAAAAAGGGGGTGGGTCA MCC345 9142 chr7_44780960 Intergenic MCV_DysKA_ 175 NCCR AGCCTCGGAGGCTAGGAGCCCCAAGCCTCTGCCAACTTGAAAAAAAAAAGTCACCTAGGC TTTTAAATTACTT TGGTTGGCACAAGAAACTGCCAAGCCACAGGCCCTGCTCCATACCCTCCTCTCT MCC682' 9144 chr16_85632533 Intergenic CCTACTCCCCCTCAATTATTTTTCTTGGATGTTGTGAATTAAATCTTCCAGTAACC MCV_DysKA_ 1951 VP1 GTCCTTTTAAATGAGAATGGAGTGGGCCCTCTATGCAAAGGAGATGGCCTATTTATTAGCTGTGCAGA GCAGGAGACGTCTTC ' represents Metastasis cases, * I stands for Intron , number following # is the intron number bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Supplementary Table 1. MCV Integration Site Genes and Coordinates From Current Study and Published Reports

Integration sites (Human genome) # of MCC for which Year Publication Method MCV integration site Sample name ChrBand Coordinates Linked Gene integration site detected

Digital Transcriptome Subtraction, 5'RACE, 3' RACE and 2 (matched 2008 Feng H., et al. Science, 2008 3p14.2 chr3:61561985 PTPRG (I#1) 2140 (LT) MCC347,MCC348 Southern Hybridization primary and metastasis)

8q24.21 chr8:130177462 MYC 1532 (3', LT) 1 12q23.1 chr12:97372542 AX747640 5202 (3',NCCR) 2 2q32.3 chr2:196674834 3925 (5',VP1) 3 20q11.21 chr20:31474040 SNTA1 3712 (3', VP1) 4 4q13.1 chr4:64683073 SRD5A2L2 1515 (3',LT) 10 5 2009 Sastre-Garau X. et al. J Pathol, 2009 DIPS-PCR 3q26.33 chr3:183625703 ATP11B 2663 (3', LT) (5 primary and metastasis 6a;6b;6c 5q35.1 chr5:170684993 TLX3 1978 (5',LT) and 5 only primary) 7 3119 (5', NCCR) 8 chr6:137408299- 6q23.3 IL20RA 2240 (3',LT), 3305 (5',LT) 9 137409329 Yq12 chrY:57288464 2980 (3',LT) 10 chr15:33600001- ATPBD4;XP_002343 15q14 2166 (LT) 4 40100000* 377 chr9:117700001- 9q33 DENND1A;mir601 3337 (VP1) 16 130300000* 4 2010 Laude H.C. et al. PLoS Pathogens, 2010 DIPS-PCR chr11:2800001- (primary) 11p15 PARVA;TEAD1 2981 (LT) 20 21700000* chr6:7100001- 6p24 GMDS 2594 (LT) 29 13400000* chr8:65566806- Hybird-capture enrichment optimized for FFPE tissue DNA, 8q12.3 12,23 65568962 4 NGS followed by identification of offtarget coverage and 2011 Duncavage E.J., et al. J Mol Diagn,2011 9q33.1 chr9:121417276 (3 metastasis and 1 27 chimeric viral-human sequences. Followed by use of SLOPE chr6:19684666- matched primary) and BreakDancer Softwares and PCR validation 6p22.3 15 19684859 5q14.2 chr5: 131805 XRCC4 4698 (5',VP2) FraMerk-1 4p15.2 KCNIP4 4162 (3',VP1) FraMerk-2 19p13.2 EVI5L 2049 (5',LT) FraMerk-3 4p14 chr4: 60724 RELL1 2210 (3',LT) FraMerk-4 1q31.1 chr1: 247984 PLA2G4A 2414 (3',LT) FraMerk-6 2173 (3',LT) FraMerk-7 6q24.1 chr6: 8443561 MAP3K7IP2 4626 (3ʹ,VP2) FraMerk-8b 19q13.2 ECH1 2585 (3ʹ,LT) FraMerk-9 5q11.2 chr5: 251753 ISL1 1399 (3ʹ,LT) FraMerk-12 2012 Martel-Jantin et al. Virology, 2012 DIPS-PCR 5q11.2 chr5:538333 ISL1 2597 (3',LT) 19 FraMerk-15 1q25.1 RABGAP1L 2258 (3',LT) FraMerk-16 510 (5',ST) FraMerk-18 3q21.3 EEFSEC 3269 (3',VP1) FraMerk-19 18q12.1 chr18: 237768 CDH2 1187 (3',LT) FraMerk-20 6p22.3 CDKAL1 3527 (5',VP1) FraMerk-22 14q12 chr14: 483848 NOVA1 1461 (3',LT) FraMerk-23 1p22.3 MCOLN2 2091 (3',LT) FraMerk-24c 4q22.2 GRID2 1321 (3',LT) FraMerk-26 5p13.2 PRLR 999 (3',LT) FraMerk-27 5q11.2 chr5:2282589 2843 (LT) MS-1 4 Lambda phage library ,3'RACE, 5'RACE and Southern 6p24.3 chr6:9842760 MRDS1 (I#6) 1836 (LT) MCC345 2013 Guastafierro A., et al., J Virol Methods, 2013 ( 1 cell line, 1 metstasis hybirdization 3q26.33 chr3:88824524 596 (ST) MCC350 and 2 primary) 10q23.1 chr10:36041150 491 (LT) MCC352 chr6:20757000, CDKAL1, RP3- 6p22.3 20,60 (NCCR) 09156-088 chr6:20758430, 348I23.2

chr1:31941056, 1p35.1 chr1:31940936,chr1:31 40 (NCCR) 09156-076 4 2017 Starrett G.J. et al., mBio, 2017 Whole Genome NGS and Custom pipeline 941003, chr1:31904698 (primary) 20q11.21 chr20:32132694 CBFA2T2 LT (Exon2/C terminal) 09156-090 chr3:182180601, 19q13.2 chr19:39307703, ECH1 LT (Exon2/C terminal) 09156-146 chr19:39307378 6p22.2 chr6:20635170 CDKAL1 (I#4) 4078 (VP1), 2067 (LT) WaGa 2q33.1 chr2:197433173 SF3B1 (I#1) 3775 (VP1) LoKe 5 2019 Schrama D. et al., Int J Cancer, 2019 DIPS-PCR 1p36.22 chr1:10790267 CASZ1 (I#1) 5043 (VP2), 1529 (LT) BroLi (cell lines) 12q14.3 chr12:69690384 BEST3 (I#4;Tr#1) 3332 (VP1), 3300 (VP1) AlDo 11p14.3 chr11:25063435 LUZP2 (I#10;Tr#1) 4991 (VP2), 4880 (VP2) PeTa 3q26.33chr3:181965781, chr3:181965770 5166 (LT), 2480 (**) Set 1 (43, 44) 15q21.3chr15:57507659,chr15:57507677 1737 (VP1), 4693 (LT) Set 2 (45, 46, 62, 94) 1p34.2 chr1:41108106 5257 (LT), 1609 (VP1) Set 3 (59, 98, 101) 6p12.2chr6:51146411, chr6:51146420 2176 (VP1), 4978 (ST) Set 4 (70, 96) 7p22.3chr7:1330176, chr7:1330199 5249 (LT), 4842 (ST) Set 5 (80, 95) 5p13.2chr5:34193298, chr5:34349919 4758 (LT), 533 (VP2) Set 6 (72, 97) 11 34 5 37 8 39 5 47 2 48 1 50 5 51 10 53 16 33 unique cases 55 Hybird-capture and Massively parallel sequencing, Viropanel, 5 (Total 42 samples -- Sets 57 2020 Slevin M.K., et al., J Mol Diagn, 2020 SvABA (structural variation and insertion/deletion analysis by 16 include technical repeats 58 assembly) 9 and PDX of matched 63 1 primary) 64 9 65 11 67 8 71 6 73 7 76 1 78 1 86 5 87 11 88 5 89 11 90 6 91 33 unique cases Hybird-capture and Massively parallel sequencing, Viropanel, (Total 42 samples -- Sets 2020 Slevin M.K., et al., J Mol Diagn, 2020 SvABA (structural variation and insertion/deletion analysis by include technical repeats assembly) and PDX of matched primary)

bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

6 99 2 100 chr5:52562625, 159 (5',NCCR), 1498 5q11.2 MKL-1 chr5:52562630 (3',VP1) chr11:62728189, TTC9C (E#1), 647 (5', VP2), 640 (3', 11q12.3 MKL-2 chr11:62505277 AHNAK (I) VP2) chr13: 71546428, 1818 (5', VP1), 1939 (3', 13q21.33 DACH1 (I) WoWe-2 chr13:71480943 VP1) chr9:136432020, 1663 (5', VP1), 1654 (3', 9q34.3 INPP5E (I) UKE-MCC1a chr9:136133905 VP1) chr5:51618447, 738 (5', VP2), 1519 (3', 5q11.2 9 UM-MCC-29 Long-read third generation NGS, Nanochannel and Nanopore chr5:51618453 VP1) 2020 Czech-Sioli M. et al., bioRxiv, 2020 (7 cell lines, 1 Primary and sequencing and Custom pipeline chr20:31785438, 397 (5',NCCR), 4225 1 Metstatsis) chr20:31665275, 20q11.21 BCL2L1 (E ), TPX2 (I) (3',LT), 2261 (5', VP1), UKE-MCC-4a chr20:31715051, 3755 (3',LT) chr20:31749139 chr4:189965715, 4q35.2, 1136 (5', VP2), 3916 chr4:189948666, FRG1(I), CAMK2A(I), 5q32, (3',LT), 1855 (5',VP1), UM-MCC-52 chr5:150404900, CD74(I) 5q33.1 2470 (3',**) chr5:150238240 chr3:64619639, 5290 (5', LT), 5193 (3', 3p14.1 ADAMTS9 (I) MCC-47T, MCC-47M chr3:64619644 LT) Human coding region sequencing NGS (Hybird Capture) and 2020 Garcia-Mulero S. et al. bioRxiv, 2020 19q13.33 chr19:48445990 GRWD1 1 48121209 Custom pipeline

chr3:181965781, 3q27.1 MCC001 chr3:181965770

chr5:20753360, 5p14.3 MCC005 chr5:33939328 chr2:196945370 2q33.1 MCC006 chr2:196945371 chr5:149618981 5q33.1 MCC008 chr5:149709442 chr1:116791739 1p13.1 MCC10 chr1:117025123 chr6:36192882 6p21.2 MCC013 chr6:36282634 chr5:138420218 5q31.2 MCC014 chr5:138511276 chr10:63999700 10q21.3 MCC019 chr10:64000021 chr16:83581326 16q24.1 MCC022 chr16:83890305 chr9:7689383 chr9:77023700; 9q21.2 chr16:47914233 16q12.1 MCC026 chr16:48036152 18p11.32 chr18:1561377, chr18:1668866 chr1:3582621 Hybird-capture and Massively parallel sequencing, Viropanel 1p36.32 MCC027 chr1:4107851 2020 Starrett G.J. et al., Genome Medicine, 2020 and Custom pipeline ( oncovirus tools- suite - 25 chr2:206984157 (https://github.com/gstarrett/oncovirus_tools)) 2q34 MCC029 chr2:206984156 chr15:57507670 15q22.1 MCC036 chr15:57507677 chr8:28408988 8p12 MCC040 chr8:28457320 chr9:111568335 9q32 MCC041 chr9:111579165 chr1:116797448 1p12 MCC042 chr1:116797523 chr9:13451094 9p22.3 MCC043 chr9:13451103 chr11:79113528 11q14.1 MCC044 chr11:79113529 chr6:51146411 6p12.2 MCC050 chr6:51146421 chr8:113896842 8q23.3 MCC052 chr8:114256794 chr5:8556313 5p15.31 chr5:34193826 MCC054 (34349919–34349456) chr6:9659029 6p24.2 MCC056 chr6:9659034 chr7:121478017 7q31.32 MCC062 chr7:121478033 chr1:76825442 1p31.1 MCC069 chr1:76826185 chr7:1330002 7p22.3 MCC071 chr7:1593035 12q13.12 chr12:49429977 KMT2D (*I#34) 3082 (LT) METMCC_614 (7057) 17q23.2 chr17:61656843 DCAF7 (*I#3) 172 (NCCR) METMCC578 (7055) 5. 2020 Arora et al . (current study) Whole Exome (Hybrid Capture) and HPV Detector 4q13.1 chr4:58486053 2201 (VP1) METMCC661 (9140) (1 Primary , 4 Metastasis) 7p12.3 chr7:44780960 175 (NCCR) MCC345 (9142) 16q24.1 chr16:85632533 1951 (VP1) METMCC682 (9144)

Cases/ cell lines tested twice (in different publications) have only been listed once (as per first report) * These coordinates were added by Doolittle-Hall J.M., et al. Cancers (Basel), 2015 , the original publication gave the following information - Distance : Case4- 353 kbp (5ʹ)672 kbp (3ʹ), Case16 - 2nd intron600 bp (3ʹ), Case 20- 8 kbp (5ʹ)228 kbp (3ʹ) and Case 29-7th/10th introns Doolittle-Hall J.M., et al. Cancers (Basel), 2015- compiled all cases from - 2008-2015 Total data of 95 cases has been compiled above I stands for Intron , number following # is the intron number ** This was identified in the region between MCV LT and VP1 coding regions in the MCV genome bioRxiv preprint doi: https://doi.org/10.1101/2020.08.03.234799; this version posted August 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

Supplementary Table 2. MCV Integration Site Genes and Their Associated Biological Pathways Gene Set Name # Genes in Gene Set (K) # Genes in Overlap (k) k/K p-value FDR q-value Description Genes having at least one occurence of the highly conserved motif M17 AACTTT in the region spanning up to 4 kb around their transcription AACTTT_UNKNOWN 1919 15 0.0078 4.72E-09 1.21E-04 start sites. The motif does not match any known binding site (v7.4 TRANSFAC). Genes having at least one occurence of the highly conserved motif M73 CTTTGA sites. The motif matches transcription factor binding site CTTTGA_LEF1_Q2 1237 11 0.0089 2.22E-07 2.86E-03 V$LEF1_Q2 (v7.4 TRANSFAC). Genes having at least one occurence of the highly conserved motif M92 TGCTGAY in the region spanning up to 4 kb around their transcription TGCTGAY_UNKNOWN 540 7 0.013 4.04E-06 3.46E-02 start sites. The motif does not match any known transcription factor binding site (v7.4 TRANSFAC). Genes having at least one occurence of the motif ACTGCCT in their 3' untranslated region. The motif represents putative target (that is, seed ACTGCCT_MIR34B 220 5 0.0227 7.43E-06 3.96E-02 match) of human mature miRNA hsa-miR-34b (v7.1 miRBase). Any process that activates or increases the rate or extent of a molecular function, an elemental biological activity occurring at the molecular GO_POSITIVE_REGULATION_OF_MOLECULAR_FUNCTION 1781 11 0.0062 7.69E-06 3.96E-02 level, such as catalysis or binding. [GO:jl] Genes having at least one occurence of the highly conserved motif M174 WTGAAAT in the region spanning up to 4 kb around their WTGAAAT_UNKNOWN 623 7 0.0112 1.02E-05 3.97E-02 transcription start sites. The motif does not match any known transcription factor binding site (v7.4 TRANSFAC). Any process that modulates the frequency, rate or extent of cell differentiation, the process in which relatively unspecialized cells acquire GO_REGULATION_OF_CELL_DIFFERENTIATION 1881 11 0.0058 1.29E-05 3.97E-02 specialized structural and functional features. [GOC:go_curators] Genes having at least one occurence of the transcription factor binding site V$MYB_Q3 (v7.4 TRANSFAC) in the regions spanning up to 4 kb MYB_Q3 252 5 0.0198 1.43E-05 3.97E-02 around their transcription starting sites. Genes predicted to be targets of miRBase v22 microRNA hsa-miR-3978 in miRDB v6.0 with MirTarget v4 prediction scores > 80 (high MIR3978 254 5 0.0197 1.49E-05 3.97E-02 confidence targets). Genes having at least one occurence of the transcription factor binding site V$AREB6_02 (v7.4 TRANSFAC) in the regions spanning up to 4 kb AREB6_02 256 5 0.0195 1.54E-05 3.97E-02 around their transcription starting sites. Genes predicted to be targets of miRBase v22 microRNA hsa-miR-6124 in miRDB v6.0 with MirTarget v4 prediction scores > 80 (high MIR6124 449 6 0.0134 1.77E-05 4.13E-02 confidence targets). Genes having at least one occurence of the highly conserved motif M13 CTTTGT sites. The motif matches transcription factor binding site CTTTGT_LEF1_Q2 1983 11 0.0055 2.10E-05 4.51E-02 V$LEF1_Q2 (v7.4 TRANSFAC). Genes predicted to be targets of miRBase v22 microRNA hsa-miR-1283 in miRDB v6.0 with MirTarget v4 prediction scores > 80 (high MIR1283 477 6 0.0126 2.48E-05 4.72E-02 confidence targets). RODRIGUES_THYROID_CARCINOMA_ANAPLASTIC_UP 719 7 0.0097 2.57E-05 4.72E-02 Genes up-regulated in anaplastic thyroid carcinoma (ATC) compared to normal thyroid tissue. GSEA-MSigDB analysis parameters:- Genes in universe (N): 38404; genes in comparison (n): 47; genesets in collections: 25724; collections: C1, C2, C3, C4, C5, C6, C7, H