Characterization of Hepatitis B Virus Integrations Identified In
Total Page:16
File Type:pdf, Size:1020Kb
viruses Article Characterization of Hepatitis B Virus Integrations Identified in Hepatocellular Carcinoma Genomes Pranav P. Mathkar 1, Xun Chen 1,2,* , Arvis Sulovari 1,3 and Dawei Li 1,4,* 1 Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA 2 Institute for the Advanced Study of Human Biology, Kyoto University, Kyoto 606-8501, Japan 3 Cajal Neuroscience Inc., Seattle, WA 98102, USA 4 Department of Biomedical Science, Charles E. Schmidt College of Medicine, Florida Atlantic University, Boca Raton, FL 33431, USA * Correspondence: [email protected] (D.L.); [email protected] (X.C.) Abstract: Hepatocellular carcinoma (HCC) is a leading cause of cancer-related mortality. Almost half of HCC cases are associated with hepatitis B virus (HBV) infections, which often lead to HBV sequence integrations in the human genome. Accurate identification of HBV integration sites at a single nucleotide resolution is critical for developing a better understanding of the cancer genome landscape and of the disease itself. Here, we performed further analyses and characterization of HBV integrations identified by our recently reported VIcaller platform in recurrent or known HCC genes (such as TERT, MLL4, and CCNE1) as well as non-recurrent cancer-related genes (such as CSMD2, NKD2, and RHOU). Our pathway enrichment analysis revealed multiple pathways involving the alcohol dehydrogenase 4 gene, such as the metabolism pathways of retinol, tyrosine, and fatty acid. Further analysis of the HBV integration sites revealed distinct patterns involving the integration upper breakpoints, integrated genome lengths, and integration allele fractions between tumor and normal tissues. Our analysis also implies that the VIcaller method has diagnostic potential through Citation: Mathkar, P.P.; Chen, X.; discovering novel clonal integrations in cancer-related genes. In conclusion, although VIcaller is a Sulovari, A.; Li, D. Characterization hypothesis free virome-wide approach, it can still be applied to accurately identify genome-wide of Hepatitis B Virus Integrations integration events of a specific candidate virus and their integration allele fractions. Identified in Hepatocellular Carcinoma Genomes. Viruses 2021, 13, Keywords: viral integration; virome-wide detection; VIcaller; integration allele fraction; hepatocellular 245. https://doi.org/10.3390/ carcinoma (HCC); hepatitis B virus (HBV) v13020245 Academic Editor: Kei Fujiwara Received: 31 December 2020 1. Introduction Accepted: 2 February 2021 Published: 4 February 2021 Hepatocellular carcinoma (HCC), a primary liver malignancy and leading cause of cancer-related deaths, is a major global health concern. The onset of HCC is often preceded Publisher’s Note: MDPI stays neutral by chronic liver conditions (such as hepatitis, cirrhosis or fibrosis [1,2]), which can further with regard to jurisdictional claims in complicate anti-cancer treatment regimens [3]. Extensive research has established that published maps and institutional affil- chronic Hepatitis B virus (HBV) infection is a leading risk factor for the initiation and iations. progression of HCC [4] and may account for approximately half of all HCC cases [5]. With about 400 million global cases of HBV infection [5], HBV-associated HCC remains highly prevalent, particularly in parts of Asia and Sub-Saharan Africa [6]. Despite the strong correlation between HBV infections and HCC onset, the precise genetic and genomic mechanisms underlying this relationship are still poorly understood [5]. Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. Following infection, HBVs commonly integrate their DNA into the human genome. This article is an open access article Such integration events have been identified in 75–90% of HCC tissues [7]. These inte- distributed under the terms and grations may further lead to development of host-virus fusion transcripts, particularly conditions of the Creative Commons if the integrations occur within genic regions. The HBV genome is circular and partially Attribution (CC BY) license (https:// double stranded and is approximately 3200 base pairs (bp) in length. It contains four creativecommons.org/licenses/by/ overlapping open reading frames, namely surface, core, polymerase, and X [8]. The HBV 4.0/). X region, located near the nucleotide 1800 (nt 1374–1838) region on the HBV reference Viruses 2021, 13, 245. https://doi.org/10.3390/v13020245 https://www.mdpi.com/journal/viruses Viruses 2021, 13, 245 2 of 16 genome [9], is often retained following integration events and is selectively over-expressed in HCC [10,11]; thus, it may contribute to the high rate of metastasis in HBV-associated HCC patients [12]. Systematic research on the identification and characterization of the patterns of integration breakpoints on both the human and HBV genomes is critical for understanding the mechanistic processes of HBV integrations. We recently found that the cellular proportion of each integration event, which can be estimated based on its integration allele fraction, serves as important evidence to identify integration events involved in early-stage tumorigenesis [13]. Such analyses may provide new opportunities to understand the genomic landscape of HBV integrations and their involvement in the early stages of HCC development, which may help identify new diagnostic biomarkers and potential targets for therapeutic intervention. An increasing number of studies have focused on the detection of infectious pathogens using cancer high-throughput sequencing (HTS) data [14]. HTS reads uniquely mapped to a viral reference genome are strong evidence of the presence of the virus in a sample; however, this analysis is compounded by two major practical challenges. First, the detected virus-mappable reads may have been derived from environmental microbes, synthetic DNA [15], or commonly-used cell lines such as HeLa cells [16]. Second, the presence of virus-mappable reads alone may not be conclusive for its role in tumorigenesis since viral infection may have occurred after oncogenesis [16]. To overcome these challenges, we recently developed a novel strategy and bioinformatics platform, viral integration caller (VIcaller), for identifying clonal viral integrations in the human genome and providing estimated integration allele frequencies [13]. We recently reported our preliminary findings on the analysis of whole genome sequencing (WGS) data from 88 HBV-associated HCC patients [9] using the VIcaller approach, and demonstrated high sensitivity and precision for the identification of integration events [13]. In this study, we report our in-depth analyses of these HBV integration events detected by VIcaller, as well as the characterization and pathway enrichment analyses of these integration events. 2. Methods 2.1. Detection of HBV Integrations Using VIcaller A total of 88 HCC samples with both tumor and paired normal tissues were previously analyzed using the VIcaller platform [13]. In brief, we first submitted the paired-end reads in FASTQ format to VIcaller’s “detect” function to screen for integrations of the candidate virus, i.e., HBV (NC_003977.2), with the parameters: “-d WGS -m standard -r -a -q 20”. We then validated all identified candidate HBV integrations using the VIcaller “validate” function with the default parameters. Only the successfully validated HBV integration candidates were included for further analyses. Lastly, we calculated the integration allele fraction of each detected HBV integration event using the VIcaller “calculate” function with default parameters. To identify novel HBV integration events in the 88 HCC sam- ples, we subsequently compared our detected HBV integrations with those presented by Sung et al. [9]. The resulting HBV integrations were then classified as “consistent”, if detected both by VIcaller and Sung et al. “novel”, if detected by VIcaller but not by Sung et al., and “missing” if detected by Sung et al. but not VIcaller [13]. 2.2. Repeat Sequence Comparative Analysis The repeat sequence annotation file generated by RepeatMasker [17] was downloaded from the UCSC Genome Browser (build hg19). The repeat density was examined using the BEDtools2 window function [18], and we determined the proportion of repeat sequences covering a 2000 bp window centered at each HBV integration site. We compared the repeat densities of novel, missing, and consistent HBV integrations with random positions across the human genome using a student’s t-test. Viruses 2021, 13, 245 3 of 16 2.3. Upper and Lower Breakpoint Analyses The genomic coordinates of the upper and lower breakpoints of each integration on the HBV reference genome (NC_003977.2) were analyzed separately. The HBV upper breakpoint is the genomic coordinate of the sequence junction connected upstream (50) orientation on the human sequence, while the lower breakpoint is the other HBV sequence junction connected downstream (30) on the human sequence. When the breakpoint locations were detected from chimeric reads only (without split reads; chimeric and split reads were depicted in Supplementary Figure S1 of our previous paper [19]), the nearest location to the breakpoint was adopted. HBV integrations with one breakpoint and those with two breakpoints were analyzed separately. The lengths of inserted HBV genomes or sequences were estimated based on the HBV integration events with both upper and lower breakpoints detected. 2.4. Gene Ontology and Pathway Enrichment Analyses of Novel HBV Integrations We first converted the