Discovery of Unfixed Endogenous Retrovirus Insertions in Diverse

Discovery of Unfixed Endogenous Retrovirus Insertions in Diverse

Discovery of unfixed endogenous retrovirus insertions in PNAS PLUS diverse human populations Julia Halo Wildschuttea,1, Zachary H. Williamsb,1, Meagan Montesionb, Ravi P. Subramanianb, Jeffrey M. Kidda,c, and John M. Coffinb,2 aDepartment of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109; bDepartment of Molecular Biology and Microbiology, Tufts University School of Medicine, Boston, MA 02111; and cDepartment of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109 Contributed by John M. Coffin, February 11, 2016 (sent for review November 25, 2015; reviewed by Norbert Bannert, Robert Belshaw, and Jack Lenz) Endogenous retroviruses (ERVs) have contributed to more than ongoing exogenous replication, and retain one or more ORFs 8% of the human genome. The majority of these elements lack (8, 13–15). HML-2 expression has been observed in tumor- function due to accumulated mutations or internal recombination derived tissues as well as normal placenta in the form of RNAs, resulting in a solitary (solo) LTR, although members of one group proteins, and noninfectious retrovirus-like particles (3, 16–19). of human ERVs (HERVs), HERV-K, were recently active with members These unique properties raise the possibility that some HML-2 that remain nearly intact, a subset of which is present as insertionally group members are still capable of replication by exogenous polymorphic loci that include approximately full-length (2-LTR) and transmission from rare intact proviruses, from the generation of solo-LTR alleles in addition to the unoccupied site. Several 2-LTR infectious recombinants via copackaged viral RNAs, or from insertions have intact reading frames in some or all genes that are rare viruses still in circulation in some populations. A naturally expressed as functional proteins. These properties reflect the activity occurring infectious provirus has yet to be observed, although of HERV-K and suggest the existence of additional unique loci within the well-studied “K113” provirus, which is not in the GRCh37 humans. We sought to determine the extent to which other poly- (hg19) reference genome but maps to chr19:21,841,544, has morphic insertions are present in humans, using sequenced genomes intact ORFs (9) and engineered recombinant HML-2 provi- from the 1000 Genomes Project and a subset of the Human Genome ruses are infectious in cell types, including human cells (20, 21). Diversity Project panel. We report analysis of a total of 36 non- The goal of this study was to enhance our understanding of MICROBIOLOGY reference polymorphic HERV-K proviruses, including 19 newly report- such elements by identifying and characterizing additional ed loci, with insertion frequencies ranging from <0.0005 to >0.75 polymorphic HML-2 insertions in the population. that varied by population. Targeted screening of individual loci iden- The wealth of available human whole-genome sequence tified three new unfixed 2-LTR proviruses within our set, including an (WGS) data should, in principle, provide the information needed intact provirus present at Xq21.33 in some individuals, with the po- to identify transposable elements (TEs), including proviruses, in tential for retained infectivity. the sequenced population. However, algorithms for routine analysis of short-read (e.g., Illumina) paired-end sequence data exclude HERV-K | HML-2 | human endogenous retrovirus | reads that do not match the reference genome. Based on read 1000 Genomes Project | Human Genome Diversity Project Significance uring a retrovirus infection, a DNA copy of the viral RNA Dgenome is permanently integrated into the nuclear DNA of The human endogenous retrovirus (HERV) group HERV-K con- the host cell as a provirus. The provirus is flanked by short target tains nearly intact and insertionally polymorphic integrations site duplications (TSDs), and consists of an internal region among humans, many of which code for viral proteins. Ex- encoding the genes for replication that is flanked by identical pression of such HERV-K proviruses occurs in tissues associated LTRs. Infection of cells contributing to the germ line may result with cancers and autoimmune diseases, and in HIV-infected in a provirus that is transmitted to progeny as an endogenous individuals, suggesting possible pathogenic effects. Proper retrovirus (ERV), and may reach population fixation (1). In- characterization of these elements necessitates the discrimi- deed, more than 8% of the human genome is recognizably of nation of individual HERV-K loci; such studies are hampered by our retroviral origin (2). The majority of human ERVs (HERVs) incomplete catalog of HERV-K insertions, motivating the identifi- represent ancient events and lack function due to accumulated cation of additional HERV-K copies in humans. By examining mutations or deletions, or from recombination leading to the >2,500 sequenced genomes, we have discovered 19 previously formation of a solitary (solo) LTR; however, several HERVs unidentified HERV-K insertions, including an intact provirus with- have been coopted for physiological functions to the host (3). – out apparent substitutions that would alter viral function, only the The HERV-K (HML-2) proviruses (4 9), so-named for their second such provirus described. Our results provide a basis for use of a Lys tRNA primer and similarity to the mouse mammary future studies of HERV evolution and implication for disease. tumor virus (human MMTV like) (10), represent an exception to the antiquity of most HERVs. HML-2 has contributed to at least Author contributions: J.H.W., Z.H.W., J.M.K., and J.M.C. designed research; J.H.W., Z.H.W., 120 human-specific insertions, and population-based surveys in- M.M., and R.P.S. performed research; J.H.W., Z.H.W., M.M., and R.P.S. contributed new dicate as many as 15 unfixed sites, including 11 loci with more or reagents/analytic tools; J.H.W., Z.H.W., R.P.S., J.M.K., and J.M.C. analyzed data; and J.H.W., Z.H.W., J.M.K., and J.M.C. wrote the paper. less full-length proviruses (5, 6, 8, 9). To distinguish the latter Reviewers: N.B., Robert Koch Institute; R.B., University of Plymouth; and J.L., Albert Einstein from recombinant solo-LTRs, we refer to these elements as Medical School. “ ” 2-LTR insertions throughout this study. The majority of these The authors declare no conflict of interest. ∼ insertions are estimated to have occurred within the past 2 My, Data deposition: The sequences reported in this paper have been deposited in the the youngest after the appearance of anatomically modern hu- GenBank database (accession nos. KU054242–KU054309). mans (4, 8, 11). Population modeling has implied a relatively 1J.H.W. and Z.H.W. contributed equally to this work. constant rate of HML-2 accumulation since the Homo-Pan di- 2To whom correspondence should be addressed. Email: [email protected]. vergence (5, 12, 13). All known insertionally polymorphic HML- This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 2 proviruses have signatures of purifying selection, implying 1073/pnas.1602336113/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1602336113 PNAS Early Edition | 1of9 Downloaded by guest on October 2, 2021 signatures stemming from such read pairs, specialized algorithms Analysis of Unmapped Reads for LTR Junction Discovery. Unmapped reads have been developed to detect TEs present within sequenced whole were retrieved from BAM files with Samtools (samtools.sourceforge.net/) ≥ genomes. These methods seek to identify read pairs for which one from all 53 HGDP samples and 825 1KGP samples ( 10 samples per 1KGP ′ read is mapped to a reference genome and the mate is aligned to population) and searched for a sequence that matched the 5 HML-2 LTR edge (TGTGGGGAAAAGCAAGAGA), 3′ LTR edge (GGGGCAACCCACCCA- the TE of interest (22). Additional criteria (e.g., read support, TACA), or 3′ LTR variant (GGGGCAACCCACCCATTCA) that is observed in a depth, presence of reads that cross the insertion junction) are then subset of human-specific elements, requiring ≥10 bp of non-LTR sequence assessed to identify a confident call set. Recent applications of this per read. Reads matching reference HML-2 junctions were removed. Can- general method to Illumina WGS data have indicated the presence didate reads were then aligned to the hg19 reference to identify genomic of additional nonreference HML-2 insertions (12, 23), although position. Sequences with no match to hg19, with <90% identity, or that validation and further characterization of these sites have been aligned to gaps or multiple genomic positions were searched against the limited. Also, given the comparably short fragment lengths of typ- chimpanzee (panTro4) and gorilla (gorGor3) references, and available hu- ical Illumina libraries, it is not possible to distinguish between solo- man WGS data from the NCBI Trace Archive to identify insertions in struc- turally variable regions. LTR insertions and the presence of a 2-LTR provirus using these data alone, and experimentation is required to exclude sequencing Validation and Sequencing. DNA from samples yielding positive reads was artifacts. obtained from Coriell or the Foundation Jean Dausset-Centre d’Étude du To date, the number of human genomes analyzed for unfixed Polymorphisme Humain. Coordinates for each insertion were based on HML-2 proviruses is fairly small, limiting discovery of elements mapping of assembled contigs or read-captured flanking sequence to the not present in the human reference genome, or “nonreference” hg19 reference. PCR was performed with 100 ng of genomic DNA using elements, to those elements that are present in a relatively high primers flanking each site to detect either the empty site or solo-LTR alleles. proportion of individuals. Here, we build on existing detection A separate PCR was run to infer a 2-LTR allele with a primer situated in the ′ methods to improve the efficiency of nonreference HML-2 HML-2 5 UTR paired with a flanking primer (6, 8, 33).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us