HEMO, an Ancestral Endogenous Retroviral Envelope Protein Shed in the Blood of Pregnant Women and Expressed in Pluripotent Stem Cells and Tumors
Total Page:16
File Type:pdf, Size:1020Kb
HEMO, an ancestral endogenous retroviral envelope protein shed in the blood of pregnant women and expressed in pluripotent stem cells and tumors Odile Heidmanna,b,1, Anthony Béguina,b, Janio Paterninaa,b, Raphaël Berthiera,b, Marc Delogerc, Olivia Bawad, and Thierry Heidmanna,b aUnité Physiologie et Pathologie Moléculaires des Rétrovirus Endogènes et Infectieux, CNRS UMR 9196, Gustave Roussy, Villejuif, F-94805, France; bUMR9196, Université Paris-Sud, Orsay, F-91405, France; cPlateforme de Bioinformatique, INSERM US23/CNRS UMS3655, Gustave Roussy, Villejuif, F-94805, France; and dPlateforme d’Évaluation Préclinique, Laboratoire de Pathologie Expérimentale, Gustave Roussy, Villejuif, F-94805, France Edited by John M. Coffin, Tufts University School of Medicine, Boston, MA, and approved June 6, 2017 (received for review February 17, 2017) Capture of retroviral envelope genes is likely to have played a role identified in all placental mammals where they have been in the emergence of placental mammals, with evidence for multiple, searched for, and their unambiguous role in placentation was reiterated, and independent capture events occurring in mammals, shown via the generation and characterization of KO mice (10, and be responsible for the diversity of present day placental struc- 11). Syncytins are also present in marsupials, where they are tures. Here, we uncover a full-length endogenous retrovirus envelope expressed in a short-lived placenta that is very transiently protein, dubbed HEMO [human endogenous MER34 (medium- formed (a few days) before the embryo pursues its development reiteration-frequency-family-34) ORF], with unprecedented char- in an external pouch (12). acteristics, because it is actively shed in the blood circulation in Previous systematic searches for genes encoding endogenous humans via specific cleavage of the precursor envelope protein retroviral Env proteins within the human genome have led to the upstream of the transmembrane domain. At variance with previously identification of 18 genes with a full-length coding sequence (among identified retroviral envelope genes, its encoding gene is found to be which are syncytin-1 and -2) (13, 14). These analyses have been transcribed from a unique CpG-rich promoter not related to a performed using methods based on the search for characteristic retroviral LTR, with sites of expression including the placenta as motifs carried by retroviral Envs (Fig. 1), which include, from the well as other tissues and rather unexpectedly, stem cells as N terminus to the C terminus, a signal peptide; a furin cleavage site well as reprogrammed induced pluripotent stem cells (iPSCs), (R-X-R/K-R) between the surface (SU) and transmembrane (TM) where the protein can also be detected. We provide evidence subunits, with the latter carrying additional signatures including an that the associated retroviral capture event most probably immunosuppressive domain (ISD; 17-aa motif), which is also found occurred >100 Mya before the split of Laurasiatheria and Euarch- in most oncoretroviruses; a characteristic C-(X)5–7-C motif; and a ontoglires, with the identified retroviral envelope gene encoding a transmembrane hydrophobic domain anchoring the Env protein in full-length protein in all simians under purifying selection and with the cell or virion membrane (4, 15). similar shedding capacity. Finally, a comprehensive screen of Less stringent methods based on BLAST searches using large the expression of the gene discloses high transcript levels in panels of retroviral Env proteins, including the increasing number of several tumor tissues, such as germ cell, breast, and ovarian tu- newly identified ERV genes from other animals, led us to identify a mors, with in the latter case, evidence for a histotype depen- gene encoding a full-length retroviral Env protein with unprecedented characteristics. This Env protein gene—dubbed HEMO [human endog- dence and specific protein expression in clear-cell carcinoma. — Altogether, the identified protein could constitute a “stemness enous MER34 (medium-reiteration-frequency-family-34) ORF] is marker” of the normal cell and a possible target for immunother- apeutic approaches in tumors. Significance HERV | endogenous retrovirus | envelope protein | placenta | Endogenization of retroviruses has occurred multiple times in the development | stem cells | tumors course of vertebrate evolution, with the captured retroviral enve- lope syncytins playing a role in placentation in mammals, including ndogenous retroviral sequences represent ∼8% of the human marsupials. Here, we identify an endogenous retroviral envelope Egenome. These sequences [called human endogenous retro- protein with unprecedented properties, including a specific cleav- viruses (HERVs)] share strong similarities with present day retro- ageprocessresultingintheshedding of its extracellular moiety in viruses and are the proviral remnants of ancestral germ-line infections the human blood circulation. This protein is conserved in all — — by active retroviruses, which have thereafter been transmitted in a simians with a homologous protein found in marsupials with a “ ” Mendelian manner (1–3). The >30,000 proviral copies found in the stemness expression in embryonic and reprogrammed stem cells, human genome can be grouped into about 80 distinct families, with as well as in the placenta and some human tumors, especially — most of these elements being nonprotein-coding because of the ac- ovarian tumors. This protein could constitute a versatile marker — cumulation of mutations, insertions, deletions, and/or truncations (4, andpossiblyaneffector of specific cellular states and being shed, 5). However, some retroviral genes have retained a coding capacity, be immunodetected in the blood. andsomeofthemhaveevenbeendivertedbyremoteprimatean- “ ” Author contributions: O.H. and T.H. designed research; O.H., A.B., J.P., R.B., M.D., and O.B. cestors for a physiological role. The so-called syncytins, namely performed research; O.H., A.B., J.P., M.D., and T.H. analyzed data; and O.H., A.B., and T.H. syncytin-1 and -2 in humans, are retroviral envelope (env) genes wrote the paper. captured 25 and 40 Mya, respectively, with a full-length protein- The authors declare no conflict of interest. coding sequence, a fusogenic activity, and strong placental expres- This article is a PNAS Direct Submission. – sion (6 9). These genes have been shown to be involved in placenta Data deposition: The sequences reported in this paper have been deposited in the Gen- formation, with their fusogenic activity contributing to the formation Bank database (accession nos. MF320351–MF320355). of the syncytiotrophoblast (ST) at the maternofetal interface as a 1To whom correspondence should be addressed. Email: [email protected]. – result of the syncytin-mediated cell cell fusion of the underlying This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. mononucleated cytotrophoblasts (CTs). Syncytins were, thereafter, 1073/pnas.1702204114/-/DCSupplemental. E6642–E6651 | PNAS | Published online July 24, 2017 www.pnas.org/cgi/doi/10.1073/pnas.1702204114 Downloaded by guest on September 27, 2021 PNAS PLUS ACSU SU TM CC 1 MGSLSNYALL QLTLTAFLTI LVQPQHLLAP VFRTLSILTN QSNCWLCCWLCEHL SU-TM cleavage site 51 DNAEQPELVF VPASASTWWT YSGQWMYERV WYPQAEVQNH STSSYRKVTW Extra- TM 101 HWEASMEAQG LSFAQVRLLE GNFSLCVENK NGSGPFLGNI PKQYCNQILW cellular CXXC RXKR C-CC 151 FDSTDGTFMP SIDVTNESRN DDDDTSVCLG TRQCSWFAGC TNRTWNSSAV Membrane N-ter C-ter 201 PLIGLPNTQD YKWVDRNSGL TWSGNDTCLY SCQNQTKGLL YQLFRNLFCS Intra- signal fusion ISD trans- 251 YGLTEAHGKW RCADASITND KGHDGHRTPT WWLTGSNLTL SVNNSGLFFL cellular peptide peptide membrane 301 CGNGVYKGFP PKWSGRCGLG YLVPSLTRYL TLNASQITNL RSFIHKVTPH CWLC CTQG C-CC 351 RCTQGDTDNP PLYCNPKDNS TIRALFPSLG TYDLEKAILN ISKAMEQEFS B 4 401 ATKQTLEAHQ SKVSSLASAS RKDHVLDIPT TQRQTACGTV GKQCCLYINY 0 451 SEEIKSNIQR LHEASENLKN VPLLDWQGIF AKVGDWFRSW GYVLLIVLFC -3 1 100 200 300 400 500 563 501 LFIFVLIYVR VFRKSRRSLN SQPLNLALSP QQSAQLLVSE TSCQVSNRAM signal ISD trans- 551 KGLTTHQYDT SLL peptide membrane mIAPE gag pro pol env D 92 Cav-env1 E MER34-int HERV-R-erv3 100 Syn-Mar1 ALV 52,752,000 52,743,000 100 Syn-Rum1 HERV-Rb Chr 4 5’ 3’ 98 Syn-Car1 3’ 100 Syn-Ten1 HEMO 5’ HERV-V2 100 HEMO CpG Island Env-panMars MER34-pol HERV-Pb MER34-env +1 100 HERV-H1 ...TC ACTTC... ATG Stop HERV-H2 MER34-A 100 HERV-H3 MER74-B ORF HERV-FRD SINE E1 E2 E3 E4 100 SynB-mus LINE SynA-mus ...CAGGTATG... ...CTTTTCAACCAGGTG... 100 HERV-Fc1 HERV-Fc2 qPCR primer AAAAA HERV-W 100 RD114 85 BaEV MPMV F Syn-Ory1 100 ReVA 140 55 120 100 98 HERV-T Syn-Opo1 100 80 100 PERV-A 100 FeLV 80 60 76 60 MoMLV 60 94 mGLN 40 56 40 51 KoRV 20 100 GaLV 20 HTLV-2 Transcript level (% max) 0 0 62 99 BLV HIV1 PBLSkin FIV BrainHeart Liver Lung Testis Colon Ovary KidneyBreast rachea Bewo JAR 293T HeLa TE671 HuH7 HERV-K Spleen ThymusThyroidT JEG-3 NT2D1 NCCIT PancreasPlacenta Prostate ESC-H1 ESC-H7 ESC-H9 2102Ep CaCo-2 JSRV Placenta SH-SY5H 96 iPSC NP24 MMTV Bone marrow BIV 0.6 Adrenal gland Fig. 1. Characterization of the human HEMO Env retroviral protein and the HEMO env gene. (A) Schematic representation of a canonical retroviral Env protein delineating the SU and TM subunits. The furin cleavage site (consensus: R-X-R/K-R) between the two subunits, the C-X-X-C motif involved in SU-TM interaction, the hydrophobic signal peptide (purple), the fusion peptide (green), the transmembrane domain (red), and the putative ISD (blue) along with the conserved C-X5–7-CC motif (C-CC) are indicated. Adapted from ref. 38, copyright (2007) National Academy of Sciences. (B) Hydrophobicity profile of HEMO Env. The canonical structural features highlighted in A are positioned and shown in the color code used in A. The mutated furin site (CTQG) is shown as a dotted line. (C) Amino acid sequence of the HEMO Env protein with the same color code. (D) Retroviral Env protein-based phylogenetic tree with the identified HEMO-Env protein. The maximum likelihood tree was constructed using the full-length SU-TM amino acid sequences from HERV Envs (including an HERV-K consensus), all previously identified syncytins, and a series of endogenous and infectious retroviruses.