Supporting Information

Legendre et al. 10.1073/pnas.1510795112 SI Methods Infected Cells and Virion Imaging. Sample Recovery and Radiocarbon Dating. Sterility controls of the Infectious cycle observations by TEM. A. castellanii-infected cell cul- samples were performed during their collection as previously tures were fixed by adding an equal volume of PBS with 2% glu- described (16, 32, 33). The samples of buried soils were taken taraldehyde and incubated for 20 min at room temperature. Cells × from the frozen outcrop walls in Chukotka, on the Stanchikovsky were recovered and pelleted for 20 min at 5,000 g. The pellet was yar (GPS coordinates: 68.370155, 161.415553, 68°22′13″N and resuspendedin1mLofPBSwith1%glutaraldehyde,incubatedat 161°24′56E), 23- to 24-m height from the Anui River. The melting least 1 h at 4 °C, and washed twice in PBS prior coating in agarose material was first cleaned out from the wall surface, and the un- and embedding in Epon resin. Each pellet was mixed with 2% low- melting agarose and centrifuged to obtain small flanges of ap- thawing layer was exposed; the frozen rock was excavated to make 3 a hollow of 30–40 cm deep, and a sample was taken from this proximatively 1-mm containing the sample coated with agarose. hollow. After treating with 95% ethanol, the sample was placed in These samples were then embedded in Epon resin using a standard a sterile plastic bag and stored frozen. In the laboratory, the method: 1-h fixation in 1% osmium tetroxide, dehydration in in- samples were stored in freezers at −20 °C. creasing ethanol concentrations (50%, 70% including uranyl ace- tate 2%, 90%, and 100% ethanol), and embedding in Epon-812. Isolation and Production. was isolated from a piece Ultrathin sections of 70 nm were poststained with 4% uranyl ac- of the buried soil sample P1084-T as previously reported (16). etate and lead citrate and observed using a Zeiss EM 912 operating Briefly, 400 mg of P1084-T were resuspended in 6 mL of Prescott at 100 kV. and James medium (27). The infection trials were performed twice Scanning electron microscopy observations of Mollivirus purified particles. and produced identical results. Each 3 mL were supplemented with A suspension of purified Mollivirus particles in PHEM buffer 300 μL of Amphotericin B (Fungizone), 250 μg/mL, and 1.65 mL (240 mM Pipes, 100 mM Hepes, 8 mM MgCl2, 40 mM EGTA, – of this 10% Fungizone solution was left overnight at 4 °C under pH 6.9) was adsorbed on a poly-L-lysine coated silica slide for stirring. After decantation, the supernatant was recovered and centri- 10 min at room temperature, and then fixed with 2.5% glutaral- fuged at 800 × g for 5 min. Acanthamoeba castellanii (Douglas) Neff dehyde in PHEM buffer for 20 min. After three washes of 5 min in (ATCC 30010TM) cells adapted to resist Fungizone (2.5 μg/mL) PHEM buffer and two of 2 min in ddH2O, the silica slide was put in were inoculated with 100 μL of the supernatant and with the pellet 50% acetone for 5 min. Serial increasing acetone baths (75%, 85%, resuspended in 50 μL of Tris (20 mM), CaCl (1 mM), pH 7.4. The 95%, 100%) of 5 min were done, followed by two more baths in 2 100% acetone for 5 min. Samples were then placed in the chamber cells were cultured at 32 °C in microplates with 1 mL of protease of a critical point dryer filled with 100% acetone. After cooling to peptone–yeast extract–glucose (PPYG) medium supplemented with 10 °C, the acetone was replaced by carbon dioxide before heating at antibiotics [ampicillin, 100 μg/mL, and penicillin–streptomycin, the critical point under pressure. Samples were sputter coated with 100 μg/mL (Gibco); Fungizone, 2.5 μg/mL (Life Technologies)] 80 Å of gold and observed on a Jeol JSM-6320F at 15 kV. and monitored for cell death. Visualization of 5-ethynyl-2′-deoxyuridine-labeled Mollivirus particles. A. castellanii cells were infected by Mollivirus at a MOI of 0.25 Virus Purification. The wells presenting an infection phenotype μ ′ were recovered, centrifuged for 5 min at 500 × g to remove the and grown in the presence of 100 M 5-ethynyl-2 -deoxyuridine cellular debris and used to infect four T-75 tissue-culture flasks (EdU) until the infectious cycle was complete. The virions were recovered and used to infect A. castellanii cells at a MOI of 20. plated with fresh Acanthamoeba cells. After completion of the Cells were recovered after 30, 60, and 90 min of infection and infectious cycle, the cultures were recovered, centrifuged for fixed with formaldehyde (3.7%), permeabilized with Triton 5 min at 500 × g to remove the cellular debris, and the virus was X-100 (0.5%), and labeled with Alexa Fluor 488 picolyl azide in pelleted by a 30-min centrifugation at 3,000 × g prior purifica- a copper buffer according to the manufacturer protocol (Click-it tion. The viral pellet was then resuspended and washed twice in Plus EdU 488 Imaging Kit; Molecular Probes). Images were PBS and layered on a discontinuous sucrose gradient [30%/40%/ recorded on a Zeiss Axio Observer Z1 inverted microscope using × g 50%/60% (wt/vol)], and centrifuged at 5,000 for 15 min. The a63× objective lens associated with a 1.6× Optovar. virus produced a white disk, which was recovered and washed − twice in PBS and stored at 4 °C or 80 °C with 7.5% DMSO. Mollivirus Virion DNA Extraction. The genomic DNA was recovered × 10 2 from 1.8 10 purified particles using the PureLink Genomic Virus Cloning. A. castellanii cells (70,000/cm ) were seeded on a DNA Extraction Mini Kit (Life Technologies) according to the 12-well culture plate with 1 mL of PPYG. After adhesion, manufacturer’s protocol. were added to the well at a multiplicity of infection (MOI) of 50. After 1 h, the well was washed several times with 1 mL of PPYG Mollivirus Sequencing. Five hundred nanograms of ge- to remove the excess of viruses. The cells were then recovered by nomic DNA were sheared to a 150- to 700-bp range using the gently scrapping the well, and a serial dilution was performed in Covaris E210 instrument (Covaris). Sheared DNA was used for the next three wells by mixing 200 μL of the previous well with Illumina library preparation by a semiautomatized protocol. 500 μL of PPYG. Drops of 0.5 μL of the last dilution were re- Briefly, end repair, A-tailing, and ligation of Illumina compatible covered and observed by light microscopy to verify that there adaptors (Bioo Scientific) were performed using the SPRIWorks were less than two cells. The 0.5-μL droplets were then distrib- Library Preparation System and SPRI TE instrument (Beckman uted in each well of a 24-well culture plate. Thousand uninfected Coulter), according to the manufacturer’s protocol. A 300- to A. castellanii cells in 500 μL of PPYG were added to the wells seeded 600-bp size selection was applied to recover most of the frag- with a single cell and monitored for cell death. The corresponding ments. DNA fragments were amplified by 12 cycles of PCR using viral clones were recovered and amplified prior purification, Platinum Pfx Taq Polymerase Kit (Life Technologies) and DNA extraction, proteome analysis, and cell cycle characteriza- Illumina adapter-specific primers. Libraries were purified with tion by electron microscopy. 0.8× AMPure XP beads (Beckman Coulter). After library profile

Legendre et al. www.pnas.org/cgi/content/short/1510795112 1of7 analysis by Agilent 2100 Bioanalyzer (Agilent Technologies) and Metagenomic data analysis. Reads containing sequences of low com- qPCR quantification, the libraries were sequenced using 151 plexity were filtered out to avoid spurious matches using the build-in base-length read chemistry in paired-end flow cell on the Illu- “dust” software tool from the SGA assembler (34) with the fol- mina MiSeq (Illumina). About 2 × 1.5 million useful reads were lowing parameters: minimal length = 50 and dust threshold = 2. obtained. This resulted in a total of 368,474,026 usable reads. These reads were mapped to the Mollivirus genome as well as to the Transcriptome Preparation. of the following other giant viruses: Pandoravirus salinus (Refseq Mollivirus-infected A. castellanii cells. Adherent cells were infected ID: NC_022098.1), Pandoravirus dulcis (Refseq ID: NC_021858.1), by Mollivirus with a MOI of 50 and distributed in 30 flasks (1.4 × sibericum (Refseq ID: NC_023423.1), Mimivirus (Refseq 107 cells/flasks of 175 cm2) containing 20 mL of PPYG and left at ID: NC_014649.1), Megavirus chilensis (Refseq ID: NC_016072.1), 32 °C for 30 min, after which viruses in excess were removed. For and the cellular host A. castellanii (GenBank Assembly ID: each time point, 12 mL were recovered to make three pools GCA_000313135.1) using bowtie2 with the “very sensitive” pa- (1: 30 min, 1, 2 h; 2: 3, 4, 5 h; 3: 6, 7, 9 h) for transcriptomic rameter (35). The genomes from Mollivirus and Pithovirus ex- analysis, 3 mL for the quantitative temporal proteomic study and hibited small but significant numbers of permafrost metagenomics 3 mL for TEM observations. mapped reads (Table 1). Cumulative distributions of these mapped RNA extraction. RNA was extracted using the RNeasy Midi kit reads against their positions in their respective target genomes (catalog no. 75144; Qiagen) using the manufacturer’s protocol. were plotted (Fig. 7). Briefly, the cells were resuspended in 4 mL of RLT buffer sup- plemented with 0.1% β-mercaptoethanol and disrupted by sub- Mollivirus Genome Assembly and Annotation. sequent −80 °C freezing and thawing at 37 °C for 10 min. Total Mollivirus genome assembly. Mollivirus genome was assembled using RNA was eluted with two successive additions of ∼200 μLof SOAPdenovo (36) with a stringent k-mer parameter value (k = RNase-free water. 121) using 2 × 1,544,523 paired-end reads and 2,221 single-end RNA quantification and quality control. RNA was quantified by mea- reads of 150 nt. This resulted in a single contig that was corrected suring the absorbance at 260 nm using a NanoDrop Spectropho- by mapping the reads using bowtie2 (35) and taking the consensus tometer. The integrity of the RNA sample was assessed using the sequence using Gap5 (37). Read coverage (about 600×) was uni- Experion Automated Electrophoresis System with RNA StdSens form along the contig except for one region of 10 kb located at one chips and reagents (Bio-Rad). extremity and exhibiting twice as much coverage (about 1,200×) characteristic of repeated sequences. This prompted us to verify Mollivirus Transcriptome Sequencing. Paired-end libraries were the connection between the 10-kb region and the contig by PCR. prepared with early, intermediate, and late mRNA following The results confirmed a 651,523-bp linear genome flanked by a Illumina’s protocol (TruSeq Stranded mRNA Sample Prep Kit). 10-kb-long inverted repeat at each extremity. Briefly, 1 μg of mRNA was poly-A selected [Life Technologies; Gene annotation. Homology searches were performed using BlastP Dynabeads Oligo (dT)25], chemically fragmented, and converted against the nonredundant (NR) GenBank database (19) with an − into single-stranded cDNA using random hexamer priming. The E-value threshold < 10 5. The functional annotation of Molli- second strand was then generated to create double-stranded cDNA. virus predicted proteins was complemented by CD search (38), cDNA were 3′-adenylated, and Illumina adapters were added. the FUGUE program (39), and RPS-BLAST (40) against COGs − DNA fragments (with adapters) were PCR-amplified using Illu- (41) (E-value threshold < 10 3). mina adapter-specific primers. Libraries were purified and then quantified using a Qubit Fluorometer (Life Technologies). Li- Protein Extraction. braries profiles were evaluated using an Agilent 2100 bioanalyzer Virion proteome. A total of 108 purified particles was resuspended in (Agilent Technologies). Each library was sequenced using 101 100 μL of lysis buffer (Tris·HCl, 40 mM; SDS, 2%; and DTT, base-length read chemistry in a paired-end flow cell on the Illu- 60 mM, pH 7.5) before extraction in gel loading buffer (100 mM mina HiSeq2000 (Illumina). More than 61 million useful reads Tris·HCl, pH 6.8; SDS, 2%; glycerol, 4%; β-mercaptoethanol, 5%; were obtained for each library. and traces of bromophenol blue) and 10 min of heating at 95 °C. Infected cells protein extraction. For each time point of the infection Metagenome Sequencing and Data Analysis. experiment performed for the transcriptomic study, 3 mL of the DNA extraction. DNA was extracted from 0.52 and 0.242 g of the culture were centrifuged to recover the cells, and the pellet was 1084-T permafrost sample using the PowerSoil DNA isolation kit frozen and stored at −80 °C prior analysis. Proteins extracted from (Mo Bio) following the manufacturer’s protocol except that we infected cells at each time point were solubilized in Laemmli gel added 83 mM DTT to the second sample to permit a more ef- loading buffer prior digestion. fective lysis of the viral particles. We recovered respectively 744 ng and 1.12 μg of pure DNA (Qubit). Protein Electrophoresis. Ten and 15 μg of extracted proteins from Metagenomic library preparation. One hundred nanograms of ge- Mollivirus solubilized in Laemmli buffer were separated on a 4– nomic DNA were sonicated to a 100- to 1,000-bp size range using 12% gradient polyacrylamide gel (NuPAGE; Invitrogen) before the E210 Covaris instrument (Covaris). Fragments were end- staining using colloidal Coomassie blue (GelCode Blue Stain Re- repaired, and then 3′-adenylated, and NextFlex PCR free DNA agent; Pierce, Thermo Scientific) and periodic acid-Schiff method barcodes (Bioo Scientific Corporation) were added by using (Glycoprotein detection kit; Sigma-Aldrich), respectively. Five NEBNext Sample Reagent Set (New England Biolabs). Ligation micrograms of horseradish peroxidase were used as positive products were purified with AMPure XP beads (Beckman control for glycoprotein detection. Coulter), and DNA fragments (>200 bp) were amplified by 12 cycles of PCR using Platinum Pfx Taq Polymerase Kit (Life Proteomic Analysis. Technologies) and Illumina adapter-specific primers. Libraries Protein digestion. were purified with 0.6× AMPure XP beads (Beckman Coulter). Virion and infected cells proteomes. Proteins were stacked in the After library profile analysis by Agilent 2100 Bioanalyzer (Agi- top of a 4–12% NuPAGE gel (Invitrogen) before R-250 Coomassie lent Technologies) and qPCR quantification (MxPro; Agilent blue staining. The gel band was manually excised and cut in pieces Technologies), the libraries were sequenced using 101 base- before being washed by six successive incubations of 15 min in length read chemistry in paired-end flow cell on the Illumina 25 mM NH4HCO3 and in 25 mM NH4HCO3 containing 50% HiSeq2000 (Illumina). (vol/vol) acetonitrile. Gel pieces were then dehydrated with 100%

Legendre et al. www.pnas.org/cgi/content/short/1510795112 2of7 acetonitrile and incubated for 45 min at 53 °C with 10 mM DTT in cleavages were allowed. Precursor mass error tolerances were set 25 mM NH4HCO3 and for 35 min in the dark with 55 mM io- at 20 ppm and 4.5 ppm for first and main searches, respectively. doacetamide in 25 mM NH4HCO3. Alkylation was stopped by Fragment mass error tolerance was set to 0.5 Da. Peptide adding 10 mM DTT in 25 mM NH4HCO3 and mixing for 10 min. modifications allowed during the search were as follows: car- Gel pieces were then washed again by incubation in 25 mM bamidomethylation (C, fixed), acetyl (protein Nter, variable), NH4HCO3 before dehydration with 100% acetonitrile. Modified and oxidation (M, variable). Minimum peptide length was set trypsin (Promega; sequencing grade) in 25 mM NH4HCO3 was to 6 aa. Minimum number of peptides, razor plus unique peptides, added to the dehydrated gel pieces for an overnight incubation at and unique peptides were all set to 1. Maximum false discovery 37 °C. Peptides were then extracted from gel pieces in three 15-min rates were set to 0.01 at peptide and protein levels. Label-free sequential extraction steps in 30 μL of 50% acetonitrile, 30 μLof quantification (LFQ) and intensity-based absolute quantification 5% formic acid, and finally 30μL of 100% acetonitrile. The pooled (iBAQ) values were calculated from MS intensities of unique supernatants were then dried under vacuum. peptides. Surfome (surface proteome) analysis. The 8 × 108 purified particles Each A. castellanii protein was quantified as the ratio of its were incubated for 30 min at 37 °C in digestion buffer (50 mM abundance relative to the total amoeba proteins quantity at Tris·HCl, pH 7.5, 150 mM NaCl, and 5 mM CaCl2) containing or each time point. For the Mollivirus proteome analysis, we used not 1.8 μg of modified trypsin (Promega; sequencing grade). all identified proteins (host and virus) at each time point to Samples were then centrifuged for 3 min at 14,500 × g, and the normalize the data. Mollivirus protein clustering analysis was supernatant was centrifuged again the same way. The superna- performed using hierarchical clustering with Euclidian dis- tant was then submitted to an overnight trypsin digestion (1 μgof tances (Fig. S4). modified trypsin) at 37 °C. Peptides were then desalted using C18 spin columns (Harvard Apparatus). Gene Content-Based Clustering. A cladistics tree (Fig. 8) was con- Nano-LC-MS/MS analyses. The dried extracted peptides were resus- structed based on the clustering of gene contents of the following pended in 5% acetonitrile and 0.1% trifluoroacetic acid and ana- completely sequenced viral genomes: Mollivirus, Acanthamoeba lyzed by online nano–LC-MS/MS (Ultimate 3000, Dionex and polyphaga mimivirus (NC_014649), Acanthamoeba polyphaga LTQ-Orbitrap Velos pro, or Q-Exactive Plus; Thermo Fisher moumouvirus (NC_020104), Aedes taeniorhynchus iridescent virus Scientific). Peptides were sampled on a 300 μm × 5-mm PepMap (NC_008187), African swine fever virus (NC_001659), Amsacta C18 precolumn and separated on a 75 μm × 250-mm C18 column moorei entomopoxvirus (NC_002520), Autographa californica nu- (PepMap, Dionex). The nano-LC method consisted in a 120-min cleopolyhedrovirus (NC_001623), Bathycoccus RCC1105 virus (for particle and surfome analyses) or a 240-min (for infected cells (NC_014765), Cafeteria roenbergensis virus (NC_014637), Culex analyses) gradient at a flow rate of 300 nL/min, ranging from 5% to nigripalpus NPV (NC_003084), Ectocarpus siliculosus virus1 37% acetonitrile in 0.1% formic acid during 114 min before (NC_002687), Emiliania huxleyi virus86 (NC_007346), Feldmannia reaching 72% acetonitrile in 0.1% formic acid for the last 6 min. species virus (NC_011183), Human herpesvirus 1 (NC_001806), MS and MS/MS data were acquired using Xcalibur (Thermo Human herpesvirus 6A (NC_001664), Infectious spleen and kidney Fisher Scientific). Spray voltage and heated capillary were set at necrosis virus (NC_003494), Lausannevirus (NC_015326), Lym- 1.4 kV and 200 °C, respectively. Survey full-scan MS spectra (m/z = phocystis disease virus china (NC_005902), Mamestra configurata 400–1,600) were acquired in the Orbitrap with a resolution NPV-A (NC_003529), Marseillevirus (NC_013756), Megavirus of 60,000 after accumulation of 106 ions (maximum filling time, chilensis (NC_016072), Megavirus lba (NC_020232), Melanoplus 500 ms). The 20 most intense ions from the preview survey scan sanguinipes entomopoxvirus (NC_001993), Micromonas RCC1109 delivered by the Orbitrap were fragmented by collision-induced MpV1 (NC_014767), Myxoma virus (NC_001132), Neodiprion dissociation (collision energy, 35%) in the LTQ after accumu- abietis NPV (NC_008252), Orf virus (NC_005336), Ostreococcus lation of 104 ions (maximum filling time, 100 ms). lucimarinus virus OlV1 (NC_014766), Ostreococcus tauri virus1 Mass spectrometry bioinformatics data analyses. For particle and sur- (NC_013288), Ostreococcus virus OsV5 (NC_010191), Pandor- fome analyses, data were processed automatically using Mascot avirus dulcis (NC_021858), Pandoravirus inopinatum (NC_026440), Daemon software (version 2.5.1; Matrix Science). Concomitant Pandoravirus salinus (NC_022098), Paramecium bursaria Chlorella searches against Mollivirus protein sequence databank (523 en- virus1 (NC_000852), Phaeocystis globosa virus (NC_021312), tries), in-house–built A. castellanii protein sequence databank Pithovirus sibericum (NC_023423), Rodent herpesvirus Peru (16,206 entries), classical contaminants database (67,126 se- (NC_015049), Spodoptera litura granulovirus (NC_009503), Wise- quences, homemade), and the corresponding reversed databases ana iridescent virus (NC_015780). and Aureococcus anopha- were performed using Mascot (version 2.5). ESI-TRAP was gefferens virus (NC_024697). We first performed gene clustering chosen as the instrument, trypsin/P as the enzyme, and two missed using OrthoMCL (42) with standard parameters (Blast E-value − cleavage allowed. Precursor and fragment mass error tolerances cutoff = 10 5 and mcl inflation factor = 1.5) on the protein coding were set at 10 ppm and 0.6 Da, respectively, for data acquired on genes of length ≥ 100 aa. This resulted in the definition of 3,001 the LTQ-Orbitrap Velos, and 10 ppm and 25 mmu (milli-mass distinct clusters. We computed a presence/absence matrix based on units of mDa) for data acquired on the Q-Exactive Plus. Peptide the genes clusters and calculated a distance matrix using the dis- modifications allowed during the search were as follows: carba- tance defined in ref. 43. Finally the phylogenetic tree was con- midomethyl (C, fixes), acetyl (N-ter, variable), and oxidation structed using neighbor joining. Support values were estimated (M, variable), and deamidation (NQ, variable). The IRMa soft- using bootstrap resampling (n = 10,000). ware (28) (version 1.31.1) was used to filter the results: conser- vation of rank 1 peptides, peptide identification false discovery A. castellanii Genome Annotation. rate < 1% (as calculated on peptide scores by using the reverse Gene structure prediction. Although not annotated, the genome se- database strategy), and minimum of one specific peptide per quence assembly of A. castellanii from the Baylor College of Medi- identified protein group. cine (GenBank Assembly Name: “Acas_2”; ID: GCA_000193105.1) For infected cells analyses, RAW files were processed using with a total length of 46,714,639 bp, is substantially larger than the MaxQuant software (29) (version 1.5.1.2). Spectra were searched one from Clarke et al. (18) (GenBank Assembly Name: “Acas- against the Mollivirus protein sequence databank (523 entries), tellanii.strNEFF v1”; ID: GCA_000313135.1) with a length of A. castellanii protein sequence databank (16,206 entries), and 42,019,824 bp. This prompted us to perform gene prediction on the the frequently observed contaminants database embedded in Acas_2 assembly. We used the Maker annotation software (44) MaxQuant. Trypsin was chosen as the enzyme and two missed with Augustus (45), Genemark-ES (46), and Snap (47) ab initio

Legendre et al. www.pnas.org/cgi/content/short/1510795112 3of7 gene prediction algorithms, supported by UniprotKB (Swiss-prot) false negatives in our annotation. The gain regarding predicted protein sequences as well as A. castellanii EST expression data transcribed regions for protein-coding genes could correspond (ftp://ftp.hgsc.bcm.edu/AcastellaniNeff/ESTs/BRENDANSSFFS/). to a more accurate definition of either coding regions or UTRs. After manual curation of the predicted gene structures, we were To test whether our annotation actually improved the prediction able to predict 16,206 protein-coding genes, which contain 8.25 of A. castellanii protein sequences, we mapped proteomic data exons of 214 nt separated by 105-nt-long introns, on average. from MS/MS experiment of A. castellanii cells using Mascot We next questioned whether this new dataset actually repre- (48). Again, we found an improvement of 21% in protein se- sented a significant improvement over the Clarke et al. annotation quence prediction using our annotation (19,620 spectral counts (18) predicting 14,974 protein-coding genes. We thus mapped all vs. 16,226). available RNA-seq transcriptomic data from Clarke et al. (SRA Prediction of protein functions. Functional annotation of predicted IDs: SRX203182, SRX203266–SRX203275, SRX208998) (18) to genes was performed using Blast2GO (49) with Blastp against the the predicted transcripts from both annotations using bowtie2 NR database and Interproscan motifs detection. As expected, the (35). The proportion of RNA-seq reads mapped to our anno- vast majority of predicted proteins are identical or highly similar to tation of A. castellanii transcripts (76.75%) is much higher than the A. castellanii proteins predicted by Clarke et al. (18), with 87% the 54.33% of mapped reads to Acastellanii.strNEFF v1 anno- of the predicted genes having a best BLAST hit corresponding to tated transcripts, and thus an increase of 41%. Symmetrically to this annotation. Genes with a significant BLAST hit (E value < − this true-positives measurement, we quantified the proportion of 10 5) were assigned the best-match functional annotation. For RNA-seq reads aligned in predicted untranscribed regions, i.e., genes with no significant BLAST hit or no annotation (“hypo- intergenic and intronic regions. These percentages were 11.51% thetical protein”), we assigned Gene Ontology annotations pre- vs. 8% for the Acastellanii.strNEFF v1 and the Acas_2 assem- dicted from Blast2GO whenever available. Eventually, we were blies, respectively, and thus a decrease of 30% in the number of able to annotate 77% of the predicted Acanthamoeba genes.

Mollivirus genomic sequence (nt) Mollivirus concatenated ORFs (aa) Mollivirus (nt) genomic sequence Mollivirus concatenated ORFs (aa) Mollivirus concatenated

Fig. S1. Absence of large-scale repeated regions in the Mollivirus genome. (Left) A dot plot of the Mollivirus genomic sequence against itself was computed using the Gepard software (50) with default parameters. (Right) Dot plot of the concatenated peptide sequences of Mollivirus proteins with the word length parameter = 6. Most of the off-diagonal dots correspond to ankyrin motifs found in many different proteins.

Legendre et al. www.pnas.org/cgi/content/short/1510795112 4of7 Intergenic regions Predicted genes Counts 0 10203040506070

1 10 100 1000 10000 100000 1000000

Average read coverage

Fig. S2. Validation of the predicted protein-coding genes using RNA-seq. RNA-seq reads obtained at nine time points (30 min, 1–7, and 9 h PI) were mapped onto the Mollivirus genome using TopHat2 (51) with the following parameters: i = 20, I = 2,000, and b2-very-sensitive. The average coverage was computed as the average number of reads overlapping a given nucleotide position within each predicted gene or intergenic region. 1.0 80 100 0.8 60 0.6

Mollivirus A. castellanii Relative Protein Expression (%) Relative Mitochondrion Normalized Relative Protein Expression Normalized Relative 0.0 0.2 0.4 02040

00.512345679 02468

Infection Time Course (in hours) Infection Time Course (in hours)

Fig. S3. Relative proportion of Mollivirus-, mitochondrion-, and Acanthamoeba-encoded proteins in the proteome of infected cells for all time points. Protein abundances [intensity-based absolute quantification (iBAQ) (52)] were normalized by the total abundance of all identified proteins at each time point. The peak (red distribution) seen at 30 min PI corresponds to the Mollivirus proteins brought about by the infecting particles and remaining in the phagocytic vacuoles.

Legendre et al. www.pnas.org/cgi/content/short/1510795112 5of7 Fig. S4. Relative abundances of Mollivirus-encoded proteins throughout the infectious cycle. The seven leftmost columns display the LFQ abundance of each reliably detected protein, relative to its value at 30 min PI (initial phagocytosis step) taken as a reference. Intensities are color-coded in red when abundance values are higher, and in blue when lower. The six middle columns display the purple-coded increment in abundance between two adjacent time points. The orange and green rightmost columns indicate the abundance of each protein 6 h PI and in the virion proteome, respectively.

Legendre et al. www.pnas.org/cgi/content/short/1510795112 6of7 Fig. S5. Concatenated proteins dot plots of Mollivirus sibericum against itself and the fully sequenced Pandoraviruses. Despite their large difference in size and detailed gene contents, the Pandoraviruses retain several large regions of colinearity. None is found in Mollivirus (right column).

Other Supporting Information Files

Table S1 (DOCX) Table S2 (DOCX) Table S3 (DOCX) Table S4 (DOCX)

Legendre et al. www.pnas.org/cgi/content/short/1510795112 7of7