Comparative Viral of Hot springs in Yellowstone and Nevada Reveals High and Identifies Novel Replication Enzymes. www.lucigen.com Thomas Schoenfeld, Nick Hermersmann, Darby Sugar, David Mead Lucigen Corporation, Middleton, WI 53562 USA 888-575-9695 email [email protected]

ABSTRACT Comparison of the metagenomic profiles of in three circumneutral hot springs of similar temperature Family A DNA Polymerases Applications Using PyroPhage 3173 Pol and chemistry shows significant differences in gene content and structure associated with geographical and temporal separation. Viral metagenomes were isolated from Octopus and Black Pool (Yellowstone) and Great Boiling Spring (Nevada). Roche 454 and Sanger sequence data were used to assemble contigs up to 33.5 kb, some of which may contain nearly compete viral . Putative replication operons in these assemblies suggest novel replication strategies and provide new biological reagents. Viral replicases were identified by 74-like Pol RT-PCR sequence similarity and by functional activity. More than twenty novel thermostable viral DNA polymerases (Pols) were expressed and characterized, all of which are molecularly and biochemically divergent from known Pols. Based on amino acid sequences, these Pols fall into four clades. Clade 1 includes family A-like Pols and is the most diverse, geographically dispersed, and persistent. Clade 2 includes family B-like Pols and was relatively conserved and abundant in one Octopus metagenome, but was not detected in later screens of this MS2 RNA or other springs. Clade 3 was exclusive to Black Pool and includes primase/polymerase enzymes similar to those seen in archaeal plasmids, but never before reported in viral genomes. Clade 4 is comprised of a single Real-time RT-qPCR Pol that was discovered by functional screening of the Great Basin metagenome and has weak but significant similarity (E = 0.73) to a single open reading frame of unknown function in the crenarchaeal , SSV1. The family of highly related clones identified by functional screening encode an apparent polyprotein Assembly of these pol genes with other sequences in the metagenomes reveals the genomic context of with polymerase activity in the C-terminal half and potential accessory functions in the N-terminal half. replication genes. Both Clade 1 and 2 Pols appear to be encoded in operons that include genes for likely subunit helicases and recB-like endonucleases. The gene for one Clade 1 Pol is contained in the 33.5 kb contig allowing identification of adjacent genes, several of which appear to be replicase subunit genes. This Pol appears to be expressed as a polyproteins of 1608 amino acids that is processed in vitro and presumably in 3173 Pol Assembled from Sanger Metagenome vivo to generate 70 kD Pols. A truncation product of this protein has proven highly useful for single enzyme RT PCR and isothermal RNA and DNA detection. Sequence similarity and the polyprotein structure are shared by 5'-3' exonuclease - the other representatives of this clade. These Pols also share sequence similarity, but not the polyprotein structure, with Pols of Aquificales order . Remarkably, they also have strong similarities in sequence, 3'-5' exonuclease Strong gene structure, and thermostability with Pols encoded by the plastids of several parasitic protists, e.g., Strand displacement Strong plasmodium, suggesting unusual patterns of gene transfer. Isolation of sequence variants of both the Clade 1

Extension from nicks Strong r and 2 Pols allows mapping of coding variation associated with measurable biochemical differences. The high e d

Thermostability (T½ d

conservation, relative abundance, and wide geographic distribution of certain replication-associated genes, a @95°C) 10 min. p p p p p p p L especially pol and hel genes, suggests they may be valuable as signature genes for several types of p p p b b b

b b b b b

Km dNTPs 40 μM 9 2 0 b 8

thermophilic viruses. Biochemical characterization of selected viral replicases will be discussed. 0b 93 8

36 16 217 21 243 294 124 Km DNA 5.3 nM 10 A contig assembled from Sanger data includes the PCR confirms 3173 Pol polyprotein and putative accessory proteins. Processivity 47 nt the assemblies. Expression of the 1608 aa ORF. Fidelity 7 X 104 Human transcripts Thermophilic Viral Metagenomics DNA / P o ly m e ra s e Fi de lity 9. 0E +0 4 Template RNA Thermal Profile PyroPhage Pol 8. 0E +0 4

7. 0E +0 4 120.0%

6. 0E +0 4 Reverse Transcriptase Activity 100.0% 5. 0E +0 4 1.4

ty 80.0% 1.2 i d v e t

4. 0E +0 4 ti a 1 c r o A

60.0% p l r 0.8 a 3. 0E +0 4 o c m n

i Mn i

x 0.6 P

a 40.0% 2. 0E +0 4 T Mg T

M 0.4 d % 1. 0E +0 4 20.0% of 0.2 ol

0. 0E +0 0 nm 0 0.0% 1 0.2 0.04 PPhhu us iosinon PPfufu U Ultlrtraa ETcoaqn o T a q 3P1yr73o Pwht a g e P3y1r7o3Pehxago-e R T 55 57 62 65 69 74 77 79 Units of PyroPhage 3173 Pol T T Reaction Temperature (deg C) T R R R er er

Reverse transcriptase activity d d e e PCR fidelity shown by the LacIq e d g g g d a a a a demonstrated by incorporation of 33P- a hage hage hage l l V V V

reversion asssay. P P P T T p V V V labeled dTTP on a poly riboA template. p ML ML ML L ro oph RT L oph R L oph R ro ro b b

r r r y y y 0 0 y y y M- P 3173 RT No M- P 3173 RT No M- P 3173 RT No 10 MM P No MM P No MM P TEM images of the viruses No 10 144 bp 246 bp 298 bp directly isolated from the springs. Assembly of 3173 Phage from β-actin β2-µglobulin Cyclophilin NextGen Metagenome Sequence PyroPhage Polymerases were isolated from near-boiling RT-PCR using PyroScript Mix hot springs. YS Top left, viral RNA from phage MS2 was amplified by 40 cycles of RT-PCR. Targets of 89 to 362 bp were amplified. Bottom left, total human Liver RNA (1 µg) was reverse transcribed by Moloney Murine Leukemia NV Virus or by PyroPhage 3173 Pol, then PCR amplified using Lucigen EconoTaq® PLUS Master Mix. Top right, CA the 160 bp primer set used in a 10X dilution series with amplification detected by real-time RT-qPCR. Center Octopus Metagenome right, the melt curve shows high specificity across the dilution series. Bottom right, the Ct values from the real- DNA Pols time data is graphed (log/log) vs. the dilutions. Discovered by • 164,800 reads 75% NAID assembly Sequencing and Largest contig Functional • 68,774 reads Isothermal Amplification • 33.5 kb Screening NV • Average 775 reads per nt YS • 37 ORFs • Contains the 3173 replication operon.

A phylogram of selected PyroPhage viral polymerases (circled) based on ClustalW alignment of the amino PyroPhage 3173 Pol in isothermal detection of a serial dilution of a plasmid containing influenza A target sequence sequences. Pols were discovered by sequence similarity and funcitonal screening.

454 Sequencing of Viral Metagenomes 347 Pol – a Novel Replicase Discovered by Sanger sequencing Spring Genome Number of Assembled Unassembled Largest Contigs Contigs Center Reads @75% NAID contig >2 kb Functional Metagenomics

Black Pool Roche 454 117,653 107,119 10,534 18,588 7,058 986

• DNA polymerase activity by Octopus Broad 229,553 164,800 64,753 33,529 31,888 1,580 radioactive incorporation and primer extension assay • Strongest E value = 0.55 to SSV1 Great Boiling JGI 258,950 254,533 4,417 9,647 38,333 429 Spring protein An engineered variant of the PyroPhage 3173 Pol was used in dye terminator Sanger sequencing. • NTP binding domain

Viral metagenomes were sequenced by 454 technology by the indicated Genome Center. • 35 kD • Highly thermostable Conclusions Viral Replicases • Viral metagenomes represent a vast, untapped resource for enzyme discovery Assembled from Family B Replicase Genes in a 16.5 kb Contig 454 Reads • Thermostable PyroPhage 3173 Pol directly RT-PCR amplifies viral RNA and human transcripts. • Sanger reads assembled at 50% NAID • Effective for real-time and endpoint RT-PCR analyses. • 16.5 kb contig • PyroPhage RT is also effective in RT-LAMP and Sequencing. • Encodes an apparent replication operon including likely replicase subunits.

Acknowledgements

Sequences from the Black Pool Library were assembled and genes identified by GeneMark. Predicted ORFs This work was supported by NSF and NIH (NIAID and NHGRI) grants to TS and the Moore foundation. were annotated by BLASTp analysis. The six largest contigs are shown. Sequencing of viral metagenomes was performed by JGI, Broad and Roche 454.