The Pennsylvania State University The Graduate School Eberly College of Science

COMPARATIVE GENOMICS PROVIDES

INSIGHTS INTO HERPESVIRUS

TRANSMISSION, SPREAD, AND VIRULENCE

A Dissertation in Biochemistry, Microbiology, and Molecular Biology by Utsav Pandey

Ó 2018 Utsav Pandey

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

May 2018 The dissertation of Utsav Pandey was reviewed and approved* by the following:

Moriah L. Szpara Assistant Professor of Biochemistry and Molecular Biology Dissertation Advisor Chair of Committee

Andrew F. Read Evan Pugh Professor of Biology and Entomology Eberly Professor of Biotechnology

Istvan Albert Research Professor

Anthony Associate Professor of Molecular Virology Director, Pathobiology Graduate Program

Timothy C. Meredith Assistant Professor of Biochemistry and Molecular Biology

Wendy Hanna-Rose Interim Department Head, Biochemistry and Molecular Biology Associate Professor of Biochemistry and Molecular Biology

*Signatures are on file in the Graduate School

ii Abstract

Herpesviruses are widespread in the nature infecting almost every animal species. They are significant pathogens for human health and agriculture. Herpes simplex virus 1 (HSV-1) causes millions of chronic infections in humans, whereas Marek’s disease virus 1 (MDV-1) is a herpesvirus of poultry of great economic importance. Clinical outcomes of HSV-1 infections are diverse, ranging from surface lesions, keratitis, to severe and potentially lethal encephalitis. Similarly, MDV-1 causes nervous system disease and lymphomas in chickens with mortality approaching 100% in absence of vaccination. Both HSV-1 and MDV-1 are alphaherpesviruses and contain a double-stranded DNA (dsDNA) genome as their genetic material. Using high throughput sequencing (HTS) and comparative genomics, this dissertation seeks to explore the genetic basis of transmission, spread and virulence of these viruses. In this dissertation, transmission of HSV-1 between individuals has been explored using transmission events between father-son and mother-neonate pair. Comparative genomics of these genomes showed that at the consensus level HSV-1 genomes can be identical even after multiple cycles of latency and reactivations. However, we observed that at the population level the parental and transmitted virus can be quite different. HSV-1 isolates obtained from the father- son pair were further characterized phenotypically using a murine animal model. By examining a series of phenotypic properties, we concluded that parental and transmitted viruses can also preserve their phenotypes over decades. Transmission and spread of MDV-1 in the field was depicted by sequencing viral isolates form poultry farms in central Pennsylvania. This study presented the first case of MDV-1 genomes obtained directly from chicken feathers and poultry dander without the use of cell culture. We showed that these genomes were highly identical at the consensus level, but differed at the population level, corroborating the findings from transmission events of HSV-1. This study was also important in laying foundation for studying MDV-1 genomics

iii using field samples. We further extended comparative genomics of field samples of MDV-1 to understand the evolution of virulence in MDV-1. To this effect, we sequenced 64 spatially and temporally separated field isolates of MDV-1 and explored the role of recombination in emergence of highly virulent isolates over time. We found that paths to emergence of virulence in field isolates of MDV-1 is complex and could be explained by multiple recombination events between field isolates. We also employed comparative genomics to identify genetic markers of virulence in HSV-1 and MDV-1. In HSV-1, we identified amino acid variations in the viral protein VP22 that reliably distinguish low-virulence isolates from high- virulence isolates of HSV-1. To understand the biological importance of these amino acid residues in determining HSV-1 virulence, I have initiated the process of engineering viral mutants. These recombinant viral mutants can then be tested against the parental virus using in vitro and in vivo assays. Similarly, through comparison of field isolates of MVD-1 we have identified nucleotide substitutions that distinguish MDV-1 isolates of different pathotypes. Despite their widely recognized clinical and veterinary importance, there is a paucity of knowledge concerning the evolution and genetic basis of virulence for herpesviruses. Understanding the genetic diversity of parental and transmitted virus is invaluable to getting a comprehensive picture of a transmission event. Likewise, assessing the genetic diversity of clinical and field samples gives insights into the genetic variation in the circulating viral strains. Information obtained from these studies can be used to develop vaccines and therapeutics that account for the diversity present in the circulating strains. Similarly, identification of genetic markers of virulence can be used to gauge virulence level based on viral genotype providing a powerful tool for future diagnostics and prediction of clinical outcomes. Understanding how and why field isolates of MDV-1 became more pathogenic is of significance not just for poultry industry, but also in the wider context of combating increased incidence of drug and vaccine resistance among other pathogens.

iv Table of contents

LIST OF FIGURES ...... VIII LIST OF TABLES ...... XIV ACKNOWLEDGEMENT ...... XV CHAPTER 1 ...... 1 1.1 Genetic variation in pathogens and the advent of sequencing technologies ...... 2 1.2 Studying pathogen biology in the era of high throughput sequencing (HTS) ...... 3 1.3 Use of HTS in clinical diagnostics ...... 5 1.4 Use of HTS in epidemiological studies ...... 7 1.5 Bioinformatics in clinical and public health laboratories ...... 9 1.6 Forward and reverse genetics approaches (Comparative genomics) ...... 11 1.7 Genetic variation in RNA vs. DNA viruses ...... 12 1.8 Herpesvirus genetic variation and diversity ...... 14 1.9 Alphaherpesviruses ...... 16 1.10 Marek’s disease virus serotype 1 (MDV-1) ...... 17 1.11 Herpes simplex virus-1 (HSV-1) ...... 18 CHAPTER 2 ...... 21 2.1 ABSTRACT ...... 22 2.2 IMPORTANCE ...... 23 2.3 INTRODUCTION ...... 24 2.4 MATERIALS AND METHODS ...... 25 2.4.1 Collection of dust and feathers ...... 25 2.4.2 Viral DNA isolation from dust ...... 26 2.4.3 Isolation of viral DNA from feather follicles ...... 28 2.4.4 Measurement of total DNA and quantification of viral DNA ...... 29 2.4.5 Illumina next-generation sequencing ...... 30 2.4.6 Consensus genome assembly ...... 31 2.4.7 Between-sample: consensus genome comparisons ...... 32 2.4.8 Within-sample: polymorphism detection within each consensus genome ...... 33 2.4.9 Testing for signs of selection acting on polymorphic viral populations ...... 33 2.4.10 Sanger sequencing of polymorphic locus in ICP4 ...... 34 2.4.11 Genetic distance and dendrogram ...... 35 2.4.12 Taxonomic estimation of non-MDV sequences in dust and feathers ...... 35 2.4.13 GenBank accession numbers and availability of materials ...... 35 2.5 RESULTS ...... 36 2.5.1 Sequencing, assembly and annotation of new MDV-1 consensus genomes from the field ...... 36 2.5.2 DNA and amino acid variations between five new field genomes of MDV-1 .. 39 2.5.3 Detection of polymorphic bases within each genome ...... 40 2.5.4 Tracking shifts in polymorphic loci over time ...... 44 2.5.5 Comparison of field isolates of MDV-1 to previously sequenced isolates ...... 46 2.5.6 Assessment of taxonomic diversity in dust and chicken feathers ...... 48 2.6 DISCUSSION ...... 49 2.7 ACKNOWLEDGEMENTS ...... 53 2.8 SUPPLEMENTARY TABLES ...... 55 CHAPTER 3 ...... 73

v 3.1 ABSTRACT ...... 74 3.2 INTRODUCTION ...... 75 3.3 METHODS ...... 76 3.3.1 Isolate acquisition and stock generation ...... 76 3.3.2 Animal studies ...... 77 3.3.3 Virus replication in vivo ...... 77 3.3.4 Reactivation studies ...... 77 3.3.5 Antibodies and Immunohistochemistry ...... 78 3.3.6 Virus culture and DNA isolation for HTS ...... 78 3.3.7 Southern blot ...... 78 3.3.8 Illumina high-throughput sequencing ...... 78 3.3.9 De novo assembly of consensus genomes ...... 79 3.3.10 Consensus genome comparison and phylogenetic analysis ...... 79 3.3.11 Intra-strain minority-variant detection ...... 80 3.3.12 Data Availability ...... 80 3.4 RESULTS ...... 81 3.4.1 Familial transmission and viral culture characteristics ...... 81 3.4.2 Acute replication kinetics & mortality of isolates R-13 and N-7 ...... 83 3.4.3 Nearly identical genomes of father and son HSV-1 isolates ...... 88 3.4.4 Comparison of father and son isolates to other HSV-1 strains ...... 89 3.4.5 Intra-strain variation: Detection of minority variants within each genome ...... 91 3.5 DISCUSSION ...... 95 3.6 ACKNOWLEDGEMENTS ...... 98 3.7 SUPPLEMENTARY TABLES ...... 99 CHAPTER 4 ...... 105 4.1 ABSTRACT ...... 106 4.2 INTRODUCTION ...... 107 4.3 METHODS ...... 110 4.3.1 Specimen collection and pathotyping ...... 110 4.3.2 DNA isolation, genome sequencing and de novo assembly of viral genomes ...... 112 4.3.3 Genome-wide examination for recombination ...... 112 4.4 RESULTS ...... 114 4.4.1 Genome-wide examination of recombination in field isolates of MDV-1 ...... 114 4.4.2 Virulence evolution through donation of genome segments ...... 116 4.4.3 Overview of impact of recombination on evolution of MDV-1 ...... 122 4.5 DISCUSSION ...... 124 4.6 FUTURE DIRECTIONS ...... 127 CHAPTER 5 ...... 129 APPENDIX A: ...... 134 A.1 ABSTRACT ...... 135 A.2 RESEARCH SUMMARY ...... 136 A.2.1 Introduction ...... 136 A.2.2 Animal model for studying HSV-1 pathogenesis ...... 136 A.2.3 Viral DNA extraction and genome assembly ...... 137 A.2.4 Identification of potential genetic markers of virulence ...... 138 A.2.5 Making viral mutants to explore gain and loss of pathogenicity ...... 138 A.2.6 Conclusion and future directions: ...... 142 APPENDIX B: ...... 143

vi B.1 ABSTRACT ...... 144 B.2 RESEARCH SUMMARY ...... 145 B.2.1 Introduction ...... 145 B.2.2 Clinical presentation ...... 145 B.2.3 Library prep, Illumina sequencing, and genome assembly ...... 147 B.2.4 Consensus genome comparison ...... 147 B.2.5 Genome comparison at the population level ...... 150 B.2.6 Conclusion ...... 152 APPENDIX C: ...... 153 C.1 ABSTRACT ...... 154 C.2 RESEARCH SUMMARY ...... 155 APPENDIX D: ...... 158 D.1 ABSTRACT ...... 159 D.2 RESEARCH SUMMARY ...... 160 REFERENCES ...... 163

vii List of Figures

FIGURE 2-1: DIAGRAM OF SAMPLES COLLECTED FOR GENOME SEQUENCING OF FIELD ISOLATES OF MDV. SAMPLES COLLECTED FOR GENOME SEQUENCING WERE SOURCED FROM TWO PENNSYLVANIA FARMS WITH LARGE-SCALE OPERATIONS THAT HOUSE APPROXIMATELY 25,000-30,000 INDIVIDUALS PER BUILDING. THESE FARMS WERE SEPARATED BY 11 MILES. ON FARM A, TWO SEPARATE COLLECTIONS OF DUST WERE MADE 11 MONTHS APART. ON FARM B, WE COLLECTED ONE DUST SAMPLE AND INDIVIDUAL FEATHERS FROM SEVERAL HOSTS, ALL AT A SINGLE POINT IN TIME. IN TOTAL, THREE DUST COLLECTIONS AND TWO FEATHERS WERE USED TO GENERATE FIVE CONSENSUS GENOMES OF MDV FIELD ISOLATES (TABLE 2-1). (ARTWORK BY NICK SLOFF, PENN STATE UNIVERSITY, DEPARTMENT OF ENTOMOLOGY)...... 26 FIGURE 2-2. PROCEDURES FOR ENRICHMENT AND ISOLATION OF MDV DNA FROM DUST. VORTEXING, CENTRIFUGATION AND SONICATION WERE ESSENTIAL TO RELEASE CELL-ASSOCIATED VIRUS INTO THE SOLUTION. THE VIRUS-CONTAINING SUPERNATANT WAS THEN PASSED THROUGH 0.8 µM AND 0.22 µM FILTERS FOR REMOVAL OF LARGER CONTAMINANTS. THE FLOW-THOROUGH WAS TREATED WITH DNASE AND THE VIRAL PARTICLES WERE CAPTURED USING 0.1 µM FILTER. THE MEMBRANE OF THE 0.1 µM FILTER WAS THEN EXCISED AND USED FOR EXTRACTION OF THE VIRAL DNA...... 27 FIGURE 2-3. PROCEDURES FOR ENRICHMENT AND ISOLATION OF MDV DNA FROM INDIVIDUAL FEATHER FOLLICLES. PROCEDURE FOR ENRICHMENT OF MDV DNA USING CHICKEN FEATHER FOLLICLE AS THE SOURCE OF VIRAL DNA. A FEATHER WAS MECHANICALLY DISRUPTED (BEAD-BEATING) AND TREATED WITH TRYPSIN TO BREAK OPEN HOST CELLS AND RELEASE CELL-FREE VIRUS INTO THE SOLUTION. THE SAMPLE WAS THEN TREATED WITH DNASE TO REMOVE CONTAMINANT DNA. FINALLY, THE VIRAL CAPSIDS WERE LYSED TO OBTAIN VIRAL GENOMIC DNA...... 29 FIGURE 2-4. WORKFLOW FOR COMPUTATIONAL ENRICHMENT FOR MDV SEQUENCES AND SUBSEQUENT VIRAL GENOME ASSEMBLY AND TAXONOMIC PROFILING. THE VIRGA WORKFLOW (202) REQUIRES AN INPUT OF HIGH-QUALITY HTS DATA FROM THE VIRAL GENOME OF INTEREST. FOR THIS STUDY WE ADDED AN ADDITIONAL STEP THAT SELECTED MDV-LIKE SEQUENCE READS FROM THE MILIEU OF DUST AND FEATHER SAMPLES. THE SEQUENCE READS OF INTEREST WERE OBTAINED BY USING BLAST TO COMPARE ALL READS AGAINST A CUSTOM MDV DATABASE WITH AN E-VALUE OF 10-2; THESE WERE THEN SUBMITTED TO VIRGA FOR ASSEMBLY. TAXONOMIC PROFILING FOLLOWED A SIMILAR PATH USING NCBI’S ALL-NUCLEOTIDE DATABASE TO IDENTIFY THE TAXONOMIC FOR EACH SEQUENCE READ. IN THIS WORKFLOW DIAGRAM, PARALLELOGRAMS REPRESENT DATA OUTPUTS WHILE RECTANGLES REPRESENT COMPUTATIONAL ACTIONS...... 31 FIGURE 2-5. THE COMPLETE MDV-1 GENOME INCLUDES TWO UNIQUE REGIONS AND TWO SETS OF LARGE INVERTED REPEATS. (A) THE FULL STRUCTURE OF THE MDV-1 GENOME INCLUDES A UNIQUE LONG REGION (UL) AND A UNIQUE SHORT REGIONS (US), EACH OF WHICH ARE FLANKED LARGE REPEATS KNOWN AS THE TERMINAL AND INTERNAL REPEATS OF THE LONG REGION (TRL AND IRL) AND THE SHORT REGION (TRS AND IRS). MOST ORFS (PALE GREEN ARROWS) ARE LOCATED IN THE UNIQUE REGIONS OF THE GENOME. ORFS IMPLICATED IN MDV PATHOGENESIS ARE OUTLINED AND LABELED; THESE INCLUDE ICP4 (MDV084 / MDV100), UL36 (MDV049), AND MEQ (MDV005 / MDV076) (SEE RESULTS FOR COMPLETE LIST). (B) A TRIMMED GENOME FORMAT WITHOUT THE TERMINAL REPEAT REGIONS WAS USED FOR ANALYSES IN ORDER TO NOT OVER- REPRESENT THE REPEAT REGIONS. (C) PERCENT IDENTITY FROM MEAN PAIRWISE COMPARISON OF FIVE CONSENSUS GENOMES, PLOTTED SPATIALLY ALONG THE LENGTH OF THE GENOME. DARKER COLORS INDICATE LOWER PERCENT IDENTITY (SEE LEGEND)...... 38 FIGURE 2-6: GENOME-WIDE DISTRIBUTION OF POLYMORPHIC BASES WITHIN EACH CONSENSUS GENOME. POLYMORPHIC BASE CALLS FROM EACH MDV GENOME WERE GROUPED IN BINS OF 5 KB AND THE SUM OF POLYMORPHISMS IN EACH BIN WAS PLOTTED. FARM B-DUST (AQUA) CONTAINED THE LARGEST NUMBER OF POLYMORPHIC BASES, WITH THE MAJORITY OCCURRING IN THE REPEAT REGION (IRL/IRS). FARM A-DUST 1 (BROWN) AND FARM A-DUST 2 (GRAY) HARBORED FEWER POLYMORPHIC BASES, WITH SIMILAR DISTRIBUTION TO FARM B-DUST. POLYMORPHIC BASES DETECTED IN FEATHER GENOMES WERERARER, ALTHOUGH THIS LIKELY REFLECTS THEIR LOWER COVERAGE DEPTH (SEE TABLE 2-1). NOTE THAT THE UPPER AND LOWER SEGMENTS OF THE Y-AXIS HAVE DIFFERENT SCALES; THE NUMBER OF POLYMORPHIC BASES PER GENOME FOR THE SPLIT COLUMN ON THE RIGHT ARE LABELED FOR CLARITY...... 41

viii FIGURE 2-7: GENOME-WIDE DISTRIBUTION OF POLYMORPHISMS WITHIN EACH CONSENSUS GENOME, USING HIGH- STRINGENCY CRITERIA. POLYMORPHIC BASE CALLS FROM EACH MDV GENOME WERE GROUPED BY POSITION IN BINS OF 5 KB AND THE SUM OF POLYMORPHISMS IN EACH BIN WAS PLOTTED. STRICTER PARAMETERS OF POLYMORPHISM DETECTION (SEE METHODS FOR DETAILS) REVEALED A SIMILAR DISTRIBUTION TO THOSE IN FIGURE 3. NO POLYMORPHISMS WERE DETECTED IN FEATHER-DERIVED GENOMES USING HIGH-STRINGENCY CRITERIA, DUE TO THEIR LOWER COVERAGE DEPTH (SEE TABLE 2-1). NOTE THAT THE UPPER AND LOWER SEGMENTS OF THE Y-AXIS HAVE DIFFERENT SCALES; THE NUMBER OF POLYMORPHIC BASES PER SEGMENT FOR THE SPLIT COLUMN ON THE RIGHT ARE THUS LABELED ON THE GRAPH...... 42 FIGURE 2-9: A NEW POLYMORPHIC LOCUS IN ICP4, AND ITS SHIFTING ALLELE FREQUENCY OVER TIME. (A) HTS DATA REVEALED A NEW POLYMORPHIC LOCUS IN ICP4 (MDV084) AT NUCLEOTIDE POSITION 5,495. IN THE SPATIALLY- AND TEMPORALLY-SEPARATED DUST SAMPLES FROM FARM A (SEE FIGURE 2-1 AND METHODS FOR DETAILS), WE OBSERVED A DIFFERENT PREVALENCE OF C (ENCODING SERINE) AND A (ENCODING TYROSINE) ALLELES. (B) USING TARGETED SANGER SEQUENCING OF THIS LOCUS, TIME-SEPARATED DUST SAMPLES SPANNING NINE MONTHS WERE SANGER-SEQUENCED TO TRACK POLYMORPHISM FREQUENCY AT THIS LOCUS OVER TIME. THE MAJOR AND MINOR ALLELE FREQUENCIES AT THIS LOCUS VARIED WIDELY ACROSS TIME, AND THE MAJOR ALLELE SWITCHED FROM C TO A MORE THAN TWICE DURING THIS TIME...... 45 FIGURE 2-10: DENDROGRAM OF GENETIC DISTANCES AMONG ALL SEQUENCED MDV-1 GENOMES. USING A MULTIPLE- GENOME ALIGNMENT OF ALL AVAILABLE COMPLETE MDV-1 GENOMES, WE CALCULATED THE EVOLUTIONARY DISTANCES BETWEEN GENOMES USING THE JUKES-CANTOR MODEL. A DENDROGRAM WAS THEN CREATED USING THE NEIGHBOR-JOINING METHOD IN MEGA WITH 1000 BOOTSTRAPS. THE FIVE NEW FIELD-SAMPLED MDV-1 GENOMES (GREEN) FORMED A SEPARATE GROUP BETWEEN THE TWO CLUSTERS OF USA ISOLATES (BLUE). THE EUROPEAN VACCINE STRAIN (RISPENS) FORMED A SEPARATE CLADE, AS DID THE THREE CHINESE MDV-1 GENOMES (DARK BLUE). GENBANK ACCESSIONS FOR ALL STRAINS: NEW GENOMES, TABLE 2-1; PASSAGE 11- 648A, JQ806361; PASSAGE 31-648A, JQ806362; PASSAGE 61-648A, JQ809692; PASSAGE 41-648A, JQ809691; PASSAGE 81-648A, JQ820250; CU-2, EU499381; RB-1B, EF523390; MD11, 170950; MD5, AF243438; RISPENS (CVI988), DQ530348; 814, JF742597; GX0101, JX844666; LMS, JQ314003...... 47 FIGURE 2-11. TAXONOMIC DIVERSITY IN DUST AND CHICKEN FEATHERS FROM FARM B. WE USED AN ITERATIVE BLASTN WORKFLOW TO GENERATE TAXONOMIC PROFILES FOR ALL SAMPLES FROM FARM B (SEE METHODS FOR DETAILS). MAJOR CATEGORIES ARE SHOWN HERE, WITH A FULL LIST OF TAXA (TO FAMILY LEVEL) IN SUPPLEMENTAL TABLE S2-3. FARM B-FEATHER 1 AND FARM B-FEATHER 2 SHOW LESS OVERALL DIVERSITY, AS WOULD BE EXPECTED FROM DIRECT HOST-SAMPLING, VS. THE ENVIRONMENTAL MIXTURE OF THE DUST SAMPLES. SINCE THE VIRAL DNA ENRICHMENT PROCEDURES REMOVE VARIABLE AMOUNTS OF HOST AND ENVIRONMENTAL CONTAMINANTS, THE PROPORTION OF TAXA PRESENT IS REPRESENTATIVE BUT NOT FULLY DESCRIPTIVE OF THOSE PRESENT INITIALLY. THE ASTERISK INDICATES SEQUENCES THAT WERE UNCLASSIFIED OR AT LOW PREVALENCE...... 49 FIGURE 2-12: METHODS DESCRIBED ABOVE CAN BE USED TO EXPLORE ADDITIONAL ASPECTS OF VARIATION IN FUTURE STUDIES. (ARTWORK BY NICK SLOFF, PENN STATE UNIVERSITY, DEPARTMENT OF ENTOMOLOGY) ...... 53 FIGURE 3-2. SOUTHERN BLOT COMPARISON OF GENETIC VARIATION IN R-13 (FATHER’S) AND N-7 (SON’S) ISOLATES, RELATIVE TO OTHER STRAINS OF HSV-1. THE OVERALL GENOMIC STRUCTURE OF THE FATHER’S AND SON’S CLINICAL ISOLATES WERE ANALYZED BY DNA (SOUTHERN) BLOT ANALYSIS AND COMPARED TO FOUR COMMON LABORATORY STRAINS 17SYN+, MCKRAE, KOS(M), F, AND TEN DIFFERENT CLINICAL ISOLATES. VIRAL GENOMIC DNA WAS CLEAVED WITH BAMHI, GEL-SEPARATED, AND PROBED WITH A COSMID CLONE INSERT SPANNING 40 KBP OF STRAIN 17SYN+ GENOME (SEE METHODS FOR DETAILS). NO MAJOR REARRANGEMENTS OR CHANGES IN FRAGMENT SIZE WERE OBSERVED BETWEEN THE R-13 (FATHER) AND N-7 (SON) ISOLATES...... 83 FIGURE 3-3: QUANTIFICATION OF REPLICATION AND LATENCY PHENOTYPES OF R-13 AND N-7 DURING INFECTION IN VIVO. SWISS WEBSTER MICE (MALE) WERE INFECTED ON SCARIFIED CORNEAS WITH 2 X 105 PFU OF THE CLINICAL ISOLATE R-13 OR N-7. AT THE INDICATED TIMES POST-INFECTION, TISSUES COLLECTED FROM EACH OF THREE MICE PER GROUP WERE ASSAYED FOR INFECTIOUS VIRUS USING A STANDARD PLAQUE ASSAY (SEE METHODS FOR DETAILS). REPLICATION IN EYES (A) AND TG (B) REVEALED THAT INFECTIOUS VIRUS GENERATED IN THESE TISSUES DURING THE ACUTE STAGE OF INFECTION WAS NOT SIGNIFICANTLY DIFFERENT BETWEEN R-13, N-7, OR 17SYN+, WHEREAS THE LEVELS FOR CI-37 WERE SIGNIFICANTLY HIGHER (STUDENT’S T TEST, *P≤0.05, **P≤0.01, FOR PEAK TITER ON DAY 4; AUC = AREA UNDER THE CURVE) (C) THE REPLICATION KINETICS AND REGIONAL DISTRIBUTION OF ISOLATES IN THE BRAIN WAS DETERMINED BY CUTTING EACH BRAIN INTO 4 CORONAL SECTIONS. THE LEVELS OF INFECTIOUS VIRUS WERE NOT SIGNIFICANTLY

ix DIFFERENT BETWEEN R-13, N-7, OR 17SYN+, WHEREAS THE LEVELS FOR CI-37 WERE SIGNIFICANTLY HIGHER (STUDENT’S T TEST, *P≤0.05, **P≤0.01, FOR PEAK TITER ON DAY 6). THIS LOW LEVEL OF INFECTIOUS VIRUS FOR R-13 AND N-7 IN THE BRAIN (≤ 100 PFU) IS CONSISTENT WITH THE 100% SURVIVAL OBSERVED FOR BOTH ISOLATES UNDER THESE INFECTION CONDITIONS, WHEREAS CI-37 INDUCED COMPLETE MORTALITY (FIGURE 3-4). (D-E) QUANTIFICATION OF LATENT VIRAL GENOMES. AT 40 DAYS POST INFECTION, THE TG AND BRAINS FROM R-13 (N=4) AND N-7 (N=4) LATENTLY INFECTED MICE WERE ASSAYED FOR VIRAL GENOME COPIES USING REAL TIME QPCR. VIRAL GENOME COPY NUMBERS PER 50 NG MOUSE DNA DETECTED IN R-13 AND N-7 WERE NOT SIGNIFICANTLY DIFFERENT (STUDENT’S T TEST) (D). BRAINS WERE CUT INTO FOUR CORONAL SECTIONS PRIOR TO ISOLATING DNA AND THE NUMBER OF VIRAL GENOMES COPIES WAS DETERMINED IN EACH SECTION. VIRAL GENOME COPIES IN THE BRAINS WERE NOT SIGNIFICANTLY DIFFERENT (ANOVA; ON BOX AND WHISKER PLOT, + INDICATES MEAN, BAR INDICATES MEDIAN) (E)...... 85 FIGURE 3-4. NO MORTALITY WAS OBSERVED IN SWISS WEBSTER MICE INFECTED WITH EITHER THE FATHER’S (R-13) OR THE SON’S (N-7) ISOLATE OF HSV-1. SWISS WEBSTER MICE WERE INFECTED VIA THE OCULAR ROUTE WITH 2×105 PFU OF R-13 (FATHER’S), N-7 (SON’S), 17SYN+, AND CI-37 HSV-1 (SEE METHODS FOR DETAILS). NEITHER N-7 NOR R-13 CAUSED ANY DEATH OF MICE THROUGH 40 DAYS POST INFECTION, WHEREAS 17SYN+ AND CI-37 CAUSED 19% AND 100% MORTALITY RESPECTIVELY (R-13, 35/35 MICE SURVIVED; N-7, 34/34 MICE SURVIVED; CI-37 5/5 MICE DIED; 17SYN+ 13/16 MICE DIED.). MORTALITY RATE FOR ISOLATE CI-37 WAS SIGNIFICANTLY DIFFERENT AS COMPARED TO 17SYN+ (ANOVA, ** P<0.003) AND N-7 OR R-13 (ANOVA, *** P<0.0001). THE MORTALITY RATE FOR 17SYN+ WAS ALSO SIGNIFICANTLY DIFFERENT AS COMPARED TO N-7 OR R-13 (ANOVA, * P<0.03)...... 86 FIGURE 3-5: EXPLANT AND IN VIVO REACTIVATION IN SWISS WEBSTER MICE LATENTLY INFECTED WITH R-13 AND N- 7. R-13 AND N-7 WERE COMPARED FOR REACTIVATION FROM LATENCY (>40 DAYS PI) USING IN VITRO AND IN VIVO REACTIVATION ASSAYS. (A) IN A STANDARD TG EXPLANT ASSAY, NO DIFFERENCE BETWEEN R-13 AND N-7 REACTIVATION FREQUENCY WAS OBSERVED (P = 0.99, STUDENT’S T TEST) ALTHOUGH THE DIFFERENCE BETWEEN VIRUS RECOVERED AT TIME 0 AND 3 DAYS POST EXPLANT WAS SIGNIFICANT IN BOTH GROUPS (P=0.0003, ANOVA). (B) THE IN VIVO REACTIVATION FREQUENCY (PERCENTAGE OF MICE WITH INFECTIOUS VIRUS DETECTED IN TG) WAS ALSO NOT DIFFERENT BETWEEN R-13 AND N-7 AT 22 HRS. POST HYPERTHERMIC STRESS (STUDENT’S T TEST, EXPT. 1 P = 0.89, EXPT. 2 P = 0.83) AND (C) THE NUMBER OF NEURONS EXITING LATENCY WAS ALSO NOT DIFFERENT (EXPT. 1 P = 0.39, EXPT. 2 P = 0.86). LATENTLY INFECTED TG WERE SUBJECTED TO WHOLE GANGLION IMMUNOHISTOCHEMISTRY TO DETECT VIRAL PROTEIN AT 0 HRS. (D), AND 3 DAYS (E) POST EXPLANT. VIRAL PROTEIN-EXPRESSING NEURONS (BLACK ARROWS) AND TRACTS (WHITE ARROWHEADS) MARK THE RANGE OF VIRAL SPREAD. VIRAL PROTEIN-EXPRESSING NEURONS (BLACK ARROWS) ARE DETECTABLE IN TG FROM N-7 (F) AND R-13 (G), WHICH REACTIVATED FROM LATENTLY INFECTED TG, AT 22 HRS. AFTER HYPERTHERMIC STRESS IN VIVO...... 88 FIGURE 3-6: A PHYLOGENETIC NETWORK SHOWING GENETIC RELATEDNESS BETWEEN ISOLATES R-13 (FATHER), N-7 (SON) AND PREVIOUSLY SEQUENCED HSV-1 ISOLATES. A PHYLOGENETIC NETWORK BETWEEN ISOLATES R-13 (FATHER), N-7 (SON) AND ALL AVAILABLE COMPLETE HSV-1 GENOMES WAS CONSTRUCTED USING SPLITSTREE4. THE FATHER AND SON ISOLATES FORM A SEPARATE BRANCH COMPARED TO ALL PREVIOUSLY SEQUENCED HSV-1 GENOMES, WITH THEIR BRANCH LOCALIZED BETWEEN THE ASIAN/EUROPEAN AND AFRICAN CLUSTERS. SEE METHODS FOR A COMPLETE LIST OF STRAIN NAMES AND GENBANK ACCESSIONS FOR THE HSV-1 GENOMES INCLUDED IN THIS ANALYSIS...... 91 FIGURE 3-7: INTRA-STRAIN VARIATION OBSERVED AT A POLYMORPHIC LOCUS IN N-7 (SON’S) VIRAL GENOME, IN THE GENE UL14. A LOW FREQUENCY NON-SYNONYMOUS VARIATION WAS OBSERVED IN THE N-7 (SON’S) VIRAL GENOME AT POSITION 19,370, IN THE TEGUMENT PROTEIN UL14. THIS SITE HAS AN A PRESENT IN 3% OF THE VIRAL SEQUENCE READS INSTEAD OF THE MAJORITY G ALLELE (97%). THE MINORITY ALLELE FOR UL14 ENCODES A VALINE TO METHIONINE CHANGE AT RESIDUE 109; MET109 EXISTS AS THE DOMINANT ALLELE IN SEVERAL INDEPENDENT ISOLATES OF HSV-1 (SEE TEXT FOR DETAILS). WHILE THE UL14 CODING SEQUENCE IS ENCODED ON THE REVERSE STRAND OF THE REFERENCE GENOME FOR HSV-1, IT IS DEPICTED HERE IN FORWARD ORIENTATION TO ENABLE CODON READING FROM LEFT TO RIGHT. ACTUAL READ DEPTH OF EACH VARIANT IS INDICATED ABOVE. A SUBSET OF THE ALIGNMENT OF ILLUMINA SEQUENCING READS TO THE N-7 CONSENSUS GENOME IS SHOWN HERE, WITH THE POSITION AND CONSENSUS SEQUENCE SHOWN IN THE TOP ROW. SEQUENCE READ ORIENTATION IS DEPICTED AS AQUA AND GREEN, WITH DIRECTIONAL ARROWS. AREAS WITH NO LETTER SHOWN HAVE 100% AGREEMENT WITH THE CONSENSUS NUCLEOTIDE; THE LETTERS ARE LEFT OUT FOR CLARITY...... 93 FIGURE 3-8: INTRA-STRAIN VARIATION OBSERVED AT A POLYMORPHIC LOCUS ADJOINING A HOMOPOLYMERIC TRACT,

x IN AN INTERGENIC REGION OF THE HSV-1 ISOLATE N-7. A POTENTIAL POLYMORPHIC LOCUS WAS DETECTED AT POSITION 25,647 OF BOTH THE SON’S N-7 (7.8% MINORITY ALLELE), AND THE FATHER’S R-13 (4% MINORITY ALLELE) VIRAL GENOMES. HOWEVER INSPECTION OF THE ALIGNMENT OF ILLUMINA SEQUENCING READS TO THE CONSENSUS GENOME (N-7 SHOWN HERE) REVEALED THAT THE POLYMORPHIC SITE DETECTION RESULTED FROM A COMBINATION OF SMALL INSERTIONS OR DELETIONS IN A HOMOPOLYMERIC TRACT OF GS IN THE CONSENSUS GENOME. A SUBSET OF THE ALIGNMENT OF ILLUMINA SEQUENCING READS TO THE N-7 CONSENSUS GENOME IS SHOWN HERE, WITH THE POSITION AND CONSENSUS SEQUENCE SHOWN IN THE TOP ROW. ACTUAL READ DEPTH AT THE POSITION IS INDICATED ABOVE. INSERTIONS RELATIVE TO THE CONSENSUS ARE SHOWN AS A BLUE “I” (LABELED AS GG OR GGG), AND DELETIONS RELATIVE TO THE CONSENSUS ARE SHOWN AS A BLACK HORIZONTAL LINE (RANGE OF 1-6 BP SHORTER) IN THE ALIGNED SEQUENCE READ. THE LENGTH OF THE G-HOMOPOLYMER TRACT IS SHOWN ON THE LEFT, FOR THOSE SEQUENCE READS THAT COMPLETELY SPAN THE HOMOPOLYMER TRACT. HOMOPOLYMER TRACT LENGTH CANNOT BE INFERRED FOR READS THAT TERMINATE WITHIN THE G-TRACT; THUS NO LENGTH IS LISTED FOR THOSE READS. THIS POSITION AND OTHER POLYMORPHIC LOCI THAT WERE DETECTED ADJACENT TO TANDEM REPEATS AND HOMOPOLYMERIC TRACTS WERE FLAGGED AS SUCH IN SUPPLEMENTARY TABLE 2. FORWARD-ORIENTED SEQUENCE READS ARE COLORED AQUA, WHILE REVERSE-ORIENTED READS ARE COLORED GREEN. AREAS WITH NO LETTER SHOWN HAVE 100% AGREEMENT WITH THE CONSENSUS NUCLEOTIDE; THE LETTERS ARE LEFT OUT FOR CLARITY...... 95 FIGURE 4.1: OVERVIEW OF METHOD USED FOR RECOMBINATION ANALYSIS OF MDV-1 ISOLATES. THE RECOMBINATION ANALYSIS USED BREAKPOINTS GENERATED BY THE PROGRAM 3SEQ. THE BREAKPOINTS WERE USED TO DIVIDE THE GENOMES INTO GENOME SEGMENTS. THE SEGMENTS WERE THEN USED TO GENERATE MAXIMUM- LIKELIHOOD PHYLOGENETIC TREES AND ANALYZED FOR THE PRESENCE OF PHYLOGENETIC INCONGRUENCE BETWEEN SEGMENTS. ADJACENT SEGMENTS LACKING PHYLOGENETIC INCONGRUENCE WERE COMBINED INTO A SINGLE GENOME-SEGMENT...... 113 FIGURE 4.2: OVERVIEW OF THE MDV-1 GENOME SHOWING OPEN READING FRAMES (ORFS), GENOMIC REGIONS, AND RECOMBINATION BREAKPOINTS ACROSS THE GENOME. (A) THE FULL STRUCTURE OF THE MDV-1 GENOME INCLUDES A UNIQUE LONG REGION (UL) AND A UNIQUE SHORT REGIONS (US), EACH OF WHICH ARE FLANKED LARGE REPEATS KNOWN AS THE TERMINAL AND INTERNAL REPEATS OF THE LONG REGION (TRL AND IRL) AND THE SHORT REGION (TRS AND IRS). MOST ORFS (PALE GREEN ARROWS) ARE LOCATED IN THE UNIQUE REGIONS OF THE GENOME. ORFS CONTAINING THE BREAKPOINTS OR CLOSE TO THE IDENTIFIED RECOMBINATION BREAKPOINTS ARE LABELED. (B) A TRIMMED GENOME FORMAT WITHOUT THE TERMINAL REPEAT REGIONS WAS USED FOR ANALYSES IN ORDER TO NOT OVER-REPRESENT THE REPEAT REGIONS. THE RECOMBINATION BREAKPOINTS IDENTIFIED ARE SHOWN USING RED DOTTED LINES...... 115 FIGURE 4.3: PHYLOGENETIC TREE OF SEGMENT 3 SHOWING DONATION OF SEGMENT 1 BETWEEN CLADES/ISOLATES...... 116 FIGURE 4.4: PHYLOGENTIC TREE OF SEGMENT 3 SHOWING DONATION OF SEGMENT 2 BETWEEN CLADES/ISOLATES...... 117 FIGURE 4.5: PHYLOGENTIC TREE OF SEGMENT 3 SHOWING DONATION OF SEGMENT 4 BETWEEN CLADES/ISOLATES...... 118 FIGURE 4.6: PHYLOGENTIC TREE OF SEGMENT 3 SHOWING DONATION OF SEGMENT 5 BETWEEN CLADES/ISOLATES...... 119 FIGURE 4.7: PHYLOGENTIC TREE OF SEGMENT 3 SHOWING DONATION OF SEGMENT 6 BETWEEN CLADES/ISOLATES...... 120 FIGURE 4.8: PHYLOGENTIC TREE OF SEGMENT 3 SHOWING DONATION OF SEGMENT 7 BETWEEN CLADES/ISOLATES...... 121 FIGURE 4.9: PHYLOGENTIC TREE OF SEGMENT 3 SHOWING DONATION OF SEGMENT 1,2,4,5,6 AND 7 BETWEEN CLADES/ISOLATES...... 123 FIGURE 4.10: UNMODIFIED PHYLOGENTIC TREE OF SEGMENT 3 SHOWING DONATION OF SEGMENT 1,2,4,5,6 AND 7 BETWEEN CLADES/ISOLATES...... 124 FIGURE A-1: OCULAR MODEL OF INFECTION IN MICE. THE OCULAR MODEL OF INFECTION INVOLVES INOCULATION OF THE SCARIFIED CORNEAL SURFACE WITH A HIGH TITER OF VIRUS. THIS IS FOLLOWED BY QUANTIFICATION OF THE RESULTING DISEASE PHENOTYPE IN THE EYE ON A SCALE OF 0 TO 5, WITH 0 BEING NO DISEASE PHENOTYPE AND 5 BEING CORNEAL KERATITIS. MORBIDITY IN INFECTED ANIMALS IS ASSESSED BY MEASURING WEIGHT LOSS. ANIMALS SHOWING GREATER THAN 20% WEIGHT-LOSS ARE EUTHANIZED AND ARE USED TO QUANTIFY SURVIVAL RATE POST-INFECTION. THE ABILITY OF THE HSV-1 ISOLATES TO PENETRATE INTO THE NERVOUS SYSTEM IS ASSESSED BY MEASURING VIRAL TITERS AT 2,4,6, AND 8 DAYS POST-INFECTION IN EYES, TRIGEMINAL

xi GANGLIA, AND BRAIN...... 137 FIGURE A-2: DISEASE PHENOTYPE OF CLINICAL AND LAB-ISOLATES IN SWISS-WEBSTER MICE. ISOLATES THAT DID NOT CAUSE MORTALITY IN MICE WERE CATEGORIZED AS LOW-VIRULENCE ISOLATES WHEREAS ISOLATES THAT CAUSED MORTALITY WERE CATEGORIZED AS HIGH-VIRULENCE. ISOLATES WITH * ARE LOW-PASSAGE CLINICAL ISOLATES AND THE REST ARE LAB ISOLATES...... 137 FIGURE A-3: VARIANTS IN VP22 ASSOCIATED WITH VIRULENCE IN THE MOUSE OCULAR MODULE. (A) POSITIONS OF THE UL49 GENE UNDER INVESTIGATION IN THE HSV-1 GENOME. (B) SCHEMATIC SHOWING FUNCTIONAL DOMAINS OF PROTEIN VP22 AND POSITIONS OF AMINO ACID RESIDUES UNDER INVESTIGATION. (C) AMINO ACID ALIGNMENT SHOWING DIFFERENCE IN RESIDUES BETWEEN HIGH AND LOW-VIRULENCE ISOLATES...... 138 FIGURE A-4: CRISPR-FACILITATED HOMOLOGOUS RECOMBINATION. HIGH-VIRULENCE ASSOCIATED GENE WILL BE INTRODUCED IN THE BACKGROUND OF LOW-VIRULENCE ISOLATES TO GENERATE A HIGH-VIRULENCE RECOMBINANT VIRUS AND VICE VERSA. THE CAS9 PROTEIN CAUSES A DOUBLE-STRANDED BREAK AT THE CRISPR SITE IN THE HSV-1 GENOME, WHICH AIDS HOMOLOGOUS RECOMBINATION...... 139 FIGURE A-5: PREPARATION OF SHUTTLE PLASMID CONTAINING THE GENE VARIANT OF INTEREST FOR UL49 (VP22). (A) SNAPSHOT OF THE HSV-1 GENOME SHOWING THE GENOMIC REGION CONTAINING THE UL49 GENE. THE 2.5 KB REGION WAS AMPLIFIED USING PCR AND USED AS AN INSERT TO MAKE A SHUTTLE PLASMID. KPNI AND BAMHI RESTRICTION SITES PRESENT IN THE REGION WERE USED FOR ANNEALING THE INSERT INTO THE PUC19+2 PLASMID BACKBONE. (B) SCHEMATIC OF THE PUC19+2 PLASMID SHOWING THE POSITIONS OF KPNI AND BAMHI RESTRICTION SITES IN THE PLASMID BACKBONE. THE PLASMID WAS PROVIDED BY DR. RICHARD JOHNSON (UNIVERSITY OF CINCINNATI.) (C) GEL SHOWING FORMATION OF THE SHUTTLE PLASMID. THE PLASMID WITHOUT THE INSERT IS ~2.6 KB IN LENGTH. AFTER THE INTEGRATION OF THE INSERT (~2.5 KB) INTO THE BACKBONE THE PLASMID NOW RUNS BETWEEN 5 AND 6 KB ON THE AGAROSE GEL. THE SHUTTLE PLASMID IS STORED IN THE SZPARA LAB PLASMID LIBRARY...... 140 FIGURE A-6: PREPARATION OF CRISPR-CAS9 TARGETING PLASMID. (A) SNAPSHOT OF THE HSV-1 GENOME SHOWING THE CAS9 TARGETING SITE FOR CRISPR-CAS9 ASSISTED HOMOLOGOUS RECOMBINATION. HIGHLIGHTED IN GREEN IS THE EXACT SEQUENCE IN THE HSV-1 GENOME HOMOLOGOUS TO THE DESIGNED GUIDE RNA. HIGHLIGHTED IN YELLOW IS THE PROTOSPACER ADJACENT MOTIF (PAM) SEQUENCE WHICH FLANKS THE 3’ END THE HOMOLOGOUS SEQUENCE AS RECOMMENDED BY LE CONG ET AL (369). (B) NUCLEOTIDE SEQUENCE OF OLIGONUCLEOTIDES USED FOR CLONING INTO THE CRISPR PLASMID. THE PAIR OF ANNEALED OLIGOS WERE CLONED INTO THE CRISPR ARRAY OF PX330-U6-CHIMERIC_BB-CBH-HSPCAS9 PLASMID (ADDGENE #42230) USING BBSI RESTRICTION SITES (369). THE PROTOCOL USED FOR THIS PROCESS WAS PROVIDED BY TSCHARKE LAB (AUSTRALIAN NATIONAL UNIVERSITY) AND LINDNER LAB (PENN STATE UNIVERSITY). (C) RESTRICTION DIGEST OF PX330 PLASMIDS WITH ENZYMES BBSI AND AGEI. PLASMIDS WITH ANNEALED OLIGONUCLEOTIDES ARE LINEARIZED TO FORM A SINGLE BAND DUE TO THE LOSS OF BBSI RESTRICTION SITE WHEREAS PLASMIDS WITHOUT ANNEALED OLIGONUCLEOTIDES ARE DIGESTED INTO ~1 KB AND ~7.5 KB FRAGMENTS...... 141 FIGURE B-2. GENETIC COMPARISON OF HSV-1 ISOLATES TRANSFERRED FROM MOTHER TO NEONATE DURING BIRTH. (A) DIAGRAM OF THE HSV-1 GENOME AND ITS GENES (GENES; GRAY ARROWS DEPICT FORWARD- VS. REVERSE- STRAND ENCODED GENES). OVERLAPPING GENES ARE SHOWN BELOW THE MAIN DIAGRAM. BLACK DASHED VERTICAL LINES EXCLUDING TERMINAL REPEAT REGIONS DENOTE TRIMMED GENOME USED FOR DOWNSTREAM GENOMIC ANALYSES (SEE METHODS). (B) HISTOGRAM SHOWS PERCENT IDENTITY OF A DNA ALIGNMENT OF VIRAL GENOMES DERIVED FROM THE THREE CLINICAL SAMPLES FROM MOTHER AND NEONATE. IN THE HISTOGRAM, NUCLEOTIDE POSITION IDENTITY IS COLOR-CODED: 100% IDENTITY IS GRAY, ≤99% IDENTITY IS YELLOW, AND ≤25% IDENTITY IS RED. TO ILLUSTRATE THE GENOME-SPECIFIC LOCATIONS OF THESE NON- IDENTICAL SITES, EACH GENOME IS DEPICTED AS A HORIZONTAL GRAY BAR (BOTTOM), WITH GAPS IN THE ALIGNMENT (IN/DELS) SHOWN AS VERTICAL OR HORIZONTAL BLACK BARS. UL, UNIQUE LONG REGION; US, UNIQUE SHORT REGION; TRL / IRL, TERMINAL OR INTERNAL REPEAT OF THE LONG REGION; TRS / IRS, TERMINAL OR INTERNAL REPEAT OF THE SHORT REGION. IDENTITY GRAPH WAS GENERATED USING GENEIOUS...... 148 FIGURE B-3. SPLITSTREE DEMONSTRATING THE NORTH AMERICAN PHYLOGENY OF THE MOTHER AND BABY HSV-1 ISOLATES. THE THREE CLINICAL ISOLATES PRESENTED IN THIS WORK ARE HIGHLIGHTED IN RED AND FALL WITHIN THE NORTH AMERICAN/EUROPEAN PHYLOGENETIC CLADE. THIS TREE WAS GENERATED BY ALIGNING THE CONSENSUS SEQUENCES OF THE THREE VIRAL ISOLATES FROM MOTHER AND NEONATE WITH 50 PUBLISHED HSV-1 STRAINS (SEE METHODS SECTION). THE SCALE BAR REPRESENTS 0.1% NUCLEOTIDE DIVERGENCE. .. 149 FIGURE B-4. DE NOVO MINORITY VARIANT IN THE UL6 PORTAL PROTEIN OF THE NEONATE’S HSV-1 SKIN ISOLATE. (A)

xii PROTEIN DIAGRAM OF THE 676 AMINO ACID UL6 PORTAL PROTEIN OF HSV-1. THIS DIAGRAM DETAILS KNOWN DOMAINS AS WELL AS PREDICTED DOMAINS (378). RGD MOTIFS (12-18 AND 238-240) INCLUDE PEPTIDES ARG-GLY-ASP THAT TOGETHER, ARE ASSOCIATED WITH CELL ADHESION (379, 380). PREDICTED NUCLEAR PORE BINDING DOMAIN (19-48). DISULFIDE BONDS ARE PRESENT AT AMINO ACID POSITIONS 166 AND 254. LEUCINE ZIPPER MOTIF OCCURS FROM AMINO ACID POSITION 422-443 (378). THE LOCATION OF THE UL6 MINORITY VARIANT DETECTED IN THE NEONATE SKIN HSV-1 ISOLATE AND HIGHLIGHTED IN PART C IS SHOWN RELATIVE TO THE ENTIRE 676 AMINO ACID PROTEIN. (B) THE UL6 PORTAL PROTEIN DOES NOT APPEAR TO BE WELL CONSERVED AMONG THE 8 HUMAN HERPESVIRUSES. IDENTITIES WERE CALCULATED USING BLAST ALIGNMENTS AGAINST HSV-1 UL6 PROTEIN ON THE UNIPROT WEBSITE (381). (C) UL6 MINORITY VARIANT PRESENT IN THE NEONATE SKIN HSV-1 GENOME IS HIGHLIGHTED AT THE NUCLEOTIDE LEVEL. A SUBSET OF READS IS SHOWN AT NUCLEOTIDE POSITION 7,373 RELATIVE TO THE CONSENSUS LEVEL SEQUENCE. THE MAJOR ALLELE GUANINE (G) AT THIS POSITION IS PRESENT AT A FREQUENCY OF 90% WHILE THE MINOR ALLELE THYMINE (T) IS PRESENT AT A FREQUENCY OF 9%. BOTH FORWARD (GREEN) AND REVERSE (PURPLE) READS ARE SHOWN TO DEMONSTRATE THAT BIDIRECTIONAL READ SUPPORT IS PRESENT AT THIS NUCLEOTIDE POSITION...... 152 FIGURE C-1: MINORITY VARIANTS PRESENT IN NEONATAL HSV-2 GENOME POPULATIONS. (A) PLOT INDICATES THE TOTAL NUMBER OF MINORITY VARIANTS (MV) OBSERVED IN EACH NEONATAL ISOLATE. DISS29 AND CNS15 HAVE 10-FOLD MORE MINORITY VARIANTS THAN OTHER NEONATAL STRAINS, WHICH IS PARTICULARLY NOTICEABLE FOR THOSE MV THAT ARE LOCATED IN CODING, OR GENIC, SEQUENCES. (B) MINORITY VARIANTS CAN BE EITHER SINGLE-NUCLEOTIDE VARIANTS OR POLYMORPHISMS (SNPS) OR SMALL INSERTIONS OR DELETIONS (IN/DELS). MINORITY VARIANT SNPS ARE MORE COMMON THAN IN/DELS. (C) PIE CHARTS SHOW THE OVERALL FREQUENCY OR PEENTRANCE OF EACH MINORITY VARIANT. DISS29 AND CNS15 HAVE MANY MV THAT EXIST AT A HIGH FREQUENCY OR PENETRANCE, WHILE THE PENETRANCE OF MV ALLELES IN MOST OTHER NEONATAL ISOLATES IS LOW. (D) STACKED HISTOGRAMS SHOW THE NUMBER OF NONSYNONYMOUS MV LOCATED IN EACH HSV2 CODING SEQUENCE (GENE). COLOR CODING OF STACKED HISTOGRAM BARS IS THE SAME AS SHOWN IN ...... 157

xiii List of Tables

TABLE 2-1: FIELD SAMPLE STATISTICS AND ASSEMBLY OF MDV-1 CONSENSUS GENOMES ...... 36 TABLE 2-2: PAIR-WISE DNA IDENTITY AND VARIANT PROTEINS BETWEEN PAIRS OF CONSENSUS GENOMES ...... 39 TABLE 2-3: CHI-SQUARED VALUES FROM PAIRWISE COMPARISONS OF DIFFERENT CATEGORIES OF POLYMORPHISMS. . 44 TABLE 3-1. SEQUENCING STATISTICS FOR N-7 AND R-13 STRAINS...... 79 TABLE 3-2. PAIR-WISE DNA IDENTITY AND VARIANT PROTEINS BETWEEN CONSENSUS GENOMES ...... 89 TABLE 4.1: NAMES, PATHOTYPES, YEAR AND LOCATION OF ISOLATION OF MDV-1 ISOLATES ...... 111 TABLE 4.2: SUMMARY OF BREAKPOINTS IDENTIFIED ACROSS MDV GENOMES ...... 114 TABLE B-1: UNIQUE AMINO ACID VARIANTS FOUND IN ALL THREE CLINICAL GENOMES FROM THE MOTHER AND NEONATE THAT ARE NOT PRESENT IN 48 OTHER PUBLISHED STRAINS OF HSV-1...... 149 TABLE C-1: CLINICAL CHARACTERISTICS ASSOCIATED WITH HSV-2 ISOLATES FROM TEN PATIENTS...... 155 TABLE D-1: SNPS IDENTIFIED ACROSS MDV-1 GENOMES THAT CORRELATE WITH DISEASE PHENOTYPES OF THE ISOLATES. ISOLATE NAME, LOCATION OF ISOLATION, ISOLATION DATE, AND PATHOTYPE ARE ALSO SHOWN. SNPS HIGHLIGHTED IN ORANGE ARE PREDOMINANTLY PRESENT IN ‘VV’ AND 'VV+’ ISOLATES WHEREAS SNPS HIGHLIGHTED IN GREEN ARE PRESENT PREDOMINANTLY IN ‘V’ ISOLATES...... 161

xiv

Acknowledgement

Firstly, I would like to thank my advisor Dr. Moriah Szpara for accepting me into her lab. I whole-heartedly believe that the last 4 years I spent in the Szpara lab have helped me become a better scientist and a better thinker. I am extremely grateful for the scientific freedom she has provided me to explore different projects and her guidance in every phase of the research projects. I would also like to thank her for prioritizing my career development and providing me advice on topics outside of my research. I would also like to thank the members of the Szpara lab who I have had the opportunity to collaborate or interact with during my time in the lab. Their scientific acumen has helped me become a better scientist and contributed greatly to enhancing different projects that I have worked on. I also believe that members of Szpara lab have contributed to my personal growth. I hope my presence in the lab has had a similar positive effect on them. I would also like to thank Dr. Andrew Read for fostering collaboration between the Szpara and the Read lab and providing me with guidance on all my projects. I would also like to thank the members of the Read lab, especially Dr. Andrew Bell, with whom I had the opportunity to collaborate on several different projects. Without Dr. Bell’s help and advice, I would not have been able to successfully complete projects that form the basis of my dissertation. I would like to thank my entire thesis committee – Dr. Moriah Szpara, Dr. Andrew Read, Dr. Istvan Albert, Dr. Tim Meredith, and Dr. Antnony Scmitt for their guidance and support. I would also like to acknowledge Dr. Nancy Sawtell at Cincinnati Children’s Hospital Medical Center and Dr. Richard Johnson at University of Cincinnati who have been amazing collaborators on several projects. My graduate work would not have been possible without funding from several different sources. The work presented in this dissertation were supported by startup funds (M.L.S.) from the Pennsylvania State University; Institute of

xv General Medical Sciences, National Institutes of Health (R01GM105244 [A.F.R.]) as part of the joint NSF-NIH-USDA Ecology & Evolution of Infectious Diseases Program; Virus Pathogens Resource (ViPR), a Bioinformatics Resource Center (BRC) funded by NIAID; Huck Institutes for the Life Sciences at the Pennsylvania State University (MLS); Pennsylvania Department of Health Tobacco CURE funds (MLS); and 1R01AI093614 grants to NS and RT. I would like to thank all my past and present roommates who helped turn a house into a livable home. Their company and friendship helped me maintain work-life balance and stay focused on my goals. Lastly, I would like to thank the members of my immediate and extended family. Without the support of my mom, dad, and sister, I would have never been able to accomplish any of this. I would also like to thank my uncles, aunts, and grandparents for being supportive and making me a better person.

xvi

Chapter 1

Introduction

1

1.1 Genetic variation in pathogens and the advent of sequencing technologies The idea of linking genetic variation in pathogens to virulence and disease outcomes has been actively pursued for many decades. Significant headway has been made in identifying underpinning genetic mechanisms of infection, resistance and evolution in , viruses and parasites. Along with host genetics and environmental factors, genetic variation in pathogens is one of the key determinants of disease severity and pathogenesis. Genetically diverse populations allow pathogens to adapt to different environments and withstand changing selective pressures (1–3). The most significant of these selective pressures is the host immune system. Pathogens that carry a certain mutation might be better at evading the host immune system than others. Genetic variation also allows pathogens to expand their host range and/or tissue tropism. Drug resistance and vaccine failure have also attributed to the variation present in pathogen populations (4). Methodologies used to study genetic variation in pathogens have undergone momentous changes over the years. The advent of recombinant DNA technologies; development of DNA sequencing; invention of polymerase chain reaction (PCR); and sequencing of the first viral, bacterial, and eukaryote genomes were important landmarks on the quest to understand, how DNA sequences are translated to observable phenotypes (5). Genome sequencing has spearheaded a revolution in pathogen research. The use of sequencing technologies has enabled a complete genomic definition of each pathogen as opposed to characterization through conventional methods such as morphology, staining properties, and metabolic characteristics (6). Genetic information obtained from pathogens has been used to identify drug resistance genes, compare phylogenetic relationship between different pathogen isolates, and track pathogen outbreaks (7–11). Sanger sequencing (12) has been the gold standard for DNA sequencing for the last 40 years. Sanger sequencing relies on incorporation of nucleotides and fluorescently labeled dideoxynucleotides by DNA polymerase into the template DNA strand (12–14). Incorporation of didexoynucleotides causes termination of DNA replication, resulting in synthesis of DNA strands of various lengths. These DNA strands

2

can then be resolved to single base pair resolution using polyacrylamide gels or capillary tubes. The fluorescently labeled dideoxynucleotides on each synthesized strand can then be used to determine the exact sequence of the template strand. After Fred Sanger successfully sequenced bacteriophage PhiX174 (12), there was an immediate surge towards sequencing organisms from other domains of life. As a result, the first prokaryotic genome of Haemophilus influenzae was published in 1995 (15), first eukaryotic and archaeon genome of Saccharomyces cervesiae and Methanococcus jannaschii respectively in 1996 (16, 17), and finally a draft of human genome in 2001 (18). Genome sequencing has since become an integral part of studying pathogens that cause disease in humans, agriculture and crops. While Sanger sequencing significantly advanced the field of genome sequencing, it had low throughput, high cost and was immensely time consuming. The demand for sequencing a myriad of pathogens in a cost-and time-efficient manner led to the development of a second-generation of sequencing devices. These devices used a combination of template preparation, clonal amplification, and repeated cycles of sequencing to generate millions of sequencing reads in a short period of time (19). Progress made in genome sequencing also led to the development of other ‘omics’ technologies such as proteomics and transcriptomics, and it motivated a systems biology approach to studying host-pathogen interactions. The technologies used by these second generation sequencing devices are also known as next generation sequencing (NGS), high throughput sequencing (HTS), or deep- sequencing.

1.2 Studying pathogen biology in the era of high throughput sequencing (HTS) HTS has dramatically transformed the study of host-pathogen biology. Advancements in HTS have led to increased throughput and reduced cost of pathogen genome sequencing. Widespread use of HTS due to reduced cost and time to obtain sequence data has led to the discovery of new microorganisms, early identification of microorganisms during disease outbreaks, and provided valuable insights into pathogen biology and evolution (20). Widespread use of HTS in health-care, agriculture, food- safety, and wildlife conservation has given rise to the field of metagenomics (20). Metagenomics is an unbiased assessment of all of the microorganisms in a given environment (21). It is also known as community genetics. Metagenomics in turn has

3

given rise to virome and microbiome studies. Microbiome/virome studies have given us insight into complex relationships between the microbes and human body. Microbes in the human body outnumber human cells by an order of magnitude and include members from all domains of life (22). The multiple microbiome niches in the human body have become a tool to study variation in humans across geography and time (23). The overall relation between microbiome and human health is also subject of many studies (24–26). The foremost impact of HTS in the study of pathogens has been its ability to characterize the spectrum of genetic variations found in pathogen populations. HTS has allowed investigators to study pathogens as a genetically diverse population rather than a clonal population. This in turn has shed light into how pathogen populations evolve under different selective pressures. Sanger sequencing has been important in identifying single nucleotide polymorphisms (SNPs) and small insertions/deletions (INDELS), however, large INDELS, gene deletions, and rearrangements such as inversions and translocations can be better characterized using HTS (6). Using HTS, variations present across the pathogen genome can be surveyed in a single experiment, without prior knowledge of the genome. This enables the discovery of previously unidentified variations that may serve as genetic markers of virulence. This information can be used to develop assays that use targeted sequencing to detect virulence markers in a large number of samples. Whole genome sequencing (WGS) using HTS has also enabled a more comprehensive application of methods that relate genetic variations to disease, such as genome-wide association studies (GWAS). GWAS are now being extensively applied to understand genetic basis of various human diseases (27). Likewise, GWAS on pathogen genomes are beginning to be used to identify variations in the pathogen genome that confer characteristics such as virulence or drug resistance (28, 29). For diseases caused by chromosomal aberrations, HTS can also replace traditional karyotyping methods such as fluorescence in situ hybridization (FISH) (6).

4

1.3 Use of HTS in clinical diagnostics In healthcare, the accurate identification and characterization of a pathogen is paramount to disease diagnosis and patient recovery. The advent of molecular testing in clinical laboratories has reduced the turnaround time from receiving the sample to the final result, and has also facilitated the detection of pathogens that cannot be cultivated in the lab (30). However, most of the current PCR-based molecular methods require prior knowledge of the pathogen involved. Pathogens with emerging genetic features, such as drug resistance variations, can go undetected using these methods. Targeted sequencing approaches such as 16S and 18S ribosomal ribonucleic acids (rRNA), also require some prior knowledge of the pathogens of interest, making these approaches non-ideal for metagenomic pathogen detection (30). Metagenomic analyses using HTS eliminate the need to devise contextual-and application-specific assays (31). Clinical samples obtained from patients can be directly subjected to HTS without having to cultivate microorganisms in the laboratory. DNA or RNA from the pathogen or from specifically targeted regions of the pathogen genome can be used as the starting material. Diagnosis of infections such as, encephalitis, meningitis, pneumonia, and sepsis remains a major challenge in most clinical microbiology labs. Identifying the etiology of these diseases requires the use of multiple diagnostic assays, resulting in increased time to treatment and a significant economic burden on the health-care system (32). Even with the use of multitude of diagnostic assays, it is estimated that 63% of the encephalitis cases go undiagnosed (33). Several recent clinical studies and case reports have demonstrated the utility of metagenomics in infectious disease diagnosis. For example, a novel human pegivirus (HPgV) was discovered using metagenomic analysis of plasma from patient who died from sepsis of unknown etiology (34). The presence of uncommon pathogens such as herpesviruses and parvoviruses have been detected along with hepatitis A and B virus in patients with acute live failure (35). RNA sequencing-based metagenomics was used to detect an array of respiratory viruses that were overlooked by a common respiratory viral panel (36). A fatal case of encephalitis caused by St. Louis encephalitis virus was diagnosed using metagenomic sequencing of patient’s CSF (37). Atypical encephalitis cases by

5

Leptospira (38) and astroviruses (39) have also been successfully identified using HTS. HTS has also been effectively used in identification of fastidious bacteria such as Nocardia and non-tuberculous mycobacteria (40). These metagenomic studies not only reveal the etiology of the disease, but also provide information on the phylogenetic relationship of these new pathogens in comparison to prior strains, thus giving a more comprehensive view of the pathogen. The evolution of drug-resistance in pathogens is a major public health concern. Along with determining the causative agents of a disease, investigating the antimicrobial drug susceptibility of a pathogen to direct the course of therapy is a primary responsibility of a clinical microbiology lab. HTS can be effectively used to identify previously known resistance variations in a pathogen genome (41, 42). HTS in combination with current methods of bacterial identification such as matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) (43) can enhance a laboratory’s ability to identify pathogens and report their antimicrobial susceptibility. However, a major utility of HTS in antimicrobial testing may be its ability to predict drug resistance (33, 40). The ability of HTS to survey all of the circulating genotypes in a pathogen isolate, rather than just the dominant genotype, allows for the detection of minority genotypes that may confer drug resistance. These minor genotypes can sweep through a pathogen population under certain selective pressures, giving rise to isolates with a dominant drug-resistant genotype. Recent studies involving HTS have demonstrated the power of HTS in predicting antimicrobial susceptibilities. Stoesser et al showed that genomic sequence of clinical isolates of E. coli and K. pneumoniae can be used to predict antibiotic resistance with high-confidence (44). This approach has also been used to predict antibiotic-resistance in Burkholderia dolosa, Staphylococcus aureus and Vibrio cholera (45–47). HTS has been successfully used for detection of low-frequency drug resistant variations in viruses such as HCMV (48), HCV (49), HBV (50), influenza virus (51), and antiviral escape variants in HIV (52). Detection of these genetic variations can enable development of targeted drugs or the more effective use of currently available drugs (53). Continuing decreases in raw cost of sequencing, coupled with an increase in the amount and breadth of data from various HTS platforms has encouraged investigators

6

to explore the routine use of HTS in clinical diagnostics (31, 33). Although early uses of HTS were primarily focused on diseases pertaining to human genetics and the idea of personalized medicine, application of HTS is now being explored in many clinical laboratories for infectious disease diagnostics (30, 31, 54). However, the integration of HTS for infectious disease diagnostics in clinical microbiology laboratories has been slow when compared to the pace of sequencing and bioinformatics advancements in HTS (33). Certainly, widespread integration of HTS in clinical laboratories also requires clinical guidelines from agency such as Food and Drug Administration (FDA) and the College of American Pathologists (CAP). The Illumina MiSeqDx is the only HTS platform that has currently been approved for use in the clinical lab. The platform is currently being used for screening patients with cystic fibrosis (CF) for mutations in the CF gene. At this point, the IlluminaMiSeqDx has not been cleared by the FDA for sequencing of microbial genomes (33). The integration of HTS in the clinical microbiology will also have to be accompanied by the development of user-friendly bioinformatics pipelines, HTS-focused training of laboratory technicians and directors, a universal database of microbes, standardized reference materials for test validation, and information technology structure for data analysis, storage, and reporting (33, 55).

1.4 Use of HTS in epidemiological studies HTS can be used to survey samples from an ongoing outbreak or for routine surveillance of pathogens. An outbreak poses a massive challenge in terms of disease management. Understanding the biology of a pathogen during an outbreak is key to appropriately-matched medications and vaccines. The use of HTS technologies during outbreaks can provide real-time insights into the pathogen’s origin, spread and evolution (7, 9, 46, 56). As evidenced by studies on the recent Ebola outbreak in western Africa and the Zika virus outbreak in South America, HTS is paramount to investigating an ongoing outbreak (57, 58). Many research labs in collaboration with local public health and diagnostic labs were involved in understanding the evolutionary history and transmission of these viruses during each outbreak. In a study involving 99 Ebola virus genomes collected from Sierra Leone during the 2014 outbreak, Gire et al were able to characterize the genetic variation of the pandemic virus and delineate patterns of viral transmission. The phylogenetic information obtained from sequencing was used to trace

7

the evolutionary history of the pandemic viruses in relation to Ebola viruses isolated during previous outbreaks (7). Similar approaches have been applied to understanding Zika virus transmission and spread as well (9, 59, 60). HTS of clinical samples collected during the outbreak were used to identify mutations in the viral genome with implications for viral pathogenesis, and for development of rapid diagnostic assays. Phylogenetic analyses of these genomes were used to understand the introduction of Zika virus in South American and its spread into Central and North America. The use of HTS also allowed researchers to survey samples for signs of co-infection with other pathogens such as Chikungunya that are endemic in the region (61). The Haitian cholera outbreak in 2010 is another example where the use of HTS led to discovery of the source of the outbreak. Using HTS, investigators traced the origin of the outbreak strain of Vibro cholerae to South Asia. The outbreak strain was genetically distinct from the strain that caused the last cholera outbreak (46, 62). This finding had a direct effect on management of cholera in Haiti, as aid workers coming for earthquake-relief from South-Asia were then subsequently subjected to additional screenings before travelling to Haiti. HTS is also increasingly being used by the Center for Disease Control (CDC), as well as public health and research labs for studying pathogen biology during outbreaks. In a 2015 outbreak of HIV in southeastern Indiana, the CDC used information from HIV genomes to track the source and the scale of the outbreak (55, 63). After the first reported cases of MERS-CoV in 2012, CDC was able to develop a PCR assay for reliable detection of MERS-CoV for diagnostics and surveillance, using genomic data originaly obtained through HTS (55, 64). HTS of Streptococcus pyogenes isolates during an outbreak in western Switzerland led to the identification of mutations in antibacterial resistance genes that resulted in higher virulence among bacterial isolates during the outbreak. The information obtained from these analyses directly affected patient care, allowing dissipation of the outbreak in just 10 days (65). HTS also has many potential uses for emerging disease surveillance in public health labs. Currently, national-and state-level public health laboratories rely on pathogen information obtained from clinical laboratories for disease surveillance (55, 66). The pathogens reported to public health laboratories by clinical laboratories include

8

syphilis, gonorrhea, measles, hemorrhagic fever viruses, Mycobacterium tuberculosis, Salmonella, Yersinia pestis, etc (20). Since, most clinical laboratories rely on growth- based mediums and biochemical methods for pathogen identification, it can take weeks to months before this information is passed on to public health labs (66). Biochemical and molecular assays are often unreliable at providing strain and serotype information. HTS of these pathogens would provide strain-and serotype-specific information in a fraction of the time taken by conventional methods. Faster relay of information from clinical laboratories to public-health laboratories can decrease response times during outbreaks. CDC has now taken the initiative to establish a nationwide whole-genome database for several pathogens that pose a significant public health risk (67). Currently, there are databases for Listeria, Influenza, HIV, Campylobacter, Vibrio, Shigella, and E.coliO157 sequences (55). HTS of pathogens has also had a significant impact on hospital infection-control and surveillance programs. For example, WGS of methicilln-resistant Staphylococcus aureus (MRSA) has been used to investigate MRSA outbreaks in neonatal wards (42, 68). In another example, an outbreak of hemolytic-uremic syndrome in Germany (69) was tracked using HTS. HTS was also used to determine nosocomial transmission of varicella zoster virus (VZV) in a fatal case (70).

1.5 Bioinformatics in clinical and public health laboratories With decreasing costs and increasing throughput, the benefits of integrating HTS in clinical and public health laboratories is likely to escalate in the future. Expanding the integration of HTS will pose challenges for labs to develop effective bioinformatics tools and computational power for timely analysis (71). Progress made in HTS workflows have decreased turnaround times from sample to sequence to about 8 hours (72). In order to use, sequence data to impact direct patient care or improve public health responses, it is necessary that the sequence data also be analyzed in a suitable timeframe. Rapid analysis of sequence data in turn necessitates development and application of computational tools, algorithms and pipelines that are fast, sensitive, accurate, and capable of handling large amounts of data (72, 73). Most pipelines that process sequence data rely on computational removal of sequence reads obtained from the host (contaminant sequences) to enrich for

9

sequences from the pathogens of interest (74). The process of aligning massive amount of sequence data to a reference genome for this contaminant removal can take days, depending on the amount of sequence data and computational power available. Once the contaminating sequence reads are removed, the remaining sequences are either de novo assembled to obtain longer sequence reads, or directly compared to a comprehensive database of sequences made available by National Center for Biotechnology Information (NCBI) using tools such as BLAST (72–75). Analysis of samples containing multiple contaminants can be complicated, if co-infection of multiple pathogens is present, leading to ambiguous findings. Several commercial and non- commercial computational pipelines have attempted to overcome these limitations (71– 73, 76). These pipelines have their own strengths and weaknesses, and are often tailored to a particular sequencing platform, sequence length or sequencing protocol (73). Standardization of HTS platforms across laboratories will also require guidelines from agencies such as CAP and Clinical Laboratory Improvement Amendments (CLIA). Guidelines are necessary for developing framework for computational pipelines and to ensure accurate and safe patient testing (73). Widespread use of HTS in clinical and public health laboratories will require computational biologists (bioinformaticians) with working knowledge of different software and pipelines. There has been a push towards developing software packages that require minimal computational training and that generate results that are easily interpretable by laboratory technicians and clinicians (77). Nonetheless, laboratories should only adopt these tools after making significant investments in the education and training of their personnel. To deal with the ever-changing face of technology, laboratories should also prepare to process unique and non-standard data sets. Currently, most HTS platforms rely on short-read sequencing platforms such as Illumina. Long-read sequencing technologies being developed by companies such as PacBio and Oxford Nanopore are changing the landscape of sequencing technologies (71, 78, 79). Understanding the limitations of the currently available tools, adapting them for new purposes, and developing new computational tools will be pivotal in dealing with these challenges.

10

Having skilled computational biologists in clinical and public laboratories will be indispensable, as HTS technologies become standard in laboratories

1.6 Forward and reverse genetics approaches (Comparative genomics) Major public health challenges such as drug resistance and the emergence of zoonotic diseases can often be explained by genetic variations in a pathogen. Comparing the genomes of strains with acquired characteristics such as drug resistance and or an expanded host range with other strains that lack these characteristics can reveal the genetic basis of these traits. Reverse genetic approaches – where deletions or mutations are engineered into the pathogen genome and the phenotypic effects are studied (80) – have laid the foundation for linking specific genes to observable phenotypes in pathogens. Reverse genetics approaches have traditionally studied phenotypic effects of a single gene. Mutants are engineered to either remove a gene of interest or to insert the gene of interest into another pathogen genome through homologous recombination (80–82). These studies are extremely valuable in elucidating the impact of individual genes on a phenotype of interest. Nevertheless, reverse genetic approaches are limited in their ability to determine the basis of polygenic phenotypes. Engineering mutations in multiple genes in a single genome is an arduous process, and successfully engineered mutants may have unintended mutations introduced in downstream and upstream regions of the genes of interest. Also, not all phenotypes in pathogens arise as a result of an entire gene being acquired or deleted. Traits such as drug-resistance, a rise in virulence, or an expanded host-range may result from a few nucleotide changes in specific genes. For example, a single point mutation in the a herpesvirus polymerase gene UL30 was shown to be responsible for higher neuropathogenicity (83), a single point mutation in the NS1- related protein of Japanese encephalitis virus was shown to determine its neurovirulence and neuroinvasiveness (84), a single mutation in the Ebola virus glycoprotein was shown to confer adaptation to human cells leading to a greater spread of virus during the 2014 outbreak (85, 86), and point mutations in genes have also shown to be the cause of antibiotic-resistance in several bacterial species (87, 88). With the advent of HTS, techniques used to link genotypes to phenotypes have undergone a major transformation. HTS has allowed researchers to obtain full-length

11

genomes of pathogens and enabled detection of genetic differences between strains across the genome. The arrival of HTS has given rise to the field of forward genetics – where the genes underlying distinct phenotypic differences are sought (89). Forward genetics has allowed researchers to use unbiased approaches to identify genes that are responsible for a biological phenomenon. Quantitative trait locus (QTL) analysis and genome-wide association studies (GWAS) are some of the most advanced approaches to study polygenic traits. Although these approaches were initially developed to study human genetics and disease, they have found widespread use in studies involving host- pathogen interactions (27, 28, 90, 91). QTL analysis relies on generating offspring from parental strains that differ in the phenotype and genotype of interest. The offspring of the cross are then scored for their phenotype and genotype. Genotypes and phenotypes that are the most closely associated in the offspring are considered to be linked (92). QTL analyses are useful in determining the genetic regions influencing a trait, but they often lack the ability to determine specific SNPs associated with a trait (93). Unlike QTl analyses, GWAS relies on associations based on SNPs observed across a large number of genetically unrelated individuals. GWAS are more powerful than QTL in linking specific SNPs to phenotypes. However, genetic information from a genetically homogenous population can lead to true variations being obscured and false associations being established. Both approaches have the potential to generate accurate genotype and phenotype associations, therefore, a comprehensive understanding of their strengths and limitations are necessary for their suitable applications (93).

1.7 Genetic variation in RNA vs. DNA viruses In spite of the progress made in medical sciences viral diseases remain one of the major public health concerns worldwide. Viruses can cause local or systemic, acute or chronic, and symptomatic or asymptomatic infections (1). The ubiquity of viruses has been attributed to the de novo genetic variation present in viral populations, which allows them to adapt to diverse environments. Genetic variation in viral populations can arise through different mechanisms. Recombination, and random genetic drift among others are some of the major contributors to observed viral genetic diversity (94). Viruses can utilize one or more of these mechanisms to evolve under different selective

12

pressures. It has been observed that RNA viruses have more genetically diverse populations than DNA viruses (94–96). The difference in mutation rate between RNA and DNA viruses is regarded as the chief cause of this disparity. Studies aiming to estimate the mutation rates of viruses have shown that RNA viruses have a mutation rate in the range of 10-6 – 10-4 substitutions per nucleotide per cell infection (s/n/c), whereas DNA viruses have a mutation rate in the range of 10-8 – 10-6 s/n/c (94, 97). Biologically, this difference in mutation rates is attributed to the low fidelity of the RNA- dependent RNA polymerase (RdRP) found in RNA viruses, as compared to the DNA polymerases of the DNA viruses (94). The RdRP lacks a 3’ exonuclease proofreading activity, which makes replication of RNA genomes more error-prone than the replication of DNA genomes (98). The reverse transcriptase of retroviruses has been shown to be equally error-prone to RdRPs, making the mutation rate of retroviruses comparable to that of RNA viruses (99). RNA viruses have genomes that are between ~3 and ~33 kb, which is close to the inverse of the mutation rates, suggesting that no RNA virus genome is perfectly copied. However, genome replication is not the only source of variation in RNA viruses. For segmented RNA genomes such as the influenza virus, genetic variation can also arise through re-assortment of genome segments (100). This ability to exchange segments between co-replicating genomes has allowed influenza viruses to cross species barriers and cause pandemics. In DNA viruses, genome replication is less error-prone due to the intrinsic ability of DNA polymerase to proofread and correctt errors. However, even at a low error- frequency, the ability of viruses to replicate in high numbers can lead to accumulation of mutations in the viral populations over time (101). Variation in DNA viruses has also been attributed to insertions, deletions, and frameshifts in the genome, as well as recombination between different strains (4). Re-infection with the same virus multiple times during the host’s lifetime has also been associated with genetic variation in DNA viruses through homologous recombination (102). Less is known about genetic variation in DNA viruses and its contribution to disease phenotype and pathogen evolution as compared to RNA viruses. Recent studies have suggested that contrary to accepted knowledge, populations of DNA viruses can be highly polymorphic (102–105). These

13

studies have shown that DNA viruses can evolve at rates closer to that of RNA viruses. Canine parvovirus (106), human parvovirus (107), African swine fever virus (108), and human cytomegalovirus (102) have all been shown to evolve on a much shorter time- scale than previously predicted for DNA viruses. These findings have been enabled mainly because of the advent HTS approaches. HTS provides an exciting alternative to the traditional labor-intensive techniques such as limiting dilution, RFLP analysis, and Sanger sequencing. HTS has allowed researchers to design experiments to studying the genetic diversity of viruses at the population scale.

1.8 Herpesvirus genetic variation and diversity The Herpesviridae family is a family of large double-stranded DNA viruses where viral populations of certain members of the family have been shown to have genetic variation approaching that of RNA viruses (102). The Herpesviridae family consists of over 200 viruses that are found in most animal species (109). Herpesviruses have been divided into alpha, beta and gamma sub-families based on their host range, their replication-time, and their genome organization (109). Herpesvirus virions are 200-250 nm in diameter, and contain a linear double-stranded DNA genome of 125-295 kilobases (kb). The DNA genome is encapsidated in an icosahedral protein capsid of 125 nm in diameter. The capsid (also referred to as the nucleocapsid) is then surrounded by a layer containing multiple viral proteins, which is called the tegument. The tegument is further wrapped by a lipid-bilayer containing several viral glycoproteins (109, 110). An unifying feature of herpesviruses is their ability to establish a lifelong quiescent (latent) infection in their host (109, 111). During the latent phase, no infectious viruses are produced and transcription is limited to a few latency-associated transcripts (111). This feature is a critical survival strategy, since latently infected cells serve as reservoirs for periodic reactivations and spread to new hosts. There are a total of 9 herpesviruses that infect humans. These include herpes simplex virus 1 and 2 (HSV-1/2 or HHV-1/2), and varicella zoster virus (VZV or HHV-3) belonging to the alpha-subfamily; human cytomegalovirus (HCMV or HHV-5), human herpesvirus 6A, 6B, and 7 (HHV-6A/6B/7) of the beta-subfamily; and the gamma sub-family consisting of Epstein-Barr virus (EBV or HHV-4) and Kaposi’s sarcoma associated herpesvirus

14

(KSHV or HHV-8) (109). In addition to causing diseases in humans, veterinary herpesvirus species are also a major group of pathogens for domestic animals and food production animals (112). The linear double-stranded DNA genome of herpesviruses is made up of unique and repeated sequences (Figure 2-5). Based on the position of the unique and repeated sequences, a total of 6 genome types--Class A through F-- have been described. Most open reading frames (ORFs) present in herpesvirus genomes are found in the unique regions (109, 113, 114). The genome organization of herpesviruses often correlates with sub-family, however, there are exceptions. For instance, both HSV- 1 and HCMV have class E genome organization even though HSV-1 belongs to the alpha and HCMV to the beta sub-family (109, 115). The G+C content of the herpesvirus genomes is another distinguishing feature and can vary widely between different herpesvirues. The genome of canine herpesvirus has a G+C content of 32% while pseudorabies virus has a G+C content of 74%. Repeated regions in the herpesvirus genomes have been shown to possess higher G+C content as compared to the unique regions (115). The first full-length herpesvirus genomes infecting humans have now been available for over 3 decades (116–118). For nearly two decades since these findings, our understanding of the genetic basis of herpesvirus pathogenesis was based on the full-length sequences of just one or a few isolates. In the past decade, the advent of HTS technologies has allowed researchers to sequence many more herpesvirus isolates for each species. Hence, we are starting to get a first glimpse into the genetic diversity present in different herpesvirus species (4, 119–122). A common perception amongst researchers is that the double-stranded DNA genomes of herpesviruses are inherently stable and that herpesviruses isolates do not harbor extensive genetic diversity (94, 97). This perception stems from the presence of a high-fidelity polymerase present in herpesviruses, as compared to the error-prone polymerase of RNA viruses as discussed earlier. The mutation rate in HSV-1 has been estimated to be around 10-8 s/n/c (123, 124). However, these studies do not take into consideration the standing genetic variation in the viral population, which accumulates through processes other than polymerase fidelity, such as genetic drift and

15

recombination. Studies conducted on the beta-herpesvirus HCMV have calculated the standing variation in viral populations to be much higher than previously anticipated (120). Similar studies looking into genome-wide variation in HSV-1 and in the related alpha-herpesviruses Marek’s disease virus (MDV) and pseudorabies virus (PRV) have also elucidated the presence of multiple genotypes in viral populations (4, 111, 125, 126). Recombination can be a strong evolutionary force in DNA virus genomes such as the herpesviruses. Several studies conducted on human and non-human herpesviruses have demonstrated the occurrence of recombination in both laboratory and natural settings (127–134). HTS has provided new avenues for studying recombination in herpesviruses. The use of full-length genomes from disparate naturally-occurring isolates has allowed researchers to use phylogeny-based approaches to infer ancient sites of recombination (135–138). Recombination between attenuated herpesvirus isolates in natural and laboratory settings has been shown to result in emergence of highly pathogenic strains (131, 139). Recombination studies performed using HSV-1 have shown that the crossover sites are concentrated in repeated and intergenic regions of the genome (140). Similarly, studies with HCMV have shown that the viral genome contains segments that are more likely to co-segregate during recombination, indicating a linkage or shared functionality between the genes present in the co- segregating segments (134). Study of recombination in herpesviruses may be further bolstered by the application of long-read sequencing to elucidate recombination breakpoints across the genome and co-segregating sites.

1.9 Alphaherpesviruses My doctoral thesis focuses on the genetic diversity present in alphaherpesviruses and its contribution to pathogen evolution and disease phenotype. Alphaherpesviruses are characterized by variable host range, a rapid and efficient infection cycle in cell- culture, and their ability to establish latent infection in neurons, primarily in the sensory ganglia (109). The alpha-subfamily consists of the genera Mardivirus, Iltovirus, Simplexvirus, and Varicellovirus. Mardiviruses and Iltoviruses cause disease in avian hosts, while Simplexviruses and Varicelloviruses cause disease in human hosts (109). Three alphaherpesviruses are known to infect humans, with herpes simplex virus 1/2

16

(HSV1/2) causing oral and genital lesions, while varicella zoster virus (VZV) causes chicken pox and shingles (109, 111). In the poultry industry, a related alphaherpesvirus, Marek’s disease virus serotype 1 (MDV-1), causes lymphomas, immunosuppression and polyneuritis, with economic impacts amounting to annual losses of $1-2 billion dollars (141, 142). All alphaherpesviruses have a class E genome organization, which consists of a unique long (UL) and a unique short (US) region. Each unique region is bordered by its own repeat regions arranged in opposite orientations. The terminal repeats (terminal repeat long [TRL] and terminal repeat short [TRS]) lie at the extremities of the genome whilst the two internal repeats (internal repeat long [IRL]) and internal repeat short [IRS] abut each other (4, 109, 143, 144).

1.10 Marek’s disease virus serotype 1 (MDV-1) MDV-1 or Gallid herpesvirus type 2 (GaHV-2) is the causative agent of Marek’s disease (MD) in chickens. It is an alphaherpesvirus of the genus Mardivirus, which also includes the closely related non-oncogenic Marek’s disease virus serotype 2 (MDV-2) and turkey herpesvirus serotype 1 (HVT-1) (145, 146). MDV-1 is airborne and extremely contagious (147). Transmission between chickens is horizontal. Highly virulent strains of MDV-1 can have mortality rates approaching 100% (148). Mature virions are formed in the feather follicle epithelium cells of infected chickens, from which the virus is shed in association with fine particles of skin and feather debris (142). This debris, also called poultry dust or dander, is the primary source of virus transmission between birds (149). The poultry dust is inhaled and comes in contact with aleveolar macrophages in the lungs (147). The macrophages then transport the virions to T and B cells (142, 147). It is in T cells that MDV-1 goes latent, which is relatively unusual among alpha- herpesviruses (142, 147). This site of latency allows the virus to spread systemically, causing disease in the nervous system and also lymphomas (142, 147). Since the late 1960s, MD infections have been controlled via mass vaccination of one-day old chicks or unhatched chick embryos (113, 150). All three serotypes, MDV-1, MDV-2 and HVT-1, have been used to make modified live vaccines that are either employed singly or in combination (148, 151). The severity of MD has risen in the last 40 years, and along with changes in farming practices widespread vaccination has been attributed as one of the main causes of rising MDv virulence (148, 152, 153). Despite both clinical and

17

laboratory data that demonstrate increased virulence in field isolates of MDV-1, the mechanism of MDV-1 evolution into more virulent forms over the years is not well understood. Understanding MDV-1 evolution in the field may give us an ability to predict future evolution of this pathogen or to apply precautionary measures to limit or prevent outbreaks. A few genes such as Meq, Ul36, and ICP4 have been associated with MDV- 1 pathogenicity, however, very little is known about their interaction partners and whether these are the only genes under selection pressure (151). Remarkably, our understanding of MDV-1 genomics and genetic variation comes exclusively from the study of 10 different laboratory-grown strains (154, 154–161). MDV-1 isolates tend to lose virulence with increasing passage number in vitro (150, 162), raising concerns about the ability of these cultured strains to accurately reflect virus as it is found in the field. The genetic basis for the increased virulence MDV field isolates may be better explained by viral genomes obtained without growing virus in cell culture. Hence, chapter 2 of this thesis describes a method to rapidly and directly sequence field isolates of MDV with out culture. Application of this procedure could potentially be valuable to understanding the genetic basis of rising virulence in circulating wild strains. HTS approaches can be used to reveal the genetic variation in MDV-1 populations in the field by identifying genetic markers of virulence, detecting loci that are under selection pressures, and evaluating the extent of standing variation in a population. MDV-1 has proven to be a valuable model for studying virus-host interaction in a natural system. Understanding genetic variation in populations of MDV-1 may help us make predictions about the extent of genetic variation in human herpesviruses and lead to the creation of new or improved therapeutics. In combination with statistical models, identification of genetic markers of virulence can help us track the spread of MDV-1 in and between different chicken farms. Epidemiological insights gained from these findings can be of great value in modeling the dynamics of disease spread in human and animal populations.

1.11 Herpes simplex virus-1 (HSV-1) With about 5 billion people infected worldwide, herpes simplex virus 1 (HSV-1) is the most widespread pathogen of the human herpesvirus family (119, 163). HSV-1 is an

18

alpha-herpesvirus and belongs to the genera Simplexviruses. HSV-1 infection is a major public health concern worldwide and is a leading cause of sporadic necrotizing encephalitis and infectious blindness in the US (163). HSV-2, which has 70% genomic similarity to HSV-1, is associated with higher HIV/AIDS acquisition in developing countries (119). HSV-1-induced genital infections are also on the rise (164). Primary HSV infection begins on the epithelial surface. Primary infection maybe symptomatic or asymptomatic, depending on the immunological status of the host and other factors (165). The primary infection leads to infection of the sensory or sympathetic neurons innervating the epithelial surface. The virus then gets transmitted to peripheral nervous system, where latency is established. The exact mechanism of transmission from epithelial to neuronal cells is poorly understood (166). Upon reactivation the virus can travel back to the epithelial surface via nerve tracts to cause lesions (109). The disease manifestations of HSV differ between hosts. Studies done using animal models have shown that genetic variation between viral strains of HSV-1 contributes greatly to their pathology, including lesion severity and reactivation from latency (125, 167–169). Recent comparative genomics studies on HSV-1 have demonstrated that independent isolates can vary in 2-4% of the genome (4, 119, 170). In fact, HSV-1 has been shown to harbor more genetic variation between independent isolates than the other human alpha-herpesviruses - HSV-2 and VZV (4, 135, 171). Studying clinical isolates of pathogens is indispensable in understanding the pathology of the disease. Lab strains of HSV are useful in studying aspects such as biology and mechanisms of viral replication, while clinical samples can tell us about infectivity of the virus. Combined with clinical-data, low-passage clinical isolates can reveal valuable information regarding transmission and progression of the disease. HTS of clinical isolates manifesting varying degrees of pathogenicity can provide insights into genetic markers that determine virulence of a particular strain. It is also necessary that strategies to develop new therapeutics and vaccines account for the genetic diversity present in circulating strains of HSV. It is not feasible to perform controlled experiments to determine genetic basis of HSV-1 virulence using human subjects. Likewise, virulence-associated genes are often

19

dispensable for the growth of the virus in cell culture. Introduction of virus into cell culture can lead to changes in the genetic background of the virus, via processes like gene-deletion and duplication (162, 172, 173). The limitations of cell-culture make the use of animal models an invaluable tool for studying HSV-1 pathogenesis. In combination with genome-wide HTS, animal models can be used to help identify virulence-associated variations that are present in highly virulent strains, but absent in less virulent isolates. These findings can then be verified using traditional lab techniques such as PCR, Sanger sequencing, Western blots, and reverse genetic engineering. These approaches can also be employed for studying other phenomenon such as viral transmission, and host immunological responses to isolates of interest. The idea of using genetic variation to determine disease severity in herpesviruses has been pursued for many years using lab strains and animal models. However, relatively little is known about how viral variation correlates with disease outcome during natural infections of humans and animals. Chapter 3 of this thesis describes a study that characterizes two HSV-1 isolates from a father and son pair using HTS and animal models . The study aims to understand whether and to what extent genetic and phenotypic changes accumulated by HSV-1 during transmission and subsequent cycles of latency and reactivations in each host. By combining traditional molecular biology techniques with state-of-the-art HTS approaches, my thesis aims to contribute to our understanding of a pathogen that has been recognized since the era of Hippocrates (109).

20

Chapter 2

DNA from dust: comparative genomics of large DNA viruses in field surveillance samples

Utsav Pandey1, Andrew S. Bell2, Daniel Renner1, David A. Kennedy2, Jacob Shreve1, Chris L. Cairns2, Matthew J. Jones2, Patricia A. Dunn3, Andrew F. Read2, Moriah L. Szpara1

1Department of Biochemistry and Molecular Biology, Center for Infectious Disease Dynamics, and the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, 16802, USA

2 Center for Infectious Disease Dynamics, Departments of Biology and Entomology, Pennsylvania State University, University Park, Pennsylvania 16802, USA

3 Department of Veterinary and Biomedical Sciences, Pennsylvania State University, University Park, Pennsylvania, 16802, USA

Adapted from: DNA from dust: Comparative genomics of large DNA viruses in field surveillance samples. 2016. mSphere. DOI: 10.1128/mSphere.00132-16

Acknowledgements: U.P and A.B. isolated viral DNA and prepared viruses for sequencing. D.R. and J.S. assembled the full-length genomes. U.P. and D.R. performed computational comparisons. M.J., C.C., and P.D. collected dust and feather samples. M.S. and A.R. conceptualized the work. M.S.,U.P., and A.B. wrote the manuscript; all authors contributed to its completion.

21

2.1 Abstract The intensification of the poultry industry over the last sixty years facilitated the evolution of increased virulence and vaccine breaks in Marek’s disease virus (MDV). Full genome sequences are essential for understanding why and how this evolution occurred, but what is known about genome-wide variation in MDV comes from laboratory culture. To rectify this, we developed methods for obtaining high quality genome sequences direct from field samples without the need for sequence-based enrichment strategies prior to sequencing. We found that viral genomes from adjacent field sites had high levels of overall DNA identity, and despite strong evidence of purifying selection, had coding variations in proteins associated with virulence and manipulation of host immunity. Our methods empower ecological field surveillance, make it possible to determine the basis of viral virulence and vaccine breaks, and can be used to obtain full genomes from clinical samples of other large DNA viruses, known and unknown.

22

2.2 Importance Despite both clinical and laboratory data that show increased virulence in field isolates of MDV-1 over the last half century, we do not yet understand the genetic basis of its pathogenicity. Our knowledge of genome-wide variation between strains of this virus comes exclusively from isolates that have been cultured in the laboratory. MDV-1 isolates tend to lose virulence during repeated cycles of replication in the laboratory, raising concerns about the ability of cultured isolates to accurately reflect virus in the field. The ability to directly sequence and compare field isolates of this virus is critical to understanding the genetic basis of rising virulence in the wild. Our approaches remove the prior requirement for cell culture, and allow direct measurement of viral genomic variation within and between hosts, over time, and during adaptation to changing conditions.

23

2.3 Introduction Marek’s disease virus (MDV), a large DNA alphaherpesvirus of poultry, became increasingly virulent over the second half of the 20th century, evolving from a virus that caused relatively mild disease to one that can kill unvaccinated hosts lacking maternal antibodies in as little as ten days (174–178). Today, mass immunizations with live- attenuated vaccines help to control production losses, which are mainly associated with immunosuppression and losses due to condemnation of carcasses (177, 179). Almost 9 billion broiler chickens are vaccinated against MD each year in the US alone (180). MD vaccines prevent host animals from developing disease symptoms, but do not prevent them from becoming infected, nor do they segment transmission of the virus (179, 181). Perhaps because of that, those vaccines may have created conditions favoring the evolutionary emergence of the hyperpathogenic strains which dominate the poultry industry today (178). Certainly, virus evolution undermined two generations of MD vaccines (174–177). However, the genetics underlying MDV-1 evolution into more virulent forms and vaccine breaks are not well understood (177, 182). Likewise, the nature of the vaccine-break lesions that can result from human immunization with live- attenuated varicella zoster virus (VZV) vaccine is an area of active study (183–186). Remarkably, our understanding of MDV-1 (genus: Mardivirus, species: Gallid alphaherpesvirus type 2) genomics and genetic variation comes exclusively from the study of 10 different laboratory-grown strains (161, 187, 160, 155, 188, 159, 158, 156). Most herpesviruses share this limitation, where the large genome size and the need for high-titer samples has led to a preponderance of genome studies on cultured virus, rather than clinical or field samples (189–192, 111, 4, 171). Repeated observations about the loss of virulence during serial passage of MDV-1 and other herpesviruses raises concerns about the ability of cultured strains to accurately reflect the genetic basis of virulence in wild populations of virus (192, 167, 193, 194). The ability to capture and sequence viral genomes directly from host infections and sites of transmission is the necessary first step to reveal when and where variations associated with vaccine- breaks arise, which one(s) spread into future host generations, and to begin to understand the evolutionary genetics of virulence and vaccine failure.

24

Recent high-throughput sequencing (HTS) applications have demonstrated that herpesvirus genomes can be captured from human clinical samples using genome amplification techniques such as oligonucleotide enrichment and PCR amplicon-based approaches (195–199). Here we present a method for the enrichment and isolation of viral genomes from dust and feather follicles, without the use of either of these solution- based enrichment methods. Chickens become infected with MDV by the inhalation of dust contaminated with virus shed from the feather follicles of infected birds. Deep sequencing of viral DNA from dust and feather follicles enabled us to observe, for the first time, the complete genome of MDV-1 directly from field samples of naturally infected hosts. This revealed variations in both new and known candidates for virulence and modulation of host immunity. These variations were detected both within and between the virus populations at different field sites, and during sequential sampling. One of the new loci potentially associated with virulence, in the viral transactivator ICP4 (MDV084 / MDV100), was tracked using targeted gene surveillance of longitudinal field samples. These findings confirm the genetic flexibility of this large DNA virus in a field setting, and demonstrate how a new combination of HTS and targeted Sanger-based surveillance approaches can be combined to understand viral evolution in the field.

2.4 Materials and Methods

2.4.1 Collection of dust and feathers Samples were collected from two commercial-scale farms in central Pennsylvania, where each poultry building housed 25,000-30,000 individuals (Figure 2- 1).

25

Figure 2-1: Diagram of samples collected for genome sequencing of field isolates of MDV. Samples collected for genome sequencing were sourced from two Pennsylvania farms with large-scale operations that house approximately 25,000- 30,000 individuals per building. These farms were separated by 11 miles. On Farm A, two separate collections of dust were made 11 months apart. On Farm B, we collected one dust sample and individual feathers from several hosts, all at a single point in time. In total, three dust collections and two feathers were used to generate five consensus genomes of MDV field isolates (Table 2-1). (Artwork by Nick Sloff, Penn State University, Department of Entomology).

The poultry on both farms were the same breed and strain of colored (“red”) commercial broiler chicken from the same hatchery and company. Dust samples were collected into 1.5 ml tubes from fan louvers. This location contains less moisture and contaminants than floor-collected samples, and represents a mixture of air-borne virus particles and feather dander. Sequential samples from Farm A (Supplemental Table S2-1) were collected 11 months apart, from adjacent houses on the same farm (Figure 2-1). Samples from Farm B (Supplemental Table S2-1) were collected from a single house, at a single point in time. Feathers were collected just before hosts were transported from the farms for sale, to maximize the potential for infection and high viral titer. At the time of collection the animals were 10-12 weeks old. Ten individuals were chosen randomly throughout the entirety of one house for feather collection. Two feathers from each animal were collected from the axillary track (breast feathers). The distal 0.5-1.0 cm proximal shaft or feather tip, which contains the feather pulp, was snipped into a sterile 1.5 ml micro-tube containing a single sterile 5 mm steel bead (Qiagen). On return to the laboratory, tubes were stored at -80°C until processing. One feather from each animal was tested for the presence and quantity and MDV-1 present (see below for quantitative PCR details). The remaining feathers from the two animals with highest apparent MDV-1 titer were used for a more thorough DNA extraction (see below for details) and next-generation sequencing. Animal procedures were approved by the Institutional Animal Care and Use Committee of the Pennsylvania State University (IACUC Protocol#: 46599).

2.4.2 Viral DNA isolation from dust MDV nucleocapsids were isolated from dust as indicated in Figure 2-2. Dust collected from poultry houses was stored in 50 ml polypropylene Falcon® tubes

26

(Corning) at 4°C until required. 500 mg of dust was suspended in 6.5 ml of 1X phosphate buffered saline (PBS). To distribute the dust particles into solution and help release cell-associated virus, the mixture was vortexed vigorously until homogenous and centrifuged at 2000 × g for 10 minutes. This supernatant was further agitated on ice for 30 sec. using a Sonica Ultrasonic Processor Q125 (probe sonicator with 1/8th inch microtip) set to 20% amplitude. It was then vortexed before being centrifuged for a Procedures for enrichment and isolation of MDV DNA from dust or individual feather follicles. further 10 minutes at 2000 × g.

AaaaaaaaaaaaaA

Resuspend in 6.5 ml of Vortex Centrifugation Sonication Centrifugation PBS Poultry dust Vortex

Pass Pass DNA Capture DNase through 0.22 through 0.8 extraction using 0.1 treatment µM filter µM filter Genomic DNA µM filter

B Figure 2-2. Procedures for enrichment and isolation of MDV DNA from dust. Vortexing, centrifugation and sonication were essential to release cell-associated virus Clip base of the feather Mechanical into the solution. The virus-containingcontaining supernatantfollicular cells wasseparation then passed through 0.8 µM and 0.22 µM filters for removal of larger contaminants. The flow-thorough was treated with DNase and theChicken viral featherparticles were captured using 0.1 µM filter. The membrane of the Trypsinize 0.1 µM filter was then excised and used for extraction of the viral DNA. DNA DNase Sonication To enrich viral capsids extractionaway from thetreatment remaining contaminants, the supernatant Genomic DNA (approximately 5 ml in volume) was subjected to a series of filtration steps. First, we used a Corning® surfactant-free cellulose acetate (SFCA) filter (0.8 μM) that had been soaked overnight in fetal bovine serum (FBS) to remove particles at the level of eukaryotic cells, and bacteria. To remove smaller contaminants, the flow-through was then passed through a Millipore Express® PLUS Membrane vacuum filter (0.22 μM) and the membrane subsequently washed twice with 2.5 ml of PBS. To remove contaminant DNA, the final filtrate (approximately 10 ml in volume) was treated with DNase (Sigma) at a concentration of 0.1 mg/ml for 30 minutes at room temperature. In the absence of DNase treatment we observed a higher yield of viral DNA, but with much lower purity (data not shown). The MDV nucleocapsids present in the DNase-treated solution were captured on a polyethersulfone (PES) membrane (VWR) filter (0.1 μM). This filter

27

membrane trapped the viral nucleocapsids, which are between 0.1-0.2 µm (200). An increased MDV purity, but ultimately reduced total nanograms of DNA yield, may be achieved by washing this membrane once with 2.5 ml PBS (see Supplemental Table S2-1). In the future, samples with a higher percentage of MDV DNA could be obtained by applying these wash steps to all components of the sample pool. The membrane was then carefully excised using a sterile needle and forceps, and laid – exit side downwards – in a sterile 5 cm diameter plastic petri-dish where it was folded twice lengthwise. The “rolled” membrane was then placed into a 2 ml micro-tube containing 1.8 ml of lysis solution (ATL buffer and Proteinase K from the DNeasy® Blood and Tissue kit, Qiagen). Digestion was allowed to proceed at 56°C for 1 hour on an incubating microplate shaker (VWR) set to 1100 rpm. The membrane was then removed, held vertically over a tilted sterile 5 cm diameter plastic petri-dish and washed with a small volume of the lysis solution (from the 2 ml micro-tube). This wash was subsequently returned to the 2 ml micro-tube and the tube replaced on the heated shaker where it was allowed to incubate overnight. The following day, the DNA was isolated as per manufacturer’s instructions using the DNeasy® Blood and Tissue kit (Qiagen).Procedures DNA wasfor enrichment eluted in and 200 isolation μl DNase of MDV-free DNA water. from Ten dust to or fourteen individual aliquots feather follicles. of 500

mgA aaaaaaaaaaaaA each were used to obtain sufficient DNA for each dust sample (see Supplemental Table S2-1). Quantitative PCR was used to assess the copy number of viral genomes Resuspend in the resulting DNA.in 6.5 ml Total of yieldVortex and percentCentrifugation MDV-1 vs.Sonication MDV-2 DNACentrifugation are listed in PBS SupplementalPoultry dust Table S2-1. Vortex

2.4.3 Isolation of viral DNA from feather follicles Pass Pass DNA Capture DNase through 0.22 through 0.8 extraction using 0.1 The protocol for extraction of MDV DNAtreatment from featherµM filter follicles wasµM optimized filter for Genomic DNA µM filter the smaller input material and an expectation of higher purity (Figure 2-3). B

Clip base of the feather Mechanical containing follicular cells separation

Chicken feather Trypsinize

DNA DNase extraction treatment Sonication Genomic DNA

28

Figure 2-3. Procedures for enrichment and isolation of MDV DNA from individual feather follicles. Procedure for enrichment of MDV DNA using chicken feather follicle as the source of viral DNA. A feather was mechanically disrupted (bead-beating) and treated with trypsin to break open host cells and release cell-free virus into the solution. The sample was then treated with DNase to remove contaminant DNA. Finally, the viral capsids were lysed to obtain viral genomic DNA.

Sequential size filters were not used to filter out contaminants from feather follicles, since these direct host samples have fewer impurities than the environmental samples of dust. However, the feather follicle cells were encased inside the keratinaceous shell of the feather tip, which required disruption to release the cells. Each tube containing a single feather tip and one sterile 5 mm diameter steel bead was allowed to thaw, and then 200 μl of PBS was added and the sample bead-beaten for 30 seconds at 30 Hz using a Tissuelyser (Qiagen) (Figure 2-3). Vigorous bead-beating achieved the desired destruction of the follicle tip. To dissociate the cells, 80 μl of 2.5 mg/ml trypsin (Sigma) and 720 μl of PBS were then added (final trypsin concentration: 0.8 mg/ml), and the solution was transferred to a new sterile 2 ml micro-tube and incubated for 2 hours at 37°C on a heated microplate shaker (VWR) set to 700 rpm. To release cell-associated virus, the suspension was then sonicated on ice for 30 seconds using a Sonica Ultrasonic Processor Q125 (probe sonicator with 1/8th inch microtip) set to 50% amplitude. DNase I was added to a final concentration of 0.1 mg/ml and allowed to digest for 1 hour at room temperature to remove non-encapsidated DNA. An equal volume of lysis solution (ATL buffer and Proteinase K from the DNeasy® Blood and Tissue kit, Qiagen) was added and the sample was incubated over night at 56°C on an Incubating Microplate Shaker (VWR) set to 1100 rpm. The following day, the DNA was isolated as per manufacturer’s instructions using the DNeasy® Blood and Tissue kit (Qiagen). While the overall amount of DNA obtained from feather follicles was lower than that obtained from pooled dust samples (Supplemental Table S2-1), it was of higher purity and was sufficient to generate libraries for sequencing.

2.4.4 Measurement of total DNA and quantification of viral DNA The total amount of DNA present in the samples was quantified by fluorescence analysis using a Qubit® fluorescence assay (Invitrogen) following the manufacturer’s

29

recommended protocol. MDV genome copy numbers were determined using serotype- specific quantitative PCR (qPCR) primers and probes, targeting either the MDV-1 pp38 (MDV073; previously known as LORF14a) gene or MDV-2 (SB-1 strain) DNA polymerase (UL42, MDV055) gene. The MDV-1 assay was designed by Sue Baigent: forward primer (Spp38for) 5’-GAGCTAACCGGAGAGGGAGA-3’; reverse primer (Spp38rev) 5’-CGCATACCGACTTTCGTCAA-3’; probe (MDV-1) 6FAM- CCCACTGTGACAGCC-BHQ1 (S. Baigent, pers. comm.). The MDV-2 assay is that of Islam et al. (201), but with a shorter MGB probe (6FAM-GTAATGCACCCGTGAC-MGB) in place of their BHQ-2 probe. Real-time quantitative PCRs were performed on an ABI Prism 7500 Fast System with an initial denaturation of 95°C for 20 seconds followed by 40 cycles of denaturation at 95°C for 3 seconds and annealing and extension at 60°C for 30 seconds. Both assays included 4 μl of DNA in a total PCR reaction volume of 20 μl with 1X PerfeCTaTM qPCR FastMixTM (Quanta Biosciences), forward and reverse primers at 300 nM and TaqMan ® BHQ (MDV-1) or MGB (MDV-2) probes (Sigma and Life Sciences, respectively) at 100 nM and 200 nM, respectively. In addition each qPCR reaction incorporated 2 μl BSA (Sigma). Absolute quantification of genomes was based on a standard curve of serially diluted plasmids cloned from the respective target genes. The absolute quantification obtained was then converted to concentration. Once the concentration of the total DNA, MDV-1, and MDV-2 DNA present in the sample were known, we calculated the percentage of MDV-1 and MDV-2 genomic DNA in the total DNA pool (see Supplemental Table S2-1).

2.4.5 Illumina next-generation sequencing Sequencing libraries for each of the isolates were prepared using the Illumina TruSeq Nano DNA Sample Prep Kit, according to the manufacturer’s recommended protocol for sequencing of genomic DNA. Genomic DNA inputs used for each sample are listed in Table 2-1. The DNA fragment size selected for library construction was 550 base pairs (bp). All the samples were sequenced on an in-house Illumina MiSeq using version 3 sequencing chemistry to obtain paired-end sequences of 300 × 300 bp. Base calling and image analysis was performed with the MiSeq Control Software (MCS) version 2.3.0.

30

2.4.6 Consensus genome assembly As our samples contained DNA from many more organisms than just MDV, we developed a computational workflow (Figure 2-4) to preprocess our data prior to assembly.

Figure 2-4. Workflow for computational enrichment for MDV sequences and subsequent viral genome assembly and taxonomic profiling. The VirGA workflow (202) requires an input of high-quality HTS data from the viral genome of interest. For this study we added an additional step that selected MDV-like sequence reads from the milieu of dust and feather samples. The sequence reads of interest were obtained by using BLAST to compare all reads against a custom MDV database with an E-value of 10-2; these were then submitted to VirGA for assembly. Taxonomic profiling followed a similar path using NCBI’s all-nucleotide database to identify the taxonomic kingdom for each sequence read. In this workflow diagram, parallelograms represent data outputs while rectangles represent computational actions.

A local BLAST database was created from every Gallid herpesvirus genome

31

available in GenBank. All sequence reads for each sample were then compared to this database using BLASTN (75) with a loose e-value less than or equal to 10-2 in order to computationally enrich for sequences related to MDV. These “MDV-like” reads were then processed for downstream genome assembly. The use of bivalent vaccine made it possible for us to readily distinguish sequence reads that resulted from the shedding of virulent MDV-1 vs. vaccine virus (MDV-2 or HVT) strains. The overall DNA identity of MDV-1 and MDV-2 is just 61% (203). In a comparison of strains MDV-1 Md5 (NC_002229) and MDV-2 SB-1 (HQ840738), we found no spans of identical DNA greater than 50 bp (data not shown). This allowed us to accurately distinguish these 300 x 300 bp MiSeq sequence reads as being derived from either MDV-1 or MDV-2. MDV genomes were assembled using the viral genome assembly VirGA (202) workflow which combines quality control preprocessing of reads, de novo assembly, genome linearization and annotation, and post-assembly quality assessments. For the reference-guided portion of viral genome assembly in VirGA, the Gallid herpesvirus 2 (MDV-1) strain MD5 was used (GenBank Accession: NC_002229.3). These new genomes were named according to recent recommendations, as outlined by Kuhn et al (204). We use shortened forms of these names throughout the manuscript (see Table 2- 1 for short names). The full names for all five genomes are as follows: MDV-1 Gallus domesticus-wt/Pennsylvania, USA/2015/Farm A-dust 1; MDV-1 Gallus domesticus- wt/Pennsylvania, USA/2015/Farm A-dust 2; MDV-1 Gallus domesticus-wt/Pennsylvania, USA/2015/Farm B-dust; MDV-1 Gallus domesticus-wt/Pennsylvania, USA/2015/Farm B-feather 1; MDV-1 Gallus domesticus-wt/Pennsylvania, USA/2015/Farm B-feather 2. GenBank Accessions are listed below and in Table 2-1. Annotated copies of each genome, in a format compatible with genome- and sequence browsers, are available at the Pennsylvania State University ScholarSphere data repository: https://scholarsphere.psu.edu/collections/1544bp14j.

2.4.7 Between-sample: consensus genome comparisons Clustalw2 (43) was used to construct pairwise global nucleotide alignments between whole genome sequences, and pairwise global amino acid alignments between open reading frames. These alignments were utilized by downstream custom Python scripts to calculate percent identity, protein differences, and variation between

32

samples. The proline-rich region of UL36 (also known as VP1/2 or MDV049), which contains an extended array of tandem repeats, was removed from all five consensus genomes prior to comparison. The amount of polymorphism seen in this region of UL36 is driven by fluctuations in the length of these tandem repeats, as has been seen in prior studies with other alphaherpesviruses such as HSV, VZV, and pseudorabies virus (PRV) (32,48–50). Since the length of extended arrays of perfect repeats cannot be precisely determined by de novo assembly (189, 190, 111, 4), we excluded this region from pairwise comparisons of genome-wide variation. Genome alignments with and without the UL36 region removed are archived at the ScholarSphere site: https://scholarsphere.psu.edu/collections/1544bp14j.

2.4.8 Within-sample: polymorphism detection within each consensus genome VarScan v2.2.11 (206) was used to detect variants present within each consensus genome. To aid in differentiating true variants from potential sequencing errors (207), two separate variant calling analyses were explored. (183). Our main polymorphism- detection parameters (used in Figures 2-6 – 2-8 and Supplemental Table S2-2) were as follows: minimum variant allele frequency ≥ 0.02; base call quality ≥ 20; read depth at the position ≥ 10; independent reads supporting minor allele ≥ 2. Directional strand bias ≥ 90% was excluded; a minimum of two reads in opposing directions was required. For comparison and added stringency, we also explored a second set of parameters (used in Figure 2-7): minimum variant allele frequency ≥ 0.05; base call quality ≥ 20; read depth at the position ≥ 100; independent reads supporting minor allele ≥5. Directional strand bias ≥ 80% was excluded. The variants obtained from VarScan were then mapped back to the genome to understand their distribution and mutational impact using SnpEff and SnpSift (208, 209). Polymorphisms in the proline-rich region of UL36 were excluded, as noted above.

2.4.9 Testing for signs of selection acting on polymorphic viral populations For each of our five consensus genomes, which each represent a viral population, we classified the polymorphisms detected into categories of synonymous, non- synonymous, genic-untranslated, or intergenic, based on where each polymorphism

33

was positioned in the genome. For these analyses (Figure 2-8), we were only able to include polymorphisms detected in the three dust genomes, since the total number of polymorphisms obtained from feather genomes was too low for chi-square analysis. First, we calculated the total possible number of single nucleotide mutations that could be categorized as synonymous, non-synonymous, genic-untranslated or intergenic. To remove ambiguity when mutations in overlapping genes could be classified as either synonymous or non-synonymous, genes with alternative splice variants or overlapping reading frames were excluded from these analyses. This removed 25 open reading frames (approximately 21% of the genome). These tallies of potential mutational events were used to calculate the expected fraction of mutations in each category. We preformed chi-squared tests on each dataset to assess whether the observed distribution of polymorphisms matched the expected distribution. We also performed a similar analysis in pairwise fashion (Table 2-3), to assess whether the fraction of variants differed from what would be expected by random chance. Pairwise combinations included the following: synonymous vs. non-synonymous, synonymous vs. intergenic, synonymous vs. genic-untranslated, non-synonymous vs. intergenic, non-synonymous vs. genic-untranslated, and intergenic vs. genic-untranslated. Statistically significant outcomes would suggest that recent or historical selection differed between those categories of variants.

2.4.10 Sanger sequencing of polymorphic locus in ICP4 A potential locus of active selection within the ICP4 (MDV084 / MDV100) gene was detected during deep-sequencing of Farm B-dust. This locus was examined using Sanger sequencing. An approximately 400 bp region of the ICP4 gene was amplified using a Taq PCR Core Kit (Qiagen) and the following primers at 200 nM: forward primer (ICP4selF) 5’AACACCTCTTGCCATGGTTC 3’; reverse primer (ICP4selR) 5’GGACCAATCATCCTCTCTGG 3’. Cycling conditions included an initial denaturation of 95°C for 2 minutes, followed by 40 cycles of denaturation at 95°C for 30 seconds, annealing at 55°C for 30 seconds and extension at 72°C for 1 minute, with a terminal extension at 72°C for 10 minutes. The total reaction volume of 50 μl included 10 μl of DNA and 4 μl BSA (final concentration 0.8 mg/ml). Amplification products were visualized on a 1.5% agarose gel, the target amplicon excised and then purified using

34

the E.Z.N.A. Gel Extraction Kit (Omega Bio-tek). Sanger sequencing was performed by the Penn State Genomics Core Facility utilizing the same primers as used for DNA amplification. The relative peak height of each base call at the polymorphic position was analyzed using the ab1PeakReporter tool (210).

2.4.11 Genetic distance and dendrogram Multiple sequence alignments of complete MDV-1 (Gallid herpesvirus 2) genomes from GenBank and those assembled by our lab were generated using MAFFT (211). The evolutionary distances were computed using the Jukes-Cantor method (212) and the evolutionary history was inferred using the neighbor-joining method (213) in MEGA6 (214), with 1,000 bootstrap replicates (215). Positions containing gaps and missing data were excluded. The 18-strain genome alignment is archived at ScholarSphere: https://scholarsphere.psu.edu/collections/1544bp14j.

2.4.12 Taxonomic estimation of non-MDV sequences in dust and feathers All sequence reads from each sample were submitted to a quality control preprocessing method to remove sequencing primers, artifacts, and areas of low confidence (202). Sequence annotation was performed using a massively iterative all- vs.-all BLASTN (E-value ≤ 10-2) approach using the all-nucleotide-database from NCBI. Only a portion of the total sequence read pool could be identified with confidence using this method. We then used de novo assembly to extend the length of these unidentified sequences, therefore elongating them into contigs. These were iterated through BLASTN again, which revealed alignment to repetitive regions of the Gallus domesticus (chicken) genome. Since the viral DNA enrichment procedures include a level of stochasticity in removal of host and environmental contaminants, the proportion of taxa present is not a definitive outline of those present initially. The results of these classifications are shown in Figure 2-11 and listed in Supplemental Table S2-3.

2.4.13 GenBank accession numbers and availability of materials GenBank Accessions are listed here and in Table 2-1: Farm A - dust 1, KU173116; Farm A - dust 2, KU173115; Farm B – dust, KU173119; Farm B - feather 1, KU173117; Farm B - feather 2, KU173118. Additional files used in this manuscript, such as multiple-sequence alignments of these genomes, are archived and available at

35

ScholarSphere: https://scholarsphere.psu.edu/collections/1544bp14j

2.5 Results

2.5.1 Sequencing, assembly and annotation of new MDV-1 consensus genomes from the field

To assess the level of genomic diversity within and between field sites that are under real world selection, two commercial farms in central Pennsylvania (11 miles apart) with a high prevalence of MDV-1 were chosen (Figure 2-1). These operations raise poultry for meat (also known as broilers), and house 25,000-30,000 individuals per house. The poultry were vaccinated with a bivalent vaccine composed of MDV-2 (strain SB-1) and HVT (strain FC126). In contrast to the Rispens vaccine, which is an attenuated MDV-1 strain, MDV-2 and HVT can be readily distinguished from MDV-1 across the length of the genome, which allowed us to differentiate wild MDV-1 from concomitant shedding of vaccine strains. These farms are part of a longitudinal study of MDV-1 epidemiology and evolution in modern agricultural settings (216, 217). To obtain material for genomic surveillance, we isolated MDV nucleocapsids from dust or epithelial tissues from the individual feather follicles of selected hosts (see Methods for details). A total of five uncultured wild-type samples of MDV were sequenced using an in-house Illumina MiSeq sequencer (Table 2-1, lines 4-6; see Methods for details). Table 2-1: Field sample statistics and assembly of MDV-1 consensus genomes

Farm A - Farm A - Farm B - Farm B - Farm B - Categorya dust 1 dust 2 dust feather 1 feather 2 Line #

Nanograms of 1 120 127 144 12 27 DNA 2 % MDV-1 2.4% 1.3% 0.6% 40.6% 5.7% 3 % MDV-2 4.6% 2.7% 5.9% 0.1% 0% 4 Total Readsb 1.4×107 2.5×107 2.7×107 3.9×105 3.4×105 MDV-specific 5 3.7×105 5.1×105 1.4×106 1.0×105 1.7×105 readsb % MDV specific 6 2.6% 2.0% 5.2% 26.9% 48.3% reads 7 Average depth 271 333 597 44 68

36

(X-fold) 8 Genome length 177,967 178,049 178,169 178,327 178,540 NCBI accession 9 KU173116 KU173115 KU173119 KU173117 KU173118 number

aLines 1-3 refer to sample preparation, lines 4-6 to Illumina MiSeq output, and lines 7-9 to new viral genomes. b Sequence read counts in line 4 and 5 are the sum of forward and reverse reads for each sample

The sequence read data derived from dust contained approximately 2-5% MDV-1 DNA, while the feather samples ranged from ~27%-48% MDV-1 (Table 2-1, line 6). Since dust represents the infectious material that transmits MDV from host to host, and across generations of animals that pass through a farm or field site, we pursued analysis of wild MDV-1 genomes from both types of source material. Consensus genomes were created for each of the five samples in Table 2-1, using a recently described combination of de novo assembly and reference-guided alignment of large sequence segments, or contigs (Figure 2-5A) (202).

20,000 60,000 100,000 140,000 177,000 A a a’ a MDV-1 genome TRLUL IRL IRS US TRS LORF2 DNA pol* Meq ICP4 + ORFs _ vLIP UL43 Meq helicase- UL36 ICP4 B primase a’ Trimmed MDV-1 genome UL IRL IRS US C Percent 100% 50% identity 0%

Legend UL = Unique long TRL / IRL = Terminal / internal repeat of the long region US = Unique short TRS / IRS = Terminal / internal repeat of the short region a / a’ = Terminal / inverted “a” repeat = Proline rich region of ORF MDV049 100% identity From 30% to 99% identity Below 30% identity

37

Figure 2-5. The complete MDV-1 genome includes two unique regions and two sets of large inverted repeats. (A) The full structure of the MDV-1 genome includes a unique long region (UL) and a unique short regions (US), each of which are flanked large repeats known as the terminal and internal repeats of the long region (TRL and IRL) and the short region (TRS and IRS). Most ORFs (pale green arrows) are located in the unique regions of the genome. ORFs implicated in MDV pathogenesis are outlined and labeled; these include ICP4 (MDV084 / MDV100), UL36 (MDV049), and Meq (MDV005 / MDV076) (see Results for complete list). (B) A trimmed genome format without the terminal repeat regions was used for analyses in order to not over-represent the repeat regions. (C) Percent identity from mean pairwise comparison of five consensus genomes, plotted spatially along the length of the genome. Darker colors indicate lower percent identity (see Legend).

Nearly complete genomes were obtained for all five samples (Table 2-1). The coverage depth for each genome was directly proportional to the number of MDV-1 specific reads obtained from each sequencing library (Table 2-1, line 5,7). The dust sample from Farm B had the highest coverage depth, at an average of almost 600X across the viral genome. Feather 1 from Farm B had the lowest coverage depth, averaging 44X genome-wide, which still exceeds that of most bacterial or eukaryotic genome assemblies. The genome length for all 5 samples was approximately 180 kilobases (Table 2-1), which is comparable to all previously sequenced MDV-1 isolates (155, 156, 158–161, 187, 188). For each field sample collected and analyzed here, we assembled a consensus viral genome. We anticipated that the viral DNA present in a single feather follicle might be homotypic, based on similar results found for individual vesicular lesions of the alphaherpesvirus VZV (183, 196). We further expected that the genomes assembled from a dust sample would represent a mix of viral genomes, summed over time and space. Viral genomes assembled from dust represent the most common genome sequence, or alleles therein, from all of the circulating MDV-1 on a particular farm. The comparison of consensus genomes provided a view into the amount of sequence variation between Farm A and Farm B, or between two individuals on the same Farm (Table 2-2). In contrast, examining the polymorphic loci within each consensus genome assembly allowed us to observe the level of variation within the viral population at each point source (Figures 6-7; Supplemental Table S2-2).

38

2.5.2 DNA and amino acid variations between five new field genomes of MDV-1 We began our assessment of genetic diversity by determining the extent of DNA and amino acid variations between the five different consensus genomes. We found that the five genomes are highly similar to one another at the DNA level, with the percent homology ranging from 99.4% to 99.9% in pairwise comparisons (Figure 2-5C, Table 2-2).

Table 2-2: Pair-wise DNA identity and variant proteins between pairs of consensus genomes

% Total Intergenic Genic DNA # bp INDELs INDELs Synonym Comparisons SNP Non-synonymous ident differ (# (# ous s SNPs ity ent events) events) SNPs Different farms: Dust vs. dust Farm B-dust 66 (1) in 1 in 3 (one each in 99.7 143 vs. 353 140 DNA- helicase- vLIP, LORF2, 3 (22) Farm A-dust 1 pola primasea UL43) a Farm B-dust 66 (1) in 1 in 3 (one each in 99.8 vs. 195 49 (14) 76 DNA- helicase- vLIP, LORF2, 7 Farm A-dust 2 pola primasea UL43) a Same farm, same time: Dust vs. host Farm B-dust 66 (1) in 1 in 3 (one each in vs. 99.6 476 552 6 DNA- helicase- vLIP, LORF2, Farm B-feather 4 (11) pola primasea UL43) a 1 Farm B-dust 66 (1) in 1 in 3 (one each in vs. 99.5 572 687 45 DNA- helicase- vLIP, LORF2, Farm B-feather 2 (19) pola primase a UL43) a 2 Same farm: Separated in time and space Farm A-dust 1 99.7 170 vs. 338 168 0 0 0 6 (20) Farm A-dust 2 Same farm, same time: one host vs. another Farm B-feather 1 vs. 99.3 973 972 (9) 1 0 0 0 Farm B-feather 8 2 aAbbreviations refer to DNA polymerase processivity subunit protein UL42 (MDV055); helicase-primase subunit UL8 (MDV020); vLIP, lipase homolog (MDV010); LORF2, immuneevasion protein (MDV012); UL43 membrane protein (MDV056)

These comparisons used a trimmed genome format (Figure 2-5B) where the terminal repeat regions had been removed, so that these sequences were not over-

39

represented in the analyses. The level of identity between samples is akin to that observed in closely related isolates of herpes simplex virus 1 (HSV-1) (202). Observed nucleotide differences were categorized as genic or intergenic, and further sub-divided based on whether the differences were insertions or deletions (INDELs) or single- nucleotide polymorphism (SNPs) (Table 2-2). The number of nucleotide differences was higher in intergenic regions than in genic regions for all genomes. For the INDEL differences, we also calculated the minimum number of events that could have led to the observed differences, to provide context on the relative frequency of these loci in each genome. We anticipate that these variations include silent mutations, as well as potentially advantageous or deleterious coding differences. To understand the effect(s) of these nucleotide variations on protein coding and function, we next compared the amino acid (AA) sequences of all open reading frames (ORFs) for the five isolates. The consensus protein coding sequences of all five isolates were nearly identical, with just a few differences (Table 2-2). In comparison to the other four samples, Farm B-dust harbored AA substitutions in four proteins. A single non- synonymous mutation was seen in each of the following: the virulence-associated lipase homolog vLIP (MDV010; Farm B-dust, S501A) (218), the MHC class I immune evasion protein LORF2 (MDV012; Farm B-dust, L311W) (219), and the probable membrane protein UL43 (MDV056; Farm B-dust, S74L). A single synonymous mutation was observed in the DNA helicase-primase protein UL8 (MDV020; Farm B-dust, L253L). Finally, a 22 AA insertion unique to Farm B-dust was observed in the DNA polymerase processivity subunit protein UL42 (MDV055; Farm B-dust, insertion at AA 277). We did not observe any coding differences between temporally separated dust isolates from Farm A or between feather isolates from different hosts in Farm B, although both of these comparisons (Table 2-2, bottom) revealed hundreds of noncoding differences.

2.5.3 Detection of polymorphic bases within each genome Comparing viral genomes found in different sites provides a macro-level assessment of viral diversity. We next investigated the presence of polymorphic viral populations within each consensus genome, to reveal how much diversity might exist within a field site (as reflected in dust-derived genomes) or within a single host (as reflected in feather genomes).

40

For each consensus genome, we used polymorphism detection analysis to examine the depth and content of the sequence reads at every nucleotide position in each genome (see Methods for details). Rather than detecting differences between isolates, as in Table 2-2, this approach revealed polymorphic sites within the viral population that contributed to each consensus genome. We detected 2-58 polymorphic sites within each consensus genome (Figure 2-6) (see Methods for details).

Distribution of polymorphic loci in MDV genomes

60 Legend 23

ases Farm A/dust 1 Farm A/dust 2 Farm B/dust b

) 40 c n i

i Farm B/feather 1 Farm B/feather 2 a r

ph 26 t r s

20 o y m b

y l

d 10 e r po

f o 8 l o

o r c e ( 6 14 b

m 4 u

N 2

0 kbp 5 0 5 0 5 0 2 5 7 10 12 a’ 15 UL IRL IRS US

Genome position in bins of 5 kbp

Figure 2-6: Genome-wide distribution of polymorphic bases within each consensus genome. Polymorphic base calls from each MDV genome were grouped in bins of 5 kb and the sum of polymorphisms in each bin was plotted. Farm B-dust (aqua) contained the largest number of polymorphic bases, with the majority occurring in the repeat region (IRL/IRS). Farm A-dust 1 (brown) and Farm A-dust 2 (gray) harbored fewer polymorphic bases, with similar distribution to Farm B-dust. Polymorphic bases detected in feather genomes wererarer, although this likely reflects their lower coverage depth (see Table 2-1). Note that the upper and lower segments of the y-axis have different scales; the number of polymorphic bases per genome for the split column on the right are labeled for clarity.

The feather genomes had a lower number of polymorphisms compared to the dust genomes, which may be due to low within-follicle diversity or the relatively low sequence coverage. INDELs were not included in this polymorphism analysis, but clearly contributed to between-sample variation (Table 2-2), suggesting that this may be an underestimate of the overall amount of within-sample variation. Viral polymorphisms

41

were distributed across the entire length of the genome (Figure 2-6), with the majority concentrated in the repeat regions. Application of a more stringent set of parameters (see Methods for details) yielded a similar distribution of polymorphisms, albeit with no polymorphisms detected in feather samples due to their lower depth of coverage (Figure 2-7). Distribution of polymorphic loci in MDV genomes 30 Legend 25 Farm A-dust 1 Farm A-dust 2 Farm B-dust 20 8 15 10 11 6

4 (coloredby strain)

2 5 Number of polymorphic bases bases polymorphic of Number 0 25 50 75 kbp 100 125 150 a’ UL IRL IRS US

Genome position in bins of 5 kbp Figure 2-7: Genome-wide distribution of polymorphisms within each consensus genome, using high-stringency criteria. Polymorphic base calls from each MDV genome were grouped by position in bins of 5 kb and the sum of polymorphisms in each bin was plotted. Stricter parameters of polymorphism detection (see Methods for details) revealed a similar distribution to those in Figure 3. No polymorphisms were detected in feather-derived genomes using high-stringency criteria, due to their lower coverage depth (see Table 2-1). Note that the upper and lower segments of the y-axis have different scales; the number of polymorphic bases per segment for the split column on the right are thus labeled on the graph.

These data reveal that polymorphic alleles are present in field isolates, including in viral genomes collected from single sites of shedding in infected animals. To address the potential effect(s) of these polymorphisms on MDV biology, we divided the observed polymorphisms into categories of synonymous, non-synonymous, genic- untranslated, or intergenic (Supplemental Table S2-2). The majority of all polymorphisms were located in intergenic regions (Supplemental Table S2-2). We next

42

investigated whether evidence of selection could be detected from the distribution of polymorphisms in our samples. One way to assess this is to determine whether the relative frequencies of synonymous, non-synonymous, genic-untranslated, and intergenic polymorphisms can be explained by random chance. If the observed frequencies differ from those expected from a random distribution, it would suggest genetic selection. After calculating the expected distribution in each sample (as described in Methods), we determined that the distribution of variants differed from that expected by chance in each of our dust samples (Figure 2-8, Farm A-Dust 1: χ2=68.16, d.f. =3, p<0.001; Farm A-Dust 2: χ2=128.57, d.f. =3, p<0.001; Farm B-Dust 1: χ2=63.42, d.f. =3, p<0.001).

Number of observed vs. expected polymorphisms in each genome

40 ases

b 30

c i

ph 20 r o 10 ym l po

f o

r 5 e b m u

N 0 Farm A-dust 1 Farm A / dust 2 Farm B / dust Legend Observed Expected synonymous polymorphisms Observed Expected non-synonymous polymorphisms Observed Expected genic-untranslated polymorphisms Observed Expected intergenic polymorphisms

Figure 2-8. Observed vs. expected polymorphism categories for each consensus genome. Each consensus genome was analyzed for the presence of polymorphic loci (see Methods for details). Observed polymorphic loci (solid bars) were categorized as

43

causing synonymous (green) or non-synonymous (aqua) mutations, or as genic- untranslated (gray) or intergenic (brown). The expected outcomes (striped bars) for a random distribution of polymorphisms is plotted behind the observed outcomes (solid bars) for each category. For all genomes, there was a significant difference of the observed-vs.-expected intergenic polymorphisms, relative to those of other categories.

In addition, we found in pairwise tests that the number of observed intergenic polymorphisms was significantly higher than the observed values for other categories (Table 2-3). This suggests that the mutations that occurred in the intergenic regions were better tolerated and more likely to be maintained in the genome; i.e. that purifying selection was acting on coding regions.

Table 2-3: Chi-squared values from pairwise comparisons of different categories of polymorphisms.

Non- a Intergenic Intergenic Synonymo Synonymo Sample Intergenic synonymou vs. non- vs. genic us vs. non- us vs. vs. s vs. genic synonymo untranslate synonymo genic synonymo untranslate us us d us untranslat d ed

Farm A- χ2=16.6 χ2=55.47 χ2=3.74 χ2=0.03 χ2=0.83 χ2=1.73 dust 1 (p = <0.001) (p = <0.001) (p = 0.053) (p=0.873) (p = 0.361) (p = 0.189)

Farm A- χ2=31.76 χ2=94.93 χ2=9.48 χ2=1.11 χ2=2.72 χ2=0.69 dust 2 (p = <0.001) (p = <0.001) (p = 0.002) (p = 0.292) (p = 0.099) (p = 0.407)

Farm B- χ2=25.27 χ2=47.32 χ2=5.39 χ2=1.83 χ2=1.61 χ2=0.09 dust (p = <0.001) (p = <0.001) (p = 0.020) (p = 0.176) (p = 0.205) (p = 0.759) aDegrees of freedom (d.f.) = 1 for all comparisons; p indicates p-value 2.5.4 Tracking shifts in polymorphic loci over time In addition to observing polymorphic SNPs in each sample at a single moment in time, we explored whether any shifts in polymorphic allele frequency were detected in the two sequential dust samples from Farm A. We found one locus in the ICP4 (MDV084 / MDV100) gene (nucleotide position 5,495) that was polymorphic in the Farm A-dust 2 sample, with nearly equal proportions of sequence reads supporting the major allele (C) and the minor allele (A) (Figure 2-9A).

44

A Deep-sequencing B Sanger sequencing 70 100 90 60 Farm A-dust 2 collected for deep-sequencing 80 A 50 70 (Tyrosine) 60 40 50 C 30 40 (Serine) 30 C 20 Legend Allele frequency Alllele frequency 20 (Serine) 10 Frequency of "A" allele 10 0 0 Farm A-dust 1 Farm A-dust 2 0255075100125150175200225250 Day of the year Figure 2-9: A new polymorphic locus in ICP4, and its shifting allele frequency over time. (A) HTS data revealed a new polymorphic locus in ICP4 (MDV084) at nucleotide position 5,495. In the spatially- and temporally-separated dust samples from Farm A (see Figure 2-1 and Methods for details), we observed a different prevalence of C (encoding serine) and A (encoding tyrosine) alleles. (B) Using targeted Sanger sequencing of this locus, time-separated dust samples spanning nine months were Sanger-sequenced to track polymorphism frequency at this locus over time. The major and minor allele frequencies at this locus varied widely across time, and the major allele switched from C to A more than twice during this time.

In contrast, this locus had been 99% A and only 1% C in Farm A-dust 1 (collected 11 months earlier in another house on the same farm), such that it was not counted as polymorphic in that sample by our parameters (see Methods for details). At this polymorphic locus, the nucleotide C encodes a serine, while nucleotide A encodes a tyrosine. The encoded AA lies in the C-terminal domain of ICP4 (AA position 1,832). ICP4 is an important immediate-early protein in all herpesviruses, where it serves as a major regulator of viral transcription (220–222). The role of ICP4 in MDV pathogenesis is also considered crucial because of its proximity to the latency associated transcripts (LAT) and recently described miRNAs (222–224). In a previous study of MDV-1 attenuation through serial passage in vitro, mutations in ICP4 appeared to coincide with attenuation (194). Given the very different allele frequencies at this ICP4 locus between two houses on the same farm 11 months apart, we examined dust samples from one of the houses over 9 months with targeted Sanger sequencing of this SNP (Figure 2-9B). We found that this locus was highly polymorphic in time-separated dust samples. The A (Tyrosine) allele rose to almost 50% frequency in the 9 month period. In four of the dust samples,

45

the A (Tyrosine) allele was dominant over the C (serine) allele. This reversible fluctuation in allele frequencies over a short period of time is unprecedented for alphaherpesviruses so far as we know. However, recent studies on human cytomegalovirus (HCMV) have shown that selection can cause viral populations to evolve in short periods of time (197, 198). While this is only one example of a polymorphic locus that shifts in frequency over time, similar approaches could be used at any of the hundreds of polymorphic loci detected here (Supplemental Table S2-2).

2.5.5 Comparison of field isolates of MDV-1 to previously sequenced isolates To compare these new field-based MDV genomes to previously sequenced isolates of MDV, we created a multiple sequence alignment of all available MDV-1 genomes (155, 156, 158, 159, 161, 187, 188, 225, 226). The multiple sequence alignment was used to generate a dendrogram depicting genetic relatedness (see Methods). We observed that the five new isolates form a separate group when compared to all previously sequenced isolates (Figure 2-10).

53 648a, passage 61 648a, passage 81 Serial passages 100 648a, passage 41 of a USA strain 648a, passage 11 99 100 648a, passage 31 Farm B/dust 100 Farm A/dust 1 99 100 Farm A/dust 2 Present study, 79 commercial PA 100 Farm B/feather 1 farms (uncultured) 100 Farm B/feather 2 99 Md5 100 Md11 USA isolates RB-1B CU-2 Rispens (Vaccine strain) European isolate 814 (Vaccine strain) 98 100 GX0101 China isolates LMS 0.0001

46

Figure 2-10: Dendrogram of genetic distances among all sequenced MDV-1 genomes. Using a multiple-genome alignment of all available complete MDV-1 genomes, we calculated the evolutionary distances between genomes using the Jukes- Cantor model. A dendrogram was then created using the neighbor-joining method in MEGA with 1000 bootstraps. The five new field-sampled MDV-1 genomes (green) formed a separate group between the two clusters of USA isolates (blue). The European vaccine strain (Rispens) formed a separate clade, as did the three Chinese MDV-1 genomes (dark blue). GenBank Accessions for all strains: new genomes, Table 2-1; Passage 11-648a, JQ806361; Passage 31-648a, JQ806362; Passage 61-648a, JQ809692; Passage 41-648a, JQ809691; Passage 81-648a, JQ820250; CU-2, EU499381; RB-1B, EF523390; Md11, 170950; Md5, AF243438; Rispens (CVI988), DQ530348; 814, JF742597; GX0101, JX844666; LMS, JQ314003.

This may result from geographic differences as previously seen for HSV-1 and VZV (4, 227–230), or from temporal differences in the time of sample isolations, or from the lack of cell-culture adaptation in these new genomes. We also noted a distinctive mutation in the genes encoding glycoprotein L (gL; also known as UL1 or MDV013). All of the field isolates had a 12 nucleotide deletion in gL that has been described previously in strains from the Eastern USA. This deletion is found predominantly in very virulent or hypervirulent strains (vv and vv+, in MDV-1 pathotyping nomenclature (174)) (231–234). This deletion falls in the putative cleavage site of gL, which is necessary for its post-translational modification in the endoplasmic reticulum (232). Glycoprotein L forms a complex with another glycoprotein H (gH). The gH/gL dimer is conserved across the Herpesviridae family and has been associated with virus entry (235, 236). These field-isolated genomes also contain a number of previously characterized variations in the oncogenesis-associated Marek's EcoRI-Q-encoded protein (Meq; also known as MDV005, MDV076, and RLORF7). We observed three substitutions in the C- terminal (transactivation) domain of Meq (P153Q, P176A, P217A) (237). The first two of these variations have been previously associated with MDV-1 strains of very virulent and hypervirulent pathotypes (vv and vv+) (231, 238, 239), while the third mutation has been shown to enhance transactivation (240). In contrast, the field isolates lacked the 59 AA insertion in the Meq proline repeats that is often associated with attenuation, as seen in the vaccine strain CVI988 and the mildly-virulent strain CU-2 (225, 226, 241). We also observed a C119R substitution in all five field-derived genomes, which is

47

absent from attenuated and mildly-virulent isolates. This C119R mutation falls in the LXCXE motif of Meq, which normally binds to the tumor suppressor protein Rb to regulate cell cycle progression (231, 242). Although comprehensive in vitro and in vivo studies will be required to fully understand the biological implications of these variations, sequence comparisons of both gL and Meq from dust and feather genomes suggest that these closely resemble highly virulent (vv and vv+) variants of MDV-1 (225, 226). This is corroborated by the dendrogram (Figure 2-10), where the dust- and feather- derived genomes cluster closely with 648a, which is a highly virulent (vv+) MDV isolate.

2.5.6 Assessment of taxonomic diversity in dust and chicken feathers As noted in Table 2-1, only a fraction of the reads obtained from each sequencing library were specific to MDV-1. We analyzed the remaining sequences to gain insight into the taxonomic diversity found in poultry dust and chicken feathers. Since our enrichment for viral capsids removed most host and environmental contaminants, the taxa observed here represent only a fraction of the material present initially. However it provides useful insight into the overall complexity of each sample type. The results of the classification for Farm B- dust, Farm B-feather 1, and Farm B- feather 2 are shown in Figure 2-11. Farm B-dust Farm B-feather 1 Farm B-feather 2 Plant Bacteria Bacteria Plant * Bacteria * * Animalia MDV MDV MDV Chicken Chicken Chicken

*Unclassified or low prevalence

48

Figure 2-11. Taxonomic diversity in dust and chicken feathers from Farm B. We used an iterative BLASTN workflow to generate taxonomic profiles for all samples from Farm B (see Methods for details). Major categories are shown here, with a full list of taxa (to family level) in Supplemental Table S2-3. Farm B-feather 1 and Farm B-feather 2 show less overall diversity, as would be expected from direct host-sampling, vs. the environmental mixture of the dust samples. Since the viral DNA enrichment procedures remove variable amounts of host and environmental contaminants, the proportion of taxa present is representative but not fully descriptive of those present initially. The asterisk indicates sequences that were unclassified or at low prevalence.

We divided the sequence reads by the different kingdoms they represent. Complete lists of taxonomic diversity for all samples to the family level are listed in Supplemental Table S2-3. As expected, the taxonomic diversity of dust is greater than that of feather samples. The majority of sequences in the dust samples mapped to the chicken genome, and only about 2-5% were MDV specific (see also Table 2-1, line 6). We found that single feathers were a better source of MDV DNA, due to their reduced level of taxonomic diversity and higher percentage of MDV-specific reads (Table 2-1, line 6 and Figure 2-11).

2.6 Discussion This study presents the first description of MDV-1 genomes sequenced directly from a field setting. This work builds on recent efforts to sequence VZV and HCMV genomes directly from human clinical samples, but importantly the approaches presented here do not employ either the oligo-enrichment used for VZV or the PCR- amplicon strategy used for HCMV (183, 196–198, 243). This makes our technique widely accessible and reduces potential methodological bias. It is also more rapid to implement and is applicable to the isolation of unknown large DNA viruses, since it does not rely on sequence-specific enrichment strategies. These five genomes were interrogated at the level of comparing consensus genomes – between-host variation – as well as within each consensus genome – within-host variation. By following up with targeted PCR and Sanger sequencing, we demonstrate that HTS can rapidly empower molecular epidemiological field surveillance of loci undergoing genetic shifts. Although a limited number of non-synonymous differences were detected between the field samples compared here, it is striking that several of these (vLIP, LORF2, UL42) have been previously demonstrated to have roles in virulence and

49

immune evasion. The N-glycosylated protein viral lipase (vLIP; MDV010) encodes a 120 kDa protein that is required for lytic virus replication in chickens (218, 176). The vLIP gene of MDV-1 is homologous to other viruses in the Mardivirus genus as well as to avian adenoviruses (244–246). The S501A mutation in the second exon of vLIP protein is not present in the conserved region that bears homology to other pancreatic lipases (218). The viral protein LORF2 (MDV012) is a viral immune evasion gene that suppresses MHC class I expression by inhibiting TAP transporter delivery of peptides to the endoplasmic reticulum (219). LORF2 is unique to the non-mammalian Mardivirus clade, but its function is analogous to that of the mammalian alphaherpesvirus product UL49.5 (219, 247, 248). Another study has shown that LORF2 is an essential phosphoprotein with a potential role as a nuclear/cytoplasmic shuttling protein (249). Interestingly, we also observed a 22 amino acid insertion in the DNA polymerase processivity subunit protein UL42 (MDV055). In herpesviruses, UL42 has been recognized as an integral part of the DNA polymerase complex, interacting directly with DNA and forming a heterodimer with the catalytic subunit of the polymerase (250–252). In HSV-1, the N-terminal two thirds of UL42 have been shown to be sufficient for all known functions of UL42; the insertion in Farm B-dust falls at the edge of this N-terminal region (253). The non-synonymous mutations and insertions detected here warrant further study to evaluate their impacts on protein function and viral fitness in vivo. The fact that any coding differences were observed in this small sampling of field-derived genomes suggests that the natural ecology of MDV-1 may include mutations and adaptations in protein function, in addition to genetic drift. Drug resistance and vaccine failure have been attributed to the variation present in viral populations (183, 196, 254). Polymorphic populations allow viruses to adapt to diverse environments and withstand changing selective pressures, such as evading the host immune system, adapting to different tissue compartments, and facilitating transmission between hosts (183, 111, 196–198, 243, 254, 95). Polymorphisms that were not fully penetrant in the consensus genomes, but that may be fodder for future selection, include residues in genes associated with virulence and immune evasion, such as ICP4 (Figure 5), Meq, pp38, vLIP, LORF2, and others (Supplemental Table S2-2). The non- synonymous polymorphism that we observed in Meq is a low-frequency variant present

50

in the C-terminal domain (I201L) (Farm B-dust, Supplemental Table S2-2). However a comparison of 88 different Meq sequences from GenBank and unpublished field isolates (216) did not reveal any examples where leucine was the dominant allele; all sequenced isolates to date have isoleucine at position 201. Previous studies have examined the accumulation of polymorphic loci in MDV-1 genomes after serial passage in vitro (193, 194). Overall, we found a similar quantity of polymorphisms in field-derived genomes as found in these prior studies, but we did not find any specific polymorphic loci that were identical between field-derived and in vitro- passaged genomes (193, 194). The genes ICP4 (MDV084 / MDV100), LORF2 (MDV012), UL42 and MDV020 contain polymorphic in both field and serially-passaged isolates, albeit at different loci (193, 194). It is noteworthy that these coding variations are detected despite signs of clearance of polymorphisms from coding regions (Figure 2-8), as indicated by the higher-than-expected ratios of intergenic to coding polymorphisms in these genomes. Together these findings suggest that MDV-1 exhibits genetic variation and undergoes rapid selection in the field, which may demonstrate the basis of its ability to overcome vaccine induced host-resistance to infection (176, 255, 256, 178). For the viral transactivation protein ICP4, we explored the penetrance of a polymorphic locus (nucleotide position 5,495) both in full-length genomes, and also via targeted sequencing over time. Most of the work on this region in the MDV-1 genome has actually focused on the LAT transcripts that lie antisense to the ICP4 gene (222, 223). This polymorphic locus could thus impact either ICP4’s coding sequence (AA 1,832 serine vs tyrosine) or the sequence of the LAT transcripts. This variation in ICP4 lies in the C-terminal domain, which in HSV-1 has been implicated in the DNA synthesis, late gene expression, and intranuclear localization functions of ICP4 (257, 258). This combination of deep-sequencing genomic approaches to detect new polymorphic loci, and fast gene-specific surveillance to track changes in SNP frequency over a larger number of samples, illustrate the power of high-quality full genome sequences from field samples to provide powerful new markers for field ecology. Our comparison of new field-isolated MDV-1 genomes revealed a distinct genetic clustering of these genomes, separate from other previously sequenced MDV-1

51

genomes (Figure 2-10). This pattern may results from geographic and temporal drift in these strains, or from the wild, virulent nature of these strains vs. the adaptation(s) to tissue culture in all prior MDV-1 genome sequences. The impact of geography on the genetic relatedness of herpesvirus genomes has been previously shown for related alphaherpesviruses such as VZV and HSV-1 (4, 227–230). Phenomena such as recombination can also have an impact on the clustering pattern of MDV isolates. It is worth noting that the genetic distance dendrogram constructed here included genomes from isolates that were collected over a 40 year span, which introduces the potential for temporal drift (155, 156, 158, 159, 161, 187, 188, 225, 226). Agricultural and farming practices have evolved significantly during this time, and we presume that pathogens have kept pace. To truly understand the global diversity of MDV, future studies will need to include the impacts of recombination and polymorphisms within samples, in addition to the overall consensus-genome differences reflected by static genetic distance analyses. Prior studies have shown that when MDV is passaged for multiple generations in cell- culture, the virus accumulates a series of mutations, including several that affect virulence (193). The same is true for the betaherpesvirus HCMV (192). Extended passage in vitro forms the basis of vaccine attenuation strategies, as for the successful vaccine strain (vOka) of the alphaherpesvirus VZV (259). Cultured viruses can undergo bottlenecks during initial adaptation to cell culture, and they may accumulate variations and loss of function mutations by genetic drift or positive selection. The variations and mutations thus accumulated may have little relationship to virulence and the balance of variation and selection in the field. We thus anticipate that these field-isolated viral genomes more accurately reflect the genomes of wild MDV-1 that are circulating in the field. The ability to access and compare virus from virulent infections in the field will enable future analyses of vaccine-break viruses. Our data and approaches provide powerful new tools to measure viral diversity in field settings, and to track changes in large DNA virus populations over time in hosts and ecosystems. In the case of MDV-1, targeted surveillance based on an initial genomic survey could be used to track viral spread across a geographic area, or between multiple end-users associated with a single parent corporation (Figure 2-12).

52

Figure 2-12: Methods described above can be used to explore additional aspects of variation in future studies. (Artwork by Nick Sloff, Penn State University, Department of Entomology)

Similar approaches could be implemented for public-or animal-health programs, for instance to guide management decisions on how to limit pathogen spread and contain airborne pathogens. The ability to sequence and compare large viral genomes directly from individual hosts and field sites will allow a new level of interrogation of host-virus fitness interactions, which form the basis of host resistance to infection (Figure 2-12). Finally, the analysis of viral genomes from single feather follicles, as from single VZV vesicles, enables our first insights into naturally-occurring within-host variation during infection and transmission (Figure 2-12). Evidence from tissue compartmentalization studies in HCMV and VZV suggests that viral genomes differ in distinct body niches (183, 198, 243). These new techniques will enable us to ask similar questions about MDV-1, and to begin exploring the relative fitness levels of viruses found in different tissue compartments.

2.7 Acknowledgements We thank Sue Baigent, Michael DeGiorgio, Peter Kerr and members of Szpara and Read labs for helpful feedback and discussion. This work was supported and inspired by the Center for Infectious Disease Dynamics and the Huck Institutes for the Life Sciences, as well as by startup funds (MLS) from the Pennsylvania State

53

University. This work was part funded by the Institute of General Medical Sciences, National Institutes of Health (R01GM105244; AFR) as part of the joint NSF-NIH-USDA Ecology and Evolution of Infectious Diseases program. The findings and conclusions of this study do not necessarily reflect the view of the funding agencies.

54

2.8 Supplementary tables

Supplementary table S2-1: Yield and percent MDV-1+MDV-2 and total nanograms of DNA in each sample for Farm A-dust 1, Farm A-dust2, and Farm B-dust

Washes % MDV1 Samples on 0.1 %MDV-1 %MDV-2 DNA (ng) + MDV2 μm filtera Farm A-dust1 1 0 2.88 5.44 8.3 6.94 2 0 2.03 5.12 7.2 6.59 3 0 4.16 8.39 12.5 6.73 4 0 2.51 4.73 7.2 4.71 5 0 1.66 3.3 4.96 6.97 6 1 9.13 13.99 23.12 2.69 7 1 9.29 15.7 24.99 2.16 8 1 5.86 10.91 16.77 3.36 9 0 1.89 2.98 4.9 9.81 10 0 1.76 2.9 4.7 17.35 11 0 2.69 5.33 8.02 8.96 12 0 4.49 7.8 12.29 4.14 13 0 1.16 2.49 3.65 20 14 0 1.36 2.83 4.19 19.47 Farm A-dust2 1 0 1.5 3.16 4.66 10.69 2 0 2.55 5.62 8.17 7.18 3 0 1.36 3.68 5.04 7.62 4 0 1.38 2.94 4.32 9.84 5 1 2.71 6.19 8.9 4.11 6 1 3.08 5.87 8.95 4.37 7 1 2.68 4.91 7.59 5.88 8 1 3.49 6.24 9.73 4.88 9 1 4.09 7.94 12.03 2.66 10 1 6.42 10.52 16.94 3.15 11 0 0.26 0.91 1.17 20.35 12 0 0.19 0.56 0.75 26.09 13 0 0.24 0.93 1.17 15.13 14 0 0.36 1.21 1.57 5.62 Farm B-dust 1 0 0.84 6.68 7.52 14.1

55

2 0 0.46 5.2 5.66 26.64 3 0 0.65 4.85 5.5 19.43 4 0 0.75 5.91 6.66 16.84 5 0 0.23 3.67 3.9 25.9 6 0 0.53 4.65 5.18 23.5 7 1 1.1 14.5 15.6 4.59 8 1 0.95 15.77 16.72 4.29 9 1 0.95 14.4 15.35 4.81 10 1 1.02 10.69 11.71 3.59 Feathers Feather 1 40.59 0.12 40.72 11.97 Feather 2 5.68 0.02 5.70 27.36 aSamples that were washed before lysis (bold) yielded a higher percent MDV DNA, but less overall DNA.

56 Supplementary Table S2-2: Summary and annotation of all polymorphic loci detected in MDV-1 consensus genomes

Farm A/dust 1 (high stringency) Percent Percent Reads Reads Reads Reads reads reads supporting supporting supporting supporting Position Minor supporting supporting Major Minor major major minor minor Type of in the allele minor minor Gene Function Isolate allele allele allele on allele on allele on allele on variation genome frequency allele on allele on forward reverse forward reverse forward reverse strand strand strand strand strand strand Farm A/dust 1 115376 C A 8.16% 93 42 6 6 50% 50.00% Intergenic N/A N/A Farm A/dust 1 115377 C A 29.20% 51 29 21 12 64% 36.36% Intergenic N/A N/A Farm A/dust 1 137099 A C 34.74% 42 20 18 15 55% 45.45% Intergenic N/A N/A Farm A/dust 1 137101 A C 5.93% 75 36 4 3 57% 42.86% Intergenic N/A N/A Farm A/dust 1 137264 A G 5.22% 87 40 5 2 71% 28.57% Intergenic N/A N/A Farm A/dust 1 138209 C A 8.27% 88 34 8 3 73% 27.27% Intergenic N/A N/A Farm A/dust 1 138281 A C 12.82% 47 21 6 4 60% 40.00% Intergenic N/A N/A Farm A/dust 2 (high stringency) Percent Percent Reads Reads Reads Reads reads reads supporting supporting supporting supporting Position Minor supporting supporting Major Minor major major minor minor Type of Isolate in the allele minor minor Gene Function allele allele allele on allele on allele on allele on variation genome frequency allele on allele on forward reverse forward reverse forward reverse strand strand strand strand strand strand Non- gH, glycoprotein H; UL22 homolog; Farm A/dust 2 40519 G A 19.88% 51 78 14 18 44% 56% synonymous MDV034 heterodimer with gL; part of variant fusion/entry complex Non- gB, glycoprotein B; UL27 homolog; Farm A/dust 2 48554 C T 9.68% 114 54 13 5 72% 28% synonymous MDV040 part of fusion/entry complex variant Farm A/dust 2 116937 G A 12.67% 86 176 11 27 29% 71% Intergenic N/A N/A

Meq; oncogene; role in tumor Farm A/dust 2 121872 T C 36.76% 52 108 35 58 38% 62% Genic_UTR MDV076 formation; no HSV homolog Non- ICP4 (RS1) homolog; transactivator Farm A/dust 2 130968 G T 43.48% 77 105 52 88 37% 63% synonymous MDV084 of gene expression; immediate-early variant protein Farm A/dust 2 137156 C A 5.17% 81 139 4 8 33% 67% Intergenic N/A N/A Farm A/dust 2 138433 C A 6.54% 215 85 13 8 62% 38% Intergenic N/A N/A

57 Supplementary Table S2-2: Summary and annotation of all polymorphic loci detected in MDV-1 consensus genomes

Farm A/dust 2 138436 C G 7.12% 221 92 18 6 75% 25% Intergenic N/A N/A Farm A/dust 2 138437 T A 8.19% 218 96 20 8 71% 29% Intergenic N/A N/A Farm A/dust 2 138505 C A 28.91% 73 136 52 33 61% 39% Intergenic N/A N/A Farm A/dust 2 138506 A C 9.15% 117 161 15 13 54% 46% Intergenic N/A N/A Farm A/dust 2 138593 C G 6.19% 81 222 10 10 50% 50% Intergenic N/A N/A Farm A/dust 2 138594 G A 12.81% 76 203 13 28 32% 68% Intergenic N/A N/A Farm A/dust 2 138596 T C 5.35% 81 220 9 8 53% 47% Intergenic N/A N/A Farm A/dust 2 138599 A C 5.40% 87 211 5 12 29% 71% Intergenic N/A N/A Farm A/dust 2 138748 A G 19.15% 12 64 5 13 28% 72% Intergenic N/A N/A Farm B/dust (high stringency) Non- vLIP; lipase homolog; role in Farm B/dust 2072 T G 43.64% 90 43 66 37 64% 36% synonymous MDV010 virulence in vivo; no HSV homolog variant Synonymous DNA helicase-primase subunit; UL8 Farm B/dust 15775 C T 45.76% 94 53 78 46 63% 37% MDV020 variant homolog; role in DNA replication large tegument protein; VP1/2 Non- (UL36) homolog; ubiquitin specific Farm B/dust 65843 A G 11.30% 39 118 5 15 25% 75% synonymous MDV049 protease; complexed w/ UL37 variant tegument protein Non- UL43 homolog; probably membrane Farm B/dust 86626 T C 40.19% 114 78 78 51 60% 40% synonymous MDV056 protein; non-essential in vitro variant LORF5; function unknown; no HSV Farm B/dust 108743 T C 41.74% 173 95 119 73 62% 38% Genic_UTR MDV072 homolog

Farm B/dust 115231 A C 21.80% 127 221 36 61 37% 63% Intergenic N/A N/A Farm B/dust 115232 A C 14.99% 161 270 19 57 25% 75% Intergenic N/A N/A Meq; oncogene; role in tumor Farm B/dust 121656 C T 37.22% 76 118 43 72 37% 63% Genic_UTR MDV076 formation; no HSV homolog Farm B/dust 124841 T C 41.75% 188 151 130 113 53% 47% Intergenic N/A N/A Farm B/dust 137449 A C 45.05% 149 62 103 70 60% 40% Intergenic N/A N/A Farm B/dust 138199 T A 5.64% 393 142 24 8 75% 25% Intergenic N/A N/A Farm B/dust 138267 C A 37.23% 145 231 81 142 36% 64% Intergenic N/A N/A Farm B/dust 138268 A C 7.74% 188 396 25 24 51% 49% Intergenic N/A N/A

58 Supplementary Table S2-2: Summary and annotation of all polymorphic loci detected in MDV-1 consensus genomes

Farm B/dust 138355 C G 5.22% 83 407 9 18 33% 67% Intergenic N/A N/A Farm B/dust 138356 G A 12.99% 74 368 15 51 23% 77% Intergenic N/A N/A Farm B/dust 138361 A C 5.24% 89 381 6 20 23% 77% Intergenic N/A N/A Farm B/dust 138510 A G 27.94% 25 73 10 28 26% 74% Intergenic N/A N/A Farm A/dust 1 (low stringency) Percent Percent Reads Reads Reads Reads reads reads supporting supporting supporting supporting Position Minor supporting supporting Major Minor major major minor minor Type of Isolate in the allele minor minor Gene Function allele allele allele on allele on allele on allele on variation genome frequency allele on allele on forward reverse forward reverse forward reverse strand strand strand strand strand strand capsid protein VP23; UL18 Synonymous Farm A/dust 1 30905 G A 7.61% 53 32 2 5 29% 71% MDV030 homolog; DNA packaging terminase variant subunit 1; DNA encapsidation Non- gH; glycoprotein H; UL22 homolog; Farm A/dust 1 40519 G A 17.78% 56 18 12 4 75% 25% synonymous MDV034 heterodimer with gL; part of variant fusion/entry complex Farm A/dust 1 43053 G A 17.24% 48 24 12 3 80% 20% Genic_UTR MDV035 UL24 homolog; nuclear protein pp38; 38 kDa phosphoprotein; role Non- in pathogenesis; necessary for Farm A/dust 1 113439 G T 7.61% 66 19 6 1 86% 14% synonymous MDV073 infection of B cells and latency in T variant cells; no HSV homolog Farm A/dust 1 115376 C A 8.16% 93 42 6 6 50% 50% Intergenic N/A N/A Farm A/dust 1 115377 C A 29.20% 51 29 21 12 64% 36% Intergenic N/A N/A Farm A/dust 1 117527 G T 4.55% 132 57 6 3 67% 33% Intergenic N/A N/A Meq; oncogene; role in tumor Farm A/dust 1 121803 C T 26.11% 92 41 39 8 83% 17% Genic_UTR MDV076 formation; no HSV homolog Farm A/dust 1 126256 T G 12.96% 43 4 4 3 57% 43% Intergenic N/A N/A Non- ICP4 (RS1) homolog; transactivator Farm A/dust 1 132086 T C 11.97% 108 17 14 3 82% 18% synonymous MDV084 of gene expression; immediate-early variant protein Farm A/dust 1 137099 A C 34.74% 42 20 18 15 55% 45% Intergenic N/A N/A Farm A/dust 1 137100 A C 8.41% 58 40 8 1 89% 11% Intergenic N/A N/A Farm A/dust 1 137101 A C 5.93% 75 36 4 3 57% 43% Intergenic N/A N/A Farm A/dust 1 137264 A G 5.22% 87 40 5 2 71% 29% Intergenic N/A N/A

59 Supplementary Table S2-2: Summary and annotation of all polymorphic loci detected in MDV-1 consensus genomes

Farm A/dust 1 138209 C A 8.27% 88 34 8 3 73% 27% Intergenic N/A N/A Farm A/dust 1 138212 C G 6.29% 93 41 8 1 89% 11% Intergenic N/A N/A Farm A/dust 1 138213 T A 7.87% 77 40 9 1 90% 10% Intergenic N/A N/A Farm A/dust 1 138281 A C 12.82% 47 21 6 4 60% 40% Intergenic N/A N/A Farm A/dust 1 138377 G C 8.33% 60 50 1 9 10% 90% Intergenic N/A N/A Farm A/dust 1 138379 A C 9.71% 39 54 1 9 10% 90% Intergenic N/A N/A Farm A/dust 1 138381 A T 10.78% 44 47 2 9 18% 82% Intergenic N/A N/A Farm A/dust 1 138490 A T 12.64% 56 20 3 8 27% 73% Intergenic N/A N/A Farm A/dust 1 138492 C T 10.84% 52 22 2 7 22% 78% Intergenic N/A N/A Farm A/dust 1 138523 A G 35.48% 31 9 12 10 55% 45% Intergenic N/A N/A Farm A/dust 2 (low stringency) Percent Percent Reads Reads Reads Reads reads reads supporting supporting supporting supporting Position Minor supporting supporting Major Minor major major minor minor Type of Isolate in the allele minor minor Gene Function allele allele allele on allele on allele on allele on variation genome frequency allele on allele on forward reverse forward reverse forward reverse strand strand strand strand strand strand Farm A/dust 2 7251 G A 5.14% 63 103 1 8 11% 88.89% Intergenic N/A N/A Non- serine/threonine kinase; UL13 Farm A/dust 2 22829 A G 4.35% 105 71 6 2 75% 25.00% synonymous MDV025 homolog variant Non- gH, glycoprotein H; UL22 homolog; Farm A/dust 2 40519 G A 19.75% 51 79 14 18 44% 56.25% synonymous MDV034 heterodimer with gL; part of variant fusion/entry complex Non- gB, glycoprotein B; UL27 homolog; Farm A/dust 2 48554 C T 9.68% 114 54 13 5 72% 27.78% synonymous MDV040 part of fusion/entry complex variant LORF5; function unknown; no HSV Farm A/dust 2 109048 G A 5.57% 186 102 2 15 12% 88.24% Genic_UTR MDV072 homolog

Farm A/dust 2 115422 C T 2.47% 178 134 1 7 13% 87.50% Intergenic N/A N/A Farm A/dust 2 115441 A C 2.20% 129 225 6 2 75% 25.00% Intergenic N/A N/A Farm A/dust 2 116937 G A 12.67% 86 176 11 27 29% 71.05% Intergenic N/A N/A Meq; oncogene; role in tumor Farm A/dust 2 121872 T C 36.76% 52 108 35 58 38% 62.37% Genic_UTR MDV076 formation; no HSV homolog

60 Supplementary Table S2-2: Summary and annotation of all polymorphic loci detected in MDV-1 consensus genomes

Farm A/dust 2 126692 C G 26.35% 108 1 33 6 85% 15.38% Intergenic N/A N/A Farm A/dust 2 126693 C T 27.21% 105 1 33 7 83% 17.50% Intergenic N/A N/A Non- ICP4 (RS1) homolog; transactivator Farm A/dust 2 130968 G T 43.48% 77 105 52 88 37% 62.86% synonymous MDV084 of gene expression; immediate-early variant protein Farm A/dust 2 137156 C A 5.17% 81 139 4 8 33% 66.67% Intergenic N/A N/A Farm A/dust 2 137320 T A 9.94% 120 41 3 15 17% 83.33% Intergenic N/A N/A Farm A/dust 2 138433 C A 6.50% 215 85 13 8 62% 38.10% Intergenic N/A N/A Farm A/dust 2 138434 T A 4.43% 202 89 7 7 50% 50.00% Intergenic N/A N/A Farm A/dust 2 138436 C G 7.06% 221 93 18 6 75% 25.00% Intergenic N/A N/A Farm A/dust 2 138437 T A 8.12% 219 96 20 8 71% 28.57% Intergenic N/A N/A Farm A/dust 2 138451 G A 2.42% 244 110 4 5 44% 55.56% Intergenic N/A N/A Farm A/dust 2 138505 C A 28.26% 75 139 58 33 64% 36.26% Intergenic N/A N/A Farm A/dust 2 138506 A C 8.97% 122 162 15 13 54% 46.43% Intergenic N/A N/A Farm A/dust 2 138593 C G 6.17% 81 222 10 10 50% 50.00% Intergenic N/A N/A Farm A/dust 2 138594 G A 12.81% 76 203 13 28 32% 68.29% Intergenic N/A N/A Farm A/dust 2 138595 G C 5.02% 84 219 7 9 44% 56.25% Intergenic N/A N/A Farm A/dust 2 138596 T C 5.61% 81 220 10 8 56% 44.44% Intergenic N/A N/A Farm A/dust 2 138599 A C 5.38% 87 211 5 12 29% 70.59% Intergenic N/A N/A Farm A/dust 2 138600 G C 2.59% 90 210 1 7 13% 87.50% Intergenic N/A N/A Farm A/dust 2 138600 G C 2.59% 90 210 1 7 13% 87.50% Intergenic N/A N/A Farm A/dust 2 138601 G T 2.97% 81 213 6 3 67% 33.33% Intergenic N/A N/A Farm A/dust 2 138601 G T 2.97% 81 213 6 3 67% 33.33% Intergenic N/A N/A Farm A/dust 2 138602 G C 4.15% 77 197 3 9 25% 75.00% Intergenic N/A N/A Farm A/dust 2 138602 G C 4.15% 77 197 3 9 25% 75.00% Intergenic N/A N/A Farm A/dust 2 138604 A C 4.96% 77 191 5 9 36% 64.29% Intergenic N/A N/A Farm A/dust 2 138604 A C 4.96% 77 191 5 9 36% 64.29% Intergenic N/A N/A Farm A/dust 2 138606 A T 4.63% 75 192 3 10 23% 76.92% Intergenic N/A N/A Farm A/dust 2 138606 A T 4.63% 75 192 3 10 23% 76.92% Intergenic N/A N/A

61 Supplementary Table S2-2: Summary and annotation of all polymorphic loci detected in MDV-1 consensus genomes

Farm A/dust 2 138748 A G 17.82% 12 65 5 13 28% 72.22% Intergenic N/A N/A Farm A/dust 2 138748 A G 17.82% 12 65 5 13 28% 72.22% Intergenic N/A N/A Farm B/dust (low stringency) Percent Percent Reads Reads Reads Reads reads reads supporting supporting supporting supporting Position Minor supporting supporting Major Minor major major minor minor Type of Isolate in the allele minor minor Gene Function allele allele allele on allele on allele on allele on variation genome frequency allele on allele on forward reverse forward reverse forward reverse strand strand strand strand strand strand Non- vLIP; lipase homolog; role in Farm B/dust 2072 T G 44% 90 43 66 37 64% 36% synonymous MDV010 virulence in vivo; no HSV homolog variant Non- LORF2; TAP transporter segmenter; Farm B/dust 4411 G T 44% 18 80 14 64 18% 82% synonymous MDV012 reduces MHCI presentation; no variant direct HSV homolog Non- virion morphogenesis & egress; UL7 Farm B/dust 13809 G C 3% 173 133 4 4 50% 50% synonymous MDV019 homolog; tegument protein variant Synonymous DNA helicase-primase subunit; UL8 Farm B/dust 15775 C T 46% 94 53 78 46 63% 37% MDV020 variant homolog; role in DNA replication large tegument protein; VP1/2 Non- (UL36) homolog; ubiquitin specific Farm B/dust 65764 G A 10% 66 93 3 14 18% 82% synonymous MDV049 protease; complexed w/ UL37 variant tegument protein large tegument protein; VP1/2 Non- (UL36) homolog; ubiquitin specific Farm B/dust 65773 G T 16% 62 95 4 26 13% 87% synonymous MDV049 protease; complexed w/ UL37 variant tegument protein large tegument protein; VP1/2 Synonymous (UL36) homolog; ubiquitin specific Farm B/dust 65796 A G 35% 53 85 9 66 12% 88% MDV049 variant protease; complexed w/ UL37 tegument protein large tegument protein; VP1/2 Non- (UL36) homolog; ubiquitin specific Farm B/dust 65804 A T 34% 61 89 11 67 14% 86% synonymous MDV049 protease; complexed w/ UL37 variant tegument protein large tegument protein; VP1/2 Non- (UL36) homolog; ubiquitin specific Farm B/dust 65821 G T 34% 53 83 8 63 11% 89% synonymous MDV049 protease; complexed w/ UL37 variant tegument protein

62 Supplementary Table S2-2: Summary and annotation of all polymorphic loci detected in MDV-1 consensus genomes

large tegument protein; VP1/2 Non- (UL36) homolog; ubiquitin specific Farm B/dust 65843 A G 11% 39 118 5 15 25% 75% synonymous MDV049 protease; complexed w/ UL37 variant tegument protein Non- DNA polymerase processivity Farm B/dust 85939 A C 95% 0 2 15 21 42% 58% synonymous MDV055 subunit; UL42 homolog; dsDNA variant binding protein Non- DNA polymerase processivity Farm B/dust 85954 C T 29% 14 18 11 2 85% 15% synonymous MDV055 subunit; UL42 homolog; dsDNA variant binding protein Non- DNA polymerase processivity Farm B/dust 85959 C A 36% 22 17 16 6 73% 27% synonymous MDV055 subunit; UL42 homolog; dsDNA variant binding protein Non- DNA polymerase processivity Farm B/dust 85961 C A 39% 28 16 20 8 71% 29% synonymous MDV055 subunit; UL42 homolog; dsDNA variant binding protein Non- DNA polymerase processivity Farm B/dust 85962 G A 40% 25 10 19 4 83% 17% synonymous MDV055 subunit; UL42 homolog; dsDNA variant binding protein Non- DNA polymerase processivity Farm B/dust 85963 T C 56% 7 12 18 6 75% 25% synonymous MDV055 subunit; UL42 homolog; dsDNA variant binding protein Non- DNA polymerase processivity Farm B/dust 85966 G A 9% 46 22 3 4 43% 57% synonymous MDV055 subunit; UL42 homolog; dsDNA variant binding protein

Non- DNA polymerase processivity Farm B/dust 85971 G C 11% 47 18 4 4 50% 50% synonymous MDV055 subunit; UL42 homolog; dsDNA variant binding protein Non- DNA polymerase processivity Farm B/dust 85974 G C 9% 49 21 2 5 29% 71% synonymous MDV055 subunit; UL42 homolog; dsDNA variant binding protein Non- UL43 homolog; probably membrane Farm B/dust 86626 T C 40% 114 78 78 51 60% 40% synonymous MDV056 protein; non-essential in vitro variant LORF5; function unknown; no HSV Farm B/dust 108743 T C 42% 173 95 119 73 62% 38% Genic_UTR MDV072 homolog

LORF5; function unknown; no HSV Farm B/dust 108856 G A 10% 254 164 6 41 13% 87% Genic_UTR MDV072 homolog

63 Supplementary Table S2-2: Summary and annotation of all polymorphic loci detected in MDV-1 consensus genomes

LORF5; function unknown; no HSV Farm B/dust 108899 A G 11% 227 192 7 45 13% 87% Genic_UTR MDV072 homolog

LORF5; function unknown; no HSV Farm B/dust 109012 T C 5% 184 176 2 16 11% 89% Genic_UTR MDV072 homolog

Farm B/dust 114927 T G 2% 331 276 13 2 87% 13% Intergenic N/A N/A Farm B/dust 115231 A C 22% 127 221 36 61 37% 63% Intergenic N/A N/A Farm B/dust 115232 A C 15% 161 270 19 57 25% 75% Intergenic N/A N/A Farm B/dust 115241 C A 4% 155 360 17 6 74% 26% Intergenic N/A N/A Farm B/dust 116288 T A 2% 309 248 3 9 25% 75% Intergenic N/A N/A Farm B/dust 120327 A G 29% 4 64 3 25 11% 89% Intergenic N/A N/A Non- Meq; oncogene; role in tumor Farm B/dust 121181 A C 2% 203 194 4 6 40% 60% synonymous MDV076 formation; no HSV homolog variant Meq; oncogene; role in tumor Farm B/dust 121656 C T 37% 76 118 43 72 37% 63% Genic_UTR MDV076 formation; no HSV homolog Meq; oncogene; role in tumor Farm B/dust 122052 A T 3% 381 231 14 3 82% 18% Genic_UTR MDV076 formation; no HSV homolog Farm B/dust 124841 T C 42% 188 151 130 113 53% 47% Intergenic N/A N/A Farm B/dust 127347 G T 42% 35 178 22 130 14% 86% Intergenic N/A N/A Farm B/dust 137081 A C 4% 97 66 1 6 14% 86% Intergenic N/A N/A Farm B/dust 137101 A G 2% 137 198 7 1 88% 13% Intergenic N/A N/A Farm B/dust 137102 A G 4% 119 199 10 2 83% 17% Intergenic N/A N/A Farm B/dust 137249 A G 3% 112 193 7 1 88% 13% Intergenic N/A N/A Farm B/dust 137449 A C 45% 149 62 103 70 60% 40% Intergenic N/A N/A Farm B/dust 138195 C A 4% 402 137 15 5 75% 25% Intergenic N/A N/A Farm B/dust 138196 T A 3% 383 138 10 7 59% 41% Intergenic N/A N/A Farm B/dust 138198 C G 5% 405 142 18 8 69% 31% Intergenic N/A N/A Farm B/dust 138199 T A 6% 393 142 24 8 75% 25% Intergenic N/A N/A Farm B/dust 138266 C A 4% 266 386 19 9 68% 32% Intergenic N/A N/A Farm B/dust 138267 C A 37% 145 231 81 142 36% 64% Intergenic N/A N/A

64 Supplementary Table S2-2: Summary and annotation of all polymorphic loci detected in MDV-1 consensus genomes

Farm B/dust 138268 A C 8% 188 396 25 24 51% 49% Intergenic N/A N/A Farm B/dust 138355 C G 5% 83 407 9 18 33% 67% Intergenic N/A N/A Farm B/dust 138356 G A 13% 74 368 15 51 23% 77% Intergenic N/A N/A Farm B/dust 138357 G C 4% 83 397 5 14 26% 74% Intergenic N/A N/A Farm B/dust 138358 T C 4% 82 399 8 12 40% 60% Intergenic N/A N/A Farm B/dust 138361 A C 5% 89 381 6 20 23% 77% Intergenic N/A N/A Farm B/dust 138362 G C 3% 95 379 2 11 15% 85% Intergenic N/A N/A Farm B/dust 138364 G C 4% 80 355 5 15 25% 75% Intergenic N/A N/A Farm B/dust 138366 A C 5% 79 343 8 14 36% 64% Intergenic N/A N/A Farm B/dust 138368 A T 3% 79 345 2 13 13% 87% Intergenic N/A N/A Farm B/dust 138476 A C 4% 111 79 2 5 29% 71% Intergenic N/A N/A Farm B/dust 138510 A G 28% 25 73 10 28 26% 74% Intergenic N/A N/A Farm B/feather 1 (low stringency) Percent Percent Reads Reads Reads Reads reads reads supporting supporting supporting supporting Position Minor supporting supporting Major Minor major major minor minor Type of Isolate in the allele minor minor Gene Function allele allele allele on allele on allele on allele on variation genome frequency allele on allele on forward reverse forward reverse forward reverse strand strand strand strand strand strand Non- Farm capsid portal protein; UL6 homolog; 12176 C A 15.56% 17 20 6 1 86% 14% synonymous MDV018 B/feather 1 DNA encapsidation variant Non- Farm serine/threonine kinase; UL13 23126 G T 12.28% 28 21 2 5 29% 71% synonymous MDV025 B/feather 1 homolog variant Farm B/feather 2 (low stringency) Percent Percent Reads Reads Reads Reads reads reads supporting supporting supporting supporting Position Minor supporting supporting Major Minor major major minor minor Type of Isolate in the allele minor minor Gene Function allele allele allele on allele on allele on allele on variation genome frequency allele on allele on forward reverse forward reverse forward reverse strand strand strand strand strand strand Farm Meq; oncogene; role in tumor 122174 C T 10.94% 32 25 1 6 14% 85.71% Genic_UTR MDV076 B/feather 2 formation; no HSV homolog

65 Supplementary Table S2-2: Summary and annotation of all polymorphic loci detected in MDV-1 consensus genomes

Farm Meq; oncogene; role in tumor 122204 C T 15.79% 21 27 2 7 22% 77.78% Genic_UTR MDV076 B/feather 2 formation; no HSV homolog Farm 128489 G T 10.13% 38 33 2 6 25% 75.00% Intergenic N/A N/A B/feather 2 Non- Farm serine/threonine kinase; US3 144479 C A 14.00% 33 10 5 2 71% 28.57% synonymous MDV092 B/feather 2 homolog variant

66 Supplementary Table S2-3: Summary of classification to family level for all samples

Family Name Farm B/dust Farm B/feather 1 Farm B/feather 2 Farm A/dust 1 Farm A/dust 2 Herpesviridae 1,372,838 103,939 165,216 370,393 512,531 Chicken 19,119,306 212,314 121,060 11,783,342 21,215,016 <100 570,555 50,009 27,895 794,467 533,986 Gramineae 48,512 6,145 13,189 17,909 59,709 Meleagrididae 47,292 9,570 6,352 16,493 76,814 Methylobacteriaceae 35,567 2,660 5,205 356 58,844 other sequences 27,430 1,678 3,012 10,400 14,568 Propionibacteriaceae 15,405 0 148 6,197 12,459 Bovidae 1,131,177 0 0 8,055 39,873 Bradyrhizobiaceae 986,082 0 0 682 99,710 Dermabacteraceae 605,631 0 0 75,916 375,290 Staphylococcaceae 337,057 0 0 178,747 240,507 Corynebacteriaceae 229,208 0 0 52,055 129,345 unclassified 196,911 0 0 147,502 109,899 Lactobacillaceae 196,157 0 0 147,351 195,604 Babesiidae 141,039 0 0 3,930 10,966 Bacillaceae 104,791 0 0 77,052 97,930 Vira 102,266 0 0 7,700 31,685 Sphingobacteriaceae 77,220 0 0 36,775 68,077 Bacteroidaceae 72,040 0 0 32,391 64,454 Streptococcaceae 63,088 0 0 23,633 42,208 Actinoplanaceae 55,273 0 0 19,379 52,946 Lachnospiraceae 54,841 0 0 35,767 100,136 Siphoviridae 54,286 0 0 28,164 74,766 Clostridiaceae 53,824 0 0 18,827 43,915 biota 53,379 0 0 20,321 64,013 Peptostreptococcaceae 49,713 0 0 4,585 29,781 Ruminococcaceae 36,850 0 0 22,980 63,335 Enterobacteraceae 35,496 0 0 11,304 72,340 Micrococcaceae 30,584 0 0 13,502 28,318 Nocardiaceae 26,664 0 0 6,869 20,515 Myoviridae 25,744 0 0 26,447 56,563 Actinosynnemataceae 24,644 0 0 7,354 21,047 Enterococcaceae 24,224 0 0 13,055 23,442 Mycobacteriaceae 23,366 0 0 6,151 18,818 Pseudomonadaceae 21,552 0 0 2,478 11,341 Burkholderiaceae 21,347 0 0 0 5,960 Rhizobiaceae 16,069 0 0 103 4,745 Nocardiopsaceae 14,995 0 0 11,804 22,330 Sphingomonadaceae 14,945 0 0 407 15,444 Campylobacter group 13,811 0 0 6,182 5,466 Porphyromonadaceae 13,008 0 0 6,759 25,153 Comamonadaceae 11,48367 0 0 623 6,288 Supplementary Table S2-3: Summary of classification to family level for all samples

Rikenellaceae 11,272 0 0 13,944 61,763 Cellulomonadaceae 10,982 0 0 3,894 9,435 Xanthobacteraceae 10,979 0 0 389 2,367 Rhodospirillaceae 10,498 0 0 499 3,578 Microbacteriaceae 9,978 0 0 3,217 8,655 Cervidae 9,471 0 0 0 0 Lysobacteraceae 9,311 0 0 1,026 5,336 Phyllobacteriaceae 9,070 0 0 217 2,283 Promicromonosporaceae 8,756 0 0 2,975 7,802 Dermacoccaceae 8,713 0 0 2,667 7,232 Bifidobacteriaceae 8,667 0 0 5,241 52,059 Coriobacteriaceae 7,818 0 0 6,884 12,377 Geodermatophilaceae 7,635 0 0 2,153 6,594 Paenibacillaceae 6,921 0 0 4,357 7,627 Rhodobacteraceae 6,764 0 0 331 3,802 Caulobacteraceae 6,531 0 0 685 2,499 Listeriaceae 6,379 0 0 6,555 8,296 Nocardioidaceae 6,376 0 0 2,592 5,286 Gordoniaceae 6,318 0 0 1,779 4,944 Pasteurellaceae 6,071 0 0 2,431 1,853 Eubacteriaceae 5,647 0 0 3,392 8,423 Flavobacteriaceae 5,492 0 0 3,837 4,432 Erysipelothrix group 5,484 0 0 3,451 8,925 Alcaligenaceae 5,223 0 0 561 2,550 Frankiaceae 5,149 0 0 2,071 4,817 Peptococcaceae 4,819 0 0 3,381 6,319 Beutenbergiaceae 4,723 0 0 1,786 4,091 Sanguibacteraceae 4,681 0 0 1,619 3,950 Carnobacteriaceae 4,362 0 0 3,804 4,445 Kineosporiaceae 4,278 0 0 1,351 4,031 Catenulisporaceae 4,007 0 0 973 2,611 Megasphaera group 3,758 0 0 2,066 4,526 Rhodocyclaceae 3,754 0 0 484 2,155 Caryophanaceae 3,378 0 0 2,287 3,492 Oscillospiraceae 3,268 0 0 2,227 5,981 Delphinidae 3,118 0 0 0 0 Intrasporangiaceae 3,105 0 0 822 2,230 subdivision 1 2,960 0 0 134 2,274 Aspergillaceae 2,918 0 0 0 0 Thermoanaerobacterales Family III. Incertae Sedis 2,695 0 0 1,907 2,999 Acetobacteraceae 2,675 0 0 0 1,164 Leuconostoc group 2,675 0 0 2,382 3,363 Oxalobacteraceae 2,523 0 0 154 1,090 Ancylobacter group 2,446 0 0 0 977

68 Supplementary Table S2-3: Summary of classification to family level for all samples

Microsphaeraceae 2,427 0 0 773 2,196 Desulfovibrionaceae 2,370 0 0 605 3,042 Actinomycetaceae 2,328 0 0 1,052 1,947 Tsukamurellaceae 2,252 0 0 585 1,746 Fusobacteriaceae 2,232 0 0 872 1,807 Acinetobacteraceae 2,138 0 0 10,446 4,512 Borrelomycetaceae 2,006 0 0 546 1,011 Cytophaga-Flexibacter group 1,999 0 0 490 1,636 Alcanivorax/Fundibacter group 1,945 0 0 484 1,729 Sapromycetaceae 1,943 0 0 1,103 1,606 Spirochaetaceae 1,866 0 0 1,069 3,796 Muridae 1,847 0 0 1,525 3,953 Ectothiorhodospira group 1,784 0 0 121 1,441 Deinococcaceae 1,768 0 0 733 1,637 Nymphalidae 1,764 0 0 0 112 Glycomycetaceae 1,732 0 0 575 1,515 Thermoanaerobacteraceae 1,659 0 0 850 2,119 Prevotellaceae 1,573 0 0 244 2,229 Aeromonadaceae 1,534 0 0 273 1,471 Anaeromyxobacteraceae 1,496 0 0 326 1,337 Myxococcaceae 1,458 0 0 443 1,417 Microviridae 1,430 0 0 0 758 Geobacteraceae 1,371 0 0 383 1,653 Peptoniphilaceae 1,357 0 0 820 1,317 Chromobacteriaceae 1,305 0 0 398 943 Leptotrichiaceae 1,302 0 0 674 1,496 Sorangiaceae 1,280 0 0 145 943 Chromatiaceae 1,264 0 0 127 963 Rhodobiaceae 1,239 0 0 0 245 Spiroplasmataceae 1,234 0 0 824 1,207 Beijerinckiaceae 1,198 0 0 0 338 Halanaerobiaceae 1,153 0 0 605 936 Suidae 1,106 0 0 294 1,088 Thermaceae 1,089 0 0 220 1,568 Haloarchaeaceae 1,084 0 0 0 215 Sporolactobacillaceae 1,084 0 0 885 1,221 Conexibacteraceae 1,070 0 0 235 832 Brachyspiraceae 1,065 0 0 572 1,003 Acidobacteriaceae 1,001 0 0 0 887 Acidimicrobiaceae 980 0 0 236 478 Cercopithecidae 882 0 0 600 1,146 Hominidae 849 0 0 39,199 3,817 Nolanaceae 847 0 0 4,110 926 Acidaminococcaceae 818 0 0 515 1,206

69 Supplementary Table S2-3: Summary of classification to family level for all samples

Thermotogaceae 794 0 0 456 624 Chlorobiacea 774 0 0 0 954 Clostridiales Family XVIII. Incertae Sedis 771 0 0 375 1,055 Dietziaceae 755 0 0 129 338 Methanobacteriaceae 750 0 0 180 114 Methylocystaceae 739 0 0 0 202 Aerococcaceae 716 0 0 531 631 Jonesiaceae 675 0 0 377 574 Helicobacteraceae 671 0 0 0 393 Clostridiales Family XVII. Incertae Sedis 664 0 0 313 949 Cyprinidae 650 0 0 108 174 Methanomassiliicoccaceae 632 0 0 435 690 Haliangiaceae 628 0 0 146 435 Clostridiales Family XV. Incertae Sedis 626 0 0 244 812 Eremotheciaceae 622 0 0 146 312 Brucellaceae 614 0 0 105 185 Podoviridae 570 0 0 262 2,117 Giraffidae 569 0 0 0 0 Alteromonadaceae 568 0 0 113 469 Mustelidae 568 0 0 0 0 Rubrobacteraceae 548 0 0 118 556 Leporidae 506 0 0 103 129 Mimiviridae 500 0 0 0 309 Alicyclobacillaceae 498 0 0 122 564 Equidae 494 0 0 0 0 Opitutaceae 492 0 0 109 524 Euphorbiaceae 483 0 0 0 0 Hyphomonadaceae 472 0 0 0 153 Alcanivoracaceae 471 0 0 132 289 Halobacteroidaceae 465 0 0 408 648 Archangiaceae 458 0 0 149 372 Sphaerobacteraceae 458 0 0 0 330 Candidatus Brocadiaceae 457 0 0 249 236 Segniliparaceae 423 0 0 132 332 Schistosomatidae 416 0 0 296 524 Desulfobulbaceae 415 0 0 126 406 Erythrobacteraceae 400 0 0 0 382 Piscirickettsia group 399 0 0 102 248 Pelobacteraceae 396 0 0 249 588 Solibacteraceae 390 0 0 0 286 Dasyuridae 372 0 0 400 358 Heliobacteriaceae 371 0 0 198 502 Acaridae 341 0 0 111 217 Rhodothermaceae 334 0 0 0 722

70 Supplementary Table S2-3: Summary of classification to family level for all samples

Iridoviridae 333 0 0 0 0 Cyclobacteriaceae 328 0 0 184 264 Desulfarculaceae 317 0 0 149 440 Trueperaceae 309 0 0 104 200 Dictyoglomaceae 298 0 0 147 166 Phycisphaeraceae 296 0 0 135 370 Desulfobacteraceae 282 0 0 0 343 Vibrionaceae 277 0 0 0 400 Gallionella group 275 0 0 0 311 Hydrogenothermaceae 275 0 0 118 156 Ignavibacteriaceae 268 0 0 0 155 Methylococcaceae 256 0 0 0 296 Shewanellaceae 252 0 0 102 161 Nostocaceae 240 0 0 0 113 Nitrosomonadaceae 229 0 0 104 271 Hydrogenophilaceae 227 0 0 0 209 Orbaceae 223 0 0 158 189 Acidithermaceae 218 0 0 0 188 Caviidae 218 0 0 0 0 Parvularculaceae 218 0 0 0 137 Planctomycetaceae 217 0 0 111 206 Tetrahymenidae 214 0 0 0 0 Natranaerobiaceae 212 0 0 132 209 Nitrospiraceae 211 0 0 0 136 African mole-rats 207 0 0 104 158 Caldilineaceae 207 0 0 100 224 Deferribacteraceae 207 0 0 279 268 Syntrophaceae 205 0 0 115 216 Brevibacteriaceae 200 0 0 0 0 Gemmantimonadaceae 199 0 0 0 139 Desulfomicrobiaceae 194 0 0 103 219 Strongylocentrotidae 186 0 0 0 0 Thermoanaerobacterales Family IV. Incertae Sedis 186 0 0 109 222 Fabaceae 184 0 0 0 389 Chitinophagaceae 178 0 0 0 239 Thermodesulfobacteriaceae 170 0 0 154 203 Desulfurobacteriaceae 168 0 0 0 104 Sulfuricellaceae 163 0 0 0 0 Legionellaceae 160 0 0 0 0 Colwelliaceae 159 0 0 0 144 Entomoplasma group 159 0 0 0 152 Syntrophomonadaceae 158 0 0 0 123 Camelidae 154 0 0 0 209 Debaryomycetaceae 152 0 0 174 0

71 Supplementary Table S2-3: Summary of classification to family level for all samples

Cryomorphaceae 151 0 0 113 273 Bdellovibrionaceae 145 0 0 0 142 Ferrimonadaceae 145 0 0 0 105 Saprospiraceae 144 0 0 0 166 Roseiflexaceae 140 0 0 0 324 139 0 0 0 196 Tetraodontidae 139 0 0 0 0 Chloroflexaceae 133 0 0 0 0 Draconibacteriaceae 130 0 0 0 117 Cneoraceae 127 0 0 200 300 Culicidae 124 0 0 0 131 Flammeovirgaceae 117 0 0 0 0 Hypocreaceae 113 0 0 0 0 Leeaceae 112 0 0 216 218 Rivulariaceae 111 0 0 0 0 Arthrodermataceae 110 0 0 0 0 Desulfurellaceae 107 0 0 0 117 Hahellaceae 107 0 0 0 107 Acidithiobacillaceae 103 0 0 0 247 Noelaerhabdaceae 103 0 0 512 379 Pinaceae 103 0 0 0 0 Sphaeriaceae 103 0 0 0 0 Alligatoridae 0 0 0 0 253 Balaenopteridae 0 0 0 0 754 Chaetomiaceae 0 0 0 241 274 Costariaceae 0 0 0 130 0 Dipodascaceae 0 0 0 220 0 Drosophilidae 0 0 0 0 178 Fibrobacteraceae 0 0 0 0 140 Francisella group 0 0 0 0 146 Hydridae 0 0 0 0 181 Lagriidae 0 0 0 0 109 Loliginidae 0 0 0 0 171 Magnetococcaceae 0 0 0 0 102 Mamiellaceae 0 0 0 0 161 Mycosphaerellaceae 0 0 0 349 1,122 Mycosyringaceae 0 0 0 0 228 Onchocercidae 0 0 0 0 119 Pseudoalteromonadaceae 0 0 0 127 231 Retroviridae 0 0 0 296 174 Rhabditidae 0 0 0 126 0 Sarcocystidae 0 0 0 261 140 Syntrophobacteraceae 0 0 0 0 119 Thermodesulfobiaceae 0 0 0 0 136

72

Chapter 3

Inferred father-to-son transmission of herpes simplex virus results in near-perfect preservation of viral genome identity and in vivo phenotypes

Utsav Pandey1, Daniel W. Renner1, Richard Thompson2, Moriah L. Szpara1*, Nancy Sawtell3

1Department of Biochemistry and Molecular Biology, Center for Infectious Disease Dynamics, and the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA

2Department of Molecular Genetics, Biochemistry and Microbiology, University of Cincinnati, Cincinnati, Ohio, 45229, USA

3Division of Infectious Diseases, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, 45229, USA

Adapted from: Father-to-son transmission of HSV results in near-perfect preservation of viral genome identity and in vitro phenotypes. 2017. Scientific Reports DOI: 10.1038/s41598-017-13936-6

Acknowledgements: U.P. prepared viruses for sequencing and performed computational comparisons. D.R. assembled the full-length genomes. R.T. performed Southern blot analyses. N.S. performed all animal pathogenesis work. M.S. and N.S. conceptualized the work. M.S. and U.P. wrote the manuscript; all authors contributed to its completion.

73

3.1 Abstract High throughout sequencing has provided an unprecedented view of the circulating diversity of all classes of human herpesviruses. For herpes simplex virus 1 (HSV-1), we and others have previously published data demonstrating sequence diversity between hosts. However the extent of variation during transmission events, or in one host over years of chronic infection, remain unknown. Here we present an initial example of full characterization of viruses isolated from a father to son transmission event. The likely occasion of transmission occurred 17 years before the strains were isolated, enabling a first view of the degree of virus conservation after decades of recurrences, including transmission and adaptation to a new host. We have characterized the pathogenicity of these strains in a mouse ocular model of infection, and sequenced the full viral genomes. Surprisingly, we find that these two viruses have preserved their phenotype and genotype nearly perfectly during inferred transmission from father to son, and during nearly two decades of episodes of recurrent disease in each human host. Given the close genetic relationship of these two hosts, it remains to be seen whether or not this conservation of sequence will occur during non-familial transmission events.

74

3.2 Introduction Herpes simplex viruses (HSV) are widespread human pathogens. HSV-1 and HSV-2 together have ~90% incidence worldwide (260). HSV infections are a major public health concern, causing mucocutaneous and systemic diseases (261). In the USA, HSV-1 is the leading cause of sporadic necrotizing encephalitis and infectious blindness (262, 263). HSV infection remains a significant cause of morbidity and mortality in neonates, with an incidence of approximately 1 per 3,200 deliveries in the United States (264, 265). HSV-2, which has 70% genomic similarity to HSV-1, is associated with higher HIV/AIDS acquisition in developing countries (266–268). Traditionally, epidemiologically related strains of HSV-1 or HSV-2 have been identified by comparing restriction fragment length polymorphism (RFLP) patterns or using targeted Sanger sequencing of selected genes or loci (269–272). Although RFLPs were crucial to establish our overall understanding of HSV transmission, the limited number of nucleotides assayed by changes in RFLP pattern obscured variation at the level of single nucleotide variations (SNVs) and small insertions or deletions (in/dels). Advancements in high-throughput sequencing (HTS) have now made it possible to study the full extent of genetic variation in the viral population being transmitted between different hosts. A transmission event has the potential to create a bottleneck that reduces genetic variation, if only a limited number of founder viruses are transmitted to the next host (198, 273–276). Viral genetic diversity may be created de novo through replication, recombination, and/or selection in the new host, leading to a viral population that is genetically distinct from the founder population. Most transmission studies have focused on RNA viruses, which have a larger amount of standing variation in the population (277, 278). Less is known about bottlenecks or expansion of variation during transmission of large DNA viruses. The transmission bottleneck has been shown to be more restrictive for RNA viruses than for DNA viruses (198, 276). In the case of human immunodeficiency virus (HIV) and hepatitis C virus (HCV), it is estimated that as few as 1-5 virions may be transmitted between hosts to establish infection (273–275). In contrast, the founder population for human cytomegalovirus

75

(HCMV), a β-herpesvirus, was estimated to be in the tens to hundreds of virions for maternal-to-fetal transmission (198). While the conservation of genetic diversity is thought to be more likely during transmission of DNA viruses, data on viral genetic diversity during human-to-human transmission of HSV-1 has thus far been lacking. Here we present the genetic and phenotypic characterization of HSV-1 isolates obtained from a father and his son. These isolates were obtained from oral lesions of each host, nearly two decades after the inferred transmission and primary infection of the son. We investigated the pathogenicity of these isolates in a mouse ocular model of infection, and found a similar level of replication at the site of inoculation, and a similar level of neuroinvasiveness. The viral isolates were also alike in their ability to establish latency and their rate of reactivation in both in vivo and in vitro models. We subjected the viral population of each cultured viral isolate to HTS, and found that the genomes were nearly identical at the consensus level. A small number of minority variants were found in each isolate, most of which result from length variations in homopolymeric tracts or short sequence repeats (SSRs). In the light of recent studies showing extensive intra-host variation and rapid evolution during transmission of HCMV (197, 198), the extent of conservation we observed between the isolates was exceptional. This study presents the first example of comprehensive characterization of founder and transmitted HSV-1 isolates using HTS and in vivo models of pathogenesis.

3.3 Methods

3.3.1 Isolate acquisition and stock generation Human Subjects were not recruited as part of this study. The samples described here were obtained through a virology laboratory at the Cincinnati Children’s Hospital Medical Center. Informed consent was obtained from individuals contributing the viral samples. Once cultured, the viruses were coded and non-identifiable, and were not human tissues. HSV-1 isolates were collected from the father and the son on separate occasions, two years apart. For each isolate acquisition, a sterile swab was used to obtain virus from a lip vesicle during a recurrence. The swab was placed in sterile media and transported on ice to the laboratory, where a portion of the virus containing media was directly absorbed onto a rabbit skin cell (RSC) monolayer. This first

76 passage was aliquoted and used to generate a “Pass 2” stock that was utilized for all phenotypic studies. These new isolates were named according to recent recommendations, as outlined by Kuhn et al (23). We use shortened forms of these names, R-13 (father) and N-7 (son) throughout the manuscript. The full names for these isolates are HSV-1/Cincinnati, USA/1995/R-13 and HSV-1/Cincinnati, USA/1993/N-7.

3.3.2 Animal studies All procedures in mice were performed as approved by the Children’s Hospital Institutional Animal Care and Use Committee (protocol# IACUC2013-0162) and were in compliance with the Guide for the Care and Use of Laboratory Animals. Animals were housed in American Association for Laboratory Animal Care-approved quarters. Male outbred Swiss Webster mice (22-25 grams in weight) obtained from Harlan Laboratories (now Envigo) were used for these studies. Prior to infection, mice were anaesthetized by intraperitoneal injection of sodium pentobarbital (50 mg/kg of body weight). A 10 μL drop containing 2×105 PFU was placed onto each scarified corneal surface as detailed previously (279).

3.3.3 Virus replication in vivo At the indicated times post infection (pi) infected mice were euthanized and tissues, including eyes, trigeminal ganglia (TG), and brains from three mice from each inoculation group were individually assayed for virus as previously detailed (280). Isolation and quantification of total DNA from TG and quantification of total viral genomes by real time PCR using primers to the thymidine kinase region was performed as detailed previously (281).

3.3.4 Reactivation studies Latent HSV was induced to reactivate in the ganglia of mice in vivo using hyperthermic stress (HS) (22). At 22 hours post induction, TG were assayed for infectious virus as detailed previously (280, 282). For in vitro explant reactivation studies, latently infected ganglia were aseptically removed and placed into MEM supplemented with 5% newborn calf serum. These were incubated at 37˚C in a 5%

CO2 incubator. At the indicated times post explant, ganglia were homogenized and

77 assayed for infectious virus as for reactivation in vivo (283).

3.3.5 Antibodies and Immunohistochemistry Whole TG were fixed in 0.5% formaldehyde for 2 hours, rinsed in PBS and post fixed in DENT’s fixative (80% methanol, 20% dimethylsulfoxide (DMSO)). Whole ganglia immunohistochemistry utilized a primary rabbit anti-HSV (AXL237, Accurate) at a 1:1000 dilution, and a secondary HRP-labeled goat anti-rabbit (Vector), at a 1:750 dilution. These methods have been detailed extensively in previous reports (281, 284– 287).

3.3.6 Virus culture and DNA isolation for HTS Master stocks of the virus were prepared by infecting MRC-5 (ATCC®, CCL- 171) human fetal lung fibroblast cells grown in Eagle’s Minimum Essential Media (EMEM:Sigma-Aldrich) at an MOI of 0.1. Viral stocks were harvested when significant cytopathic effect (CPE) was observed. Master stocks were tittered by limiting dilution assay on Vero cells (ATCC®, CCL-171) to calculate plaque-forming units (PFU) for each stock. For isolation of viral DNA, MRC-5 cell cultures were infected at an multiplicity of infection (MOI) of 10 and viral genomic DNA (gDNA) was isolated using previously described methods for the isolation of viral nucleocapsid DNA (288).

3.3.7 Southern blot The genomic structures of the clinical isolates were analyzed by DNA (Southern) blot analysis (289, 290) and compared to four commonly-used HSV-1 strains (17syn+, McKrae, KOS(M), and F) and 10 unrelated clinical isolates. Genomic DNA was cleaved with BamHI, transferred to 0.8% agarose gels, blotted, and probed with a cosmid clone insert that was 32P-labeled using a Rediprime kit from Amersham. The cosmid clone insert matches the HSV-1 strain 17 genome from positions 24,698 to 64,405 (~40 kb). Blots were developed and analyzed on a Storm phosphorimager and quantified with GelQuantNet software.

3.3.8 Illumina high-throughput sequencing Sequencing libraries for each of the isolates were prepared using the Illumina TruSeq Nano DNA Sample Prep Kit, according to the manufacturer’s recommended

78 protocol for sequencing of genomic DNA. The target DNA fragment size selected for library construction was 550 base pairs (bp). All the samples were sequenced on an in- house Illumina MiSeq using version 3 chemistry to obtain paired-end sequence fragments of 300 × 300 bp. Base calling and image analysis was performed with the MiSeq Control Software (MCS) version 2.3.0.

3.3.9 De novo assembly of consensus genomes HSV-1 genomes were assembled using a recently described viral genome assembly (VirGA) workflow (202). Briefly, VirGA combines quality control preprocessing of reads, de novo assembly, genome linearization and annotation, and post-assembly quality assessments. HSV-1 strain 17 (GenBank Accession: JN555585) was used as comparator for the reference-guided portion of viral genome assembly in VirGA. GenBank accessions are listed in Table 3-1. Table 3-1. Sequencing statistics for N-7 and R-13 strains.

HSVP PAIRED-END RAW USED FOR GENOME AVERAGE GENBANK

STRAIN READ LENGTH READS ASSEMBLY LENG TH DEPTH (X-FOLD) ACCESSION

6 6 R-13 300 BP 3.1×10 2.4×10 151,636 7,626 KY922718*

6 6 N-7 300 BP 4.3×10 3.6×10 151,765 11,640 KY922719* *GenBank release is in queue. Please see attached GenBank flatfiles for preview (Supplementary file 7) 3.3.10 Consensus genome comparison and phylogenetic analysis Trimmed versions of the genomes lacking the terminal repeats were used for consensus genome comparison and for intra-strain polymorphism detection (below). We used the trimmed format of the genome to avoid over representation of the repeat regions during comparison. Clustalw2 (43) was used to construct pairwise global nucleotide alignments between whole genome sequences, and pairwise global amino acid alignments between open reading frames. These alignments were utilized by downstream custom Python scripts to calculate percent identity, protein differences, and genomic DNA variation between samples. Phylogenetic networks were constructed using SplitsTree4 (291) using the uncorrected_P distance, and all gaps were ignored. GenBank accession numbers and publications describing previously sequenced isolates are as follows: 17 (JN555585, NC_001806) (271, 272); F

79

(GU734771) (119, 292); H129 (GU734772) (119, 167); KOS (JQ673480, JQ780693) (293, 294); McKrae (JQ730035,JX142173) (205, 295, 296); HF10 (DQ889502) (297); KOS63 (KT425110), KOS79 (KT425109) (298); India (KJ847330); L2 (KT780616), SC16 (299); MacIntyre (KM222720) (300); CJ970 (JN420341.1), CJ311 (JN420338.1), 134 (JN4000093.1) (169); RE (KF498959); 160/1982 (LT594192), 132/1998 (LT594457), 1319/2005 (LT594108), 1394/2005 (LT594111), 66/2007 (LT594110), 20/2007 (LT594109), 369/2007 (LT594112), 2158/2007 (LT594106), 3083/2008 (LT594107), 172/2010 (LT594105) (301); CR38 (HM585508), EO3 (HM585509), E06 (HM585496), E07 (HM585497), E08 (HM585498), E10 (HM585499), E11 (HM585500), E12 (HM585501), E13 (HM585502), E14 (HM585510), E15 (HM585503), E19 (HM585511), E22 (HM585504), E23 (HM585505), E25 (HM585506), E35 (HM585507), R11 (HM585514), R62 (HM585515), S23 (HM585512), S25 (HM585513) (4).

3.3.11 Intra-strain minority-variant detection Minority variant loci or polymorphic positions within each consensus genome were detected using parameters that aimed to distinguish truly polymorphic sites from errors produced during sequencing or during de novo assembly of these genomes. VarScan v2.2.11 (206) was used to detect variants present within each consensus genome. To aid in differentiating true variants from potential sequencing errors (207), the following variant calling parameters (183) were used: minimum variant allele frequency ≥ 0.02; base call quality ≥ 20; read depth at the position ≥ 100; independent reads supporting minor allele ≥5. Polymorphisms with directional strand bias ≥ 90% were excluded. The variants obtained from VarScan were then mapped back to the genome to understand their distribution and mutational impact using SnpEff and SnpSift (208, 209). Polymorphic positions were visually assessed and hand-curated (Supplementary Table S3-2) to label those that bordered homopolymeric tracts or short sequence repeats (tandem repeats), since both can induce local mis-alignment of reads and lead to imprecise local polymorphism detection.

3.3.12 Data Availability Viral genomes sequenced during this study are included in GenBank under the following Accessions: KY922718, R-13 (full name: HSV-1/Cincinnati, USA/1995/R-13;

80

KY922719, N-7 (full name: HSV-1/Cincinnati, USA/1993/N-7). All other data generated or analyzed during this study are included in this published article (and its Supplementary Information files).

3.4 Results

3.4.1 Familial transmission and viral culture characteristics The likely timing of horizontal transmission from father to son, and the collection of the virus isolates from each, is shown schematically in Figure 3-1.

Figure 3-1. Timing of father to son transmission of HSV and viral isolate acquisition. HSV-1 isolate R-13 was acquired during an orolabial recurrence from a father with lifelong chronic infection, characterized by oral lesion recurrences 6-8 times per year. The father acquired his primary infection at age 3, more than four decades prior. At age 27, the father is inferred to have transmitted HSV-1 to his 2-year old son, who experienced gingivostomatitis with high fever. After resolution of the primary infection, the son likewise experienced recurrent orolabial lesions at a frequency of 6-8 episodes/year. Isolate N-7 was acquired during an orolabial recurrence from the son 17 years after his primary infection. The isolates were obtained on separate occasions, two years ap

81

Both the father and the son experience recurrences 6-8 times per year. The father harbored the virus for 24 years before the likely transmission to the son. The timing of transmission is inferred from the son’s primary episode of gingivostomatitis. The mother was seronegative, and the child had no contact with other caregivers or siblings before this time. Virus was isolated from a recurrence in the father 43 years after his primary infection (isolate R-13), and from a recurrence in the son 17 years after the primary infection. The isolates from father and son were collected on independent occasions, two years apart. No differences in plaque size or morphology were noted between the isolates upon culturing. Both virus isolates were found to replicate with equivalent efficiency. RFLPs were examined in a series of Southern blots on DNA cleaved with several different restriction enzymes, utilizing diverse simple probes (~1kb) or complex cosmid probes of 27 to 45 kb in length that spanned the entire viral genome (see Figure 3-2 for a representative example). At this level of resolution, no genetic differences were evident between the two isolates.

82

Figure 3-2. Southern blot comparison of genetic variation in R-13 (father’s) and N-7 (son’s) isolates, relative to other strains of HSV-1. The overall genomic structure of the father’s and son’s clinical isolates were analyzed by DNA (Southern) blot analysis and compared to four common laboratory strains 17syn+, McKrae, KOS(M), F, and ten different clinical isolates. Viral genomic DNA was cleaved with BamHI, gel-separated, and probed with a cosmid clone insert spanning 40 kbp of strain 17syn+ genome (see Methods for details). No major rearrangements or changes in fragment size were observed between the R-13 (father) and N-7 (son) isolates. 3.4.2 Acute replication kinetics & mortality of isolates R-13 and N-7 The in vivo pathogenesis properties of the isolates were examined in the mouse ocular model of infection. Groups of outbred male Swiss Webster mice were infected via the cornea with 2×105 PFU of father and son isolates R-13 and N-7, the unrelated clinical isolate CI-37, or the HSV-1 reference strain 17syn+ (see Methods for details). At two day intervals over a 10 day period, tissues were harvested from three mice in each group and the virus content quantified by standard plaque titration assays. Replication kinetic curves are shown in Figure 3-3A-C. There was no significant difference in peripheral replication on the eye between the father-son viral isolates or HSV-1 17syn+, however the viral titer for CI-37 was significantly higher (Figure 3-3A) (p≤0.05; see Figure 2 for summed value of each area under the curve (AUC)). Likewise, virus replication in trigeminal ganglia (TG) was not significantly different between R-13, N-7 and 17syn+, but was significantly higher for isolate CI-37 (Figure 3- 3B) (p≤0.01; see Figure 2 for AUC values). In the ocular model of infection, the central nervous system (CNS) is infected by retrograde transport of the virus from infected neurons in the TG to the trigeminal nucleus in the hindbrain. From there, replicating virus is transported toward the front of the brain through time. To capture some of this process, brain tissue was divided into four roughly equal parts for assay (rear, mid rear, mid front, front) as detailed previously (302).

83

84

Figure 3-3: Quantification of replication and latency phenotypes of R-13 and N-7 during infection in vivo. Swiss Webster mice (male) were infected on scarified corneas with 2 x 105 pfu of the clinical isolate R-13 or N-7. At the indicated times post- infection, tissues collected from each of three mice per group were assayed for infectious virus using a standard plaque assay (see Methods for details). Replication in eyes (A) and TG (B) revealed that infectious virus generated in these tissues during the acute stage of infection was not significantly different between R-13, N-7, or 17syn+, whereas the levels for CI-37 were significantly higher (Student’s t test, *p≤0.05, **p≤0.01, for peak titer on day 4; AUC = area under the curve) (C) The replication kinetics and regional distribution of isolates in the brain was determined by cutting each brain into 4 coronal sections. The levels of infectious virus were not significantly different between R-13, N-7, or 17syn+, whereas the levels for CI-37 were significantly higher (Student’s t test, *p≤0.05, **p≤0.01, for peak titer on day 6). This low level of infectious virus for R-13 and N-7 in the brain (≤ 100 PFU) is consistent with the 100% survival observed for both isolates under these infection conditions, whereas CI-37 induced complete mortality (Figure 3-4). (D-E) Quantification of latent viral genomes. At 40 days post infection, the TG and brains from R-13 (n=4) and N-7 (n=4) latently infected mice were assayed for viral genome copies using real time qPCR. Viral genome copy numbers per 50 ng mouse DNA detected in R-13 and N-7 were not significantly different (Student’s t test) (D). Brains were cut into four coronal sections prior to isolating DNA and the number of viral genomes copies was determined in each section. Viral genome copies in the brains were not significantly different (ANOVA; on box and whisker plot, + indicates mean, bar indicates median) (E).

No significant differences were observed between the father and son viral isolates or HSV-1 17syn+ in any of the brain regions, whereas CI-37 had a significantly higher titer in all four brain regions (Figure 3-3C) (p≤0.01 for rear and mid rear regions, p≤0.05 for mid front and front regions; see Figure 2 for AUC values). The amount of virus detected in the brain for R-13 and N-7 was low, reaching just 100 PFU in the rear portion of the brain and less in the front portions. In contrast, the viral isolate CI-37 reached much higher titers of 104 to 105 PFU in each brain region, while 17syn+ had an intermediate phenotype. These data were consistent with the 100% survival observed with both R-13 (n=35) and N-7 (n=34), the complete mortality of CI-37 (n=5), and the 81% survival of 17syn+ (n=16) (Figure 3-4).

85

Figure 3-4. No mortality was observed in Swiss Webster mice infected with either the father’s (R-13) or the son’s (N-7) isolate of HSV-1. Swiss Webster Mice were infected via the ocular route with 2×105 PFU of R-13 (father’s), N-7 (son’s), 17syn+, and CI-37 HSV-1 (see Methods for details). Neither N-7 nor R-13 caused any death of mice through 40 days post infection, whereas 17syn+ and CI-37 caused 19% and 100% mortality respectively (R-13, 35/35 mice survived; N-7, 34/34 mice survived; CI- 37 5/5 mice died; 17syn+ 13/16 mice died.). Mortality rate for isolate CI-37 was significantly different as compared to 17syn+ (ANOVA, ** p<0.003) and N-7 or R-13 (ANOVA, *** p<0.0001). The mortality rate for 17syn+ was also significantly different as compared to N-7 or R-13 (ANOVA, * p<0.03).

No signs of encephalitis in either R-13 or N-7 infected groups were observed, although moderate to severe blepharitis did occur.Since survival rates were high for the father-son isolates R-13 and N-7, additional mice were maintained for 40 days post- infection (p.i.) and TG and brain tissues were analyzed for the presence of the latent viral genomes by quantitative PCR (see Methods for details). There was no significant difference in the number of viral genomes detected in mouse TG that were latently infected with either R-13 or N-7 (Figure 3-3D). Likewise, no significant difference in the amount of latent viral DNA present in the four regions of the brain was detected (Figure 3-3E). These results were consistent with the prior observation that the isolates replicated equivalently in these tissues. The remaining latently-infected mice were employed to test the ability of the isolates to reactivate from latency. Reactivation was tested after explantation of TG into culture in vitro (303) or after hyperthermic stress (HS) in vivo (280, 282, 286). No

86 infectious virus was detected in any of six TG tested at 0 hours post-explant in either group, demonstrating that persistent infection or frequent spontaneous reactivation events were absent for both R-13 and N-7 (284) (Figure 3-5A and 3-5D). After three days in culture all TG samples were positive for virus (Figure 3-5A). We conclude from this that latent virus capable of reactivating was present in all TG tested.

87

Figure 3-5: Explant and In vivo reactivation in Swiss Webster mice latently infected with R-13 and N-7. R-13 and N-7 were compared for reactivation from latency (>40 days pi) using in vitro and in vivo reactivation assays. (A) In a standard TG explant assay, no difference between R-13 and N-7 reactivation frequency was observed (p = 0.99, Student’s t test) although the difference between virus recovered at time 0 and 3 days post explant was significant in both groups (p=0.0003, ANOVA). (B) The in vivo reactivation frequency (percentage of mice with infectious virus detected in TG) was also not different between R-13 and N-7 at 22 hrs. post hyperthermic stress (Student’s t test, expt. 1 p = 0.89, expt. 2 p = 0.83) and (C) the number of neurons exiting latency was also not different (expt. 1 p = 0.39, expt. 2 p = 0.86). Latently infected TG were subjected to whole ganglion immunohistochemistry to detect viral protein at 0 hrs. (D), and 3 days (E) post explant. Viral protein- expressing neurons (black arrows) and tracts (white arrowheads) mark the range of viral spread. Viral protein-expressing neurons (black arrows) are detectable in TG from N-7 (F) and R-13 (G), which reactivated from latently infected TG, at 22 hrs. after hyperthermic stress in vivo.

Whole ganglion immunohistochemical (IHC) (279, 304) detection of viral proteins revealed regions of virus spread within the TG at this time (Figure 3-5E). Two separate experiments showed similar efficiency of viral reactivation in vivo in TG 22 hrs. post heat-shock (HS), with 50% of the TG examined in both groups containing detectable infectious virus (Figure 3-5B). A similar number of TG neurons were found to be expressing viral proteins at 22 hrs. post HS, as detected by whole ganglion IHC (Figure 3-5C and 3-5F,G). Combined with the finding above that similar numbers of latent viral genomes were present in these TG, we concluded that the relative efficiency of reactivation from latency in vivo was also similar between the father-son isolates R-13 and N-7.

3.4.3 Nearly identical genomes of father and son HSV-1 isolates To further test the observation of overall genomic and phenotypic similarity of the HSV-1 isolates from father (R-13) and son (N-7), we used HTS and de novo assembly approaches to construct a full-length genome sequence for each viral isolate. Viral nucleocapsid DNA was used as the starting point for Illumina paired-end sequencing, with de novo assembly and genome annotation carried out using a recently described open-source viral genome assembly (VirGA) workflow (202) (Table 3-1) (see Methods for details). At the level of consensus genomes, HSV-1 isolates R- 13 and N-7 are 98.9% identical in their DNA sequence (Table 3-2). This level of

88 identity between samples has previously only been seen in the case of sub-clones picked or isolated from a common parental virus stock (202, 4, 298). Table 3-2. Pair-wise DNA identity and variant proteins between consensus genomes

Intergenic Genic DNA in/dels Non- Comparisons in/dels SNV Synonymous identity (# synonymo (# events) s SNVs events) us SNVs N-7 vs. R-13 98.9% 16 2 2* 0 0

*54bp insertion in VP1/2 (UL36) and a 6 bp insertion in ICP4 (RS1) in N-7, relative to R-13

For the 1.1% of the genome that did differ between HSV-1 isolates R-13 and N- 7, we categorized these differences as genic (coding) vs. intergenic, and grouped insertions or deletions (in/dels) vs. single-nucleotide variations (SNVs) (Table 3-2). We found no SNVs in any coding sequence, and only two in intergenic regions. For the in/dels, we calculated the minimum number of insertion or deletions events that could have led to the observed differences. For instance, a three base pair insertion was counted as one in/del event. There were a total of 18 in/del events, of which only two occurred in coding regions. These included an 54 base pair (18 amino acid (AA)) in/del in the gene encoding the ubiquitin-specific protease, VP1/2 (UL36), and a 6 base pair (2 AA) in/del in the transcriptional regulator protein, ICP4 (RS1) (Table 3-2). The insertion in VP1/2 is present in the C-terminal region of the protein, which contains an extended array of ‘PQ’ tandem repeats. Sequence length fluctuation in this region has been documented in HSV and other alphaherpesviruses such as varicella zoster virus (VZV), Marek’s disease virus (MDV), and pseudorabies virus (PRV) (144, 227, 228, 119, 205). Similarly, the insertion observed in ICP4 is also present in a short sequence repeat in a functionally uncharacterized domain of the protein (305). In both cases, the N-7 viral genome from the son contains the longer sequence (i.e. has an insertion) relative to the R-13 (father’s) genome.

3.4.4 Comparison of father and son isolates to other HSV-1 strains To place the HSV-1 isolates from father (R-13) and son (N-7) in the context of previously sequenced HSV-1 isolates, we compared full-length genomes of R-13 and

89

N-7 to all available HSV-1 genomes in GenBank (46 total; see Methods for full list). We investigated the relatedness of the genomes by constructing a network graph using SplitsTree (Figure 3-6).

90

Figure 3-6: A phylogenetic network showing genetic relatedness between isolates R-13 (father), N-7 (son) and previously sequenced HSV-1 isolates. A phylogenetic network between isolates R-13 (father), N-7 (son) and all available complete HSV-1 genomes was constructed using SplitsTree4. The father and son isolates form a separate branch compared to all previously sequenced HSV-1 genomes, with their branch localized between the Asian/European and African clusters. See Methods for a complete list of strain names and GenBank accessions for the HSV-1 genomes included in this analysis.

We observed that the father-son isolates R-13 and N-7 form a separate branch compared to all previously sequenced genomes, but are positioned in the tree graph between other Asian, European and African isolates. The phylogenetic separation of isolates based on geography has been previously described (4). To put the protein-coding features of HSV-1 isolates R-13 and N-7 in the context of prior work, we compared all protein (AA) sequences of these isolates to those encoded by 46 fully-sequenced HSV-1 strains. In this comparison, we found a total of 34 viral proteins containing AA variations in R-13 and N-7 that had never been observed before (see Supplementary Table S3-1 for a complete list). This large number of unique residues may reflect the position of these isolates in the phylogenetic network, where they lie on a branch separate from other known strains. We also found 54 proteins with AA variations that differ from the HSV-1 reference genome (strain 17syn+, Genbank ID JN555585), but that have been observed previously in one or more of the other sequenced strains.

3.4.5 Intra-strain variation: Detection of minority variants within each genome Comparison of the consensus genomes of the HSV-1 isolates from father (R-13) and son (N-7) provided a global view of the overall identity of these viral isolates. However as observed for HCMV and VZV (183, 197, 198), intra-strain variation can exist within a viral population, or within a local niche or lesion of a given host. To assess the extent of genetic diversity within each viral isolate, we investigated whether the consensus genome of either R-13 or N-7 contained any minority variants, or nucleotide positions where HTS data indicated more than one possible allele. This allowed us to detect specific nucleotide bases (loci) for which another allele was present; these minor differences would otherwise be missed in a comparison of the majority-genotype (consensus level genome) as seen in Table 3-2. We observed 59

91 minority variants in the R-13 genome and 48 in the N-7 genome (Supplementary Table S3-2). Most of the minor alleles observed in R-7 and N-13 genomes occurred at a low frequency or/and were intergenic. These minority variants were neutral and unlikely to have a major effect on viral fitness. These results are consistent with recent findings in HCMV, where it was shown that the vast majority of these minority variants observed were neutral (306). Although rare, the minority variants that occurred in coding regions were non-synonymous, but with very low frequency alleles. The N-7 isolate harbored a 3.39% frequency of a non-synonymous variant in UL14 (Figure 3- 7), and another at 2.4% frequency in US6. Polymorphic site detected in son’s (N-7) virus

19,380 bp 19,370 bp 19,360 bp Genome position Consensus sequence

Minor allele (A) Actual read depth = 137 Minor allele frequency = 3%

Major allele (G) Actual read depth = 4,038 Major allele frequency = 97%

Read directionality

92

Figure 3-7: Intra-strain variation observed at a polymorphic locus in N-7 (son’s) viral genome, in the gene UL14. A low frequency non-synonymous variation was observed in the N-7 (son’s) viral genome at position 19,370, in the tegument protein UL14. This site has an A present in 3% of the viral sequence reads instead of the majority G allele (97%). The minority allele for UL14 encodes a valine to methionine change at residue 109; Met109 exists as the dominant allele in several independent isolates of HSV-1 (see text for details). While the UL14 coding sequence is encoded on the reverse strand of the reference genome for HSV-1, it is depicted here in forward orientation to enable codon reading from left to right. Actual read depth of each variant is indicated above. A subset of the alignment of Illumina sequencing reads to the N-7 consensus genome is shown here, with the position and consensus sequence shown in the top row. Sequence read orientation is depicted as aqua and green, with directional arrows. Areas with no letter shown have 100% agreement with the consensus nucleotide; the letters are left out for clarity.

The minority allele for UL14 encodes a valine to methionine change at residue 109 (V109M), which exists as the dominant allele in several independent isolates of HSV-1 (strains F (USA), CR38 (China), and R62 (South Korea)) (4). Most of the observed minority variants in each genome were present in intergenic regions, and adjoined tandem repeats or homopolymers (Figure 3-8). Length variations at tandem repeats and homopolymers are common between strains of HSV-1 (4, 307, 308), and have been documented in select coding regions as well (e.g. UL30 polymerase and TK) (309).

93

94

Figure 3-8: Intra-strain variation observed at a polymorphic locus adjoining a homopolymeric tract, in an intergenic region of the HSV-1 isolate N-7. A potential polymorphic locus was detected at position 25,647 of both the son’s N-7 (7.8% minority allele), and the father’s R-13 (4% minority allele) viral genomes. However inspection of the alignment of Illumina sequencing reads to the consensus genome (N-7 shown here) revealed that the polymorphic site detection resulted from a combination of small insertions or deletions in a homopolymeric tract of Gs in the consensus genome. A subset of the alignment of Illumina sequencing reads to the N-7 consensus genome is shown here, with the position and consensus sequence shown in the top row. Actual read depth at the position is indicated above. Insertions relative to the consensus are shown as a blue “I” (labeled as GG or GGG), and deletions relative to the consensus are shown as a black horizontal line (range of 1-6 bp shorter) in the aligned sequence read. The length of the G-homopolymer tract is shown on the left, for those sequence reads that completely span the homopolymer tract. Homopolymer tract length cannot be inferred for reads that terminate within the G-tract; thus no length is listed for those reads. This position and other polymorphic loci that were detected adjacent to tandem repeats and homopolymeric tracts were flagged as such in Supplementary Table 2. Forward-oriented sequence reads are colored aqua, while reverse-oriented reads are colored green. Areas with no letter shown have 100% agreement with the consensus nucleotide; the letters are left out for clarity. 3.5 Discussion Here we provide the first-ever insights on viral inter- and intra-strain variation during familial transmission of HSV-1. These viruses appear indistinguishable in their phenotypes in culture and in an animal model of pathogenesis. Their clinical course in the infected father and son were similar, with both experiencing relatively severe gingivostomatitis at the time of initial infection (as noted by each mother), followed by four to six recurrences per year thereafter. The randomly selected HSV-1 isolates from each individual were nearly identical to each other at the genomic level, despite being separated by over four decades, a horizontal transmission event, and multiple rounds of latency and reactivation. This exceptional level of genomic identity contrasts with that observed for in utero transmission of HCMV, which was found to include both a bottleneck and a subsequent expansion of viral diversity (197, 198, 276, 310). However, HCMV and HSV-1 differ in their pathogenesis. HCMV is known to infect wider range of cell types, host organs, and undergo widespread viral dissemination in vivo as compared to HSV-1 (311). Furthermore, HCMV latency occurs in hematopoietic cells that undergo cell division, whereas the latent reservoir of HSV-1 in neurons does not undergo cell division. These distinctions may contribute to the observed differences

95 in their level of intra-host variation. The observation of nearly-perfect preservation of these viral genomes during familial transmission leaves open the question of where the observed diversity between clinical isolates of HSV-1 arises. One possibility (i) is that inter-strain diversity is the result of a slow mutation rate and variations that have accumulated over million years, with millions of clades of virus circulating around the planet. In this case the minority variants detected in the N-7 (son’s) viral genome may represent the fodder for evolutionary selection. Similarities in HSV-1 isolates from the same geographic regions suggests that selection for fitness adaptations to the local human populations may have occurred. However these geographic patterns could also be the result of founder effects and sequestration of human movement in early millennia. Another possible explanation (ii) for the observed inter-strain diversity of HSV-1 isolates is that horizontal transmission between unrelated hosts may induce greater selection for mutations than familial transmission. Sexual partners are unlikely to have matched immune selection for a newly transmitted HSV-1 isolate, and MHC alleles have been found to influence symptoms and severity of HSV disease (312, 313). This hypothesis suggests that if a population of viruses are transmitted to a new unrelated host, then immune pressure over time in the new host could select for a viral population that differs from the one found in the original source. Yet another possibility (iii) is that hosts with multiple HSV-1 infections or exposures may provide an opportunity for recombination to occur, with a concomitant rise in resulting variation of viruses shed by these individuals. Although rare, variations in the HSV-1 population(s) within an infected individual have been documented previously (298, 314, 315). Analyses of HSV-1 genomes by RFLP or full genome comparison suggests that recombination between viral genomes has been rampant over historical time (4, 127, 316–318, 229). Finally, it is possible (iv) that different modes of transmission may affect the amount of variation transferred from a source individual, thus impacting the subsequent selection and fixation of these variants. For instance, it has been observed that shedding from lesion sites is associated with a high copy number of detectable viral genomes, whereas asymptomatic shedding has far fewer detectable genomes (319, 320). This suggests that asymptomatic shedding,

96 which is a recognized source of transmission between individuals (321, 322), may represent a greater genetic bottleneck than transmission via contact with lesions. In summary, using an animal model and HTS, we have shown that HSV-1 preserves its genetic and phenotypic characteristics during familial transmission. These approaches serve as a framework for future transmission studies using clinical isolates of HSV-1. Analysis of additional transmission events — particularly between unrelated individuals — will be necessary to assess how and where HSV-1 diversity arises.

97

3.6 Acknowledgements We thank Yoly Tafuri, Jacob Shreve, Lynn Enquist, and Lance Parsons for contributions to the early phases of this work, as well as members of the Thompson, Sawtell and Szpara labs for helpful feedback and discussion. Genomics work was supported by the Virus Pathogens Resource (ViPR), a Bioinformatics Resource Center (BRC) funded by NIAID, and by the Huck Institutes for the Life Sciences at the Pennsylvania State University (MLS). This project was funded, in part, under a grant with the Pennsylvania Department of Health using Tobacco CURE funds (MLS). The Department specifically disclaims responsibility for any analyses, interpretations, or conclusions. Additional support was provided by 1R01AI093614 grants to NS and RT. The findings and conclusions of this study do not necessarily reflect the view of the funding agency.

98

3.7 Supplementary tables Table S3-1: Viral proteins with unique variations present only in isolates R-13 (father) and N-7 (son).

Protein Mutation Function Deletion of residues RL1 (ICP34.5) Segments innate and adaptive immunity 92E,93P, P95Q RL2 (ICP0) G76E Viral gene expression, E3 ubiquitin ligase RS1 (ICP4) A1208S Transactivator/repressor of viral transcription UL2 E216K Uracil DNA glycosylase UL6 A395V DNA entry into capsid UL8 A183V Encodes putative primase subunit of helicase primase UL11 D37N Membrane fusion during virus entry UL12 D303Y, P30H, P10S DNA exonuclease UL16 P22S Interacts with UL11 and UL21 UL22 (gH) Q528K Virion infectivity, cell-cell spread UL24 A210V Unknown UL25 T339P DNA encapsidation UL26 G64V, A203V Protease involved in capsid assembly UL28 L184F, R720S DNA encapsidation UL29 (ICP8) A319T ssDNA binding protein necessary for DNA replication UL30 R1229I Catalytic subunit of DNA polymerase UL34 W267C Capsid exit from nucleus UL36 H393Y, D987E Ubiquitin specific protease Interacts with UL29 and UL36 to promote viral UL37 L1119F assembly UL38 L349I Minor capsid protein UL39 L385M, E886A Ribonucleotide reductase UL42 R404G, P444S DNA polymerase processivity factor UL43 G22V, E/A253G Unknown UL45 A27T Unknown UL46 A8T, A266V Regulates UL48 and UL47 UL47 A275G Enhances activity of UL48 to induce α genes UL52 D925G Encodes primase subunit of helicase primase UL54 (ICP27) D263N, M339I Inhibits host protein synthesis UL55 A87V Unknown US1 (ICP22) P108H Regulatory protein US3 A153T, V409A Serine-Threonine kinase US5 (gJ) W7G, S41A Segments apoptosis of host cells US8A S27N Virulence factor US10 A211T Unknown

99 Table S3-2: Intra-strain variation -- polymorphic loci detected within the N-7 and R-13 consensus genomes.

Isolate N-7 (son) Percent Percent Reads Reads Reads Reads Position in Positio reads reads Majo Mino Minor supporti supporti supporti supporti relation to n in supporti supporti Isolat r r allele ng major ng major ng minor ng minor homopolyme the ng minor ng minor Type of variation Gene e allel allel frequen allele on allele on allele on allele on rs and genom allele on allele on e e cy forward reverse forward reverse tandem e forward reverse strand strand strand strand repeats strand strand No Non-synonymous gene_UL homopolymer N-7 19370 C T 3.39% 2545 1493 91 46 66% 33.58% mutation 14 or tandem repeats N-7 25647 G A 7.80% 161 288 9 26 26% 74.29% Intergenic N/A Homopolymer N-7 34541 C A 5.17% 661 1081 79 11 88% 12.22% Intergenic N/A Homopolymer Synonymous gene_UL N-7 62253 C T 2.64% 1478 262 41 5 89% 10.87% mutation 36

Non-synonymous gene_UL N-7 62254 T G 2.44% 1350 250 32 7 82% 17.95% mutation 36

Non-synonymous gene_UL N-7 62260 T G 3.83% 1156 203 43 9 83% 17.31% mutation 36

Non-synonymous gene_UL N-7 62261 G C 2.05% 1256 205 22 8 73% 26.67% mutation 36

Non-synonymous gene_UL N-7 62266 T G 2.38% 969 164 22 5 81% 18.52% mutation 36 Tandem Synonymous gene_UL N-7 62283 C T 2.12% 696 57 13 3 81% 18.75% repeat mutation 36 Synonymous gene_UL N-7 62289 C T 4.85% 620 40 21 11 66% 34.38% mutation 36

Non-synonymous gene_UL N-7 62290 T G 3.21% 594 60 14 7 67% 33.33% mutation 36

Non-synonymous gene_UL N-7 62293 G T 2.10% 691 117 2 15 12% 88.24% mutation 36

Synonymous gene_UL N-7 62295 C T 7.58% 518 89 21 25 46% 54.35% mutation 36

Non-synonymous gene_UL N-7 62296 T G 2.74% 546 147 9 10 47% 52.63% mutation 36

100 Table S3-2: Intra-strain variation -- polymorphic loci detected within the N-7 and R-13 consensus genomes.

Non-synonymous gene_UL N-7 62299 G T 2.71% 645 276 3 22 12% 88.00% mutation 36

Non-synonymous gene_UL N-7 62302 T G 2.77% 553 387 15 11 58% 42.31% mutation 36

Synonymous gene_UL N-7 62307 C T 4.60% 594 493 15 35 30% 70.00% mutation 36 Synonymous gene_UL N-7 62313 C T 3.06% 613 729 11 30 27% 73.17% mutation 36 No homopolymer N-7 103934 G A 6.36% 2722 1587 178 96 65% 35.04% Intergenic N/A or tandem repeats N-7 107692 T C 13.80% 354 30 13 40 25% 75.47% Intergenic N/A Homopolymer N-7 107698 G C 2.74% 440 34 4 9 31% 69.23% Intergenic N/A N-7 109235 C A 15.03% 280 1636 66 222 23% 77.08% Intergenic N/A Homopolymer N-7 109240 C A 4.06% 284 1884 14 74 16% 84.09% Intergenic N/A N-7 111002 A G 18.18% 1900 14 155 193 45% 55.46% Intergenic N/A Homopolymer N-7 111003 A G 16.63% 1759 9 30 264 10% 89.80% Intergenic N/A N-7 111537 C A 3.33% 3581 1345 16 148 10% 90.24% Intergenic N/A N-7 111542 A C 16.93% 3466 1028 85 676 11% 88.83% Intergenic N/A N-7 111543 T A 17.13% 3474 1063 100 677 13% 87.13% Intergenic N/A N-7 111544 G C 29.85% 3381 977 142 1159 11% 89.09% Intergenic N/A N-7 111569 G A 8.42% 1106 2470 229 72 76% 23.92% Intergenic N/A No N-7 111570 A C 5.82% 1255 2764 179 55 76% 23.50% Intergenic N/A homopolymer N-7 111571 G C 14.65% 1078 2492 160 363 31% 69.41% Intergenic N/A or tandem repeats N-7 111572 A C 9.72% 1000 2468 100 237 30% 70.33% Intergenic N/A N-7 111573 T C 16.97% 909 2515 215 366 37% 62.99% Intergenic N/A N-7 111577 T C 5.20% 916 2702 73 115 39% 61.17% Intergenic N/A N-7 111578 C G 2.84% 936 2762 31 74 30% 70.48% Intergenic N/A N-7 111581 A C 2.72% 847 2643 21 74 22% 77.89% Intergenic N/A N-7 117345 G A 2.77% 2809 117 14 67 17% 82.72% Intergenic N/A Tandem N-7 117346 C G 3.07% 2875 158 11 82 12% 88.17% Intergenic N/A repeat N-7 117362 G C 6.35% 1004 572 19 81 19% 81.00% Intergenic N/A

101 Table S3-2: Intra-strain variation -- polymorphic loci detected within the N-7 and R-13 consensus genomes.

N-7 117377 A G 5.05% 491 995 37 38 49% 50.67% Intergenic N/A N-7 117382 T G 2.30% 484 1386 38 5 88% 11.63% Intergenic N/A N-7 117387 A C 3.57% 476 1428 51 17 75% 25.00% Intergenic N/A N-7 117393 A G 3.89% 456 1549 13 65 17% 83.33% Intergenic N/A N-7 117423 G C 4.10% 425 1843 16 77 17% 82.80% Intergenic N/A

N-7 117547 T C 3.37% 4 1184 36 4 90% 10.00% Intergenic N/A Tandem N-7 117555 T C 2.97% 2 1245 11 26 30% 70.27% Intergenic N/A repeat No Non-synonymous homopolymer N-7 128949 T A 2.40% 2603 1813 64 42 60% 39.62% US6 mutation or tandem repeats

Isolate R-13 (father) Percent Percent Reads Reads Reads Reads Position in Positio reads reads Majo Mino Minor supporti supporti supporti supporti relation to n in supporti supporti Isolat r r allele ng major ng major ng minor ng minor homopolyme the ng minor ng minor Type of variation Gene e allel allel frequen allele on allele on allele on allele on rs and genom allele on allele on e e cy forward reverse forward reverse tandem e forward reverse strand strand strand strand repeats strand strand R-13 25647 G A 4.09% 91 399 2 19 10% 90.48% Intergenic N/A Homopolymer No R-13 53167 C A 21.47% 47 454 24 113 18% 82.48% Intergenic N/A homopolymer or tandem R-13 53168 G C 23.38% 44 437 25 123 17% 83.11% Intergenic N/A repeats R-13 53177 C G 9.12% 48 469 5 47 10% 90.38% Intergenic N/A

R-13 53178 A C 13.43% 33 413 11 65 14% 85.53% Intergenic N/A R-13 53179 G T 9.53% 45 456 6 47 11% 88.68% Intergenic N/A R-13 53182 G T 7.41% 47 440 5 34 13% 87.18% Intergenic N/A No R-13 53183 G C 7.09% 42 442 8 29 22% 78.38% Intergenic N/A homopolymer R-13 53184 T C 11.67% 34 418 14 46 23% 76.67% Intergenic N/A or tandem repeats R-13 53185 G C 6.31% 44 430 5 27 16% 84.38% Intergenic N/A R-13 53187 G C 5.31% 36 405 6 19 24% 76.00% Intergenic N/A R-13 53188 C G 4.08% 35 396 5 14 26% 73.68% Intergenic N/A R-13 53189 G C 6.32% 33 397 8 21 28% 72.41% Intergenic N/A

102 Table S3-2: Intra-strain variation -- polymorphic loci detected within the N-7 and R-13 consensus genomes.

R-13 53190 T C 5.76% 28 379 7 19 27% 73.08% Intergenic N/A Non-synonymous gene_UL R-13 62375 C G 2.68% 958 192 25 7 78% 21.88% mutation 36 Non-synonymous gene_UL R-13 62377 T G 3.72% 874 185 32 9 78% 21.95% mutation 36 Non-synonymous gene_UL R-13 62378 T G 4.03% 879 181 34 11 76% 24.44% mutation 36 Synonymous gene_UL R-13 62379 T C 5.31% 818 154 51 9 85% 15.00% mutation 36 Synonymous gene_UL R-13 62391 C T 2.42% 897 230 3 25 11% 89.29% mutation 36 Synonymous gene_UL Tandem R-13 62397 C T 2.67% 882 314 4 29 12% 87.88% mutation 36 repeat Non-synonymous gene_UL R-13 62398 T G 2.48% 789 458 24 8 75% 25.00% mutation 36 Non-synonymous gene_UL R-13 62404 T G 2.28% 678 547 21 8 72% 27.59% mutation 36 Non-synonymous gene_UL R-13 62416 G T 20.98% 293 610 206 34 86% 14.17% mutation 36 Non-synonymous gene_UL R-13 62417 C G 25.86% 240 605 265 36 88% 11.96% mutation 36 Non-synonymous gene_UL R-13 62422 A T 16.04% 262 635 152 22 87% 12.64% mutation 36 R-13 107731 G C 2.46% 24 253 4 3 57% 42.86% Intergenic N/A Junction of R-13 107732 G C 3.56% 22 243 3 7 30% 70.00% Intergenic N/A UL and IRL

R-13 107766 T C 33.73% 127 37 8 76 10% 90.48% Intergenic N/A Tandem R-13 107771 T C 8.14% 155 72 3 18 14% 85.71% Intergenic N/A repeat R-13 110334 A G 12.89% 619 2 25 67 27% 72.83% Intergenic N/A Homopolymer R-13 110902 C A 9.80% 183 330 40 19 68% 32.20% Intergenic N/A R-13 110903 T A 4.63% 125 303 15 7 68% 31.82% Intergenic N/A R-13 110906 C A 5.11% 211 328 12 17 41% 58.62% Intergenic N/A R-13 110908 C T 2.30% 109 349 5 6 45% 54.55% Intergenic N/A

R-13 110909 G A 5.15% 83 331 6 17 26% 73.91% Intergenic N/A Tandem R-13 110912 C G 4% 63 339 3 14 18% 82.35% Intergenic N/A repeat R-13 110913 C A 4.25% 68 335 5 13 28% 72.22% Intergenic N/A R-13 110915 C A 2.77% 64 317 2 9 18% 81.82% Intergenic N/A R-13 110916 C G 2.20% 61 324 4 5 44% 55.56% Intergenic N/A R-13 110918 C A 3.05% 48 323 2 10 17% 83.33% Intergenic N/A

103 Table S3-2: Intra-strain variation -- polymorphic loci detected within the N-7 and R-13 consensus genomes.

R-13 110919 T A 2.39% 45 280 1 7 13% 87.50% Intergenic N/A R-13 111000 C A 3.20% 252 558 5 22 19% 81.48% Intergenic N/A

R-13 111002 C A 2.64% 277 567 9 14 39% 60.87% Intergenic N/A Tandem R-13 111003 A T 3.14% 268 542 5 22 19% 81.48% Intergenic N/A repeat R-13 111004 C G 3.08% 266 702 11 21 34% 65.63% Intergenic N/A No homopolymer R-13 116797 A C 7.34% 71 29 6 2 75% 25.00% Intergenic N/A or tandem repeats R-13 117109 G C 2.25% 699 209 17 4 81% 19.05% Intergenic N/A R-13 117115 G C 2.77% 525 231 10 12 45% 54.55% Intergenic N/A Tandem R-13 117119 T C 2.47% 416 328 16 3 84% 15.79% Intergenic N/A repeat R-13 117124 A C 5.29% 360 337 35 4 90% 10.26% Intergenic N/A R-13 117131 G T 4.37% 238 348 25 3 89% 10.71% Intergenic N/A R-13 117146 A C 2.63% 171 416 14 2 88% 12.50% Intergenic N/A

R-13 117178 A C 2.85% 146 491 14 5 74% 26.32% Intergenic N/A Tandem R-13 117188 A C 2.74% 146 630 17 5 77% 22.73% Intergenic N/A repeat R-13 117194 A G 2.59% 166 650 4 18 18% 81.82% Intergenic N/A R-13 117355 T C 5.56% 19 473 20 9 69% 31.03% Intergenic N/A Tandem R-13 117364 T C 3.51% 1 466 13 4 76% 23.53% Intergenic N/A repeat R-13 117372 T C 3.02% 0 481 2 13 13% 86.67% Intergenic N/A Tandem R-13 122736 C G 2.33% 363 14 4 5 44% 55.56% Intergenic N/A repeat

104

Chapter 4

Effect of recombination on evolution of virulence in field isolates of MDV-1

Utsav Pandey1, Andrew S. Bell2, Daniel Renner1, Matthew J. Jones2, Andrew F. Read2, Moriah L. Szpara1, Maciej F. Boni2

1Department of Biochemistry and Molecular Biology, Center for Infectious Disease Dynamics, and the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, 16802, USA

2 Center for Infectious Disease Dynamics, Departments of Biology and Entomology, Pennsylvania State University, University Park, Pennsylvania 16802, USA

Manuscript in preparation

Acknowledgements: Viral DNA was obtained from collaborators at USDA. U.P. and A.B. prepared viral DNA libraries for sequencing. D.R. and U.P. assembled the full- length genomes. U.P. and M.B. did the analysis. U.P., M.S., A.R. conceptualized the work. U.P. wrote the manuscript.

105

4.1 Abstract Understanding the mechanism of virulence evolution in pathogens is key to predicting future evolution and preventing outbreaks. Evolution of Marek’s disease virus (MDV-1) into virulent forms in the latter half of the twentieth century represents an important case study in the evolution of virulence and vaccine failures. The severity of Marek’s disease (MD) has risen in the last 50 years. Changes in farming practices, as well as widespread vaccination with a non-sterilizing vaccine, have both been implicated in the rise of virulence. Despite field isolates of MDV-1 showing increased virulence, the genetic basis for the rise of virulence remains ambiguous. Using 64 field isolates of MDV-1 we explored the role of homologous recombination between circulating strains of MDV-1 in emergence of higher virulence pathotype. These isolates were collected between 1962 to 2014 across United States. Each isolate was categorized using a standardized in vivo pathotype grading scheme: virulent (v), very virulent (vv), to very virulent plus (vv+). We obtained viral DNA for these isolates from our collaborators at United States Departmrnt of Agriculture (USDA) and sequenced them using the Illumina MiSeq and HiSeq. The sequence reads from the sequencer were then subjected to de novo assembly to obtain full-length genomes. Full-length genomes for all 64 isolates were then analyzed for presence of recombination breakpoints using the program 3Seq. We identified a total of 6 breakpoints (7 genome segments) across the MDV-1 genome that had statistical and phylogenetic support. These genome segmentss were then analyzed to identify potential recombinant viruses. We identified numerous recombination events that affected evolutionary history of most of the isolates analyzed. We observed several recombination events that may have led to the emergence of higher virulence in MDV-1. Our observations indicate that the evolution of virulence in field isolates is complex and may have occurred through multiple pathways.

106

4.2 Introduction The disease manifestation and pathogenesis of Marek’s disease (MD) has undergone dramatic changes since Hungarian veterinarian Jozsef Marek first described it in 1907 (153, 182). Over the last century, MD has evolved from a chronic, sporadic disease into an acute, aggressive disease of global concern (153, 182, 323). MD is caused by Marek’s disease virus serotype 1 (MDV-1, also referred to as Gallid herpesvirus 2 [GaHV-2]). MDV-1 belongs to the Herpesviridae family, Alphaherpesvirinae subfamily, and genus Mardivirus (109). MDV-1 has a double- stranded DNA genome of ~180 kilobases that encodes ~103 genes (161). Similar to other alphaherpesviruses, MDV-1 genome has a class E genome organization that consists of unique long (UL) and unique short (US) regions. Each unique region is bordered by its own repeat regions arranged in opposite orientations. The terminal repeats (terminal repeat long [TRL] and terminal repeat short [TRS]) lie at the extremities of the genome while the two internal repeats (internal repeat long [IRL] and internal repeat short [IRS] abut each other (Figure 4.2) (109). Until the late 1950s, MD was categorized as a mild, chronic infection characterized by inflammation of nerves and paralysis caused by infiltration of lymphoid cells by the virus (153, 182, 323). Beginning in late 1950s and through 1960s, a more severe form of MD was described. The disease, now termed acute MD, was characterized by the presence of systemic lymphomas, inflammation, paralysis, and morbidity in infected hosts (324, 325). Increased losses in the poultry industry due to acute MD led to the introduction of first-generation of MD vaccine in the late 1960s (146). To date, three generations of MDV vaccines have been commercially used for MD management. Although vaccines have provided temporary protection against MD during different time-periods, vaccine-break strains of MDV-1 have now emerged for two different generations of MDV vaccines (146, 182). Thus, the emergence of vaccine- break strains against the currently being used third-generation vaccine remains a distinct possibility. All three generations of MDV vaccines are live-attenuated vaccines that provide non-sterilizing immunity against the disease. The vaccines protect the host from getting

107

disease symptoms, but do not protect the host from being infected or from transmitting the virus to another hosts (142, 146). Recent theoretical and experimental studies have shown that the use of non-sterilizing vaccines can select for viruses with higher virulence (178, 326). The rise in severity of the disease also coincides with introduction of industrial-scale poultry farming practices in the late 1950s (182, 323, 327), chief among them being the shortened cohort duration in modern poultry farms (148, 328). The rise in severity of MD has been documented through isolation of MDV-1 strains with various degrees of pathogenicity in vaccinated and unvaccinated chickens (153, 325). These strains have been divided into mildly virulent (m), virulent (v), very virulent (vv), and very virulent plus (vv+) pathotypes based upon their pathogenicity (153, 182). Briefly, mildly virulent (m) isolates of MDV-1 do not cause aggressive disease in their host and can be protected against using first-generation vaccine; virulent (v) isolates cause acute disease in hosts vaccinated with first-generation vaccine, but can be protected against using second-generation vaccine; very virulent (vv) and very virulent plus (vv+) isolates are both able to cause disease in hosts vaccinated with second-generation vaccine, with vv+ isolates causing more severe disease (153, 324). Despite both clinical and laboratory data that demonstrate increased virulence in field isolates of MDV-1, the genetic basis of MDV-1 evolution into more virulent forms is not well understood. A few genes such as Meq, Ul36, ICP4, vLIP have been associated with MDV-1 pathogenicity, however, very little is known about their interaction partners (323, 329–331). Most studies aimed at deciphering genetic basis of virulence in MDV have focused on a single gene or mutations within a single gene (332–336). The virulence of a pathogen is a complex trait determined by interactions between multiple genes. Hence, examining point mutations alone is inadequate to explain the underlying molecular mechanisms leading to emergence of complex traits. Homologous recombination between different viral strains can lead to the inheritance of co- segregating genes and can also bring together mutations that have emerged under different selective pressures. Several studies have now demonstrated that homologous recombination is crucial for evolution of herpesviruses (134, 136, 137, 318, 337).

108

Recombination between attenuated herpesvirus isolates in natural and laboratory settings has been shown to result in the emergence of highly pathogenic strains. In a related poultry alphaherpesvirus, infectious laryngotracheitis virus (ILTV), two-live attenuated vaccine strains were shown to recombine to form a virulent strain, raising concerns about the use of live-attenuated vaccines (139). Similarly, two weakly neuroinvasive isolates of herpes simplex virus-1 (HSV-1) were shown to form highly neuroinvasive progeny strains of HSV-1 using a murine model (131). In HSV-1, recombination sites have detected in the GC-rich regions of the genome, mainly in the intergenic and repetitive regions (140). Recombination in herpesviruses has also been shown to be closely associated with genome replication, making it an essential process for viral replication (132, 338). There are limited prior studies exploring the effect of recombination in MDV evolution (337, 339). Hughes et al examined 4 full-length genomes of MDV-1 isolates for the presence of recombination using a method that relied on detecting synonymous mutations in orthologous genes (337). The researchers detected recombination between the vaccine strain CVI988 and virulent strains MD5 and MD11. They also identified the transfer of virulence factors between isolates GA, MD5 and MD11. In contrast to Hughes et al, Loncoman et al used the suite of software available through the RDP4 package for detection of recombination breakpoints using 15 MDV-1 isolates. (339,340). The researchers identified recombination breakpoints in the unique long (UL), unique short (US), and terminal/internal repeat long (TRL/IRL) regions of the MDV genome (337, 339, 341). These studies were done with a limited number of isolates with ambiguous pathotypes. Here, we present findings from a recombination study involving 64 field isolates of MDV-1. These isolates were collected from poultry farms across United States from 1969 to 2014, and were pathotyped using a standardized virulence grading scheme developed by Witter et al (153). The isolates comprehensively cover the range of phenotypes observed in the field isolates from mildly virulent (m) to very virulent plus (vv+). We obtained viral DNA for these isolates from our collaborators at USDA and sequenced them using the Illumina MiSeq and HiSeq. The sequence reads from the sequencer were then subjected to de novo assembly to obtain full-length genomes. We

109

used the software 3Seq (342, 343) to determine the recombination breakpoints across the full-length genomes and then analyzed the impact of these breakpoints on phylogeny using maximum likelihood (ML) trees. Using these analyses, we identified 6 recombination breakpoints across the MDV-1 genomes that had statistical and phylogenetic support.

4.3 Methods

4.3.1 Specimen collection and pathotyping MDV-1 strains were collected from poultry farms across United States from 1969 to 2014. Isolation and pathotyping of the field isolates was performed as outlined by Witter et al (153). The year of strain collection, geographical location, and virulence ranking for the isolates are listed in Table 4.1.

110

Table 4.1: Names, pathotypes, year and location of isolation of MDV-1 isolates Isolate # Isolate Pathotype Year Location 1 JM102W v 1962 MA 2 GA22 v 1965 GA 3 RPL39 v 1969 GA 4 MD5 vv 1977 MD 5 Md11 vv 1977 MD 6 Md3 v 1977 MD 7 Md8 v 1977 MD 8 232 v 1978 MI 9 287L vv 1979 AL 10 295 v 1980 CO 11 MSU2 v 1980 MI 12 MIS-X v 1980 GA 13 RB1B v 1982 NY 14 549AA vv 1987 DE 15 549AB vv 1987 DE 16 568A vv 1988 NC 17 568B vv 1988 NC 18 571 v 1989 CA 19 584B vv+ 1990 NC 20 584A vv+ 1990 NC 21 583A vv 1990 IA 22 596A v 1991 WI 23 608 vv 1992 AR 24 611 vv 1992 PA 25 612 vv 1992 ME 26 610B vv 1992 MD 27 610A vv+ 1992 MD 28 615K vv 1993 DE 29 617A v 1993 OH 30 643G vv 1994 NE 31 643P vv 1994 NE 32 648B vv+ 1994 OH 33 645 vv+ 1994 PA 34 648B vv+ 1994 OH 35 652 vv+ 1995 NY 36 660A vv+ 1995 OH 37 653A vv 1995 DE 38 656C vv 1995 VA 39 674 vv+ 1997 DE 40 676 vv+ 1997 PA 41 685 vv 1997 GA 42 670 vv 1997 ME 43 686 vv+ 1999 IA 44 690 vv+ 1999 GA 45 691 vv 1999 GA 46 701 vv/vv+ 2007 PA 47 709A vv+ 2010 PA 48 709B v 2010 PA 49 723 vv 2011 PA 50 718B vv 2011 PA 51 722A vv+ 2011 IA 52 722B vv+ 2011 IA 53 722D vv+ 2011 IA 54 718A vv 2011 PA 55 722C vv+ 2011 IA 56 730A vv+ 2013 IA 57 730B vv+ 2013 IA 58 730C vv+ 2013 IA 59 739C1 vv+ 2013 DE 60 739A vv+ 2013 DE 61 747-15 v 2014 GA 62 747-22-1 v 2014 GA 63 747C v 2014 GA 64 CU2 v/m not known not known

111

4.3.2 DNA isolation, genome sequencing and de novo assembly of viral genomes 64 full-length MDV-1 genomes were obtained for this study. Nucleocapsid DNA isolation was performed using protocol detailed by Spatz et al (344). Sequencing libraries for each of the isolates were prepared using the Illumina TruSeq Nano DNA

Sample Prep Kit, according to the manufacturer’s recommended protocol for sequencing of genomic DNA. The target DNA fragment size selected for library construction was 550 base pairs (bp). 26 samples were sequenced on an in-house Illumina MiSeq using version 3 chemistry to obtain paired-end sequence fragments of 300 ´ 300 bp. The remaining 38 samples were sequenced on an Illumina HiSeq at The Pennsylvania State University Genomics core facility using 2 lane rapid-run mode to obtain paired-end sequence fragments of 250 ´ 250 bp. De novo assembly of MDV-1 consensus genomes was performed using the viral genome assembly (VirGA) workflow (345). Briefly, VirGA combines quality control preprocessing of reads, de novo assembly, genome linearization and annotation, and post-assembly quality assessments. MDV-1 strain Md5 (GenBank Accession: NC_002229) was used as the comparator for the reference-guided portion of viral genome assembly in VirGA.

4.3.3 Genome-wide examination for recombination A genome alignment consisting of all 64 strains was made using MAFFT (346). A trimmed-genome format (144) without the terminal repeat regions was used for analyses in order to not overrepresent the repeat regions. The alignment was then analyzed using the 3Seq program to determine recombination breakpoints (342, 343). 3Seq imports a nucleotide sequence file and tests triplet drawn from the alignment for mosaic recombination signals to determine if any of the three sequences (the child) is a recombinant of the other two (the parents). 3Seq performs a non-parametric test for mosaicism and uses pre-computed p-values to determine statistical significance. The first pass of multiple genome alignment through 3Seq identified 55 potential breakpoints (Figure 4.1). Out of the potential 55 breakpoints identified, 37 were non- overlapping. 37 genome segments were then analyzed, and any segment containing less than 20 polymorphic sites was combined with the adjacent segment downstream.

112

This yielded a total of 22 segments across all strains analyzed. Maximum likelihood (ML) trees were then produced using the RaXML program (347) for all 22 genome segments using the GTRCAT model of molecular evolution (data not shown). Support for each ML tree was determined using 100 bootstrap replicates. ML phylogenetic tress obtained for all 22 segments were manually examined for the presence of phylogenetic incongruence among viral isolates as compared to the ML phylogenetic tree of the adjacent downstream segment.

64 genome 55 breakpoints 3Seq 37 non-overlapping alignment identified by 3Seq breakpoints

Adjacent blocks lacking ML trees 21 breakpoints after phylogenetic incongruence constructed combining blocks were combined for all 22 blocks <20 SNPs

Identification of 6 breakpoints ML trees constructed recombinants (7 blocks) for all 7 segments between genome blocks

Figure 4.1: Overview of method used for recombination analysis of MDV-1 isolates. The recombination analysis used breakpoints generated by the program 3Seq. The breakpoints were used to divide the genomes into genome segments. The segments were then used to generate maximum-likelihood phylogenetic trees and analyzed for the presence of phylogenetic incongruence between segments. Adjacent segments lacking phylogenetic incongruence were combined into a single genome- segment.

Segments separated by recombination breakpoints, but lacking phylogenetic support, were combined. Using this approach, a total of 6 recombination breakpoints that had statistical and phylogenetic support were identified across the genome (Figure 4.1, 4.2). ML trees were then constructed for all 7 segments. To illustrate the role of recombination in evolution of MDV, we examined the phylogenetic incongruence between the longest recombinant segment (segment 3), and the remaining segments (Figure 4.3- 4.9). MDV isolates that showed phylogenetic incongruence in a particular

113

segment as compared to segment 3, with a bootstrap support of at least 70, were identified as recombinants. Recombination events between isolates are demonstrated using dotted lines and are drawn using the phylogenetic tree of segment 3 as the framework. The phylogenetic tree for segment 3 was modified to accurately depict the year of isolation for each isolate (Figure 4.3-4.9).

4.4 Results

4.4.1 Genome-wide examination of recombination in field isolates of MDV-1 Full-length genomes of 64 MDV-1 isolates were examined for recombination using 3Seq (348). 3Seq detects mosaic structures by comparing sequence triplets using nonparametric statistical tests. This method minimizes the number of false positives due to rate variation and is capable analyzing large data sets. After combining overlapping breakpoints identified by 3Seq and genome segments with less than 20 polymorphic sites we identified 6 breakpoints (7 segments) across the MDV genomes that had statistical and phylogenetic support (see Methods for details) (Figure 4.2,Table 4.2). Table 4.2: Summary of breakpoints identified across MDV genomes

# of Genomic Start End Length polymorphisms Segments regions affected by the position position (bp) contained in breakpoint the segment 1 1 19,100 19,100 49 MDV021(UL9) 2 19,101 24,900 5,800 21 MDV025 (UL13) 3 24,901 133,800 108,900 755 IRL(upstream of Meq) 4 133,801 140,500 6,700 94 IRL(downstream of Meq) 5 140501 145,000 4,500 137 a’ 6 145,001 151,800 6,800 247 ICP4 7 151,801 179,345 27,544 605 N/A

Among the 6 breakpoints identified, 2 breakpoints were present in the unique long (UL) region of the genome, 2 in the repeat long (RL), 1 in the repeat short (RS), and 1 in the a’ region of the genome (Figure 4.2). The two breakpoints in the UL were present in the gene UL9, an origin of replication-binding protein (349), and UL13, a serine-threonine kinase (350). The two breakpoints in the RL region of the genome were present upstream and downstream of the Meq open reading-frame (ORF), a

114

known oncoprotein (351). The breakpoint in the US was present in the ORF of the gene ICP4, a viral transactivator (222). Finally, a single breakpoint was also detected in the a’ sequence, which serves as a cleavage/packaging signal in herpesviruseses (352). The genome segments obtained as a result of these breakpoints were of various lengths and contained different number of polymorphic positions (Table 4.2). With a length of 108,900 bp, the longest genome segment was present between breakpoints 3 and 4. This segment also contained the most number of polymorphisms. A a a’ a MDV-1 genome TRL UL IRL IRS US TRS Meq ICP4 + ORFs _

Meq UL9 UL13 ICP4 B a’ Trimmed MDV-1 genome UL IRL IRS US

Legend UL = Unique long TRL / IRL = Terminal / internal repeat of the long region US = Unique short TRS / IRS = Terminal / internal repeat of the short region a / a’ = Terminal / inverted “a” repeat Recombination breakpoints

Figure 4.2: Overview of the MDV-1 genome showing open reading frames (ORFs), genomic regions, and recombination breakpoints across the genome. (A) The full structure of the MDV-1 genome includes a unique long region (UL) and a unique short regions (US), each of which are flanked large repeats known as the terminal and internal repeats of the long region (TRL and IRL) and the short region (TRS and IRS). Most ORFs (pale green arrows) are located in the unique regions of the genome. ORFs containing the breakpoints or close to the identified recombination breakpoints are labeled. (B) A trimmed genome format without the terminal repeat regions was used for analyses in order to not over-represent the repeat regions. The recombination breakpoints identified are shown using red dotted lines.

115

4.4.2 Virulence evolution through donation of genome segments Virulence evolution through donation of segment 1 7 recombination events were detected when segment 1 was compared to segment 3 for phylogenetic incongruence. The recombination events are shown in Figure 4.3 using black dotted lines. As shown in Figure 4.2, we observed donation of segment 1 from vv to vv, vv to vv+, vv to v, and from v to v. Donation of segment 1 occurred more frequently from vv to vv+ isolates.

MISX CU2 MD5 MD11 RB-1B MSU2 709B 571 232 747C 747C-15 643P 747C-22 295 643G RPL39 GA22 685 690 MD3 691 287L MD8 617A 583A 596A 610B 723 656C 615K 653A 739C1 739A Donation of 718A 718B segment 1 701 709A 674 610A virulent (v) 549AA 549AB 608 very virulent (vv) 568A 670 very virulent plus (vv+) 612 645 676 722A 730C 722D 722B 722C 686 730B 648A 730A 648B 652 611 660A 568B 584B JM102W 584A

1965 1975 1985 1995 2005 2015

Figure 4.3: Phylogenetic tree of segment 3 showing donation of segment 1 between clades/isolates.

116

Virulence evolution through donation of segment 2 4 recombination events were detected when segment 2 was compared to segment 3 for phylogenetic incongruence. The resulting recombination events are depicted in Figure 4.4. We identified donation of segment 2 from from vv to vv+, v to v, and from v to vv isolates.

MISX CU2 MD5 MD11 RB-1B MSU2 709B 571 232 747C 747C-15 643P 747C-22 295 643G RPL39 GA22 685 690 MD3 691 287L MD8 617A 583A 596A 610B 723 656C 615K 653A 739C1 739A Donation of 718A 718B segment 2 701 709A 674 610A virulent (v) 549AA 549AB 608 very virulent (vv) 568A 670 very virulent plus (vv+) 612 645 676 722A 730C 722D 722B 722C 686 730B 648A 730A 648B 652 611 660A 568B 584B JM102W 584A

1965 1975 1985 1995 2005 2015

Figure 4.4: Phylogenetic tree of segment 3 showing donation of segment 2 between clades/isolates.

117

Virulence evolution through donation of segment 4 3 recombinant events were detected when segment 4 was compared to segment 3 for phylogenetic incongruence. Recombination events are depicted in Figure 4.5. Donation of segment 4 was detected from v to v, and between clades of vv and vv+ isolates. The widely recognized oncoprotein Meq (222) was present in segment 4 of the genome.

MISX CU2 MD5 MD11 RB-1B MSU2 709B 571 232 747C 747C-15 643P 747C-22 295 643G RPL39 GA22 685 690 MD3 691 287L MD8 617A 583A 596A 610B 723 656C 615K 653A 739C1 739A Donation of 718A 718B segment 4 701 709A 674 610A virulent (v) 549AA 549AB 608 very virulent (vv) 568A 670 very virulent plus (vv+) 612 645 676 722A 730C 722D 722B 722C 686 730B 648A 730A 648B 652 611 660A 568B 584B JM102W 584A

1965 1975 1985 1995 2005 2015

Figure 4.5: Phylogenetic tree of segment 3 showing donation of segment 4 between clades/isolates.

118

Virulence evolution through donation of segment 5 As depicted in Figure 4.6, only a single recombination event was detected when segment 5 was compared to segment 3. Donation of segment 5 occurred from a vv+ isolate to vv isolate.

MISX CU2 MD5 MD11 RB-1B MSU2 709B 571 232 747C 747C-15 643P 747C-22 295 643G RPL39 GA22 685 690 MD3 691 287L MD8 617A 583A 596A 610B 723 656C 615K 653A 739C1 739A Donation of 718A 718B segment 5 701 709A 674 610A virulent (v) 549AA 549AB 608 very virulent (vv) 568A 670 very virulent plus (vv+) 612 645 676 722A 730C 722D 722B 722C 686 730B 648A 730A 648B 652 611 660A 568B 584B JM102W 584A

1965 1975 1985 1995 2005 2015

Figure 4.6: Phylogenetic tree of segment 3 showing donation of segment 5 between clades/isolates.

119

Virulence evolution through donation of segment 6 3 recombination events were detected when segment 6 was compared to segment 3. Remarkably, donation of segment 3 occurred from v isolates to one particular clade containing vv and vv+ isolates in all three cases. Recombinant events identified between the two segments are depicted in Figure 4.7.

MISX CU2 MD5 MD11 RB-1B MSU2 709B 571 232 747C 747C-15 643P 747C-22 295 643G RPL39 GA22 685 690 MD3 691 287L MD8 617A 583A 596A 610B 723 656C 615K 653A 739C1 739A Donation of 718A 718B segment 6 701 709A 674 610A virulent (v) 549AA 549AB 608 very virulent (vv) 568A 670 very virulent plus (vv+) 612 645 676 722A 730C 722D 722B 722C 686 730B 648A 730A 648B 652 611 660A 568B 584B JM102W 584A

1965 1975 1985 1995 2005 2015

Figure 4.7: Phylogenetic tree of segment 3 showing donation of segment 6 between clades/isolates.

120

Virulence evolution through donation of segment 7 3 recombination events were detected when segment 7 was compared to segment 3. Donation of segment 7 was observed from v to v, vv/vv+ to v, and from vv to vv isolates. Recombination events are depicted in Figure 4.8.

MISX CU2 MD5 MD11 RB-1B MSU2 709B 571 232 747C 747C-15 643P 747C-22 295 643G RPL39 GA22 685 690 MD3 691 287L MD8 617A 583A 596A 610B 723 656C 615K 653A 739C1 739A Donation of 718A 718B segment 1 701 709A 674 610A virulent (v) 549AA 549AB 608 very virulent (vv) 568A 670 very virulent plus (vv+) 612 645 676 722A 730C 722D 722B 722C 686 730B 648A 730A 648B 652 611 660A 568B 584B JM102W 584A

1965 1975 1985 1995 2005 2015

Figure 4.8: Phylogenetic tree of segment 3 showing donation of segment 7 between clades/isolates.

121

4.4.3 Overview of impact of recombination on evolution of MDV-1 Using the breakpoints from 3Seq and phylogenies of the 7 genome segments, the evolution of MDV-1 field isolates through time was reconstructed (Figure 4.3-4.9), showing the influence of recombination. Segment 3, the longest non-recombinant genome segment, was used as the framework upon which recombination events between different segments were mapped (see Methods for details), while the least number of recombination events were detected in segment 5 as compared to segment 3 (Figure 4.2, D). We observed evidence for evolution of MDV isolates into virulent forms in each of the genome segments investigated, indicating that evolution of virulence in MDV could have happened multiple times through numerous routes. However, we also observed instances where donation of genome segments form vv/vv+ isolates could have led to emergence of v isolates. Recombination was widespread with only isolates CU2, JM102W, 287L, MD8, MD11 not being linked to any recombination event. Figure 4.10 shows the recombination events using an unmodified phylogenetic tree of segment 3.

122

MISX CU2 MD5 MD11 RB-1B MSU2 709B 571 232 747C 747C-15 643P 747C-22 295 643G RPL39 GA22 685 690 MD3 691 287L MD8 617A Segment 1 583A 596A Segment 2 610B 723 656C Segment 4 615K 653A 739C1 Segment 5 739A Segment 6 718A 718B 701 Segment 7 709A 674 610A 549AA virulent (v) 549AB 608 568A 670 very virulent (vv) 612 645 676 very virulent plus (vv+) 722A 730C 722D 722B 722C 686 730B 648A 730A 648B 652 611 660A 568B 584B 584A JM102W

1965 1975 1985 1995 2005 2015 Figure 4.9: Phylogenetic tree of segment 3 showing donation of segment 1,2,4,5,6 and 7 between clades/isolates.

123

Figure 4.10: Unmodified phylogenetic tree of segment 3 showing donation of segment 1,2,4,5,6 and 7 between clades/isolates.

4.5 Discussion To investigate the role of recombination in evolution of MDV, we used a two-part approach of breakpoint detection using the program 3Seq, followed by phylogenetic analysis of genome segments separated by the detected breakpoints. This approach was taken to reduce the number of recombination breakpoints detected by the program 3Seq, removing those that lacked phylogenetic support. Prior studies examining the presence of homologous recombination in other herpes and non-herpesviruses have used a similar approach to accurately identify and vet viral recombinants, of combining breakpoints generated from various recombination detection programs with phylogentic

124

analysis (136, 137, 318, 353, 354). Recombination breakpoints identified by software alone can have an inflated number of false positive breakpoints. Thus, accurate identification of homologous recombination necessitates a two-part approach like the one used in this study. We detected a total of 6 breakpoints across the MDV genome. Out of the 6 breakpoints, 3 likely occured in the coding region of the genome, whereas 3 were present in the non-coding regions. The three coding-region breakpoints were present in ORFs of gene UL9, UL13 and ICP4. Strikingly, all three genes encode essential MDV proteins. UL9 is an origin of replication binding-protein and has been shown to be responsible for loading the DNA replisome, consisting of single-stranded binding protein UL29, the UL30-UL42 DNA polymerase, and the UL5-UL8-UL52 helicase-primase complex during viral DNA replication (355). UL13 is one of the two serine-threonine kinases present in the MDV genome, the other being US3. UL13 is part of the conserved family of serine-threonine kinases in herpesviruses (356). UL13 deficient viruses have a lower virion titer as compared to wild-type virus in cell culture, which is associated with inefficient viral assembly and release (350, 356). Likewise, ICP4 is an important immediate-early protein in all herpesviruses, where it serves as a major regulator of viral transcription (220–222). The recombination breakpoint was detected towards the c-terminal end of the protein. In HSV-1, c-terminal domain of ICP4 has been associated with DNA synthesis, gene expression, and intranuclear localization of the protein. The role of ICP4 in MDV pathogenesis is also considered crucial because of its proximity to the latency associated transcripts (LAT) and recently described miRNAs (222–224). In a previous study of MDV-1 attenuation through serial passage in vitro, mutations in ICP4 appeared to coincide with attenuation (194). Among the three recombination breakpoints detected in the intergenic regions, 2 were present in the RL region, while 1 was present in the a’ sequence between RL and RS regions of the genome. The two breakpoints present in the RL region were present on either side of the Meq gene (Figure 4.2). The genomic region downstream and upstream of the Meq gene is enriched for micro-RNAs. Of the total 9 micro-RNAs encoded by MDV-1, 6 are present upstream of the Meq gene and 3 downstream (357). Therefore, it is possible that the recombination breakpoints identified upstream and

125

downstream of the Meq gene could affect the coding sequences of these micro-RNAs. The third intergenic breakpoint is present in the a’ sequence of the genome. The a’ sequence has been shown to be a hotspot for homologous recombination and is implicated in production of isomeric viral DNA through inversion of RL and RS sequences (Figure 4.2) (358, 359). A previous recombination study by Loncoman et al detected 6 breakpoints across the MDV-1 genomes, but the distribution of these breakpoints differed from those in this study (Table 4.2, Figure 4.2) (339). The researchers report a single recombination breakpoint in Internal/Terminal repeat, 4 breakpoints in Unique Long and 1 in Unique Short. Also, the researchers detected phylogenetic support for only a single recombination breakpoint present in the US regions of the genome. No description is provided about the location or if any genes were affected by these breakpoints in the MDV-1 genome. Our analysis detected widespread recombination between field isolates of MDV- 1. Among the 64 isolates analyzed only CU2, JM102W, 287L, MD8, and MD11 were not linked to a recombination event. We detected recombination events between isolates of all pathotypes. Nevertheless, our analysis revealed that donation of genome segments occurred predominantly to isolates with vv and vv+ pathotypes. Recombination events were also detected between isolates of the same pathotypes. For genome segments 1,6, and 7, we also detected donation of genome segments from a more recent isolate to an older isolate. These scenarios could arise if the isolate donating the genome segment might have been circulating in poultry farms long before its isolation, or if the donation of genome segment probably occurred from the ancestor of the more recently isolated virus. Notably, we also detected instances of recombination events that led to a decrease in pathogenicity of the recipient isolate. Donation of genome segment 1 was observed from a vv/vv+ to v isolate, donation of genome segment 5 was observed from vv+ to vv isolate, and donation of segment 1 was observed from a vv+ to v isolates. Therefore, although the movement of genome segments occurred mostly from isolates of lower pathogenicity to higher pathogenicity, the phenomenon did not always hold true. This indicates that evolution of higher virulence in MDV is not always favored.

126

Therefore, evolution of MDV-1 field isolates to higher pathogenicity could occur through multiple distinct pathways. Trimpert et al have reported similar findings after analyzing MDV-1 genomes isolates from Eurasia and North-America (341). They identified distinct genotypic pathways that led to evolution of virulence in MDV-1 isolates from Eurasia and North-America. A topic of great interest to the field of MDV biology is the role of Meq protein in the evolution of virulence (336, 351, 360). In our analysis, the Meq gene was present in the segment 4 of the genome and was separated from segments 3 and 5 by breakpoints upstream and downstream of the ORF (Table 4.2, Figure 4.2). Three recombinant events were identified as a result of donation of segment 4, two of which were donations from v to v isolates, and one from a clade containing vv and vv+ isolates to another clade containing vv and vv+ isolates. These recombination events demonstrate that transfer of the Meq gene alone is not sufficient for conferring higher virulence pathotypes. This also points to the fact that virulence pathotype in MDV is a polygenic trait and that understanding the genetic basis of virulence requires a genome- wide approach like the one undertaken in this study. Our use of MDV-1 isolates with known pathotypes spanning nearly five decades has enabled us to conduct the first comprehensive study examining the role of homologous recombination in the history of MDV, and thus its possible contribution in the evolution of pathogenicity in MDV-1. For an even more comprehensive picture of MDV-1 evolution, the rate at which point mutations have arisen across these genome needs to be analyzed, along with the effects of homologous recombination. Whole- genome sequencing of isolates prior to the industrialization of the poultry industry will also provide further insights into phylogeography and genetic basis of virulence in MDV- 1 isolates.

4.6 Future directions In our study, ML trees were used to investigate the phylogeny of different genome segments. Simultaneous use of a clock-based phylogenetic network would put a time-stamp on the branch points in the tree, enabling us to make a more precise estimate of the time of emergence of virulent strains. It would also allow us to compare

127

it to the time of introduction of each MDV vaccine, and the industrialization of poultry farming practices. This approach to understanding the emergence of virulence has been used for MDV-1 by Trimpert et al (341), albeit with fewer isolates and less pathotyping details. Thus, going forward we would like to analyze all the genome segments identified using Bayesian coalescent methods of BEAST (361).

4.7 Acknowledgements We thank members of the Read and Szpara labs for helpful feedback and discussion. This work was supported and inspired by the Center for Infectious Disease Dynamics and the Huck Institutes for the Life Sciences, as well as by startup funds (MLS) from the Pennsylvania State University. This work was part funded by the Institute of General Medical Sciences, National Institutes of Health (R01GM105244; AFR) as part of the joint NSF-NIH-USDA Ecology and Evolution of Infectious Diseases program. The findings and conclusions of this study do not necessarily reflect the view of the funding agencies.

128

Chapter 5

Perspectives and future directions

129

The idea of understanding genetic variation in clinical or field isolates of herpesviruses has been an underlying theme in all my projects during my time in the Szpara lab. I had the opportunity to work with a human and a poultry herpesvirus to study different aspects of herpesvirus biology. In chapter 2 of the dissertation, I have presented findings from a study that aimed to assess genetic diversity and evolution of MDV in the field. We were successful in assembling the first-ever full-length MDV genomes obtained using infectious poultry dust and feathers as the source of viral DNA (144). This study allowed us to get a first glimpse of wild type MDV circulating in poultry farms and what a single host was shedding into the environment. This is a major breakthrough for the field of MDV biology. Prior to this study, the handful of MDV genomes available were sequenced using DNA from virus amplified using cell-culture, raising concerns about cell-culture adaptation of these isolates. Also, none of these viral isolates were obtained from poultry dust or chicken feathers, and most had been plaque purified diring their culturing. When comparing these field isolated MDV genomes, we observed that the viral genomes had high DNA identity to each other, but had up to 5 variant proteins between them (144). Since HTS makes it possible to sample many members of the viral genomic population, rather than just the dominant member, we also examined polymorphisms within each viral population. This revealed that field samples of MDV are a polymorphic population. By studying one of these polymorphic loci in greater depth, we were able to show that fluctuations in allele frequency can occur over a short period of time, which is unprecedented for herpesviruses. This project was also crucial in laying the foundation for MDV genomics in the Szpara lab. The two poultry farms from which the full-length genomes were obtained are located in central Pennsylvania. This project was done in collaboration with the Read lab at Pennsylvania State University. These farms are part of a three-year MDV surveillance project by the Read lab aimed at surveying poultry farms across central Pennsylvania (362, 363). During this period, the Read lab has collected over 9000 samples, from over 300 farms. Now that we have established that poultry dust and chicken feathers can be a source of viral DNA for HTS, the genomics data can be linked with surveillance data to assess strain variation in different farms over space and time. We have extended our knowledge and expertise from the above mentioned study

130

to understand the evolution of virulence in MDV using other field isolates of MDV. We have recently obtained 64 isolates of MDV with known virulence phenotypes in animal models from our collaborators at the United States Department of Agriculture (USDA). We have deep-sequenced these samples and assembled full-length viral genomes. These samples were obtained from poultry farms across United States from 1962 to 2014. Most importantly, these isolates have been pathotyped into various virulence categories using a standardized virulence-grading scheme (174). The distribution of these isolates across space, time, and virulence pathotypes make them invaluable to studying evolution of MDV into virulent forms. Chapter 4 of the dissertation presents a preliminary analysis examining the role of homologous recombination in the evolutionary history of MDV. As discussed in the chapter, we were able to show different routes that could have possibly led to the evolution of modern MDv starins many of which are of higher virulence than historical MDV strains. Likewise, as highlighted in Appendix D, Table D-1, we have also performed an initial analysis to investigate different SNPs that correlate with different virulence ranks. We have shown that SNPs at different positions across MDV genomes can be confidently associated with virulence pathotypes. These findings are important for the MDV field, as most studies seeking to determine the genetic basis of virulence in MDV have focused on a small number of SNPs in just a few genes. We present the first ever study that comprehensively assesses genetic variation across the MDV genome with a large number of pathotyped isolates. Since the disease pathotypes of these isolates are known, a genome-wide association study (GWAS) can now be used to identify SNPs that are most statistically linked to disease pathotype. Comparative genomics can be a powerful tool for hypothesis generation. For example, the SNPs identified through comparisons of these genomes can form the basis of reverse genetic approaches to understand the biological importance of these SNP variations using in vitro and in vivo models. In Chapter 3 and Appendix B of the dissertation I have presented two cases of familial transmission of HSV-1. Although the scenario and disease outcomes for the two transmission events differed, we were able to draw several conclusions that were common to both the studies. Firstly, we showed that the familial transmission of HSV-1

131

can be nearly perfect. This was demostrated by the high DNA identity shared by the viral isolates obtained from the father-son and mother-neonate pairs. Using animal models and HTS on viral isolates from the father-son pair, we were able to show that HSV-1 isolates can preserve their genotype and phenotype over decades of reactivation and latency. Likewise, the examination of the viral population for the isolates obtained from these familial pairs revealed that although viral genomes can be identical at the consensus level, they can differ at the level of the viral population. Examination of the viral population revealed transmission of a few polymorphisms from father to son and from mother to son, but most polymorphic positions detected across the genome appeared to have arisen de novo within each host. This might be indicative that despite sharing a genetically related host , each virus population could be under selective pressures that were unique to the individual host. The father-son transmission case was also the first instance of phenotypic and genotypic characterization of HSV-1 isolates after a transmission event. Together these studies have provided a model for exploring future HSV-1 transmission events. The high DNA identity between the source and transmitted virus isolates indicates that familial transmission of HSV-1 is unlikely to be a major source of the viral genetic diversity observed between wild HSV-1 isolates. Non- familial transmission events, genetic drift, and recombination between HSV-1 isolates are additional sources of genetic diversity yet to be fully explored in wild HSV-1 isolates. Appendix A of the dissertation presents a preliminary analysis aimed at identifying genetic determinants of HSV-1 virulence. As for the MDV isolates discussed in chapter 4 and Appendix D, these clinical isolates of HSV-1 have been categorized into pathotype categories using a standardized virulence grading scheme. The availability of full-length genomes and phenotypic data make these isolates highly valuable in linking genetic variations with virulence. Computational comparisons of genomes with different pathotypes have identified amino acid variations in protein VP22 that correlate with high and low-virulence isolates. As in the case of MDV, the biological importance of these variations can only be understood by engineering recombinant viruses and then testing them using in vitro and in vivo models. This work is ongoing in the lab. In the lab, I have initiated the process of making recombinant virus to test this

132

VP22 hypothesis by swapping high-virulence genetic variants observed in UL49 gene into a low-virulence viral isolate. As highlighted in Appendix A, Figure A-5 and A-6, I have thus far successfully made a shuttle plasmid containing each gene variant from high and low-virulence isolates (Figure A-5) and a CRISPR-Cas9 plasmid (Figure A-6) to facilitate the process of homologous recombination. Transfection with gene-variant plasmid, and the CRISPR-Cas9 plasmid followed by an infection has shown to allow homologous recombination to occur in vitro leading to generation of recombinant virus (364). Since, the parental strains of the recombinant viruses have already been tested using animal models the recombinant virus can be compared directly to the parental strains. Further, the protein of interest from each recombinant virus can also be compared with the parental viruses to examine the level of protein expression, its localization within the host cell, and its ability to interact with other proteins. If the biological importance of these variations can be established, this ability to predict virulence using naturally occurring variations would be a major step forward in control of HSV disease, diagnosis, vaccine development, and treatment.

133

Appendix A:

Forward genetics approaches for prediction and testing of virulence loci in herpes simplex virus 1 (HSV-1)

Utsav Pandey1, Daniel W. Renner1, Richard Thompson2, Nancy Sawtell3, Moriah L. Szpara1

1Department of Biochemistry and Molecular Biology, Center for Infectious Disease Dynamics, and the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania 16802, USA

2Department of Molecular Genetics, Biochemistry and Microbiology, University of Cincinnati, Cincinnati, Ohio, 45229, USA

3Division of Infectious Diseases, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, 45229, USA

Ongoing study

Acknowledgements: U.P. prepared viruses for sequencing and performed computational comparisons. D.R. assembled the full-length genomes and performed computational comparisons. N.S. performed all animal pathogenesis work. M.S., R.T. and N.S. conceptualized the work. U.P. wrote the chapter.

134

A.1 Abstract Full-length viral genomes enable a comprehensive picture of viral evolution, pathogenesis and transmission, in contrast to the limited view enabled by targeted sequencing of specific genes or genomic regions. We have assembled full-length genomes of 5 clinical isolates of HSV-1. The virulence phenotypes of these viruses were previously compared using a mouse model of ocular infection. We compared the genomes of these clinical isolates with 5 well-characterized lab isolates of HSV-1, which also have known virulence phenotypes in mice. We grouped these isolates into two major virulence categories: high-virulence isolates that cause significant mortality in mice vs. low-virulence that did not. To detect potential loci for neurovirulence, we computationally compared all protein-coding sequences of these isolates and identified non-synonymous mutations that distinguished virulent from non-virulent isolates. These loci included: amino acid residues at position 49 and 51 of protein VP22, product of the UL49 gene. The virulent isolates of HSV-1 had amino acid residues S and Q at positions 49 and 51 of protein VP22, whereas the low-virulence isolates had residues A and P. We are in the process of engineering viral mutants using CRISPR-Cas9 assisted homologous recombinaion that will introduce the high-virulence genetic variants into a low-virulence viral isolate. The CRISPR-Cas9 targeting vector and cloned variants for one of the genes of interest - VP22 (UL49) have already been made. Since the parental strains have already been described using the mouse ocular model of infection before mutagenesis, the mutants can also be compared to the parental isolates using the same model.

135

A.2 Research summary

A.2.1 Introduction Studying clinical isolates of pathogens is indispensable in understanding the pathology of the disease. Lab strains of HSV are useful in studying aspects such as biology and mechanisms of viral replication, while clinical samples can tell us about infectivity of the virus in vivo. Deep sequencing of clinical isolates that manifest varying degrees of pathogenicity may provide insight into genetic markers that influence the virulence of a particular strain. Computational tools allow us to carry out genome wide comparisons, with the goal of identifying genetic markers that are present in highly virulent strains, but are absent or altered in the less virulent. These findings can then be verified using traditional lab techniques such as PCR, Sanger sequencing, Western blots etc. It is also necessary that these findings be complemented by meticulous phenotyping of virus either while collecting samples from the human host or by using animal models in the lab. We have initiated the effort for determining genetic basis of virulence in HSV-1 by using 5 clinical isolates of HSV-1 obtained from patients at Cincinnati Children’s Hospital Medical Center (CCHMC).

A.2.2 Animal model for studying HSV-1 pathogenesis Our collaborators at CCHMC previously compared these isolates in a mouse eye infection model (Figure A-1) and categorized the viruses into two different phenotype categories – low-virulence with low activation frequency and extreme virulence with rare survivors in absence of antivirals (Figure A-2).

136

Eye (tear films) Infect w/ 105 PFU Trigeminal after ocular ganglia scarification (TG)

Swiss Webster (SW) mouse

Figure A-1: Ocular model of infection in mice. The ocular model of infection involves inoculation of the scarified corneal surface with a high titer of virus. This is followed by quantification of the resulting disease phenotype in the eye on a scale of 0 to 5, with 0 being no disease phenotype and 5 being corneal keratitis. Morbidity in infected animals is assessed by measuring weight loss. Animals showing greater than 20% weight-loss are euthanized and are used to quantify survival rate post-infection. The ability of the HSV-1 isolates to penetrate into the nervous system is assessed by measuring viral titers at 2,4,6, and 8 days post-infection in eyes, trigeminal ganglia, and brain.

100 Low-virulence 17-5* 75 High-virulence RWS* 17-1* NBS* 50 17-2* KOS F % survival McKrae 25 H129 KOS79 0 051015

Time (days) Figure A-2: Disease phenotype of clinical and lab-isolates in Swiss-webster mice. Isolates that did not cause mortality in mice were categorized as low-virulence isolates whereas isolates that caused mortality were categorized as high-virulence. Isolates with * are low-passage clinical isolates and the rest are lab isolates.

A.2.3 Viral DNA extraction and genome assembly These clinical isolates were cultured in MRC-5 cells, an immortal cell line of human origin, and DNA isolation was performed using the nucleocapsid method of viral DNA isolation (288). The DNA thus isolated was sequenced using Illumina MiSeq sequencer and the reads obtained were assembled using the VirGA pipeline (202).

137

A.2.4 Identification of potential genetic markers of virulence Using the phenotypic data provided by our collaborator, we computationally compared all protein coding sequences between the isolates that were highly virulent in mice versus the ones that had low-virulence. Since we began the comparisons with only five isolates, lab strains with known phenotypes were then added into the virulent and avirulent groups for more robust comparisons and to reduce the number of candidate genes (Figure A-2). These comparisons highlighted the candidate virulence protein VP22, product of the UL49 gene, and specifically to amino acids at position 49 and 51. The virulent isolates of HSV had amino acids S and Q at position 49 and 51, whereas the avirulent strains had A and P (Figure A-3). A

VP22

50 100 150 200 250 301 B VP16 gE VP22 (UL49) P P binding binding P NES NLS 49,51

C 49 51 Low virulence

High virulence

Figure A-3: Variants in VP22 associated with virulence in the mouse ocular module. (A) Positions of the UL49 gene under investigation in the HSV-1 genome. (B) Schematic showing functional domains of protein VP22 and positions of amino acid residues under investigation. (C) Amino acid alignment showing difference in residues between high and low-virulence isolates. A.2.5 Making viral mutants to explore gain and loss of pathogenicity In order to understand if the S and Q residues in VP22 are correlative with virulence, we plan to swap the amino acids at position 49 and 51 of protein VP22 between the avirulent and virulent isolates for gain or loss of function studies. For the purpose of making viral mutants, clinical isolate 17-1 was chosen from the virulent isolates for the removal of the S and Q residues, whereas clinical isolate 17-5 was

138 chosen from the avirulent isolates to add the S and Q residues. These isolates were chosen after genome wide amino acid comparison of virulent and avirulent isolates, which revealed that 17-1 and 17-5 share a great deal of homology outside of the residues of our interest. We are in the process of engineering viral mutants that will introduce the high-virulence genetic variants into a low-virulence viral isolate. Traditionally, mutants of large DNA viruses have been made using either homologous recombination (365) or bacterial artificial chromosomes (BACs) (366). Homologous recombination has a very low rate of success, while viruses recovered using BACs can be attenuated in comparison to wild strains, due to unintended genetic changes in the viral genome (364, 365). Recent studies have shown that CRISPR-Cas9 editing can be used to facilitate the process of homologous recombination, by targeting specific sites for nicking and integration of exogenous DNA (364, 367). We plan to accelerate the process of making recombinant swaps in these viral genomes by using the CRISPR- Cas9 system in combination with homologous recombination (368) (Figure A-4).

CRISPR-facilitated homologus recombination Homologous recombination Recombination & purification of virus #1 & purification #2

CRISPR High- CRISPR Low- Cas9 virulence Cas9 virulence + += +=+ targeting gene targeting gene Low-virulence vector variant High-virulence vector variant Low-virulence virus recombinant “rescue” virus virus

Figure A-4: CRISPR-facilitated homologous recombination. High-virulence associated gene will be introduced in the background of low-virulence isolates to generate a high-virulence recombinant virus and vice versa. The Cas9 protein causes a double-stranded break at the CRISPR site in the HSV-1 genome, which aids homologous recombination.

The CRISPR-Cas9 targeting vector and cloned variants for VP22 (UL49) have already been made (Figure A-5, A-6). Once the mutants are made they can be verified through deep sequencing and restriction endonucleases to check for off-target effects.

139

A B UL49 (VP22) gene from virulent and avirulent (17) KpnI isolates BamHI (34) ~2.5 kb

KpnI(103,957) BamHI(106,444)

UL48 UL49 UL49A UL50

PUC19+2 2653 bp 104,907 105,660105,668105,812 Region of interest

plasmid + insert plasmid only C

6 kb 3 kb

500 bp

Figure A-5: Preparation of shuttle plasmid containing the gene variant of interest for UL49 (VP22). (A) Snapshot of the HSV-1 genome showing the genomic region containing the UL49 gene. The 2.5 kb region was amplified using PCR and used as an insert to make a shuttle plasmid. KpnI and BamHI restriction sites present in the region were used for annealing the insert into the Puc19+2 plasmid backbone. (B) Schematic of the Puc19+2 plasmid showing the positions of KpnI and BamHI restriction sites in the plasmid backbone. The plasmid was provided by Dr. Richard Johnson (University of Cincinnati.) (C) Gel showing formation of the shuttle plasmid. The plasmid without the insert is ~2.6 kb in length. After the integration of the insert (~2.5 kb) into the backbone the plasmid now runs between 5 and 6 kb on the agarose gel. The shuttle plasmid is stored in the Szpara Lab plasmid library.

140

A

KpnI(103,957)CRISPR site (104,462) BamHI(106,444)

UL48 UL49 UL49A UL50

Region of interest

PAM sequence CRISPR guide RNA

B

C Plasmid+Oligo Plasmid only

8 kb 8 kb

500 bp 500 bp

Figure A-6: Preparation of CRISPR-Cas9 targeting plasmid. (A) Snapshot of the HSV-1 genome showing the Cas9 targeting site for CRISPR-Cas9 assisted homologous recombination. Highlighted in green is the exact sequence in the HSV-1 genome homologous to the designed guide RNA. Highlighted in yellow is the protospacer adjacent motif (PAM) sequence which flanks the 3’ end the homologous sequence as recommended by Le Cong et al (369). (B) Nucleotide sequence of oligonucleotides used for cloning into the CRISPR plasmid. The pair of annealed oligos were cloned into the CRISPR array of pX330-U6-Chimeric_BB-CBh-hSpCas9 plasmid (Addgene #42230) using BbsI restriction sites (369). The protocol used for this process was provided by Tscharke Lab (Australian National University) and Lindner lab (Penn State University). (C) Restriction digest of px330 plasmids with enzymes BbsI and AgeI. Plasmids with annealed oligonucleotides are linearized to form a single band due to the loss of BbsI restriction site whereas plasmids without annealed oligonucleotides are digested into ~1 kb and ~7.5 kb fragments.

141

A.2.6 Conclusion and future directions: The protein of our interest - VP22, is one of the major tegument proteins of HSV. Its amino acid sequence is conserved in the subfamily Alphaherpesvirinae, however, its function seems to differ between different alphaherpesviruses (163). VP22 is essential for replication in cell culture for MDV-1 and VZV and is not essential for BHV-1, PRV and HSV-1 (163). Deletion of UL49 affects BHV-1 and HSV-1 pathogenicity in vivo (163). It has also been reported that VP22 localizes in the nucleus or cytoplasm of the host cell at different times post infection, which indicates dynamic trafficking of VP22 during infection (370, 370, 371). Although a variety of functions of VP22 have been reported, the precise mechanism by which VP22 regulates pathogenesis remains unknown at present. One of the greatest challenges in associating genetic loci with phenotypes is determining if the genetic variation is sufficient to cause the phenotype under study, or if it requires epistasis. To address this, once the viral mutants are made, these mutants will be tested in vivo in mice using an eye-infection model as previously detailed by our collaborators (Figure A-1) (372). Since the parental strains have already been described using in vivo mice model before mutagenesis, the results obtained can be compared to parental isolates. If the removal of S and Q residues from 17-1 diminishes its virulence, then we can successfully conclude that these residues are necessary for virulence in mice. Similarly, if addition of the residues into 17-5 makes it more virulent in mice, then we can conclude that S and Q residues are sufficient for virulence in mice. Similarly, the biological importance of the residues can also be compared in vitro by examining VP22 expression, localization, and protein-protein interactions between recombinant and parental viruses. To further understand the importance of these motifs in HSV-1 virulence, we aim to look for these motifs in other clinical isolates of HSV-1 and study its correlation to recurrence of infections, lesion severity, shedding of viruses etc.

142

Appendix B:

Genomic snapshot of perinatal viremic herpes simplex virus-1 transmission from mother to newborn with fatal outcomes

Mackenzie M. Shipley1,2, Utsav Pandey1,2, Daniel W. Renner1,2, Charles Grose3, Moriah L. Szpara1,2

1Department of Biochemistry and Molecular Biology, and the 2Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA

3Virology Laboratory, Children’s Hospital, University of Iowa, Iowa City, Iowa, 52242, USA

Manuscript in preparation

Acknowledgements: M.M.S. prepared viruses for sequencing. D.R. assembled the full-length genomes. U.P. performed genomic analyses. M.M.S. and M.L.S. conceptualized the work.

143

B.1 Abstract Transmission of herpes simplex virus type 1 (HSV-1) happens daily all over the world, and yet this event is transient and poorly characterized at a genomic level. Herein we detail a rare instance of perinatal HSV-1 transmission that resulted in death of both mother and neonate. A pregnant woman with known negative HerpeSelect antibody test underwent cesarean section at 30 weeks’ gestation; she died later on the same day. The newborn died 5 days later. Both patients were found to have positive blood HSV-1 PCR tests postmortem. Using in-solution oligonucleotide enrichment and next- generation deep sequencing strategies, we determined that transmission of HSV-1 from mother to neonate was nearly perfect at the consensus level. Transmission of a few minority variants from mother to neonate occurred, but we also detected non-maternal HSV-1 variants that appeared to have arisen de novo in the neonate’s blood. The most interesting minority variant was a coding mutation in the UL6 gene of the neonate’s virus that was absent in the mother’s virus. The UL6 protein forms the portal of entry of HSV DNA into the capsid. An outstanding question in the herpesvirus field is whether HSV-1 transmission is nearly perfect, or if variations and/or bottlenecks occur in the viral population. Our 2 cases suggest that (i) unlike congenital cytomegalovirus, neonatal HSV-1 transmission does not include the entire maternal HSV-1 population and (ii) nonsynonymous mutations can occur after fewer than 10 rounds of replication. Thus, this report contributes to our understanding of viral transmission and pathogenesis in humans, and may ultimately lead to improved prevention strategies for neonatal HSV-1 acquisition.

144

B.2 Research summary

B.2.1 Introduction HSV infection remains one of the significant causes of morbidity and mortality in neonates. Currently, the incidence of HSV disease in the United States is approximately 1 per 3200 deliveries (264, 265). Without treatment neonates with HSV disease have a forty percent survival rate, while survivors can have lifelong sequelae (265). Morbidity of CNS infections with HSV is higher for HSV-2 than for HSV-1, however, the rise in genital infections due to HSV-1 has made it important to understand changing epidemiology of HSV-1 for patient counseling and case management during birth (265, 373, 374). HSV infections can be broadly be classified into three different categories: disseminated disease affecting multiple organs (disseminated disease), central nervous system infection (CNS disease), and skin, eye, and mouth infections (SEM disease) (264, 375). Eighty five percent of neonatal HSV disease is acquired during birth (peripartum period), five percent is acquired before birth (in utero period), and ten percent is acquired after birth (postnatal period) (264, 376, 377). This report focuses on a case of perinatal HSV-1 transmission that resulted in dual mortality of the mother and the neonate. The transmission event was unanticipated because it occurred on the same day that the Herpeselect HSV-1/HSV-2 IgG antibody titer of the pregnant mother was returned with a negative result.

B.2.2 Clinical presentation The mother was a 41-year old female in week 30 of gestation (Figure B-1). She was negative for HSV-1/2 IgG antibody test. She died 24 hours after being admitted into the hospital. A PCR test on mother’s blood for HSV postmortem was positive for HSV-1 genome. The newborn died 24 hours after birth in spite of having a normal physical and skin examination. Oral swab and blood from the newborn were positive for HSV-1 genome via PCR.

145

Figure B-1. Timeline of clinical symptoms, HSV-1 detection, and diagnosis for mother and neonate. (A) Timeline of disease symptoms and death of pregnant female. The patient was admitted to the hospital with signs of eclampsia. She had an emergency cesarean section at 30 weeks gestation. A blood sample was collected for lab tests screening for HSV-1, HSV-2, cytomegalovirus (CMV), varicella zoster virus (VZV), parvovirus, hepatitis A, hepatitis B, and hepatitis C. The woman died in less than 24 hours following hospital admission. A positive HSV-1 PCR test was returned four days post mortem. (B) The timeline of the neonate’s birth and subsequent death. The fetus was delivered at 30 weeks. At four days old, the neonate was put on intravenous acyclovir therapy when it was determined that the mother was positive for HSV-1 by PCR. The neonate died the following day, at 5 days old.

146

B.2.3 Library prep, Illumina sequencing, and genome assembly DNA isolated from mother’s blood, neonate’s skin sample, and neonate’s blood were subjected to library-prep and Illumina HTS. The library prep for these samples involved the use of oligonucleotide-baits homologous to HSV-1 genomes, hence, enabling capture of HSV-1 specific DNA. The DNA thus isolated was sequenced using Illumina MiSeq sequencer and the reads obtained were assembled using the VirGA pipeline (202).

B.2.4 Consensus genome comparison The genomes of viruses obtained from mother and neonate were highly identical with a DNA identity of 97.5% between each other. We did not detect any single nucleotide polymorphisms (SNPs) between the three consensus genomes, but we did detect insertions/deletions (INDELs) (Figure B-2). We detected two INDELs in the gene UL36 and UL51 while the rest of the INDELs were present in the non-coding regions of the genome.

147

Figure B-2. Genetic comparison of HSV-1 isolates transferred from mother to neonate during birth. (A) Diagram of the HSV-1 genome and its genes (genes; gray arrows depict forward- vs. reverse-strand encoded genes). Overlapping genes are shown below the main diagram. Black dashed vertical lines excluding terminal repeat regions denote trimmed genome used for downstream genomic analyses (see Methods). (B) Histogram shows percent identity of a DNA alignment of viral genomes derived from the three clinical samples from mother and neonate. In the histogram, nucleotide position identity is color-coded: 100% identity is gray, ≤99% identity is yellow, and ≤25% identity is red. To illustrate the genome-specific locations of these non- identical sites, each genome is depicted as a horizontal gray bar (bottom), with gaps in the alignment (in/dels) shown as vertical or horizontal black bars. UL, Unique Long region; US, Unique Short region; TRL / IRL, Terminal or Internal Repeat of the Long region; TRS / IRS, Terminal or Internal Repeat of the Short region. Identity graph was generated using Geneious.

We also compared the mother and neonate full-length genomes and amino acid sequences to other publically available HSV-1 strains. Comparison of the full-length genomes using SplitsTree phylogenetic network revealed that the mother and neonate genomes cluster with other HSV-1 genomes from North America/European clade (Figure B-3). At the amino-acid level we detected 18 amino acid differences in 12 proteins that were unique to the mother and neonate genome, which were not present in any of the publically available genomes (Table B-1).

148

Figure B-3. Splitstree demonstrating the North American phylogeny of the mother and baby HSV-1 isolates. The three clinical isolates presented in this work are highlighted in red and fall within the North American/European phylogenetic clade. This tree was generated by aligning the consensus sequences of the three viral isolates from mother and neonate with 50 published HSV-1 strains (see Methods section). The scale bar represents 0.1% nucleotide divergence.

Table B-1: Unique amino acid variants found in all three clinical genomes from the mother and neonate that are not present in 48 other published strains of HSV- 1.

AA Mother/ Other Nucleotide Gene Gene Product Alignment Child Strains Position Position* 41 T N 123 UL8 DNA helicase/primase 480 I M 1,440 HP3 protein involved in UL9 270 Q P 810 DNA replication 232 N D 696 UL10 Glycoprotein M (gM) 456 E G 1,368 UL20 Membrane protein 18 T A 54 UL22 Glycoprotein H (gH) 229 I R 687 UL27 Glycoprotein B (gB) 79 Q K/P/T 237 UL29 ssDNA binding protein 214 S F 642

149

UL30 DNA polymerase 279 R L 837 UL36 ICP1/2, tegument protein 491 I L 1,473 Small subunit, UL40 114 I L 342 ribonucleotide reductase 95 N D 285 100 H Q 300 UL47 VP13/14--binds RNA 114 S R 342 278 P T 834

279 N S 837

Tegument-associated US11 phosphoprotein, post- 18 S Y 54 transcriptional regulation

B.2.5 Genome comparison at the population level We next examined the three genomes below the consensus level to compare them at the population level. We sought to determine, if the conservation across genomes also occurred at the population level. The three genomes differed in the quantity of minor-variants (MV) present across the genomes. Majority of the MV present in all three genomes were present in the non-coding regions of the genome. The MV present in the UL6 gene, of the viral genome obtained from neonate’s skin was particularly interesting. The UL6 gene encodes the portal protein necessary for DNA packaging into viral nucleocapsids. If the MV were to become the major nucleotide at that position, it would cause a non-synonymous mutation in the UL6 protein (Figure B- 4). Comparison of the viral genomes obtained from the neonate and mother revealed that only a fraction of the MV were conserved across genomes. This is consistent with findings from chapter 3 where we showed that although the father and son genomes were conserved at the consensus level the MV present in the genomes differed widely.

150

151

Figure B-4. De novo minority variant in the UL6 portal protein of the neonate’s HSV-1 skin isolate. (A) Protein diagram of the 676 amino acid UL6 portal protein of HSV-1. This diagram details known domains as well as predicted domains (378). RGD motifs (12-18 and 238-240) include peptides Arg-Gly-Asp that together, are associated with cell adhesion (379, 380). Predicted nuclear pore binding domain (19-48). Disulfide bonds are present at amino acid positions 166 and 254. Leucine zipper motif occurs from amino acid position 422-443 (378). The location of the UL6 minority variant detected in the neonate skin HSV-1 isolate and highlighted in part C is shown relative to the entire 676 amino acid protein. (B) The UL6 portal protein does not appear to be well conserved among the 8 human herpesviruses. Identities were calculated using BLAST alignments against HSV-1 UL6 protein on the UniProt website (381). (C) UL6 minority variant present in the neonate skin HSV-1 genome is highlighted at the nucleotide level. A subset of reads is shown at nucleotide position 7,373 relative to the consensus level sequence. The major allele guanine (G) at this position is present at a frequency of 90% while the minor allele thymine (T) is present at a frequency of 9%. Both forward (green) and reverse (purple) reads are shown to demonstrate that bidirectional read support is present at this nucleotide position. B.2.6 Conclusion The HSV-1 viral genomes obtained from the mother and the neonate allowed us to investigate various aspects of HSV-1 transmission that have not been previously explored. This case report presents the first full-length HSV-1 genomes obtained after a fatal HSV-1 disease in the mother and the neonate. We compared these genomes to one another and to the other publically available HSV-1 genomes. Through these comparisons we learned that at the consensus level the HSV-1 genome obtained from the mother (source) was identical to the HSV-1 genome obtained from the neonate (recipient). The high identity between the viral genomes obtained from skin and the blood of the neonate indicated that the virus causing viremia and that present in the neonate’s skin are genetically identical. As compared to other HSV-1 genomes, these mother and neonate genomes had several unique genetic variations. A more detailed investigation of these unique variations is warranted, as mortality is not the typical outcome associated with perinatal transmission of HSV-1. Similarly, differences observed between isolates at the population level maybe indicative of the differences in host immune response between the mother and the baby, or a result of the difference in the niche from which these isolates were obtained. Together findings of this report support the findings from chapter 3 and provide further insight into HSV-1 transmission

152

Appendix C:

Genotypic and phenotypic diversity within the neonatal HSV- 2 population

Lisa N. Akhtar1, Christopher D. Bowen2, Daniel W. Renner2, Utsav Pandey2, Ashley N. Della Fera3, David W. Kimberlin4, Mark N. Prichard4, Richard J. Whitley4, Matthew D. Weitzman3,5*, Moriah L. Szpara2*

1 Department of Pediatrics, Division of Infectious Diseases, Children’s Hospital of Philadelphia and University of Pennsylvania Perelman School of Medicine 2 Department of Biochemistry and Molecular Biology, Center for Infectious Disease Dynamics, and the Huck Institutes of the Life Sciences, Pennsylvania State University 3 Division of Protective Immunity and Division of Cancer Pathobiology, Children’s Hospital of Philadelphia 4 Department of Pediatrics, Division of Infectious Diseases, University of Alabama at Birmingham 5 Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine

Adapted from: Neonatal HSV-2 population displays wide genotypic and phenotypic diversity between hosts. bioRxiv. 2018. DOI: 10.1101/262055

Acknowledgement: U.P. was responsible for analyzing the minor variants (SNPs and INDELs) present in the neonatal genomes. The results of the analyses are shown below. M.S., L.N., and M.W. conceptualized the work.

153

C.1 Abstract Neonates infected with herpes simplex virus (HSV) at the time of birth can have different courses of clinical disease. Approximately half of those infected display manifestations limited to the skin, eyes, or mouth (SEM disease, 45%). However, others develop invasive infections that spread systemically (disseminated, 25%) or to the central nervous system (CNS, 30%); both of which are associated with significant morbidity and mortality. The viral and/or host factors that predispose a neonate to these invasive forms of HSV infection are not known. To define the level of viral diversity within the neonatal population we evaluated ten HSV-2 isolates cultured from neonates with a range of clinical presentations. To assess viral fitness independent of host immune factors, we measured viral growth characteristics of each isolate in cultured cells. We found that HSV-2 isolates displayed diverse in vitro phenotypes. Isolates from neonates with CNS disease were associated with larger average plaque size and enhanced spread through culture, with isolates derived directly from the cerebrospinal fluid (CSF) exhibiting the most robust growth characteristics. We then sequenced the complete viral genomes of all ten neonatal HSV-2 isolates, providing the first insights into HSV genomic diversity in this clinical setting. We found extensive inter-host variation between isolates distributed throughout the HSV-2 genome. Furthermore, we assessed intra-host variation and found that each HSV-2 isolate contained minority variants, with two viral isolates containing ten-fold higher levels of allelic variation than other neonatal isolates or comparable adult isolates. HSV-2 glycoprotein G (gG, US4), gI (US7), gK (UL53), and viral proteins UL8, UL20, UL24, and US2 contained variants that were found only in neonatal isolates associated with CNS disease. Many of these genes encode viral proteins known to contribute to cell-to-cell spread and/or neurovirulence in mouse models of CNS disease. This study represents the first-ever application of comparative pathogen genomics to neonatal HSV disease.

154

C.2 Research summary Along with host-genetic factors, viral genetic factors are anticipated to affect the disease outcome in HSV-2 infection of neonates. Here we analyzed a set of 10 low- passage HSV-2 isolates collected from neonates with culture- or PCR-confirmed HSV infection, enrolled in three published clinical studies (382–384). These samples represented a wide range of clinical manifestations including SEM, CNS, and disseminated disease, and were associated with robust de-identified clinical information (Table C-1). Table C-1: Clinical characteristics associated with HSV-2 isolates from ten patients. Patient Age at time Gestational Clinical Morbidity Morbidity Age at of Skin Patient Clinical Sample Age at Time Disease at Score - Score - Disease Disease Sex & Isolate* Source of Birth Diagnosis Mental Motor Onset Recurrence Race (weeks) (days) (days)

CNS11 CNS CSF 4 4 12 NR** 37 M, W

DISS14 DISS + CNS CSF 2 2 7 39 39 M, W

CNS03 CNS SKIN 4 4 17 61 37 M, W

CNS15 CNS SKIN 3 4 19 40 36 M, W

CNS17 CNS SKIN 4 4 17 95 40 M, W

DISS29 DISS + CNS SKIN 3 3 5 49 38 F, B

CNS12 CNS SKIN 4 4 16 52 41 F, W

SEM02 SEM SKIN 1 1 5 52 38 F, W

SEM13 SEM SKIN 4 4 11 72 27 F, B

SEM18 SEM SKIN 2 2 17 78 37 F, W

We defined the level of diversity in this population using comparative genomics. We found extensive inter- and intra-host diversity distributed throughout the HSV-2 genome. Apart from the differences at the consensus level between these genomes we also assessed whether minor variants existed within the viral population of each

155 neonatal HSV-2 isolate. The significant depth of coverage from deep-sequencing of each isolate allowed us to screen for minority variants at every nucleotide position of each genome. We found minority variants in the viral genome population of all 10 neonatal HSV-2 isolates, albeit to a different degree in each isolate (Figure C-1). In total, there were 1,452 minority variants present at a frequency of at least 2% of reads at a given locus. These minority variants were distributed in all genomic regions: Unique long (UL): 46.3%; repeat long (IRL/TRL): 31.8%; unique short (US): 10.2%; and repeat short (IRS/TRS): 11.7%). Isolates CNS15 and DISS29 had 10-fold higher levels of minority variants than other neonatal isolates (Figure C-1A), and these variants were often present at a higher frequency or penetrance of the alternative allele than observed in other neonatal or adult HSV-2 isolates (Figure C-1C). For minority variants located in coding regions, 47% were synonymous and 51.5% were nonsynonymous (Figure C- 1B). We further examined the specific proteins impacted by these nonsynonymous minority variants (Figure C-1D), and found that nearly every HSV-2 protein harbored minority variants in at least one neonatal isolate. Only UL3, UL11, UL35, and UL55 were completely devoid of minority variants. Two of these genes (UL3 and UL11) were also devoid of AA variations at the consensus level. Overall, the amount of minority variants in these neonatal isolates is similar to what has previously been seen in adult HSV-2 samples, with the exception of the two isolates (CNS15 and DISS29) that had significantly more diverse viral populations. This study represents the first-ever application of comparative pathogen genomics to neonatal HSV disease and provides a basis for further exploration of genotype-phenotype links in this clinically vulnerable patient population.

156

Figure C-1: Minority variants present in neonatal HSV-2 genome populations. (A) Plot indicates the total number of minority variants (MV) observed in each neonatal isolate. DISS29 and CNS15 have 10-fold more minority variants than other neonatal strains, which is particularly noticeable for those MV that are located in coding, or genic, sequences. (B) Minority variants can be either single-nucleotide variants or polymorphisms (SNPs) or small insertions or deletions (in/dels). Minority variant SNPs are more common than in/dels. (C) Pie charts show the overall frequency or peentrance of each minority variant. DISS29 and CNS15 have many MV that exist at a high frequency or penetrance, while the penetrance of MV alleles in most other neonatal isolates is low. (D) Stacked histograms show the number of nonsynonymous MV located in each HSV2 coding sequence (gene). Color coding of stacked histogram bars is the same as shown in

157

Appendix D:

Forward genetics approaches for prediction of virulence loci in field isolates of Marek’s disease virus 1 (MDV-1)

Andrew S. Bell2, Utsav Pandey1, Daniel Renner1, David A. Kennedy2, Matthew J. Jones2, Moriah L. Szpara1,, Andrew F. Read2

1Department of Biochemistry and Molecular Biology, Center for Infectious Disease Dynamics, and the Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, 16802, USA

2 Center for Infectious Disease Dynamics, Departments of Biology and Entomology, Pennsylvania State University, University Park, Pennsylvania 16802, USA

Ongoing study

Acknowledgements: Viral DNA was obtained from collaborators at USDA. U.P. and A.B. prepared viral DNA libraries for sequencing. D.R. and U.P. assembled the full- length genomes. A.B. did the analyses. M.S. and A.F. conceptualized the work.

158

D.1 Abstract This study aims to determine the genetic basis of virulence using 64 field isolates of MDV-1. These isolates range in pathotypes from virulent (V), very virulent (VV), to very very virulent (VV+) and were collected in chicken farms across United States from 1962 to 2014. Pathotyping for the isolates was performed by our collaborators at USDA using the model described by RL Witter. We have deep-sequenced and obtained full-length genomes for all 64 isolates. A preliminary analysis of the genomes revealed single- nucleotide polymorphisms (SNPs) that distinguish ‘V’ isolates from very virulent plus ‘VV+’ isolates, but ‘VV’ isolates contained SNPs that were common to both ‘V’ and ‘VV+’ isolates.

159

D.2 Research summary Marek’s disease virus serotype 1 or Gallid herpesvirus type 2 (GaHV-2) is the causative agent of Marek’s disease (MD). It is an alphaherpesvirus of the genus Mardivirus, which also includes the closely related non-oncogenic Marek’s disease virus serotype 2 (MDV-2) and turkey herpesvirus serotype 1 (HVT-1) (145, 146). Highly virulent strains of MDV-1 can have mortality rates approaching 100% (148). Since the late 1960s, MD infections have been controlled via mass vaccination of one-day old chicks or chick embryos (113, 150). The severity of MD has risen in the last 40 years, and along with changes in farming practices widespread vaccination has been attributed as one of the main causes (148, 152, 153). Despite both clinical and laboratory data that demonstrate increased virulence in field isolates of MDV-1, the mechanism of MDV-1 evolution into more virulent forms over the years is not well understood. Understanding MDV-1 evolution in the field would give us an ability to predict future evolution of this pathogen and take precautionary measures to prevent outbreaks. Remarkably, our understanding of MDV-1 genomics and genetic variation comes exclusively from the study of 10 different strains (154, 154–161). Although specific genes and variations within a gene have been associated with MDV virulence, genetic markers of virulence in MDV remain elusive. Along with full-length genomes, genotype-phenotype association also requires a meticulous and standardized virulence grading method. Here we present findings from a comparative genomics study involving full-length genomes of 64 field isolates of MDV-1 (see chapter 4 for methods). These isolates were obtained from poultry farms across United States from 1962 to 2014 and pathotyped using a standardized virulence grading scheme outlined by Witter et al (153) into virulent (V), very virulent (VV), and very very virulent (VV+). Comparison of the full- length genomes of these isolates based on pathotype highlighted positions in MDV genome that correlated with all pathotypes (Table D-1). Through our preliminary analysis, we have identified SNPs in the MDV genome that can distinguish ‘V’ from ‘VV+’ isolates, however, we were unable to identify SNPs that reliably distinguish ‘VV’ isolates from ‘V’ and ‘VV+’ isolates.

160

Table D-1: SNPs identified across MDV-1 genomes that correlate with disease phenotypes of the isolates. Isolate name, location of isolation, isolation date, and pathotype are also shown. SNPs highlighted in orange are predominantly present in ‘VV’ and 'VV+’ isolates whereas SNPs highlighted in green are present predominantly in ‘V’ isolates.

161

Gene UL17 UL36 UL36 UL36 UL43 R-LORF8 14kD (Intergenic) Meq Meq Meq Meq ICP0 MDV091 Position in 29,292 67,157 73,080 78,712 92,268 134,065 134,723 135,319 135,387 135,407 135,690 141,686 172,283 the alignment Mutation Synonymous Synonymous Synonymous Synonymous Synonymous Synonymous Synonymous Non-synonymous Non-synonymous Synonymous Non-synonymous Synonymous Synonymous Year of Isolate Location Pathotype Very virulent plus (VV+) isolation 645 PA 1994 VV+ C A A C A T G A G G G G G 652 NY 1995 VV+ C A A C A T G A G G G G G 676 PA 1997 VV+ C A A C A T G A G G G G G 686 IA 1999 VV+ C A A C A T G A G G G G G 648B OH 1994 VV+ C A A C A T G A G G G G G 648A OH 1994 VV+ C A A C A T G A G G G G G 722A IA 2011 VV+ C A A C A T G A G G G G G 722B IA 2011 VV+ C A A C A T G A G G G G G 722C IA 2011 VV+ C A A C A T G A G G G G G 722D IA 2011 VV+ C A A C A T G A G G G G G 730A IA 2013 VV+ C A A C A T G A G G G G G 730B IA 2013 VV+ C A A C A T G A G G G G G 730C IA 2013 VV+ C A A C A T G A G G G G G 674 DE 1997 VV+ C A A C A T G A G G G G G 584A NC 1990 VV+ C A A C A T G A G G G G G 584B NC 1990 VV+ C A A C A T G A G G G G G 610A MD 1992 VV+ C A A C A T G A G G G G G 739A DE 2013 VV+ C A A C A T G A G G G G G 739C1 DE 2013 VV+ C A A C A T G A G G G G G 709A PA 2010 VV+ C A A C A T G A G G G G G 690 GA 1999 VV+ T A A C A T G A G G G G A 653A DE 1995 VV+ C G A C A C T A G G G A A Year of Isolate Location Pathotype Very virulent (VV) isolation 660A OH 1995 VV C A A C A T G A G G G G G 608 AR 1992 VV C A A C A T G A G G G G G 611 PA 1992 VV C A A C A T G A G G G G G 612 ME 1992 VV C A A C A T G A G G G G G 670 ME 1997 VV C A A C A T G A G G G G G 549A DE 1987 VV C A A C A T G A G G G G G 549AB DE 1987 VV C A A C A T G A G G G G G 568A NC 1988 VV C A A C A T G A G G G G G 568B NC 1988 VV C A A C A T G A G G G G G 701 PA 2007 VV C A A C A T G A G G G G G 718A PA 2011 VV C A A C A T G A G G G G G 718B PA 2011 VV C A A C A T G A G G G G G 685 GA 1997 VV T A A C A T G A G G G G A 691 GA 1999 VV T A A C A T G A G G G G A 643G NE 1994 VV T G G T A T G A G G G G G 643P NE 1994 VV T G G T A T G A G G G G G 610B MD 1992 VV T G A C A C T A G G G A A 615K DE 1993 VV T G A C A C T A G G G A A 656C VA 1995 VV T G A C A C T A G G G A A 723 PA 2011 VV T G A C A C T A G G G A A MD5 MD 1977 VV C G G T G C T C C A G A A 287L AL 1979 VV C G G T G C T C C A G A G MD11 IA 1990 VV T G G T G C T C C A G A G 583A MD 1977 VV T G G T A C T C C A G A A Year of Isolate Location Pathotype Virulent (V) isolation 596A WI 1991 V T G G T A C T C C A G A A 617A OH 1993 V T G G T A C T C C A G A A MD8 MD 1977 V T G G T G C T C C A G A A MD3 MD 1977 V T G G T G C T C C A G A A 232 MI 1978 V T G G T G C T C C A C A A 295 CO 1980 V T G G T G C T C C A C A A 571 CA 1989 V T G G T G C T C C A C A A 747-15 GA 2014 V T G G T G C T C C A C A A 747-22 GA 2014 V T G G T G C T C C A C A A 747C GA 2014 V T G G T G C T C C A C A A GA22 GA 1965 V T G G T G C T C C A C A A JM102W MA 1962 V T G G T G C T C C A C A A MIS-X GA 1980 V T G G T G C T C C A C A A MSU-2 MI 1980 V T G G T G C T C C A C A A RB1B NY 1982 V T G G T G C T C C A C A A RPL39 GA 1969 V T G G T G C T C C A C A A 709B (Rispens) PA 2010 V T G G T G C T C C A C A A CU2 Not known Not known V/M T G G T G C T C C A C A A 162 SNPs present predominantly in ‘VV’ and ‘VV+’ isolates SNPs present predominantly in ‘V’ isolates

References

1. Figlerowicz M, Alejska M, Kurzyńska-Kokorniak A, Figlerowicz M. 2003. Genetic variability: The key problem in the prevention and therapy of RNA-based virus infections. Med Res Rev 23:488–518. 2. Robertson BD, Meyer TF. 1992. Genetic variation in pathogenic bacteria. Trends Genet 8:422–427. 3. Morrison LJ, McLellan S, Sweeney L, Chan CN, MacLeod A, Tait A, Turner CMR. 2010. Role for Parasite Genetic Diversity in Differential Host Responses to Trypanosoma brucei Infection. Infect Immun 78:1096–1108. 4. Szpara ML, Gatherer D, Ochoa A, Greenbaum B, Dolan A, Bowden RJ, Enquist LW, Legendre M, Davison AJ. 2014. Evolution and diversity in human herpes simplex virus genomes. J Virol 88:1209–27. 5. Alberts B. 2002. Molecular biology of the cell, 4th ed. Garland Science, New York. 6. Behjati S, Tarpey PS. 2013. What is next generation sequencing? Arch Dis Child - Educ Pract 98:236–238. 7. Gire SK, Goba A, Andersen KG, Sealfon RSG, Park DJ, Kanneh L, Jalloh S, Momoh M, Fullah M, Dudas G, Wohl S, Moses LM, Yozwiak NL, Winnicki S, Matranga CB, Malboeuf CM, Qu J, Gladden AD, Schaffner SF, Yang X, Jiang P-P, Nekoui M, Colubri A, Coomber MR, Fonnie M, Moigboi A, Gbakie M, Kamara FK, Tucker V, Konuwa E, Saffa S, Sellu J, Jalloh AA, Kovoma A, Koninga J, Mustapha I, Kargbo K, Foday M, Yillah M, Kanneh F, Robert W, Massally JLB, Chapman SB, Bochicchio J, Murphy C, Nusbaum C, Young S, Birren BW, Grant DS, Scheiffelin JS, Lander ES, Happi C, Gevao SM, Gnirke A, Rambaut A, Garry RF, Khan SH, Sabeti PC. 2014. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science 345:1369–1372. 8. Harris SR, Cartwright EJ, Török ME, Holden MT, Brown NM, Ogilvy-Stuart AL, Ellington MJ, Quail MA, Bentley SD, Parkhill J, Peacock SJ. 2013. Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study. Lancet Infect Dis 13:130–136. 9. Metsky HC, Matranga CB, Wohl S, Schaffner SF, Freije CA, Winnicki SM, West K, Qu J, Baniecki ML, Gladden-Young A, Lin AE, Tomkins-Tinch CH, Ye SH, Park DJ, Luo CY, Barnes KG, Shah RR, Chak B, Barbosa-Lima G, Delatorre E, Vieira YR, Paul LM, Tan AL, Barcellona CM, Porcelli MC, Vasquez C, Cannons AC, Cone MR, Hogan KN, Kopp EW, Anzinger JJ, Garcia KF, Parham LA, Gélvez Ramírez RM, Miranda Montoya MC, Rojas DP, Brown CM, Hennigan S, Sabina B, Scotland S, Gangavarapu K, Grubaugh ND, Oliveira G, Robles-Sikisaka R, Rambaut A, Gehrke L, Smole S, Halloran ME, Villar L, Mattar S, Lorenzana I, Cerbino-Neto J, Valim C, Degrave W, Bozza PT, Gnirke A, Andersen KG, Isern S, Michael SF, Bozza FA, Souza TML, Bosch I, Yozwiak NL, MacInnis BL, Sabeti PC. 2017. Zika virus evolution and spread in the Americas. Nature 546:411–415. 10. Farhat MR, Shapiro BJ, Kieser KJ, Sultana R, Jacobson KR, Victor TC, Warren RM, Streicher EM, Calver A, Sloutsky A, Kaur D, Posey JE, Plikaytis B, Oggioni MR, Gardy JL, Johnston JC, Rodrigues M, Tang PKC, Kato-Maeda M, Borowsky ML, Muddukrishna B, Kreiswirth BN, Kurepina N, Galagan J, Gagneux S, Birren B, Rubin EJ, Lander ES, Sabeti PC, Murray M. 2013. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat Genet 45:1183. 11. Laabei M, Recker M, Rudkin JK, Aldeljawi M, Gulay Z, Sloan TJ, Williams P, Endres JL, Bayles KW, Fey PD, Yajjala VK, Widhelm T, Hawkins E, Lewis K, Parfett S, Scowen L,

163

Peacock SJ, Holden M, Wilson D, Read TD, Elsen J van den, Priest NK, Feil EJ, Hurst LD, Josefsson E, Massey RC. 2014. Predicting the virulence of MRSA from its genome sequence. Genome Res 24:839–849. 12. Sanger F, Nicklen S, Coulson AR. 1977. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci 74:5463–5467. 13. Prober JM, Trainor GL, Dam RJ, Hobbs FW, Robertson CW, Zagursky RJ, Cocuzza AJ, Jensen MA, Baumeister K. 1987. A system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides. Science 238:336–341. 14. LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SBH, Hood LE. 1986. Fluorescence detection in automated DNA sequence analysis. Nature 321:674. 15. Fleischmann RD, Adams MD, White O, Clayton RA, al et. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Sci Wash 269:496. 16. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H, Oliver SG. 1996. Life with 6000 Genes. Science 274:546–567. 17. Bult CJ, White O, Olsen GJ, Zhou L, Fleischmann RD, Sutton GG, Blake JA, FitzGerald LM, Clayton RA, Gocayne JD, Kerlavage AR, Dougherty BA, Tomb J-F, Adams MD, Reich CI, Overbeek R, Kirkness EF, Weinstock KG, Merrick JM, Glodek A, Scott JL, Geoghagen NSM, Weidman JF, Fuhrmann JL, Nguyen D, Utterback TR, Kelley JM, Peterson JD, Sadow PW, Hanna MC, Cotton MD, Roberts KM, Hurst MA, Kaine BP, Borodovsky M, Klenk H-P, Fraser CM, Smith HO, Woese CR, Venter JC. 1996. Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii. Science 273:1058–1073. 18. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange- Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C,

164

Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J, International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921. 19. Reuter JA, Spacek DV, Snyder MP. 2015. High-Throughput Sequencing Technologies. Mol Cell 58:586–597. 20. Lipkin WI. 2013. The changing face of pathogen discovery and surveillance. Nat Rev Microbiol 11:133. 21. Handelsman J. 2004. Metagenomics: Application of Genomics to Uncultured Microorganisms. Microbiol Mol Biol Rev 68:669–685. 22. Consortium THMP, Methé BA, Nelson KE, Pop M, Creasy HH, Giglio MG, Huttenhower C, Gevers D, Petrosino JF, Abubucker S, Badger JH, Chinwalla AT, Earl AM, FitzGerald MG, Fulton RS, Hallsworth-Pepin K, Lobos EA, Madupu R, Magrini V, Martin JC, Mitreva M, Muzny DM, Sodergren EJ, Versalovic J, Wollam AM, Worley KC, Wortman JR, Young SK, Zeng Q, Aagaard KM, Abolude OO, Allen-Vercoe E, Alm EJ, Alvarado L, Andersen GL, Anderson S, Appelbaum E, Arachchi HM, Armitage G, Arze CA, Ayvaz T, Baker CC, Begg L, Belachew T, Bhonagiri V, Bihan M, Blaser MJ, Bloom T, Bonazzi VR, Brooks P, Buck GA, Buhay CJ, Busam DA, Campbell JL, Canon SR, Cantarel BL, Chain PS, Chen I-MA, Chen L, Chhibba S, Chu K, Ciulla DM, Clemente JC, Clifton SW, Conlan S, Crabtree J, Cutting MA, Davidovics NJ, Davis CC, DeSantis TZ, Deal C, Delehaunty KD, Dewhirst FE, Deych E, Ding Y, Dooling DJ, Dugan SP, Jr WMD, Durkin AS, Edgar RC, Erlich RL, Farmer CN, Farrell RM, Faust K, Feldgarden M, Felix VM, Fisher S, Fodor AA, Forney L, Foster L, Francesco VD, Friedman J, Friedrich DC, Fronick CC, Fulton LL, Gao H, Garcia N, Giannoukos G, Giblin C, Giovanni MY, Goldberg JM, Goll J, Gonzalez A, Griggs A, Gujja S, Haas BJ, Hamilton HA, Harris EL, Hepburn TA, Herter B, Hoffmann DE, Holder ME, Howarth C, Huang KH, Huse SM, Izard J, Jansson JK, Jiang H, Jordan C, Joshi V, Katancik JA, Keitel WA, Kelley ST, Kells C, Kinder-Haake S, King NB, Knight R, Knights D, Kong HH, Koren O, Koren S, Kota KC, Kovar CL, Kyrpides NC, Rosa PSL, Lee SL, Lemon KP, Lennon N, Lewis CM, Lewis L, Ley RE, Li K, Liolios K, Liu B, Liu Y, Lo C-C, Lozupone CA, Lunsford RD, Madden T, Mahurkar AA, Mannon PJ, Mardis ER, Markowitz VM, Mavrommatis K, McCorrison JM, McDonald D, McEwen J, McGuire AL, McInnes P, Mehta T, Mihindukulasuriya KA, Miller JR, Minx PJ, Newsham I, Nusbaum C, O’Laughlin M, Orvis J, Pagani I, Palaniappan K, Patel SM, Pearson M, Peterson J, Podar M, Pohl C, Pollard KS, Priest ME, Proctor LM, Qin X, Raes J, Ravel J, Reid JG, Rho M, Rhodes R, Riehle KP, Rivera MC, Rodriguez-Mueller B, Rogers Y-H, Ross MC, Russ C, Sanka RK, Sankar P, Sathirapongsasuti JF, Schloss JA, Schloss PD, TM, Scholz M, Schriml L, Schubert AM, Segata N, Segre JA, Shannon WD, Sharp RR, Sharpton TJ, Shenoy N, Sheth NU, Simone GA, Singh I, Smillie CS, Sobel JD, Sommer DD, Spicer P, Sutton GG, Sykes SM, Tabbaa DG, Thiagarajan M, Tomlinson CM, Torralba M, Treangen TJ, Truty RM, Vishnivetskaya TA, Walker J, Wang L, Wang Z, Ward DV, Warren W, Watson MA, Wellington C, Wetterstrand KA, White JR, Wilczek-Boney K, Wu YQ, Wylie KM, Wylie T, Yandava C, Ye L, Ye Y, Yooseph S, Youmans BP, Zhang L, Zhou Y, Zhu Y, Zoloth L, Zucker JD, Birren BW, Gibbs RA, Highlander SK, Weinstock GM, Wilson RK, White O. 2012. A framework for human microbiome research. Nature 486:215. 23. Yatsunenko T, Rey FE, Manary MJ, Trehan I, Dominguez-Bello MG, Contreras M, Magris M, Hidalgo G, Baldassano RN, Anokhin AP, Heath AC, Warner B, Reeder J, Kuczynski J,

165

Caporaso JG, Lozupone CA, Lauber C, Clemente JC, Knights D, Knight R, Gordon JI. 2012. Human gut microbiome viewed across age and geography. Nature 486:222. 24. Peterson DA, Turnbaugh PJ. 2010. A microbe-dependent viral key to Crohn’s box. Sci Transl Med 2:1–5. 25. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. 2006. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444:1027. 26. Buford TW. 2017. (Dis)Trust your gut: the gut microbiome in age-related inflammation, health, and disease. Microbiome 5:80. 27. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J. 2017. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet 101:5– 22. 28. Chen PE, Shapiro BJ. 2015. The advent of genome-wide association studies for bacteria. Curr Opin Microbiol 25:17–24. 29. Farhat MR, Shapiro BJ, Sheppard SK, Colijn C, Murray M. 2014. A phylogeny-based sampling strategy and power calculator informs genome-wide associations study design for microbial pathogens. Genome Med 6:101. 30. Deurenberg RH, Bathoorn E, Chlebowicz MA, Couto N, Ferdous M, García-Cobos S, Kooistra-Smid AMD, Raangs EC, Rosema S, Veloo ACM, Zhou K, Friedrich AW, Rossen JWA. 2017. Application of next generation sequencing in clinical microbiology and infection prevention. J Biotechnol 243:16–24. 31. MacCannell D. 2016. Next Generation Sequencing in Clinical and Pubic Health Microbiology. Clin Microbiol Newsl 38:169–176. 32. Schlaberg R, Chiu CY, Miller S, Procop GW, Weinstock G. 2017. Validation of Metagenomic Next-Generation Sequencing Tests for Universal Pathogen Detection. Arch Pathol Lab Med 141:776–786. 33. Goldberg B, Sichtig H, Geyer C, Ledeboer N, Weinstock GM. 2015. Making the Leap from Research Laboratory to Clinic: Challenges and Opportunities for Next-Generation Sequencing in Infectious Disease Diagnostics. mBio 6:e01888-15. 34. Berg MG, Lee D, Coller K, Frankel M, Aronsohn A, Cheng K, Forberg K, Marcinkus M, Naccache SN, Dawson G, Brennan C, Jensen DM, Jr JH, Chiu CY. 2015. Discovery of a Novel Human Pegivirus in Blood Associated with Hepatitis C Virus Co-Infection. PLOS Pathog 11:e1005325. 35. Somasekar S, Lee D, Rule J, Naccache SN, Stone M, Busch MP, Sanders C, Lee WM, Chiu CY. 2017. Viral Surveillance in Serum Samples From Patients With Acute Liver Failure By Metagenomic Next-Generation Sequencing. Clin Infect Dis 65:1477–1485. 36. Graf EH, Simmon KE, Tardif KD, Hymas W, Flygare S, Eilbeck K, Yandell M, Schlaberg R. 2016. Unbiased Detection of Respiratory Viruses by Use of RNA Sequencing-Based Metagenomics: a Systematic Comparison to a Commercial PCR Panel. J Clin Microbiol 54:1000–1007. 37. Chiu CY, Coffey LL, Murkey J, Symmes K, Sample HA, Wilson MR, Naccache SN, Arevalo S, Somasekar S, Federman S, Stryke D, Vespa P, Schiller G, Messenger S, Humphries R, Miller S, Klausner JD. 2017. Diagnosis of Fatal Human Case of St. Louis Encephalitis Virus Infection by Metagenomic Sequencing, California, 2016. Emerg Infect Dis 23:1964–1968. 38. Wilson MR, Naccache SN, Samayoa E, Biagtan M, Bashir H, Yu G, Salamat SM, Somasekar S, Federman S, Miller S, Sokolic R, Garabedian E, Candotti F, Buckley RH, Reed KD, Meyer TL, Seroogy CM, Galloway R, Henderson SL, Gern JE, DeRisi JL, Chiu CY. 2014. Actionable Diagnosis of Neuroleptospirosis by Next-Generation Sequencing. N Engl J Med 370:2408–2417.

166

39. Naccache SN, Peggs KS, Mattes FM, Phadke R, Garson JA, Grant P, Samayoa E, Federman S, Miller S, Lunn MP, Gant V, Chiu CY. 2015. Diagnosis of Neuroinvasive Astrovirus Infection in an Immunocompromised Adult With Encephalitis by Unbiased Next-Generation Sequencing. Clin Infect Dis 60:919–923. 40. Kwong JC, Mccallum N, Sintchenko V, Howden BP. 2015. Whole genome sequencing in clinical and public health microbiology. Pathology (Phila) 47:199–210. 41. Köser CU, Bryant JM, Becq J, Török ME, Ellington MJ, Marti-Renom MA, Carmichael AJ, Parkhill J, Smith GP, Peacock SJ. 2013. Whole-Genome Sequencing for Rapid Susceptibility Testing of M. tuberculosis. N Engl J Med 369:290–292. 42. Köser CU, Holden MTG, Ellington MJ, Cartwright EJP, Brown NM, Ogilvy-Stuart AL, Hsu LY, Chewapreecha C, Croucher NJ, Harris SR, Sanders M, Enright MC, Dougan G, Bentley SD, Parkhill J, Fraser LJ, Betley JR, Schulz-Trieglaff OB, Smith GP, Peacock SJ. 2012. Rapid Whole-Genome Sequencing for Investigation of a Neonatal MRSA Outbreak. N Engl J Med 366:2267–2275. 43. Singhal N, Kumar M, Kanaujia PK, Virdi JS. 2015. MALDI-TOF mass spectrometry: an emerging technology for microbial identification and diagnosis. Front Microbiol 6. 44. Stoesser N, Batty EM, Eyre DW, Morgan M, Wyllie DH, Del Ojo Elias C, Johnson JR, Walker AS, Peto TEA, Crook DW. 2013. Predicting antimicrobial susceptibilities for Escherichia coli and Klebsiella pneumoniae isolates using whole genomic sequence data. J Antimicrob Chemother 68:2234–2244. 45. Lieberman TD, Michel J-B, Aingaran M, Potter-Bynoe G, Roux D, Jr MRD, Skurnik D, Leiby N, LiPuma JJ, Goldberg JB, McAdam AJ, Priebe GP, Kishony R. 2011. Parallel bacterial evolution within multiple patients identifies candidate pathogenicity genes. Nat Genet 43:1275. 46. Mutreja A, Kim DW, Thomson NR, Connor TR, Lee JH, Kariuki S, Croucher NJ, Choi SY, Harris SR, Lebens M, Niyogi SK, Kim EJ, Ramamurthy T, Chun J, Wood JLN, Clemens JD, Czerkinsky C, Nair GB, Holmgren J, Parkhill J, Dougan G. 2011. Evidence for several waves of global transmission in the seventh cholera pandemic. Nature 477:462. 47. McAdam PR, Holmes A, Templeton KE, Fitzgerald JR. 2011. Adaptive Evolution of Staphylococcus aureus during Chronic Endobronchial Infection of a Cystic Fibrosis Patient. PLOS ONE 6:e24301. 48. Houldcroft CJ, Bryant JM, Depledge DP, Margetts BK, Simmonds J, Nicolaou S, Tutill HJ, Williams R, Worth AJJ, Marks SD, Veys P, Whittaker E, Breuer J. 2016. Detection of Low Frequency Multi-Drug Resistance and Novel Putative Maribavir Resistance in Immunocompromised Pediatric Patients with Cytomegalovirus. Front Microbiol 7. 49. Khudyakov Y. 2012. Molecular surveillance of hepatitis C. Antivir Ther 17:1465–1470. 50. Kim JH, Park YK, Park E-S, Kim K-H. 2014. Molecular diagnosis and treatment of drug- resistant hepatitis B virus. World J Gastroenterol 20:5708–5720. 51. McGinnis J, Laplante J, Shudt M, George KS. 2016. Next generation sequencing for whole genome analysis and surveillance of influenza A viruses. J Clin Virol 79:44–50. 52. Chabria SB, Gupta S, Kozal MJ. 2014. Deep Sequencing of HIV: Clinical and Research Applications. Annu Rev Genomics Hum Genet 15:295–325. 53. Houldcroft CJ, Beale MA, Breuer J. 2017. Clinical and biological insights from viral genome sequencing. Nat Rev Microbiol 15:183–192. 54. Lefterova MI, Suarez CJ, Banaei N, Pinsky BA. 2015. Next-Generation Sequencing for Infectious Disease Diagnosis and Management: A Report of the Association for Molecular Pathology. J Mol Diagn 17:623–634. 55. Gwinn M, MacCannell DR, Khabbaz RF. 2017. Integrating Advanced Molecular Technologies into Public Health. J Clin Microbiol 55:703–714. 56. Andersen KG, Shapiro BJ, Matranga CB, Sealfon R, Lin AE, Moses LM, Folarin OA, Goba A, Odia I, Ehiane PE, Momoh M, England EM, Winnicki S, Branco LM, Gire SK, Phelan E,

167

Tariyal R, Tewhey R, Omoniwa O, Fullah M, Fonnie R, Fonnie M, Kanneh L, Jalloh S, Gbakie M, Saffa S, Karbo K, Gladden AD, Qu J, Stremlau M, Nekoui M, Finucane HK, Tabrizi S, Vitti JJ, Birren B, Fitzgerald M, McCowan C, Ireland A, Berlin AM, Bochicchio J, Tazon-Vega B, Lennon NJ, Ryan EM, Bjornson Z, Milner DA, Lukens AK, Broodie N, Rowland M, Heinrich M, Akdag M, Schieffelin JS, Levy D, Akpan H, Bausch DG, Rubins K, McCormick JB, Lander ES, Günther S, Hensley L, Okogbenin S, Schaffner SF, Okokhere PO, Khan SH, Grant DS, Akpede GO, Asogun DA, Gnirke A, Levin JZ, Happi CT, Garry RF, Sabeti PC. 2015. Clinical Sequencing Uncovers Origins and Evolution of Lassa Virus. Cell 162:738–750. 57. Buseh AG, Stevens PE, Bromberg M, Kelber ST. 2015. The Ebola epidemic in West Africa: Challenges, opportunities, and policy priority areas. Nurs Outlook 63:30–40. 58. Fauci AS, Morens DM. 2016. Zika Virus in the Americas — Yet Another Arbovirus Threat. N Engl J Med 374:601–604. 59. Faria NR, Quick J, Claro IM, Thézé J, Jesus JG de, Giovanetti M, Kraemer MUG, Hill SC, Black A, Costa AC da, Franco LC, Silva SP, Wu C-H, Raghwani J, Cauchemez S, Plessis L du, Verotti MP, Oliveira WK de, Carmo EH, Coelho GE, Santelli ACFS, Vinhal LC, Henriques CM, Simpson JT, Loose M, Andersen KG, Grubaugh ND, Somasekar S, Chiu CY, Muñoz-Medina JE, Gonzalez-Bonilla CR, Arias CF, Lewis-Ximenez LL, Baylis SA, Chieppe AO, Aguiar SF, Fernandes CA, Lemos PS, Nascimento BLS, Monteiro H a. O, Siqueira IC, Queiroz MG de, Souza TR de, Bezerra JF, Lemos MR, Pereira GF, Loudal D, Moura LC, Dhalia R, França RF, Magalhães T, Jr ETM, Jaenisch T, Wallau GL, Lima MC de, Nascimento V, Cerqueira EM de, Lima MM de, Mascarenhas DL, Neto JPM, Levin AS, Tozetto-Mendoza TR, Fonseca SN, Mendes-Correa MC, Milagres FP, Segurado A, Holmes EC, Rambaut A, Bedford T, Nunes MRT, Sabino EC, Alcantara LCJ, Loman NJ, Pybus OG. 2017. Establishment and cryptic transmission of Zika virus in Brazil and the Americas. Nature 546:406. 60. Naccache SN, Thézé J, Sardi SI, Somasekar S, Greninger AL, Bandeira AC, Campos GS, Tauro LB, Faria NR, Pybus OG, Chiu CY. 2016. Distinct Zika Virus Lineage in Salvador, Bahia, Brazil. Emerg Infect Dis 22:1788–1792. 61. Sardi SI, Somasekar S, Naccache SN, Bandeira AC, Tauro LB, Campos GS, Chiu CY. 2016. Coinfections of Zika and Chikungunya Viruses in Bahia, Brazil, Identified by Metagenomic Next-Generation Sequencing. J Clin Microbiol 54:2348–2353. 62. Chin C-S, Sorenson J, Harris JB, Robins WP, Charles RC, Jean-Charles RR, Bullard J, Webster DR, Kasarskis A, Peluso P, Paxinos EE, Yamaichi Y, Calderwood SB, Mekalanos JJ, Schadt EE, Waldor MK. 2011. The Origin of the Haitian Cholera Outbreak Strain. N Engl J Med 364:33–42. 63. Community Outbreak of HIV Infection Linked to Injection Drug Use of Oxymorphone — Indiana, 2015. 64. First Confirmed Cases of Middle East Respiratory Syndrome Coronavirus (MERS-CoV) Infection in the United States, Updated Information on the Epidemiology of MERS-CoV Infection, and Guidance for the Public, Clinicians, and Public Health Authorities — May 2014. 65. Tagini F, Aubert B, Troillet N, Pillonel T, Praz G, Crisinel PA, Prod’hom G, Asner S, Greub G. 2017. Importance of whole genome sequencing for the assessment of outbreaks in diagnostic laboratories: analysis of a case series of invasive Streptococcus pyogenes infections. Eur J Clin Microbiol Infect Dis 36:1173–1180. 66. Miller RR, Montoya V, Gardy JL, Patrick DM, Tang P. 2013. Metagenomics for pathogen detection in public health. Genome Med 5:81. 67. CDC Earmarks $2.3M for NGS, Bioinformatic Approaches to Combat Infectious Disease. GenomeWeb.

168

68. Azarian T, Cook RL, Johnson JA, Guzman N, McCarter YS, Gomez N, Rathore MH, Morris JG, Salemi M. 2015. Whole-Genome Sequencing for Outbreak Investigations of Methicillin-Resistant Staphylococcus aureus in the Neonatal Intensive Care Unit: Time for Routine Practice? Infect Control Amp Hosp Epidemiol 36:777–785. 69. Rasko DA, Webster DR, Sahl JW, Bashir A, Boisen N, Scheutz F, Paxinos EE, Sebra R, Chin C-S, Iliopoulos D, Klammer A, Peluso P, Lee L, Kislyuk AO, Bullard J, Kasarskis A, Wang S, Eid J, Rank D, Redman JC, Steyert SR, Frimodt-Møller J, Struve C, Petersen AM, Krogfelt KA, Nataro JP, Schadt EE, Waldor MK. 2011. Origins of the E. coli Strain Causing an Outbreak of Hemolytic–Uremic Syndrome in Germany. N Engl J Med 365:709–717. 70. Depledge DP, Brown J, Macanovic J, Underhill G, Breuer J. 2016. Viral Genome Sequencing Proves Nosocomial Transmission of Fatal Varicella. J Infect Dis 214:1399– 1402. 71. Mulcahy-O’Grady H, Workentine ML. 2016. The Challenge and Potential of Metagenomics in the Clinic. Front Immunol 7. 72. Naccache SN, Federman S, Veeeraraghavan N, Zaharia M, Lee D, Samayoa E, Bouquet J, Greninger AL, Luk K-C, Enge B, Wadford DA, Messenger SL, Genrich GL, Pellegrino K, Grard G, Leroy E, Schneider BS, Fair JN, Martínez MA, Isa P, Crump JA, DeRisi JL, Sittler T, Hackett J, Miller S, Chiu CY. 2014. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res. 73. Oliver GR, Hart SN, Klee EW. 2015. Bioinformatics for Clinical Next Generation Sequencing. Clin Chem 61:124–135. 74. MacConaill L, Meyerson M. 2008. Adding pathogens by genomic subtraction. Nat Genet 40:380. 75. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local alignment search tool. J Mol Biol 215:403–410. 76. Deneke C, Rentzsch R, Renard BY. 2017. PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data. Sci Rep 7:39194. 77. Bradley P, Gordon NC, Walker TM, Dunn L, Heys S, Huang B, Earle S, Pankhurst LJ, Anson L, Cesare M de, Piazza P, Votintseva AA, Golubchik T, Wilson DJ, Wyllie DH, Diel R, Niemann S, Feuerriegel S, Kohl TA, Ismail N, Omar SV, Smith EG, Buck D, McVean G, Walker AS, Peto TEA, Crook DW, Iqbal Z. 2015. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat Commun 6:10063. 78. Quick J, Grubaugh ND, Pullan ST, Claro IM, Smith AD, Gangavarapu K, Oliveira G, Robles-Sikisaka R, Rogers TF, Beutler NA, Burton DR, Lewis-Ximenez LL, Jesus JG de, Giovanetti M, Hill SC, Black A, Bedford T, Carroll MW, Nunes M, Jr LCA, Sabino EC, Baylis SA, Faria NR, Loose M, Simpson JT, Pybus OG, Andersen KG, Loman NJ. 2017. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat Protoc 12:1261. 79. Rhoads A, Au KF. 2015. PacBio Sequencing and Its Applications. Genomics Proteomics Bioinformatics 13:278–289. 80. Griffiths AJ, Miller JH, Suzuki DT, Lewontin RC, Gelbart WM. 2000. Reverse genetics. 81. Prichard MN, Gao N, Jairath S, Mulamba G, Krosky P, Coen DM, Parker BO, Pari GS. 1999. A recombinant human cytomegalovirus with a large deletion in UL97 has a severe replication deficiency. J Virol 73:5663–70. 82. Tischer BK, von Einem J, Kaufer B, Osterrieder N. 2006. Two-step red-mediated recombination for versatile high-efficiency markerless DNA manipulation in Escherichia coli. Biotechniques 40:191–7.

169

83. Goodman LB, Loregian A, Perkins G a, Nugent J, Buckles EL, Mercorelli B, Kydd JH, Palu G, Smith KC, Osterrieder N, Davis-Poynter N. 2007. A point mutation in a herpesvirus polymerase determines neuropathogenicity. PLoS Pathog 3:1583–1592. 84. Ye Q, Li X-F, Zhao H, Li S-H, Deng Y-Q, Cao R-Y, Song K-Y, Wang H-J, Hua R-H, Yu Y- X, Zhou X, Qin E-D, Qin C-F. 2012. A single nucleotide mutation in NS2A of Japanese encephalitis-live vaccine virus (SA14-14-2) ablates NS1’ formation and contributes to attenuation. J Gen Virol 93:1959–1964. 85. Urbanowicz RA, McClure CP, Sakuntabhai A, Sall AA, Kobinger G, Müller MA, Holmes EC, Rey FA, Simon-Loriere E, Ball JK. 2016. Human Adaptation of Ebola Virus during the West African Outbreak. Cell 167:1079–1087.e5. 86. Diehl WE, Lin AE, Grubaugh ND, Carvalho LM, Kim K, Kyawe PP, McCauley SM, Donnard E, Kucukural A, McDonel P, Schaffner SF, Garber M, Rambaut A, Andersen KG, Sabeti PC, Luban J. 2016. Ebola Virus Glycoprotein with Increased Infectivity Dominated the 2013–2016 Epidemic. Cell 167:1088–1098.e6. 87. Sun S, Selmer M, Andersson DI. 2014. Resistance to β-Lactam Antibiotics Conferred by Point Mutations in Penicillin-Binding Proteins PBP3, PBP4 and PBP6 in Salmonella enterica. PLOS ONE 9:e97202. 88. Gorgani N, Ahlbrand S, Patterson A, Pourmand N. 2009. Detection of point mutations associated with antibiotic resistance in Pseudomonas aeruginosa. Int J Antimicrob Agents 34:414–418. 89. Moresco EMY, Li X, Beutler B. 2013. Going Forward with Genetics. Am J Pathol 182:1462–1473. 90. Almasy L, Blangero J. 2009. Human QTL Linkage Mapping. Genetica 136:333–340. 91. Doumayrou J, Thébaud G, Vuillaume F, Peterschmitt M, Urbino C. 2015. Mapping genetic determinants of viral traits with FST and quantitative trait locus (QTL) approaches. Virology 484:346–353. 92. Alonso-Blanco C, Koornneef M, Ooijen JW van. 2006. QTL Analysis, p. 79–99. In Arabidopsis Protocols. Humana Press. 93. Sonah H, O’Donoughue L, Cober E, Rajcan I, Belzile F. 2015. Identification of loci governing eight agronomic traits using a GBS-GWAS approach and validation by QTL mapping in soya bean. Plant Biotechnol J 13:211–221. 94. Sanjuán R, Domingo-Calap P. 2016. Mechanisms of viral mutation. Cell Mol Life Sci 73:4433–4448. 95. Holland J, Spindler K, Horodyski F, Grabau E, Nichol S, VandePol S. 1982. Rapid evolution of RNA genomes. Science 215:1577–1585. 96. Andino R, Domingo E. 2015. Viral quasispecies. Virology 479–480C:46–51. 97. Sanjuan R, Nebot MR, Chirico N, Mansky LM, Belshaw R. 2010. Viral Mutation Rates. J Virol 84:9733–9748. 98. Steinhauer DA, Domingo E, Holland JJ. 1992. Lack of evidence for proofreading mechanisms associated with an RNA virus polymerase. Gene 122:281–288. 99. Roberts JD, Bebenek K, Kunkel TA. 1988. The Accuracy of Reverse Transcriptase from HIV-1. Sci Wash 242:1171. 100. Urbaniak K, Markowska-Daniel I. 2014. In vivo reassortment of influenza viruses. Acta Biochim Pol 61:427–431. 101. Biek R, Pybus OG, Lloyd-Smith JO, Didelot X. 2015. Measurably evolving pathogens in the genomic era. Trends Ecol Evol 30:306–313. 102. Renzette N, Gibson L, Jensen JD, TF. 2014. Human cytomegalovirus intrahost evolution-a new avenue for understanding and controlling herpesvirus infections. Curr Opin Virol 8:109–115.

170

103. Bradley AJ, Lurain NS, Ghazal P, Trivedi U, Cunningham C, Baluchova K, Gatherer D, Wilkinson GWG, Dargan DJ, Davison AJ. 2009. High-throughput sequence analysis of variants of human cytomegalovirus strains Towne and AD169. J Gen Virol 90:2375–2380. 104. Gorzer I, Guelly C, Trajanoski S, Puchhammer-Stockl E. 2010. Deep sequencing reveals highly complex dynamics of human cytomegalovirus genotypes in transplant patients over time. J Virol 84:7195–7203. 105. Legendre M, Santini S, Rico A, Abergel C, Claverie J-M. 2011. Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing. Virol J 8:99. 106. Shackelton LA, Parrish CR, Truyen U, Holmes EC. 2005. High rate of viral evolution associated with the emergence of carnivore parvovirus. Proc Natl Acad Sci U S A 102:379–384. 107. Shackelton LA, Holmes EC. 2006. Phylogenetic Evidence for the Rapid Evolution of Human B19 Erythrovirus. J Virol 80:3666–3669. 108. Michaud V, Randriamparany T, Albina E. 2013. Comprehensive Phylogenetic Reconstructions of African Swine Fever Virus: Proposal for a New Classification and Molecular Dating of the Virus. PLoS ONE 8:e69662. 109. Knipe DM, Howley P. 2013. Fields Virology. Lippincott Williams & Wilkins. 110. Davison AJ. 2002. Evolution of the herpesviruses. Vet Microbiol 86:69–88. 111. Szpara ML, Tafuri YR, Parsons L, Shamim SR, Verstrepen KJ, Legendre M, Enquist LW. 2011. A wide extent of inter-strain diversity in virulent and vaccine strains of alphaherpesviruses. PLoS Pathog 7:1–23. 112. Davison AJ, Eberle R, Ehlers B, Hayward GS, McGeoch DJ, Minson AC, Pellett PE, Roizman B, Studdert MJ, Thiry E. 2009. The order Herpesvirales. Arch Virol 154:171– 177. 113. Spatz SJ, Rue CA. 2008. Sequence determination of a mildly virulent strain (CU-2) of Gallid herpesvirus type 2 using 454 pyrosequencing. Virus Genes 36:479–489. 114. Papageorgiou KV, Suárez NM, Wilkie GS, McDonald M, Graham EM, Davison AJ. 2016. Genome Sequence of Canine Herpesvirus. PloS One 11:e0156015. 115. McGeoch DJ, Davison AJ, Dolan A, Gatherer D, Sevilla-Reyes EE. 2008. Molecular Evolution of the Herpesvirales, p. 447–475. In Origin and Evolution of Viruses. Elsevier. 116. Baer R, Bankier AT, Biggin MD, Deininger PL, Farrell PJ, Gibson TJ, Hatfull G, Hudson GS, Satchwell SC, Séguin C, Tuffnell PS, Barrell BG. 1984. DNA sequence and expression of the B95-8 Epstein—Barr virus genome. Nature 310:207. 117. McGeoch DJ, Dolan A, Donald S, Street C. 1986. Complete DNA sequence of the short repeat region in the genome of herpes simplex virus type 1. Nucleic Acids Res 14:1727– 1746. 118. Davison AJ, Scott JE. 1986. The complete DNA sequence of varicella-zoster virus. J Gen Virol 67 ( Pt 9):1759–1816. 119. Szpara ML, Parsons L, Enquist LW. 2010. Sequence variability in clinical and laboratory isolates of herpes simplex virus 1 reveals new mutations. J Virol 84:5303–13. 120. Renzette N, Bhattacharjee B, Jensen JD, Gibson L, Kowalik TF. 2011. Extensive genome-wide variability of human cytomegalovirus in congenitally infected infants. PLoS Pathog 7:e1001344. 121. Liu P, Fang X, Feng Z, Guo Y-M, Peng R-J, Liu T, Huang Z, Feng Y, Sun X, Xiong Z, Guo X, Pang S-S, Wang B, Lv X, Feng F-T, Li D-J, Chen L-Z, Feng Q-S, Huang W-L, Zeng M- S, Bei J-X, Zhang Y, Zeng Y-X. 2011. Direct sequencing and characterization of a clinical isolate of Epstein-Barr virus from nasopharyngeal carcinoma tissue by using next- generation sequencing technology. J Virol 85:11291–11299. 122. Arias C, Weisburd B, Stern-Ginossar N, Mercier A, Madrid AS, Bellare P, Holdorf M, Weissman JS, Ganem D. 2014. KSHV 2.0: a comprehensive annotation of the Kaposi’s

171

sarcoma-associated herpesvirus genome using next-generation sequencing reveals novel genomic and functional features. PLoS Pathog 10:e1003847. 123. Drake JW, Hwang CBC. 2005. On the mutation rate of herpes simplex virus type 1. Genetics 170:969–70. 124. Lu Q, Hwang YT, Hwang CBC. 2002. Mutation Spectra of Herpes Simplex Virus Type 1 Thymidine Kinase Mutants. J Virol 76:5822–5828. 125. Pandey U, Renner DW, Thompson RL, Szpara ML, Sawtell NM. 2017. Inferred father-to- son transmission of herpes simplex virus results in near-perfect preservation of viral genome identity and in vivo phenotypes. Sci Rep 7:13666. 126. Pandey U, Bell A, Renner DW, Shreve JT, Read AF, Szpara ML. 2015. DNA from dust: the first field-isolated genomes of MDV-1, from virions in poultry dust and chicken feather folliclesAbstract, International Herpesvirus Workshop. Boise, ID. 127. Bowden R, Sakaoka H, Donnelly P, Ward R. 2004. High recombination rate in herpes simplex virus type 1 natural populations suggests significant co-infection. Infect Genet Evol 4:115–23. 128. Brown SM, Ritchie DA. 1975. Genetic studies with herpes simplex virus type 1. Analysis of mixed plaque-forming virus and its bearing on genetic recombination. Virology 64:32– 42. 129. Ben-Porat T, Deatly A, Veach RA, Blankenship ML. 1984. Equalization of the inverted repeat sequences of the pseudorabies virus genome by intermolecular recombination. Virology 132:303–14. 130. Burrel S, Boutolleau D, Ryu D, Agut H, Merkel K, Leendertz FH, Calvignac-Spencer S. 2017. Ancient Recombination Events between Human Herpes Simplex Viruses. Mol Biol Evol 34:1713–1721. 131. Javier RT, Sedarati F, Stevens JG. 1986. Two Avirulent Herpes Simplex Viruses Generate Lethal Recombinants in Vivo. Science 234:746–748. 132. Thiry E, Meurens F, Muylkens B, McVoy M, Gogev S, Thiry J, Vanderplasschen A, Epstein A, Keil G, Schynts F. 2005. Recombination in alphaherpesviruses. Rev Med Virol 15:89–103. 133. Muylkens B, Farnir F, Meurens F, Schynts F, Vanderplasschen A, Georges M, Thiry E. 2009. Coinfection with two closely related alphaherpesviruses results in a highly diversified recombination mosaic displaying negative genetic interference. J Virol 83:3127–3137. 134. Lassalle F, Depledge DP, Reeves MB, Brown AC, Christiansen MT, Tutill HJ, Williams RJ, Einer-Jensen K, Holdstock J, Atkinson C, Brown JR, van Loenen FB, Clark DA, Griffiths PD, Verjans GMGM, Schutten M, Milne RSB, Balloux F, Breuer J. 2016. Islands of linkage in an ocean of pervasive recombination reveals two-speed evolution of human cytomegalovirus genomes. Virus Evol 2:vew017. 135. Zell R, Taudien S, Pfaff F, Wutzler P, Platzer M, Sauerbrei A. 2012. Sequencing of 21 Varicella-Zoster Virus Genomes Reveals Two Novel Genotypes and Evidence of Recombination. J Virol 86:1608–1622. 136. Norberg P, Kasubi MJ, Haarr L, Bergstrom T, Liljeqvist J-A. 2007. Divergence and recombination of clinical herpes simplex virus type 2 isolates. J Virol 81:13158–13167. 137. Norberg P, Depledge DP, Kundu S, Atkinson C, Brown J, Haque T, Hussaini Y, MacMahon E, Molyneaux P, Papaevangelou V, Sengupta N, Koay ESC, Tang JW, Underhill GS, Grahn A, Studahl M, Breuer J, Bergström T. 2015. Recombination of Globally Circulating Varicella-Zoster Virus. J Virol 89:7133–7146. 138. Koelle DM, Norberg P, Fitzgibbon MP, Russell RM, Greninger AL, Huang M-L, Stensland L, Jing L, Magaret AS, Diem K, Selke S, Xie H, Celum C, Lingappa JR, Jerome KR, Wald A, Johnston C. 2017. Worldwide circulation of HSV-2 × HSV-1 recombinant strains. Sci Rep 7:44084.

172

139. Lee S-W, Markham PF, Coppo MJC, Legione AR, Markham JF, Noormohammadi AH, Browning GF, Ficorilli N, Hartley CA, Devlin JM. 2012. Attenuated Vaccines Can Recombine to Form Virulent Field Viruses. Science 337:188–188. 140. Lee K, Kolb AW, Sverchkov Y, Cuellar JA, Craven M, Brandt CR. 2015. Recombination Analysis of Herpes Simplex Virus 1 Reveals a Bias toward GC Content and the Inverted Repeat Regions. J Virol 89:7214–7223. 141. Kennedy DA, Dunn JR, Dunn PA, Read AF. 2015. An observational study of the temporal and spatial patterns of Marek’s-disease-associated leukosis condemnation of young chickens in the United States of America. Prev Vet Med 120:328–335. 142. Baigent SJ, Davison F. 2004. 6 - Marek’s disease virus: Biology and life cycle, p. 62–ii. In Davison, F, Nair, V (eds.), Marek’s Disease. Academic Press, Oxford. 143. Osterrieder K, Vautherot J-F. 2004. 3 - The genome content of Marek’s disease-like viruses, p. 17–31. In Davison, F, Nair, V (eds.), Marek’s Disease. Academic Press, Oxford. 144. Pandey U, Bell AS, Renner DW, Kennedy DA, Shreve JT, Cairns CL, Jones MJ, Dunn PA, Read AF, Szpara ML. 2016. DNA from dust: Comparative genomics of large DNA viruses in field surveillance samples. mSphere 1:e00132-16. 145. Spatz SJ, Volkening JD, Gimeno IM, Heidari M, Witter RL. 2012. Dynamic equilibrium of Marek’s disease genomes during in vitro serial passage. Virus Genes 45:526–536. 146. Gimeno IM. 2008. Marek’s disease vaccines: a solution for today but a worry for tomorrow? Vaccine 26 Suppl 3:C31-41. 147. Couteaudier M, Denesvre C. 2014. Marek’s disease virus and skin interactions. Vet Res 45:36. 148. Atkins KE, Read AF, Savill NJ, Renz KG, Islam AFMF, Walkden-Brown SW, Woolhouse MEJ. 2013. Vaccination and reduced cohort duration can drive virulence evolution: Marek’s disease virus and industrialized agriculture. Evol Int J Org Evol 67:851–860. 149. Jarosinski KW, Arndt S, Kaufer BB, Osterrieder N. 2012. Fluorescently Tagged pUL47 of Marek’s Disease Virus Reveals Differential Tissue Expression of the Tegument Protein In Vivo. J Virol 86:2428–2436. 150. Spatz SJ. 2010. Accumulation of attenuating mutations in varying proportions within a high passage very virulent plus strain of Gallid herpesvirus type 2. Virus Res 149:135– 142. 151. Nair V. 2013. Latency and Tumorigenesis in Marek’s Disease. Avian Dis 57:360–365. 152. Biggs PM. 2004. 2 - Marek’s disease: Long and difficult beginnings, p. 8–16. In Davison, F, Nair, V (eds.), Marek’s Disease. Academic Press, Oxford. 153. Witter RL. 1997. Increased virulence of Marek’s disease virus field isolates. Avian Dis 41:149–163. 154. Zhang F, Liu C-J, Zhang Y-P, Li Z-J, Liu A-L, Yan F-H, Cong F, Cheng Y. 2011. Comparative full-length sequence analysis of Marek’s disease virus vaccine strain 814. Arch Virol 157:177–183. 155. Spatz SJ, Zhao Y, Petherbridge L, Smith LP, Baigent SJ, Nair V. 2007. Comparative sequence analysis of a highly oncogenic but horizontal spread-defective clone of Marek’s disease virus. Virus Genes 35:753–766. 156. Su S, Cui N, Cui Z, Zhao P, Li Y, Ding J, Dong X. 2012. Complete Genome Sequence of a Recombinant Marek’s Disease Virus Field Strain with One Reticuloendotheliosis Virus Long Terminal Repeat Insert. J Virol 86:13818–13819. 157. Niikura M, Dodgson J, Cheng H. 2005. Direct evidence of host genome acquisition by the alphaherpesvirus Marek’s disease virus. Arch Virol 151:537–549. 158. Spatz SJ, Volkening JD, Gimeno IM, Heidari M, Witter RL. 2012. Dynamic equilibrium of Marek’s disease genomes during in vitro serial passage. Virus Genes 45:526–536.

173

159. Cheng Y, Cong F, Zhang Y, Li Z, Xu N, Hou G, Liu C. 2012. Genome sequence determination and analysis of a Chinese virulent strain, LMS, of Gallid herpesvirus type 2. Virus Genes 45:56–62. 160. Spatz SJ, Silva RF. 2007. Sequence determination of variable regions within the genomes of gallid herpesvirus-2 pathotypes. Arch Virol 152:1665–1678. 161. Tulman ER, Afonso CL, Lu Z, Zsak L, Rock DL, Kutish GF. 2000. The Genome of a Very Virulent Marek’s Disease Virus. J Virol 74:7980–7988. 162. Hildebrandt E, Dunn JR, Perumbakkam S, Niikura M, Cheng HH. 2014. Characterizing the molecular basis of attenuation of Marek’s disease virus via in vitro serial passage identifies de novo mutations in the helicase-primase subunit gene UL5 and other candidates associated with reduced virulence. J Virol 88:6232–6242. 163. Tanaka M, Kato A, Satoh Y, Ide T, Sagou K, Kimura K, Hasegawa H, Kawaguchi Y. 2012. Herpes simplex virus 1 VP22 regulates translocation of multiple viral and cellular proteins and promotes neurovirulence. J Virol 86:5264–5277. 164. Xu F, Sternberg MR, Gottlieb SL, Berman SM, Markowitz LE, Forhan SE, Taylor LD. 2010. Seroprevalence of Herpes Simplex Virus Type 2 Among Persons Aged 14–49 Years — United States, 2005–2008. Morb Mortal Wkly Rep 59:456. 165. Steiner I. 2011. Herpes simplex virus encephalitis: new infection or reactivation? Curr Opin Neurol 24:268–274. 166. Steiner I, Benninger F. 2013. Update on Herpes Virus Infections of the Nervous System. Curr Neurol Neurosci Rep 13. 167. Dix RD, McKendall RR, Baringer JR. 1983. Comparative neurovirulence of herpes simplex virus type 1 strains after peripheral or intracerebral inoculation of BALB/c mice. Infect Immun 40:103–112. 168. Dix RD, Lukes S, Pulliam L, Baringer JR. 1983. DNA restriciton Enzyme Analysis of Viruses Isolated From Cerebrospinal Fluid and Brain-Biopsy Tissue in a Patient with herpes Simplex Encephalitis. N Engl J Med 1424. 169. Kolb AW, Lee K, Larsen I, Craven M, Brandt CR. 2016. Quantitative Trait Locus Based Virulence Determinant Mapping of the HSV-1 Genome in Murine Ocular Infection: Genes Involved in Viral Regulatory and Innate Immune Networks Contribute to Virulence. PLOS Pathog 12:e1005499. 170. Kolb AW, Adams M, Cabot EL, Craven M, Brandt CR. 2011. Multiplex sequencing of seven ocular herpes simplex virus type-1 genomes: phylogeny, sequence variability, and SNP distribution. Invest Ophthalmol Vis Sci 52:9061–73. 171. Newman RM, Lamers SL, Weiner B, Ray SC, Colgrove RC, Diaz F, Jing L, Wang K, Saif S, Young S, Henn M, Laeyendecker O, Tobian AAR, Cohen JI, Koelle DM, Quinn TC, Knipe DM. 2015. Genome Sequencing and Analysis of Geographically Diverse Clinical Isolates of Herpes Simplex Virus 2. J Virol JVI.01303-15. 172. Argnani R, Lufino M, Manservigi M, Manservigi R. 2005. Replication-competent herpes simplex vectors: design and applications. Gene Ther 12:S170. 173. Elde NC, Child SJ, Eickbush MT, Kitzman JO, Rogers KS, Shendure J, Geballe AP, Malik HS. 2012. Poxviruses Deploy Genomic Accordions to Adapt Rapidly against Host Antiviral Defenses. Cell 150:831–841. 174. Witter RL. 1997. Increased virulence of Marek’s disease virus field isolates. Avian Dis 149–163. 175. Biggs PM. 2004. Marek’s disease: Long and difficult beginnings, p. 8–16. In Davison, F, Nair, V (eds.), Marek’s Disease. Academic Press, Oxford. 176. Osterrieder N, Kamil JP, Schumacher D, Tischer BK, Trapp S. 2006. Marek’s disease virus: from miasma to model. Nat Rev Microbiol 4:283–294. 177. Gimeno IM. 2008. Marek’s disease vaccines: a solution for today but a worry for tomorrow? Vaccine 26 Suppl 3:C31-41.

174

178. Read AF, Baigent SJ, Powers C, Kgosana LB, Blackwell L, Smith LP, Kennedy DA, Walkden-Brown SW, Nair VK. 2015. Imperfect Vaccination Can Enhance the Transmission of Highly Virulent Pathogens. PLOS Biol 13:e1002198. 179. Witter RL, Lee LF. 1984. Polyvalent Marek’s disease vaccines: Safety, efficacy and protective synergism in chickens with maternal antibodies. Avian Pathol 13:75–92. 180. Poultry Slaughter Annual Summary. U S Dep Agric Econ Stat Mark Inf Syst. Government data. 181. Islam AFMF, Walkden-Brown SW, Groves PJ, Underwood GJ. 2008. Kinetics of Marek’s disease virus (MDV) infection in broiler chickens 1: effect of varying vaccination to challenge interval on vaccinal protection and load of MDV and herpesvirus of turkey in the spleen and feather dander over time. Avian Pathol 37:225–235. 182. Nair V. 2005. Evolution of Marek’s disease – A paradigm for incessant race between the pathogen and the host. Vet J 170:175–183. 183. Depledge DP, Kundu S, Jensen NJ, Gray ER, Jones M, Steinberg S, Gershon A, Kinchington PR, DS, Balloux F, Nichols RA, Breuer J. 2014. Deep Sequencing of Viral Genomes Provides Insight into the Evolution and Pathogenesis of Varicella Zoster Virus and Its Vaccine in Humans. Mol Biol Evol 31:397–409. 184. Quinlivan M, Breuer J. 2014. Clinical and molecular aspects of the live attenuated Oka varicella vaccine: Studies of the Oka varicella vaccine. Rev Med Virol 24:254–273. 185. Zerboni L, Sen N, Oliver SL, Arvin AM. 2014. Molecular mechanisms of varicella zoster virus pathogenesis. Nat Rev Microbiol 12:197–210. 186. Weinert LA, Depledge DP, Kundu S, Gershon AA, Nichols RA, Balloux F, Welch JJ, Breuer J. 2015. Rates of Vaccine Evolution Show Strong Effects of Latency: Implications for Varicella Zoster Virus Epidemiology. Mol Biol Evol 32:1020–1028. 187. Niikura M, Dodgson J, Cheng H. 2006. Direct evidence of host genome acquisition by the alphaherpesvirus Marek’s disease virus. Arch Virol 151:537–549. 188. Zhang F, Liu C-J, Zhang Y-P, Li Z-J, Liu A-L, Yan F-H, Cong F, Cheng Y. 2011. Comparative full-length sequence analysis of Marek’s disease virus vaccine strain 814. Arch Virol 157:177–183. 189. Peters G a, Tyler SD, Grose C, Severini A, Gray MJ, Upton C, Tipples G a. 2006. A full- genome phylogenetic analysis of varicella-zoster virus reveals a novel origin of replication-based genotyping scheme and evidence of recombination between major circulating clades. J Virol 80:9850–9860. 190. Tyler SD, Peters GA, Grose C, Severini A, Gray MJ, Upton C, Tipples GA. 2007. Genomic cartography of varicella-zoster virus: a complete genome-based analysis of strain variability with implications for attenuation and phenotypic differences. Virology 359:447– 58. 191. Bradley AJ, Lurain NS, Ghazal P, Trivedi U, Cunningham C, Baluchova K, Gatherer D, Wilkinson GWG, Dargan DJ, Davison AJ. 2009. High-throughput sequence analysis of variants of human cytomegalovirus strains Towne and AD169. J Gen Virol 90:2375–80. 192. Dargan DJ, Douglas E, Cunningham C, Jamieson F, Stanton RJ, Baluchova K, McSharry BP, Tomasec P, Emery VC, Percivalle E, Sarasini A, Gerna G, Wilkinson GWG, Davison AJ. 2010. Sequential mutations associated with adaptation of human cytomegalovirus to growth in cell culture. J Gen Virol 91:1535–46. 193. Spatz SJ. 2010. Accumulation of attenuating mutations in varying proportions within a high passage very virulent plus strain of Gallid herpesvirus type 2. Virus Res 149:135– 142. 194. Hildebrandt E, Dunn JR, Perumbakkam S, Niikura M, Cheng HH. 2014. Characterizing the molecular basis of attenuation of Marek’s disease virus via in vitro serial passage identifies de novo mutations in the helicase-primase subunit gene UL5 and other candidates associated with reduced virulence. J Virol 88:6232–42.

175

195. Cunningham C, Gatherer D, Hilfrich B, Baluchova K, Derrick J, Thomson M, Griffiths PD, Wilkinson GWG, Schulz TF, Dargan DJ, Davison AJ. 2010. Sequences of complete human cytomegalovirus genomes from infected cell cultures and clinical specimens. J Gen Virol 91:605–15. 196. Depledge DP, Palser AL, Watson SJ, Lai IY-C, Gray ER, Grant P, Kanda RK, Leproust E, Kellam P, Breuer J. 2011. Specific Capture and Whole-Genome Sequencing of Viruses from Clinical Samples. PLoS ONE 6:e27805. 197. Renzette N, Bhattacharjee B, Jensen JD, Gibson L, Kowalik TF. 2011. Extensive genome-wide variability of human cytomegalovirus in congenitally infected infants. PLoS Pathog 7:1–14. 198. Renzette N, Gibson L, Bhattacharjee B, Fisher D, Schleiss MR, Jensen JD, Kowalik TF. 2013. Rapid intrahost evolution of human cytomegalovirus is shaped by demography and positive selection. PLoS Genet 9:1–14. 199. Lei H, Li T, Hung G-C, Li B, Tsai S, Lo S-C. 2013. Identification and characterization of EBV genomes in spontaneously immortalized human peripheral blood B lymphocytes by NGS technology. BMC Genomics 14:804. 200. Pellett PE, Roizman B. 2013. Herpesviridae, p. 1802–1822. In Fields Virology, 6th ed. Lippincott Williams & Wilkins, Philadelphia, PA. 201. Islam A, Harrison B, Cheetham BF, Mahony TJ, Young PL, Walkden-Brown SW. 2004. Differential amplification and quantitation of Marek’s disease viruses using real-time polymerase chain reaction. J Virol Methods 119:103–113. 202. Parsons LR, Tafuri YR, Shreve JT, Bowen CD, Shipley MM, Enquist LW, Szpara ML. 2015. Rapid genome assembly and comparison decode intrastrain variation in human alphaherpesviruses. mBio 6. 203. Spatz SJ, Schat KA. 2011. Comparative genomic sequence analysis of the Marek’s disease vaccine strain SB-1. Virus Genes 42:331–338. 204. Kuhn JH, Bao Y, Bavari S, Becker S, Bradfute S, Brister JR, Bukreyev AA, Chandran K, Davey RA, Dolnik O, Dye JM, Enterlein S, Hensley LE, Honko AN, Jahrling PB, Johnson KM, Kobinger G, Leroy EM, Lever MS, Mühlberger E, Netesov SV, Olinger GG, Palacios G, Patterson JL, Paweska JT, Pitt L, Radoshitzky SR, Saphire EO, Smither SJ, Swanepoel R, Towner JS, van der Groen G, Volchkov VE, Wahl-Jensen V, Warren TK, Weidmann M, Nichol ST. 2013. Virus nomenclature below the species level: a standardized nomenclature for natural variants of viruses assigned to the family Filoviridae. Arch Virol 158:301–311. 205. Watson G, Xu W, Reed A, Babra B, Putman T, Wick E, Wechsler SL, Rohrmann GF, Jin L. 2012. Sequence and comparative analysis of the genome of HSV-1 strain McKrae. Virology 433:528–37. 206. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK. 2012. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22:568–576. 207. Nakamura K, Oshima T, Morimoto T, Ikeda S, Yoshikawa H, Shiwa Y, Ishikawa S, Linak MC, Hirai A, Takahashi H, Altaf-Ul-Amin M, Ogasawara N, Kanaya S. 2011. Sequence- specific error profile of Illumina sequencers. Nucleic Acids Res 39:1–13. 208. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly (Austin) 6:80–92. 209. Cingolani P, Patel VM, Coon M, Nguyen T, Land SJ, Ruden DM, Lu X. 2012. Using Drosophila melanogaster as a Model for Genotoxic Chemical Mutational Studies with a New Program, SnpSift. Front Genet 3. 210. Roy S, Schreiber E. 2014. Detecting and Quantifying Low Level Gene Variants in Sanger Sequencing Traces Using the ab1 Peak Reporter Tool. J Biomol Tech JBT 25:S13–S14.

176

211. Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066. 212. Jukes TH, Cantor CR. 1969. Evolution of Protein Molecules (Chapter 24), p. 21–132. In Munro, HN (ed.), Mammalian Protein Metabolism. Academic Press. 213. Saitou N, Nei M. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425. 214. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0. Mol Biol Evol 30:2725–2729. 215. Felsenstein J. 1985. Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution 39:783–791. 216. Bell AS, Jones MJ, Christopher CL, Kennedy DA, Pandey U, Dunn PA, Szpara ML, Read AF. Marek’s disease: MDV-1 diversity and dynamics of infection in the field, a Sanger sequencing-based study in Pennsylvania, USA (In prep). 217. Kennedy DA, Cairns CL, Jones MJ, Bell AS, Salathe RM, Baigent SJ, Nair VK, Dunn PA, Read AF. Industry-wide surveillance of Marek’s disease virus on commercial poultry farms: underlying potential for virulence evolution and vaccine escape (In review). 218. Kamil JP, Tischer BK, Trapp S, Nair VK, Osterrieder N, Kung H-J. 2005. vLIP, a Viral Lipase Homologue, Is a Virulence Factor of Marek’s Disease Virus. J Virol 79:6984–6996. 219. Hearn C, Preeyanon L, Hunt HD, York IA. 2015. An MHC class I immune evasion gene of .s disease virus. Virology 475:88–95׳Marek 220. Ahlers SE, Feldman LT. 1987. Immediate-early protein of pseudorabies virus is not continuously required to reinitiate transcription of induced genes. J Virol 61:1258–1260. 221. Wu CL, Wilcox KW. 1991. The conserved DNA-binding domains encoded by the herpes simplex virus type 1 ICP4, pseudorabies virus IE180, and varicella-zoster virus ORF62 genes recognize similar sites in the corresponding promoters. J Virol 65:1149–59. 222. Xie Q, Anderson AS, Morgan RW. 1996. Marek’s disease virus (MDV) ICP4, pp38, and meq genes are involved in the maintenance of transformation of MDCC-MSB1 MDV- transformed lymphoblastoid cells. J Virol 70:1125–1131. 223. Cantello JL, Parcells MS, Anderson AS, Morgan RW. 1997. Marek’s disease virus latency-associated transcripts belong to a family of spliced RNAs that are antisense to the ICP4 homolog gene. J Virol 71:1353–1361. 224. Nair V. 2013. Latency and Tumorigenesis in Marek’s Disease. Avian Dis 57:360–365. 225. Spatz SJ, Rue CA. 2008. Sequence determination of a mildly virulent strain (CU-2) of Gallid herpesvirus type 2 using 454 pyrosequencing. Virus Genes 36:479–489. 226. Spatz SJ, Petherbridge L, Zhao Y, Nair V. 2007. Comparative full-length sequence analysis of oncogenic and vaccine (Rispens) strains of Marek’s disease virus. J Gen Virol 88:1080–1096. 227. Norberg P, Tyler S, Severini A, Whitley R, Liljeqvist J-A, Bergstrom T. 2011. A genome- wide comparative evolutionary analysis of herpes simplex virus type 1 and varicella zoster virus. PloS One 6:1–8. 228. Grose C. 2012. Pangaea and the Out-of-Africa Model of Varicella-Zoster Virus Evolution and Phylogeography. J Virol 86:9558–9565. 229. Kolb AW, Ané C, Brandt CR. 2013. Using HSV-1 genome phylogenetics to track past human migrations. PloS One 8:1–9. 230. Chow VT, Tipples GA, Grose C. 2013. Bioinformatics of varicella-zoster virus: single nucleotide polymorphisms define clades and attenuated vaccine genotypes. Infect Genet Evol J Mol Epidemiol Evol Genet Infect Dis 18:351–356. 231. Shamblin CE, Greene N, Arumugaswami V, Dienglewicz RL, Parcells MS. 2004. Comparative analysis of Marek’s disease virus (MDV) glycoprotein-, lytic antigen pp38- and transformation antigen Meq-encoding genes: association of meq mutations with MDVs of high virulence. Vet Microbiol 102:147–167.

177

232. Santin ER, Shamblin CE, Prigge JT, Arumugaswami V, Dienglewicz RL, Parcells MS. 2006. Examination of the Effect of a Naturally Occurring Mutation in Glycoprotein L on Marek’s Disease Virus Pathogenesis. Avian Dis 50:96–103. 233. Tavlarides-Hontz P, Kumar PM, Amortegui JR, Osterrieder N, Parcells MS. 2009. A deletion within glycoprotein L of Marek’s disease virus (MDV) field isolates correlates with a decrease in bivalent MDV vaccine efficacy in contact-exposed chickens. Avian Dis 53:287–296. 234. Shaikh SAR, Katneni UK, Dong H, Gaddamanugu S, Tavlarides-Hontz P, Jarosinski KW, Osterrieder N, Parcells MS. 2013. A Deletion in the Glycoprotein L (gL) Gene of U.S. Marek’s Disease Virus (MDV) Field Strains Is Insufficient to Confer Increased Pathogenicity to the Bacterial Artificial Chromosome (BAC)–Based Strain, RB-1B. Avian Dis 57:509–518. 235. Gianni T, Massaro R, Campadelli-Fiume G. 2015. Dissociation of HSV gL from gH by αvβ6- or αvβ8-integrin promotes gH activation and virus entry. Proc Natl Acad Sci U S A 112:E3901–E3910. 236. Wu P, Reed WM, Lee LF. 2001. Glycoproteins H and L of Marek’s disease virus form a hetero-oligomer essential for translocation and cell surface expression. Arch Virol 146:983–992. 237. Qian Z, Brunovskis P, Rauscher F, Lee L, Kung HJ. 1995. Transactivation activity of Meq, a Marek’s disease herpesvirus bZIP protein persistently expressed in latently infected transformed T cells. J Virol 69:4037–4044. 238. Nair V, Kung H-J. 2004. 4 - Marek’s disease virus oncogenicity: Molecular mechanisms, p. 32–48. In Davison, F, Nair, V (eds.), Marek’s Disease. Academic Press, Oxford. 239. Tian M, Zhao Y, Lin Y, Zou N, Liu C, Liu P, Cao S, Wen X, Huang Y. 2011. Comparative analysis of oncogenic genes revealed unique evolutionary features of field Marek’s disease virus prevalent in recent years in China. Virol J 8:121. 240. Murata S, Okada T, Kano R, Hayashi Y, Hashiguchi T, Onuma M, Konnai S, Ohashi K. 2011. Analysis of transcriptional activities of the Meq proteins present in highly virulent Marek’s disease virus strains, RB1B and Md5. Virus Genes 43:66–71. 241. Spatz SJ, Silva RF. 2007. Polymorphisms in the repeat long regions of oncogenic and attenuated pathotypes of Marek’s disease virus 1. Virus Genes 35:41–53. 242. Giacinti C, Giordano A. 2006. RB and cell cycle progression. Oncogene 25:5220–5227. 243. Renzette N, Pokalyuk C, Gibson L, Bhattacharjee B, Schleiss MR, Hamprecht K, Yamamoto AY, Mussi-Pinhata MM, Britt WJ, Jensen JD, Kowalik TF. 2015. Limits and patterns of cytomegalovirus genomic diversity in humans. Proc Natl Acad Sci 201501880. 244. Afonso CL, Tulman ER, Lu Z, Zsak L, Rock DL, Kutish GF. 2001. The Genome of Turkey Herpesvirus. J Virol 75:971–978. 245. Izumiya Y, Jang H-K, Ono M, Mikami T. 2001. A Complete Genomic DNA Sequence of Marek’s Disease Virus Type 2, Strain HPRS24, p. 191–221. In Hirai, PDK (ed.), Marek’s Disease. Springer Berlin Heidelberg. 246. Ojkic D, Nagy É. 2000. The complete nucleotide sequence of fowl adenovirus type 8. J Gen Virol 81:1833–1837. 247. Koppers-Lalic D, Verweij MC, Lipińska AD, Wang Y, Quinten E, Reits EA, Koch J, Loch S, Rezende MM, Daus F, Bieńkowska-Szewczyk K, Osterrieder N, Mettenleiter TC, Heemskerk MHM, Tampé R, Neefjes JJ, Chowdhury SI, Ressing ME, Rijsewijk FAM, Wiertz EJHJ. 2008. Varicellovirus UL49.5 Proteins Differentially Affect the Function of the Transporter Associated with Antigen Processing, TAP. PLoS Pathog 4:e1000080. 248. Verweij MC, Lipinska AD, Koppers-Lalic D, van Leeuwen WF, Cohen JI, Kinchington PR, Messaoudi I, Bienkowska-Szewczyk K, Ressing ME, Rijsewijk FAM, Wiertz EJHJ. 2011. The Capacity of UL49.5 Proteins To Inhibit TAP Is Widely Distributed among Members of the Genus Varicellovirus. J Virol 85:2351–2363.

178

249. Schippers T, Jarosinski K, Osterrieder N. 2014. The ORF012 Gene of Marek’s Disease Virus Type 1 Produces a Spliced Transcript and Encodes a Novel Nuclear Phosphoprotein Essential for Virus Growth. J Virol 89:1348–1363. 250. Zuccola HJ, Filman DJ, Coen DM, Hogle JM. 2000. The crystal structure of an unusual processivity factor, herpes simplex virus UL42, bound to the C terminus of its cognate polymerase. Mol Cell 5:267–78. 251. Wang Y-P, Du W-J, Huang L-P, Wei Y-W, Wu H-L, Feng L, Liu C-M. 2016. The Pseudorabies Virus DNA Polymerase Accessory Subunit UL42 Directs Nuclear Transport of the Holoenzyme. Front Microbiol 7. 252. Zhukovskaya NL, Guan H, Saw YL, Nuth M, Ricciardi RP. 2015. The processivity factor complex of feline herpes virus-1 is a new drug target. Antiviral Res 115:17–20. 253. Digard P, Chow CS, Pirrit L, Coen DM. 1993. Functional analysis of the herpes simplex virus UL42 protein. J Virol 67:1159–1168. 254. Domingo E, Martín V, Perales C, Grande-Pérez A, García-Arriaza J, Arias A. 2006. Viruses as Quasispecies: Biological Implications, p. 51–82. In Domingo, E (ed.), Quasispecies: Concept and Implications for Virology. Springer Berlin Heidelberg. 255. Atkins KE, Read AF, Savill NJ, Renz KG, Islam AF, Walkden-Brown SW, Woolhouse MEJ. 2013. Vaccination and reduced cohort duration can drive virulence evolution: Marek’s disease virus and industrialized agriculture. Evolution 67:851–860. 256. Atkins KE, Read AF, Walkden-Brown SW, Savill NJ, Woolhouse MEJ. 2013. The effectiveness of mass vaccination on Marek’s disease virus (MDV) outbreaks and detection within a broiler barn: a modeling study. Epidemics 5:208–217. 257. DeLuca NA, Schaffer PA. 1988. Physical and functional domains of the herpes simplex virus transcriptional regulatory protein ICP4. J Virol 62:732–743. 258. Wagner LM, Lester JT, Sivrich FL, DeLuca NA. 2012. The N Terminus and C Terminus of Herpes Simplex Virus 1 ICP4 Cooperate To Activate Viral Gene Expression. J Virol 86:6862–6874. 259. Arvin AM, Gershon AA. 1996. Live Attenuated Varicella Vaccine. Annu Rev Microbiol 50:59–100. 260. Wald A, Corey L. 2007. Persistence in the population: epidemiology, transmission, p. . In Arvin, A, Campadelli-Fiume, G, Mocarski, E, Moore, PS, Roizman, B, Whitley, R, Yamanishi, K (eds.), Human Herpesviruses: Biology, Therapy, and Immunoprophylaxis. Cambridge University Press, Cambridge. 261. Corey L, Spear PG. 1986. Infections with Herpes Simplex Viruses. N Engl J Med 314:749–757. 262. Whitley RJ. 2006. Herpes simplex encephalitis: Adolescents and adults. Antiviral Res 71:141–148. 263. Park PJ, Chang M, Garg N, Zhu J, Chang J-H, Shukla D. 2015. Corneal lymphangiogenesis in herpetic stromal keratitis. Surv Ophthalmol 60:60–71. 264. Kimberlin DW. 2007. Herpes Simplex Virus Infections of the Newborn. Semin Perinatol 31:19–25. 265. Corey L, Wald A. 2009. Maternal and Neonatal HSV Infections. N Engl J Med 361:1376– 1385. 266. Corey L. 2007. Synergistic copathogens--HIV-1 and HSV-2. N Engl J Med 356:854–856. 267. Corey L, Wald A, Celum CL, Quinn TC. 2004. The effects of herpes simplex virus-2 on HIV-1 acquisition and transmission: a review of two overlapping epidemics. J Acquir Immune Defic Syndr 1999 35:435–445. 268. Glynn JR, Biraro S, Weiss HA. 2009. Herpes simplex virus type 2: a key role in HIV incidence. AIDS Lond Engl 23:1595–1598.

179

269. Hayward GS, Frenkel N, Roizman B. 1975. Anatomy of herpes simplex virus DNA: strain differences and heterogeneity in the locations of restriction endonuclease cleavage sites. Proc Natl Acad Sci 72:1768–1772. 270. Wiel H van der, Weiland HT, Doornum GJJ van, Straaten PJC van der, Berger HM. 1985. Disseminated neonatal herpes simplex virus infection acquired from the father. Eur J Pediatr 144:56–57. 271. McGeoch DJ, Dalrymple MA, Davison AJ, Dolan A, Frame MC, McNab D, Perry LJ, Scott JE, Taylor P. 1988. The complete DNA sequence of the long unique region in the genome of herpes simplex virus type 1. J Gen Virol 69:1531–74. 272. McGeoch DJ, Dolan A, Donald S, Rixon FJ. 1985. Sequence determination and genetic content of the short unique region in the genome of herpes simplex virus type 1. J Mol Biol 181:1–13. 273. Wang GP, Sherrill-Mix SA, Chang K-M, Quince C, Bushman FD. 2010. Hepatitis C Virus Transmission Bottlenecks Analyzed by Deep Sequencing. J Virol 84:6218–6228. 274. Li H, Stoddard MB, Wang S, Blair LM, Giorgi EE, Parrish EH, Learn GH, Hraber P, Goepfert PA, Saag MS, Denny TN, Haynes BF, Hahn BH, Ribeiro RM, Perelson AS, Korber BT, Bhattacharya T, Shaw GM. 2012. Elucidation of Hepatitis C Virus Transmission and Early Diversification by Single Genome Sequencing. PLOS Pathog 8:e1002880. 275. Masharsky AE, Dukhovlinova EN, Verevochkin SV, Toussova OV, Skochilov RV, Anderson JA, Hoffman I, Cohen MS, Swanstrom R, Kozlov AP. 2010. A Substantial Transmission Bottleneck among Newly and Recently HIV-1-Infected Injection Drug Users in St Petersburg, Russia. J Infect Dis 201:1697–1702. 276. Renzette N, Gibson L, Jensen JD, Kowalik TF. 2014. Human cytomegalovirus intrahost evolution—a new avenue for understanding and controlling herpesvirus infections. Curr Opin Virol 8:109–115. 277. Andino R, Domingo E. 2015. Viral quasispecies. Virology 479–480:46–51. 278. Lauring AS, Andino R. 2010. Quasispecies Theory and the Behavior of RNA Viruses. PLoS Pathog 6:e1001005. 279. Sawtell NM. 2005. Detection and quantification of the rare latently infected cell undergoing herpes simplex virus transcriptional activation in the nervous system in vivo. DNA Viruses Methods Protoc 292:57–72. 280. Sawtell NM, Thompson RL. 1992. Rapid in vivo reactivation of herpes simplex virus in latently infected murine ganglionic neurons after transient hyperthermia. J Virol 66:2150– 2156. 281. Sawtell NM, Thompson RL, Haas RL. 2006. Herpes Simplex Virus DNA Synthesis Is Not a Decisive Regulatory Event in the Initiation of Lytic Viral Protein Expression in Neurons In Vivo during Primary Infection or Reactivation from Latency. J Virol 80:38–50. 282. Sawtell NM, Thompson RL. 1992. Herpes simplex virus type 1 latency-associated transcription unit promotes anatomical site-dependent establishment and reactivation from latency. J Virol 66:2157–2169. 283. Sawtell NM, Thompson RL. 2004. Comparison of Herpes Simplex Virus Reactivation in Ganglia In Vivo and in Explants Demonstrates Quantitative and Qualitative Differences. J Virol 78:7784–7794. 284. Sawtell NM. 2003. Quantitative Analysis of Herpes Simplex Virus Reactivation In Vivo Demonstrates that Reactivation in the Nervous System Is Not Inhibited at Early Times Postinoculation. J Virol 77:4127–4138. 285. Thompson RL, Shieh MT, Sawtell NM. 2003. Analysis of Herpes Simplex Virus ICP0 Promoter Function in Sensory Neurons during Acute Infection, Establishment of Latency, and Reactivation In Vivo. J Virol 77:12319–12330.

180

286. Thompson RL, Sawtell NM. 2011. The herpes simplex virus type 1 latency associated transcript locus is required for the maintenance of reactivation competent latent infections. J Neurovirol 17:552–558. 287. Thompson RL, Preston CM, Sawtell NM. 2009. De novo synthesis of VP16 coordinates the exit from HSV latency in vivo. PLoS Pathog 5:1–14. 288. Szpara M. 2014. Isolation of herpes simplex virus nucleocapsid DNA, p. 31–41. In Diefenbach, RJ, Fraefel, C (eds.), Herpes Simplex Virus. Springer New York, New York, NY. 289. Thompson RL, Wagner EK, Stevens JG. 1983. Physical location of a herpes simplex virus type-1 gene function(s) specifically associated with a 10 million-fold increase in HSV neurovirulence. Virology 131:180–192. 290. Javier RT, Izumi KM, Stevens JG. 1988. Localization of a herpes simplex virus neurovirulence gene dissociated from high-titer virus replication in the brain. J Virol 62:1381–1387. 291. Huson DH. 1998. SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14:68–73. 292. Ejercito PM, Kieff ED, Roizman B. 1968. Characterization of herpes simplex virus strains differing in their effects on social behaviour of infected cells. J Gen Virol 2:357–364. 293. Macdonald SJ, Mostafa HH, Morrison LA, Davido DJ. 2012. Genome sequence of herpes simplex virus 1 strain KOS. J Virol 86:6371–6372. 294. Smith KO. 1964. Relationship Between the Envelope and the Infectivity of Herpes Simplex Virus. Exp Biol Med 115:814–816. 295. Macdonald SJ, Mostafa HH, Morrison LA, Davido DJ. 2012. Genome sequence of herpes simplex virus 1 strain McKrae. J Virol 86:9540–9541. 296. Williams LE, Nesburn AB, Kaufman HE. 1965. Experimental induction of disciform keratitis. Arch Ophthalmol 73:112–114. 297. Ushijima Y, Luo C, Goshima F, Yamauchi Y, Kimura H, Nishiyama Y. 2007. Determination and analysis of the DNA sequence of highly attenuated herpes simplex virus type 1 mutant HF10, a potential oncolytic virus. Microbes Infect 9:142–149. 298. Bowen CD, Renner DW, Shreve JT, Tafuri Y, Payne KM, Dix RD, Kinchington PR, Gatherer D, Szpara ML. 2016. Viral forensic genomics reveals the relatedness of classic herpes simplex virus strains KOS, KOS63, and KOS79. Virology 492:179–186. 299. Rastrojo A, López-Muñoz AD, Alcamí A. 2017. Genome Sequence of Herpes Simplex Virus 1 Strain SC16. Genome Announc 5:e01392-16. 300. Szpara ML, Tafuri YR, Parsons L, Shreve JT, Engel EA, Enquist LW. 2014. Genome sequence of the anterograde-spread-defective herpes simplex virus 1 strain MacIntyre. Genome Announc 2. 301. Pfaff F, Groth M, Sauerbrei A, Zell R. 2016. Genotyping of herpes simplex virus type 1 (HSV-1) by whole genome sequencing. J Gen Virol. 302. Sawtell NM, Thompson RL. 2016. De Novo Herpes Simplex Virus VP16 Expression Gates a Dynamic Programmatic Transition and Sets the Latent/Lytic Balance during Acute Infection in Trigeminal Ganglia. PLOS Pathog 12:e1005877. 303. Stevens JG, Cook ML. 1971. Latent herpes simplex virus in spinal ganglia of mice. Science 173:843–845. 304. Sawtell NM. 1998. The probability of in vivo reactivation of herpes simplex virus type 1 increases with the number of latently infected neurons in the ganglia. J Virol 72:6888– 6892. 305. Wagner LM, DeLuca NA. 2013. Temporal Association of Herpes Simplex Virus ICP4 with Cellular Complexes Functioning at Multiple Steps in PolII Transcription. PLOS ONE 8:e78242.

181

306. Renzette N, Pfeifer SP, Matuszewski S, Kowalik TF, Jensen JD. 2017. On the Analysis of Intrahost and Interhost Viral Populations: Human Cytomegalovirus as a Case Study of Pitfalls and Expectations. J Virol 91:e01976-16. 307. Maertzdorf J, Remeijer L, Van Der Lelij A, Buitenwerf J, Niesters HG, Osterhaus AD, Verjans GM. 1999. Amplification of reiterated sequences of herpes simplex virus type 1 (HSV-1) genome to discriminate between clinical HSV-1 isolates. J Clin Microbiol 37:3518–3523. 308. Deback C, Boutolleau D, Depienne C, Luyt CE, Bonnafous P, Gautheret-Dejean A, Garrigue I, Agut H. 2009. Utilization of microsatellite polymorphism for differentiating herpes simplex virus type 1 strains. J Clin Microbiol 47:533–40. 309. Burrel S, Deback C, Agut H, Boutolleau D. 2010. Genotypic Characterization of UL23 Thymidine Kinase and UL30 DNA Polymerase of Clinical Isolates of Herpes Simplex Virus: Natural Polymorphism and Mutations Associated with Resistance to Antivirals. Antimicrob Agents Chemother 54:4833–4842. 310. Renzette N, Pokalyuk C, Gibson L, Bhattacharjee B, Schleiss MR, Hamprecht K, Yamamoto AY, Mussi-Pinhata MM, Britt WJ, Jensen JD, Kowalik TF. 2015. Limits and patterns of cytomegalovirus genomic diversity in humans. Proc Natl Acad Sci U S A. 311. Muller W, Jones C, Koelle D. 2010. Immunobiology of Herpes Simplex Virus and Cytomegalovirus Infections of the Fetus and Newborn. Curr Immunol Rev 6:38–55. 312. Orr MT, Edelmann KH, Vieira J, Corey L, Raulet DH, Wilson CB. 2005. Inhibition of MHC Class I Is a Virulence Factor in Herpes Simplex Virus Infection of Mice. PLoS Pathog 1:62–71. 313. Lekstrom-Himes JA, Hohman P, Warren T, Wald A, Nam J, Simonis T, Corey L, Straus SE. 1999. Association of Major Histocompatibility Complex Determinants with the Development of Symptomatic and Asymptomatic Genital Herpes Simplex Virus Type 2 Infections. J Infect Dis 179:1077–1085. 314. Umene K, Yoshida M. 1989. Reiterated sequences of herpes simplex virus type 1 (HSV- 1) genome can serve as physical markers for the differentiation of HSV-1 strains. Arch Virol 106:281–299. 315. Heller M, Dix RD, Baringer JR, Schachter J, Conte JE Jr. 1982. Herpetic proctitis and meningitis: recovery of two strains of herpes simplex virus type 1 from cerebrospinal fluid. J Infect Dis 146:584–588. 316. Sakaoka H, Saito H, Sekine K, Aomori T, Grillner L, Wadell G, Fujinaga K. 1987. Genomic comparison of herpes simplex virus type 1 isolates from Japan, Sweden and Kenya. J Gen Virol 68:749–764. 317. Sakaoka H, Kurita K, Iida Y, Takada S, Umene K, Kim YT, Ren CS, Nahmias AJ. 1994. Quantitative analysis of genomic polymorphism of herpes simplex virus type 1 strains from six countries: studies of molecular evolution and molecular epidemiology of the virus. J Gen Virol 75:513–27. 318. Norberg P, Bergstrom T, Rekabdar E, Lindh M. 2004. Phylogenetic Analysis of Clinical Herpes Simplex Virus Type 1 Isolates Identified Three Genetic Groups and Recombinant Viruses. J Virol 78:10755–10764. 319. Tronstein E, Johnston C, Huang M-L, Selke S, Magaret A, Warren T, Corey L, Wald A. 2011. Genital shedding of herpes simplex virus among symptomatic and asymptomatic persons with HSV-2 infection. Jama 305:1441–1449. 320. Johnston C, Zhu J, Jing L, Laing KJ, McClurkan CM, Klock A, Diem K, Jin L, Stanaway J, Tronstein E, Kwok WW, Huang M -l., Selke S, Fong Y, Magaret A, Koelle DM, Wald A, Corey L. 2014. Virologic and Immunologic Evidence of Multifocal Genital Herpes Simplex Virus 2 Infection. J Virol 88:4921–4931.

182

321. Langenberg AG, Corey L, Ashley RL, Leong WP, Straus SE. 1999. A prospective study of new infections with herpes simplex virus type 1 and type 2. Chiron HSV Vaccine Study Group. N Engl J Med 341:1432–1438. 322. Corey L, Wald A, Patel R, Sacks SL, Tyring SK, Warren T, Douglas Jr JM, Paavonen J, Morrow RA, Beutner KR, others. 2004. Once-daily valacyclovir to reduce the risk of transmission of genital herpes. N Engl J Med 350:11–20. 323. Osterrieder N, Kamil JP, Schumacher D, Tischer BK, Trapp S. 2006. Marek’s disease virus: from miasma to model. Nat Rev Microbiol 4:283–294. 324. Witter RL. 1989. Very virulent Marek’s disease viruses: importance and control. World39s Poult Sci J 45:60–65. 325. Benton WJ, Cover MS. 1957. The Increased Incidence of Visceral Lymphomatosis in Broiler and Replacement Birds. Avian Dis 1:320–327. 326. Gandon S, Mackinnon MJ, Nee S, Read AF. 2001. Imperfect vaccines and the evolution of pathogen virulence. Nature 414:751. 327. Witter RL. 1998. The changing landscape of Marek’s disease. Avian Pathol U K. 328. Rozins C, Day T. 2016. The industrialization of farming may be driving virulence evolution. Evol Appl 10:189–198. 329. Nair V, Kung H-J. 2004. 4 - Marek’s disease virus oncogenicity: Molecular mechanisms, p. 32–48. In Davison, F, Nair, V (eds.), Marek’s Disease. Academic Press, Oxford. 330. Kamil JP, Tischer BK, Trapp S, Nair VK, Osterrieder N, Kung H-J. 2005. vLIP, a Viral Lipase Homologue, Is a Virulence Factor of Marek’s Disease Virus. J Virol 79:6984–6996. 331. Xie Q, Anderson AS, Morgan RW. 1996. Marek’s disease virus (MDV) ICP4, pp38, and meq genes are involved in the maintenance of transformation of MDCC-MSB1 MDV- transformed lymphoblastoid cells. J Virol 70:1125–1131. 332. Hildebrandt E, Dunn JR, Cheng HH. 2015. The Mut UL5-I682R Marek’s Disease Virus with a Single Nucleotide Mutation Within the Helicase-Primase Subunit Gene not only Reduces Virulence but also Provides Partial Vaccinal Protection Against Marek’s Disease. Avian Dis 59:94–97. 333. Mbong EF, Woodley L, Dunkerley E, Schrimpf JE, Morrison LA, Duffy C. 2012. Deletion of the Herpes Simplex Virus 1 UL49 Gene Results in mRNA and Protein Translation Defects That Are Complemented by Secondary Mutations in UL41. J Virol 86:12351–12361. 334. Santin ER, Shamblin CE, Prigge JT, Arumugaswami V, Dienglewicz RL, Parcells MS. 2006. Examination of the effect of a naturally occurring mutation in glycoprotein L on Marek’s disease virus pathogenesis. Avian Dis 50:96–103. 335. Murata S, Okada T, Kano R, Hayashi Y, Hashiguchi T, Onuma M, Konnai S, Ohashi K. 2011. Analysis of transcriptional activities of the Meq proteins present in highly virulent Marek’s disease virus strains, RB1B and Md5. Virus Genes 43:66–71. 336. Shamblin CE, Greene N, Arumugaswami V, Dienglewicz RL, Parcells MS. 2004. Comparative analysis of Marek’s disease virus (MDV) glycoprotein-, lytic antigen pp38- and transformation antigen Meq-encoding genes: association of meq mutations with MDVs of high virulence. Vet Microbiol 102:147–167. 337. Hughes AL, Rivailler P. 2007. Phylogeny and Recombination History of Gallid Herpesvirus 2 (Marek’s Disease Virus) Genomes. Virus Res 130:28–33. 338. Wilkinson D, Weller S. 2003. The Role of DNA Recombination in Herpes Simplex Virus DNA Replication. IUBMB Life 55:451–458. 339. Loncoman CA, Vaz PK, Coppo MJ, Hartley CA, Morera FJ, Browning GF, Devlin JM. 2017. Natural recombination in alphaherpesviruses: Insights into viral evolution through full genome sequencing and sequence analysis. Infect Genet Evol J Mol Epidemiol Evol Genet Infect Dis 49:174–185. 340. Martin DP, Murrell B, Golden M, Khoosal A, Muhire B. 2015. RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evol 1.

183

341. Trimpert J, Groenke N, Jenckel M, He S, Kunec D, Szpara ML, Spatz SJ, Osterrieder N, McMahon DP. 2017. A phylogenomic analysis of Marek’s disease virus reveals independent paths to virulence in Eurasia and North America. Evol Appl 10:1091–1101. 342. Boni MF, Posada D, Feldman MW. 2007. An Exact Nonparametric Method for Inferring Mosaic Structure in Sequence Triplets. Genetics 176:1035–1047. 343. Lam HM, Ratmann O, Boni MF. Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm. Mol Biol Evol. 344. Spatz SJ, Volkening JD, Gimeno IM, Heidari M, Witter RL. 2012. Dynamic equilibrium of Marek’s disease genomes during in vitro serial passage. Virus Genes 45:526–536. 345. Parsons LR, Tafuri YR, Shreve JT, Bowen CD, Shipley MM, Enquist LW, Szpara ML. 2015. Rapid genome assembly and comparison decode intrastrain variation in human alphaherpesviruses. mBio 6:e02213-14. 346. Katoh K, Misawa K, Kuma K, Miyata T. 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066. 347. Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313. 348. Lam HM, Ratmann O, Boni MF. 2018. Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm. Mol Biol Evol 35:247–251. 349. Fierer DS, Challberg MD. 1992. Purification and characterization of UL9, the herpes simplex virus type 1 origin-binding protein. J Virol 66:3986–3995. 350. Reddy SM, Sui D, Wu P, Lee L. 1999. Identification and structural analysis of a MDV gene encoding a protein kinase. Acta Virol 43:174–180. 351. Levy AM, Gilad O, Xia L, Izumiya Y, Choi J, Tsalenko A, Yakhini Z, Witter R, Lee L, Cardona CJ, Kung H-J. 2005. Marek’s disease virus Meq transforms chicken cells via the v-Jun transcriptional cascade: A converging transforming pathway for avian oncoviruses. Proc Natl Acad Sci 102:14831–14836. 352. Smiley JR, Lavery C, Howes M. 1992. The herpes simplex virus type 1 (HSV-1) a sequence serves as a cleavage/packaging signal but does not drive recombinational genome isomerization when it is inserted into the HSV-2 genome. J Virol 66:7505–7510. 353. Boni MF, Zhou Y, Taubenberger JK, Holmes EC. 2008. Homologous Recombination Is Very Rare or Absent in Human Influenza A Virus. J Virol 82:4807–4811. 354. Eden J-S, Tanaka MM, Boni MF, Rawlinson WD, White PA. 2013. Recombination within the Pandemic Norovirus GII.4 Lineage. J Virol 87:6270–6282. 355. Muylaert I, Tang K-W, Elias P. 2011. Replication and Recombination of Herpes Simplex Virus DNA. J Biol Chem 286:15619–15624. 356. Gershburg S, Geltz J, Peterson KE, Halford WP, Gershburg E. 2015. The UL13 and US3 Protein Kinases of Herpes Simplex Virus 1 Cooperate to Promote the Assembly and Release of Mature, Infectious Virions. PLoS ONE 10:e0131420. 357. Hicks JA, Liu H-C. 2013. Current State of Marek’s Disease Virus MicroRNA Research. Avian Dis 57:332–339. 358. Mocarski ES, Roizman B. 1982. Structure and role of the herpes simplex virus DNA termini in inversion, circularization and generation of virion DNA. Cell 31:89–97. 359. Bruckner RC, Dutch RE, Zemelman BV, Mocarski ES, Lehman IR. 1992. Recombination in vitro between herpes simplex virus type 1 a sequences. Proc Natl Acad Sci 89:10950– 10954. 360. Murata S, Okada T, Kano R, Hayashi Y, Hashiguchi T, Onuma M, Konnai S, Ohashi K. 2011. Analysis of transcriptional activities of the Meq proteins present in highly virulent Marek’s disease virus strains, RB1B and Md5. Virus Genes 43:66–71. 361. Drummond AJ, Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214.

184

362. Kennedy DA, Cairns C, Jones MJ, Bell, AS, Salathé RM, Baigent SJ, Nair VK, Dunn PA, Read AF. 2017. Industry-Wide Surveillance of Marek’s Disease Virus on Commercial Poultry Farms. Avian Dis 61:153–164. 363. Kennedy DA, Dunn JR, Dunn PA, Read AF. 2015. An observational study of the temporal and spatial patterns of Marek’s-disease-associated leukosis condemnation of young chickens in the United States of America. Prev Vet Med 120:328–335. 364. Russell TA, Stefanovic T, Tscharke DC. 2015. Engineering herpes simplex viruses by infection–transfection methods including recombination site targeting by CRISPR/Cas9 nucleases. J Virol Methods 213:18–25. 365. Sawtell NM, Thompson RL. 2014. Herpes simplex virus mutant generation and dual- detection methods for gaining insight into latent/lytic cycles in vivo. Methods Mol Biol Clifton NJ 1144:129–147. 366. Gierasch WW, Zimmerman DL, Ward SL, VanHeyningen TK, Romine JD, Leib DA. 2006. Construction and characterization of bacterial artificial chromosomes containing HSV-1 strains 17 and KOS. J Virol Methods 135:197–206. 367. Kennedy EM, Cullen BR. 2015. Bacterial CRISPR/Cas DNA endonucleases: A revolutionary technology that could dramatically impact viral research and treatment. Virology 479–480:213–220. 368. Doudna JA, Charpentier E. 2014. The new frontier of genome engineering with CRISPR- Cas9. Science 346:1258096–1258096. 369. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, Hsu PD, Wu X, Jiang W, Marraffini LA, Zhang F. 2013. Multiplex Genome Engineering Using CRISPR/Cas Systems. Science 339:819–823. 370. O’Regan KJ, Bucks MA, Murphy MA, Wills JW, Courtney RJ. 2007. A conserved region of the herpes simplex virus type 1 tegument protein VP22 facilitates interaction with the cytoplasmic tail of glycoprotein E (gE). Virology 358:192–200. 371. Maringer K, Stylianou J, Elliott G. 2012. A Network of Protein Interactions around the Herpes Simplex Virus Tegument Protein VP22. J Virol 86:12971–12982. 372. Thompson RL, Williams RW, Kotb M, Sawtell NM. 2014. A Forward Phenotypically Driven Unbiased Genetic Analysis of Host Genes That Moderate Herpes Simplex Virus Virulence and Stromal Keratitis in Mice. PLoS ONE 9:1–10. 373. Wald A. 2006. Genital HSV‐1 infections. Sex Transm Infect 82:189–190. 374. Lafferty WE, Downey L, Celum C, Wald A. 2000. Herpes simplex virus type 1 as a cause of genital herpes: impact on surveillance and prevention. J Infect Dis 181:1454–1457. 375. Whitley RJ, Corey L, Arvin a, Lakeman FD, Sumaya CV, Wright PF, Dunkle LM, Steele RW, Soong SJ, Nahmias a J. 1988. Changing presentation of herpes simplex virus infection in neonates. J Infect Dis 158:109–16. 376. Caviness AC. 2013. Neonatal Herpes Simplex Virus Infection. Clin Pediatr Emerg Med 14:n/a. 377. Kimberlin DW. 2004. Neonatal Herpes Simplex Infection. Clin Microbiol Rev 17:1–13. 378. Nellissery JK, Szczepaniak R, Lamberti C, Weller SK. 2007. A Putative Leucine Zipper within the Herpes Simplex Virus Type 1 UL6 Protein Is Required for Portal Ring Formation. J Virol 81:8868–8877. 379. D’Souza SE, Ginsberg MH, Plow EF. 1991. Arginyl-glycyl-aspartic acid (RGD): a cell adhesion motif. Trends Biochem Sci 16:246–250. 380. Ruoslahti E. 1996. RGD and other recognition sequences for integrins. Annu Rev Cell Dev Biol 12:697–715. 381. The UniProt Consortium. 2017. UniProt: the universal protein knowledgebase. Nucleic Acids Res 45:D158–D169.

185

382. Kimberlin DW, Lin CY, Jacobs RF, Powell DA, Frenkel LM, Gruber WC, Rathore M, Bradley JS, Diaz PS, Kumar M, Arvin AM, Gutierrez K, Shelton M, Weiner LB, Sleasman JW, de Sierra TM, Soong SJ, Kiell J, Lakeman FD, Whitley RJ, National Institute of Allergy and Infectious Diseases Collaborative Antiviral Study Group. 2001. Natural history of neonatal herpes simplex virus infections in the acyclovir era. Pediatrics 108:223–229. 383. Kimberlin DW, Lin C-Y, Jacobs RF, Powell DA, Corey L, Gruber WC, Rathore M, Bradley JS, Diaz PS, Kumar M, others. 2001. Safety and efficacy of high-dose intravenous acyclovir in the management of neonatal herpes simplex virus infections. Pediatrics 108:230–238. 384. Kimberlin DW, Whitley RJ, Wan W, Powell DA, Storch G, Ahmed A, Palmer A, Sánchez PJ, Jacobs RF, Bradley JS. 2011. Oral acyclovir suppression and neurodevelopment after neonatal herpes. N Engl J Med 365:1284–1292.

186

Vita

Utsav Pandey Email: [email protected] Webpage: http://szparalab.psu.edu/

Education

The Pennsylvania State University University Park, PA Biochemistry, Microbiology and Molecular Biology Program May 2018 Advisor: Moriah Szpara, Ph. D.

SUNY at Plattsburgh Plattsburgh, NY Bachelor of Science, Biology and Medical Technology, Magna cum laude (Honors) August 2010 Minor: Chemistry

Danbury Hospital Danbury, CT Diploma in Medical Technology, MLS (ASCP) May 2010

Publications

1. Pandey U, Bell AS, Renner DW, Kennedy DA, Shreve JT, Cairns CL, Jones MJ, Dunn PA, Read AF, Szpara ML. 2016. DNA from dust: Comparative genomics of large DNA viruses in field surveillance samples. mSphere. DOI: 10.1128/mSphere.00132-16

2. Pandey U, Thompson R, Renner D, Szpara M, Sawtell N. 2017.Father-to-son transmission of HSV results in near-perfect preservation of viral genome identity and in vitro phenotypes. Scientific Reports DOI: 10.1038/s41598-017-13936-6

3. Akhtar L, Bowen C, Renner D, Pandey U, Prichard M, Whitley R, Weitzman M, Szpara M. Neonatal HSV-2 population displays wide genotypic and phenotypic diversity between hosts. bioRxiv. 2018. DOI: 10.1101/262055

4. Shipley M, Pandey U, Renner D, Grose C, Szpara M. Genomic snapshot of perinatal transmission of viremic HSV-1: a case report. Status – In preparation

5. Pandey U, Bell AS, Jones MJ, Renner D, Read AF, Szpara ML, Boni MF. Effect of recombination on evolution of virulence in field isolates of MDV-1. Status –In preparation

6. Bell AS, Jones MJ, Christopher CL, Kennedy DA, Pandey U, Szpara ML, Read AF. MDV-1 diversity and dynamics of infection in the field, a study based in Pennsylvania, USA. Status – In preparation