Genomic analysis of several new Anguillid-herpesvirus-1 isolates reveals the first insights wcwxwx into the evolution of this and suggests that known Cyprinivirus species have much lower core gene diversity compared to species Donohoe O. 1,2, Delrez N.1, Zhang H. 1, Davison A. 3, and Vanderplasschen A. 1

1.Immunology-Vaccinology (B43b), FARAH, Faculty of Veterinary Medicine, Uliège 2.Biosience Research Institute, Athlone Institute of Technology, Athlone, Co. Westmeath, Ireland 3.University of Glasgow Centre for Virus Research, Glasgow, United Kingdom

Summary: Anguillid herpesvirus-1 (AngHV-1) belongs to the Cyprinivirus genus within the family and is the causative agent of haemorrhagic disease in European eels (Anguilla anguilla) and Japanese eels (Anguilla japonica). To date, the genomes sequence of only two AngHV-1 isolates have been published. In this study, we sequenced seven additional AngHV-1 isolates of different geographical origin, revealing low genomic variability and two main genetic lineages among isolates. Further analysis indicated five putative recombination events between these isolates, one of which suggested the existence of a third, yet unidentified lineage. The addional genomic data was also utilized in the comparison of core genes to other Cyprinivirus and Herpesviridae species. We observed significantly lower core gene diversity among Cyprinivirus species, which we speculate might be due to more recent divergence times or lower evolutionary rates of these species. With a view to generating tentative estimates for these, we explored whether models of Cyprinivirus evolution that involve the calibration (or dating) of internal nodes under assumption of co-speciation with hosts, could explain the observed patterns diversity while also being compatible with existing hypotheses regarding when these Cyprinivirus species emerged. Ultimately, we found estimates to be incompatible with such hypotheses but also but also found there to be a lack of support for co-speciation assumptions. However, deriving purely relative estimates in the absence of such assumptions indicates that the AngHV-1 diverged earlier and has a lower evolutionary rate relative to that of other Cyprinivirus species, which also corresponds to differences we observed in selective pressure. Notably, these observations are also remarkably congruent with expected differences in epidemiology between AngHV-1 and other Cyprinivirus species, which itself may be reflective of the fact that AngHV-1 is an anguillid pathogen, and thus unique among Cyprinivirus species. Collectively, this represents the first insights into the evolution of AngHV-1 relative to that of other Cyprinivirus and Herpesviridae species.

Isolates Sequenced: Relationship & Origin Comparison of species clade divergence time and evolutionary rate Overall, very low genomic diversity among all fully sequenced AngHV-1 isolates estimates between Cyprinivirus and Herpesviridae species (The Netherlands) (Newly sequenced) 0.00012 (The Netherlands) Cyprinivirus estimates from this study were compared to that of HHV1 (high diversity in Fig.3) and HHV3 (min-low diversity in Fig.3) 0.00009 estimates made elsewhere to determine if low diversity among Cyprinivirus species is corresponds to more recent divergence times or (Denmark) (Newly sequenced) lower evolutionary rates relative to other herpesvirus groups 0.00006 0.00013 (The Netherlands) (Newly sequenced) (A) (B) (C) Fig.6 (A) Scatter plot highlighting the 0.00011 0.0002 (Denmark) (Newly sequenced) apparent relationship between Cyprinivirus estimates from this study are divergence time and evolutionary 0.0001 0.0003 “long-term” estimates, thus it is only valid rate estimates for both Cyprinivirus (Taiwan) to compare them to corresponding long (this study) and Herpesviridae species HHV1 and HHV3 (estimated 0.0004 term estimates for other Herpesviridae (United Kingdom) (Newly sequenced) species elsewhere). Differences in Herpesviridae estimates are and due 0.0003 to TRDP and the methodology used 0.0001 (Denmark) (Newly sequenced) to derive these estimates (Long-term 0.0003 vs Short-Term) (B) Violin plots (Denmark) (Newly sequenced) comparing the estimated divergence times between each Cyprinivirus and Herpesviridae species (C) Box-plots comparing estimated substitution rates corresponding to the divergence times in (B) Fig.1 Phylogenetic analysis of AngHV-1 full length genome sequences. UPGMA tree (1000 rounds of bootstrapping) generated using MEGA gull length genome sequences, rooted using the branch leading to the DK2 and DK3 No significant difference between Cyprinivirus and Herpesviridae Cyprinivirus long-term evolutionary rate estimates are clade. Bootstrapping values for each node are indicated with branches also coloured by bootstrapping values according to colour scale on top left. long-term estimates for species clade divergence times significantly lower than Herpesviridae estimates

Both Cyprinivirus and Herpesviridae estimates exhibit an apparent relationship between divergence time and evolutionary rate estimations. Extreme differences in Herpesviridae estimates from several other studies are due to “Time Dependent Rate Phenomenon” (TRDP), related to time scale over which Recombination events between isolates estimates were made and methodology used (short vs long-term). Cyprinivirus estimates from this study are categorized as long-term estimates. (A) Pairwise identity between (B) Relationship between all isolates based on full (C) Relationship between all isolates based on recombinant and parental isolates genome excluding the proposed recombined region proposed recombinant region only Lower evolutionary rates among Cyprinivirus species relative to Herpesviridae species is compatible with the earlier observation of lower diversity in Fig.3 Recombination event P = 8.273×10-6 Fig.2 Summary of one of five recombination 99 CVI CVI events between AngHV-1 strains. (A) 0,8709 95% Breakpoint confidence interval KX and FJ (proposed major parent of KX) 100 69 99% Breakpoint confidence interval FJ DK1 Identification of recombination site. Relationship 99 As with studies on other herpesvirus species, our estimates are entirely based on the assumption of co-speciation within the Cyprinivirus genus. However, we Tract of sequence w ith a recombinant origin DK1 FJ between non-recombinant (B) and recombinant 0,6612 FJ - DK 2 55 regions (C) are also indicated FJ - KX HVA HVA propose that such assumptions must be carefully critically evaluated 100 DK 2 - KX KX 66 (Major Parent - Minor Parent) UK 0,4516 FJ and DK2 (Major Parent - Recombinant) DK4 DK4 KX and DK2 ( best match for proposed minor (Minor Parent - Recombinant) parent of KX minor parent of FJ, but still UK DK2 Unlike the relationship between KX and the other Pairwise identity Pairwise distant) 0,2419 isolates based on whole genome (Fig. 1) and the DK2 55 DK3 non-recombinant part of the genome (Fig.2B), the The assumption of Co-speciation within the Cyprinivirus genus: A Critical evaluation 100 DK3 KX 0,0322 recombinant region is much more distantly related 1 62594 125187 187827 250374 to the corresponding regions from other isolates • Current hypotheses relating to the origins of Cyprinivirus species point towards the continuous selection of increasingly pathogenic (A) Position in alignment 0.0004 0.0003 0.0002 0.0001 0.0000 0.0007 0.0006 0.0005 0.0004 0.0003 0.0002 0.0001 0.0000 (Fig.2C), indicating possible acquisition from an strains within aquaculture settings, implying very recent species clade divergence times. Our models of Cyprinivirus evolution the Major parent - Recombinant Major parent - Minor parent Minor parent - Recombinant Recombinant Major parent Sequence used to infer unknown minor parent Recombinant Major parent Sequence used to infer unknown minor parent unknown genetically distant AngHV-1 isolate (closest match within data) (closest match within data) 95% breakpoint confidence interval Recombined region that assume co-speciation result in much earlier species clade divergence times, and raise an apparent incapability with such Genome of KX isolate indicates recombination event involving an additional unknown isolate hypotheses. • The apparent evolutionary distance between Cyprinivirus species is low relative to the distance between their respective hosts, in particular the distance between Anguilliform and Cypriniform orders. If co-speciation occurred we would expect AngHV-1 to be much more distantly related to Cyprinivirus species infecting cyprinid hosts. • CyHV-1 could not be included in earlier analysis of diversity and estimation of species clade divergence (only a single fully sequenced Very low core gene diversity within all Cyprinivirus species clades lineage available), however we note that contrary to Fig. 4, our refined version the Cyprinivirus tree that includes this single CyHV-1 lineage does reduce congruency with host tree topology (Fig.7 A&B). This does not rule out co-speciation of at least some Cyprinivirus species infecting carp and goldfish, but it does create uncertainty as to which node coincides with the divergence of Cyprinus and (B) We compared sequence diversity in core herpesvirus genes within the AngHV-1 species clade to that of other Cyprinivirus and Carassius genera, and thus which exact node the calibration should be applied to. This added uncertainty makes such long-term mammalian Herpesviridae species clades estimates of Cyprinivirus divergence times and evolutionary rates somewhat problematic, but it may be clarified though the future identification of additional Cyprinivirus species infecting these host species.

• All fully sequenced isolates for Summary: Currently, there is some uncertainty around which node may coincide with a co-speciation event Cyprinivirus species (those with among of cyprinid hosts. Elsewhere in the phylogeny, the large distance between Anguilliform and >1 genome sequence available) Cypriniform orders raises uncertainty surrounding earlier co-speciation events were included Given some of the uncertainty surrounding co-speciation assumptions, future efforts towards exploration of • 10 fully sequenced isolates from short-term estimates may be useful. The generation of large-scale time-structured data necessary may be Fig. 7 (A) Maximum likelihood phylogenetic analysis of all Cyprinivirus species each of the 9 most frequently based on concatemerized versions of the same four core gene sequences (1000 achieved though collaboration with a network of reference labs with access to high quality sample archives. rounds of bootstrapping). (B) Bayesian phylogenetic analysis of all Cyprinivirus sequenced mammalian species based on the same four core gene sequences (partitioned sequences, Given this, in parallel, we also chose to explore relative estimations of divergence time and evolutionary rates linked trees and clocks, separate substitution models, 90 million Monte Carlo herpesvirus species were also Markov chain generations). Both approaches to phylogenetic analysis are in included within the Cyprinivirus genus in the absence of co-speciation assumptions (see next section) agreement with regard to topology, and that the inclusion of CyHV-1 reduces congruency with host 18S rRNA tree in Fig.4. • In each genome, sequences of 4 Relative comparison of the same evolutionary attributes within the Cyprinivirus genus in the absence of co- core herpesvirus genes (conserved across most speciation assumptions, indicates that AngHV-1 evolved differently to other Cyprinivirus species herpesvirus species) were included. Given the uncertainty surrounding estimates of divergence times and evolutionary rates (per unit of time) involving node calibration under assumption of co-speciation, we instead compared relative estimates within the Cyprinivirus in the absence of node calibration • 492 sequences were used to construct 48 species-level gene (A) (B) trees, Diversify (π) and Mean Our relative estimates indicate that the branch length were calculated AngHV-1 species clade diverged earlier for each (Fig.8A) and has a lower evolutionary rate

Fig.3 Comparison of diversity between species from the Cyprinivirus genus and Herpesviridae family based on species-level nucleotide alignments and gene trees generated (Fig.8B) than other Cyprinivirus species using four core herpesvirus genes (A) Diversity (π) from each species-Level nucleotide sequence alignment and (B) Mean branch length for each species-Level Gene-Tree based on four core herpesvirus genes AngHV-1 is unique among Cyprinivirus Cyprinivirus genus displays significantly lower core gene diversity than mammalian Herpesviridae genera species as it infects anguillid hosts as

opposed to cyprinid hosts. This may have Fig. 8 (A) Maximum clade credibility tree of Bayesian phytogenic analysis in Fig.7B, relative node age estimates (in arbitrary units) are indicated at each node, indicating that the CyHV-2 clade diverged most recently, with AngHV-1 diverging earliest. (B) Visual summary of results from Tajima’s Relative Rates Tests. This is the same tree in Fig.7 (A) with branch lengths profound impacts on the selective forces transformed to represent substitutions per site. Relative rates between each species were compared in pairwise manner through four separate tests (i.e. one for each of the four core genes of interest in this study), in each case the proportion of tests supporting the proposed differences in rates are expressed as percentages. The tests support the observation that No evidence of recent host jumps leading to low diversity at play in its evolution relative to that CyHV-1 and CyHV-2 are evolving fastest, while AngHV-1 is evolving the slower than the other species in the genus. Cyprinivirus species clade phylogenetic relationships Cyprinivirus host species phylogenetic other Cyprinivirus species. based on four core genes relationships based on 18S rRNA AngHV-1 also exhibits less evidence of positive Conclusion: Collectively, these evolutionary selection within core genes comparisons between AngHV-1, other • Branch-site models were generated to establish if any (A) (B) individual codon changes between Cyprinivirus species in Cyprinivirus species and Herpesviridae have the four core genes of interest represent instances of positive selection of individual amino acids provided a useful insight into the differences

• After comparing models permitting positively selected sites on branches of interest against background of between these related groups of viruses and purifying or neutral selection elsewhere, to null models (not permitting positive selection on the same branches), opens up reasonable scope for further the DNA polymerase (Fig.9A) and Helicase (Fig.9B) genes were identified as being under positive selection within avenues of investigation in the future. the Cyprinivirus genus, but no such evidence was identified in AngHV-1. Individual sites under positive selection on each branch were identified via Bayes (C) Furthermore, after thorough exploration, we Indicates a long-term co-evolution with respective hosts Empirical Bayes (BEB) approach. Fig. 9 Sites under positive selection within the have been able to conclude that the • Positive selection included multiple instances of apparent Cyprinivirus species DNA polymerase (A) and convergent evolution towards serine residues in CyHV-1 Helicase (B) gene sequences are indicated with assumption of co-speciation in the Why such low diversity? Did contemporary and CyHV-2 DNA polymerase gene. The large differences black boxes, with site positions in gene in codons used indicate a transition through other amino alignment indicated on top. (C) Serine residue acid codons prior to reaching current state. under convergent evolution in CyHV-2 driven by estimation of Cyprinivirus species clade Cyprinivirus species clades diverge more positive selection (position 556 in alignment in (A)), with approximate location marked in partial • recently than other Herpesviridae species One of these serine residues under convergent evolution CyHV-2 DNA polymerase structure, and divergence times and evolutionary rates, is Fig.4 Phylogenetic trees of three low-diversity Cyprinivirus species based on nucleotide alignment of four core herpesvirus is indicated in Fig.9C, its approximate location on the compared to location of homologous residue in structure of each Cyprinivirus species DNA polymerase is DNA polymerase in other Cyprinivirus species. genes and comparison to corresponding host phylogenetic trees. clades? Or did they evolve slower? Or both? conserved also, indicating its importance The approximate location is conserved. currently problematic given the degree on uncertainty surrounding this assumption. AngHV-1 infects a very different host compared to other Cyprinivirus Furthermore, the resulting divergence times species, which may have profound impacts on the selective forces at Estimation of species clade divergence times based on the assumption of play in the evolution of this virus relative to that of other known are not compatible with widely held co-speciation with respective hosts Cyprinivirus species hypotheses regarding when and how these species clades emerged. As an alternative, • Generated estimations of host divergence times and used this information These observations are consistent with the expected differences in to calibrate (or date) specific events in the Cyprinivirus phylogenetic history we provide support and justification for epidemiological circumstances between different Cyprinivirus species • This information was then used to infer the age of other nodes elsewhere collaborative efforts towards exploring the in the Cyprinivirus phylogenetic tree Unlike cyprinids, eels are extremely solitary, in addition their life cycle utilization of archived samples in generating • Used two methods: RelTime (part of MEGA, maximum likelihood approach) and reproductive behaviour is less conducive to viral transmission and and BEAST (Bayesian approach, 300 million Monte Carle Markov Chain short-term estimates over more recent there is greater temperature restriction on the occurrence of disease generations) timescales, something which may represent • Both were broadly in agreement with each other regarding respective caused by AngHV-1. estimation of divergence times and corresponding evolutionary rates Fewer AngHV-1 transmission events may lead to reduced evolutionary a very important step towards rates relative to other Cyprinivirus species, as observed in this study understanding the evolutionary history of How does this compare to equivalent estimations for AngHV-1, other Cyprinivirus species and how mammalian herpesvirus species? How can it be this compares to the wider related back to differences in diversity in Fig.3 ? Fig.5 Estimation of divergence times and substitution rates for Cyprinivirus species. The calibration information, used with each method and nodes they were applied to are also indicated (mya. = millions of years ago). The estimated divergence times of each species clade (mean order. and median over both methods) are summarized on the right (ya. = years ago)