<<

The evolutionary dynamics of norovirus

John-Sebastian Eden

Bachelor of Medical Science (Hons)

A dissertation submitted for the fulfilment of the requirements for the degree Doctor of Philosophy

Submitted 2012

Originality statement

‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’

29th July 2012 Signed ………………………………………………………… Date ………………………………………

ii

Copyright statement

‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only).

I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.'

29th July 2012 Signed ………………………………………………………… Date ………………………………………

Authenticity statement

‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.’

29th July 2012 Signed ………………………………………………………… Date ………………………………………

iii

Abstract

Norovirus (NoV) is the leading cause of both outbreak and sporadic community –acquired acute gastroenteritis. The overall incidence of NoV infection has grown dramatically since the emergence of epidemic NoV strains of the GII.4 lineage in the mid-1990s that have been associated with five pandemics and account for 80% of NoV infections. This thesis aimed to describe the mechanisms of evolution that facilitate the emergence of epidemic GII.4 variants and to elucidate factors that contribute to their higher epidemiological fitness.

Two molecular epidemiological studies were performed to characterise the NoV strains linked to epidemics in New South Wales, Australia and those in circulation globally between 2007 and 2010. The pandemic GII.4 variant 2006b was identified as the cause of the 2007 and 2008 epidemics and the GII.4 variant New Orleans 2010 was the aetiological agent of the epidemics of 2009 and 2010. Each variant demonstrated antigenic drift in the capsid P2 domain that likely contributed to their epidemic potential. These studies also highlighted the role that recombination played in the emergence of New Orleans 2010.

A number of factors were identified that may have contributed to the higher epidemiological fitness of the pandemic NoV GII.4 variants. Firstly, by comparing the enzymatic properties of different NoV polymerases, including replication efficiency and fidelity, it was shown that GII.4 variants have higher replication and mutation rates. It was also shown that polymerases from more prevalent genotypes, such as GII.4 and GII.b, are phosphorylated by an important cellular kinase, Akt, at a residue (Thr33) that decreases the de novo polymerase activity.

Using next-generation sequencing technology, patterns of intra-host evolution were compared between acute and chronic NoV infections. Extensive heterogeneity and toggling at antigenic sites of the viral capsid was observed in the chronic patient, which suggests that immune-compromised individuals with chronic NoV infections could be a source for novel antigenic variants. In the same study, a transmission cluster

iv

was also examined and a strong genetic bottleneck was identified at the point of transmission.

Overall, this thesis suggests that a complex pattern of mutation, recombination and adaptation drive NoV evolution in response to herd immunity.

v

Acknowledgements

Peter White – You have been an insightful and generous supervisor. Thank you for all the coffees, the conferences and the fishing stories and most recently, I thank you for giving me a job. I don’t know where I’ll eventually end up but I hope we will continue to collaborate together.

Rowena Bull – As I come to the end of my PhD, the more I realise that the whole time, I have just been walking in your footsteps. You have paved the way for everything I have worked on and provided me with sage advice on countless occasions. Thanks for everything.

Lab people – Since starting in the lab during my honours year (way back in 2007), I have had the pleasure of working with a number of people who have made my time and the experience all the more worthwhile. I must begin by giving my sincerest thanks to Sean Pham, who has been there with me from the start and hopefully will continue to be a friend and colleague into the future. You have been a great support. To the original lab members, Elise Tu and Jenny Mak, you two were great fun and treated me so well. Thanks for all the dinners and for tolerating my mischief. To more recent lab members including Arthur Chee, Han Fui Lim, Filip Bebek, Auda Eltahla, Kun-Lee Lim and Rouba Ballouk, I want to thank you all for making the lab a fun, productive place to work.

Outside of my lab, a number of other people have made significant contributions to my project and none more so that Laura Sharpe. I don’t know what I would have done without all your help. I was forever pestering you but you always put up with me. It was great to work with you and Andrew Brown. I know the progress was a little slow (mostly due to my procrastination) but we got there in the end and can be proud of our paper. A similar thank you must also go to Mark Tanaka for the help on my last paper. You might not feel like you’ve done that much; however from my perspective you’ve been a great support and a wonderful source of knowledge.

I would like to thank all my neighbours from the Wilkins and Brown Labs for their generosity and willingness to share equipment and reagents. I would also like to thank vi

Bill Rawlinson, Juan Merif and Chris McIver from the Virology Diagnostic Lab at Prince of Wales Hospital for their generous provision of norovirus samples and contributions as co-authors.

Co-authors – A big thank you to everyone who contributed to the papers presented in this thesis.

Family – To Mum and Dad, you have been extremely supportive and generous. Thank you for putting up with me through all this time spent at university. I know it will be worth the effort and you deserve all my thanks. To my brother Tom, you have been a great mate who has helped me have fun and relax through all the bore of scientific research. To my sisters, Kerrianne and Hannah, thank you both for your support. To all my grandparents, I know my work has made you proud (because you tell me all the time) and I am completely grateful for giving me such support.

Amber – To my wonderful girlfriend (of at least five years?), thank you, thank you, thank you from the bottom of my heart. You have been my biggest support and continuously cared for me despite having to put up with my shenanigans. Hopefully now we can now move onto to something better (and I might also get to see you more often). All my love.

I feel that the final thank you must go to the guys from the library lawn coffee cart, who have kept me energised throughout my time at university. I have been absolutely spoiled to be able to get some of the best coffee in Sydney, just a two minute walk from the lab. You guys are always great fun and make everyday a bit more interesting. Thank you!

vii

Table of contents

1 General introduction ...... 1

1.1 Acute gastroenteritis ...... 1 1.2 Background to norovirus ...... 4 1.3 Structure and genome organisation ...... 5 1.4 Classification ...... 7 1.5 ORF1-encoding non-structural proteins ...... 9 1.6 ORF2/3-encoding structural proteins ...... 12 1.7 NoV immunology ...... 13 1.8 Molecular epidemiology ...... 14 1.9 Antigenic variation in the GII.4 lineage ...... 15 1.10 Norovirus recombination ...... 16 1.11 Virus-host interactions ...... 18 1.12 Aims and outline of this thesis ...... 20

2 “Norovirus GII.4 variant 2006b caused epidemics of acute gastroenteritis in Australia during 2007 and 2008” ...... 22

2.1 Abstract ...... 23 2.2 Introduction ...... 24 2.3 Materials and methods ...... 26 2.4 Results ...... 28 2.5 Discussion ...... 36

3 “Rapid evolution of pandemic noroviruses of the GII.4 lineage” ...... 39

3.1 Abstract ...... 40 3.2 Introduction ...... 41 3.3 Materials and methods ...... 43 3.4 Results ...... 49 3.5 Discussion ...... 60 3.6 Supporting information ...... 64

viii

4 “Norovirus RNA-dependent RNA polymerase is phosphorylated by an important survival kinase, Akt” ...... 66

4.1 Abstract ...... 67 4.2 Chapter text ...... 68

5 “Contribution of intra- and inter-host dynamics to norovirus evolution” ...... 77

5.1 Abstract ...... 78 5.2 Introduction ...... 79 5.3 Materials and methods ...... 82 5.4 Results ...... 86 5.5 Discussion ...... 98

6 “The emergence of the pandemic norovirus GII.4 variants” ...... 103

6.1 Abstract ...... 104 6.2 Introduction ...... 105 6.3 Materials and methods ...... 108 6.4 Results ...... 112 6.5 Discussion ...... 131 6.6 Supporting information ...... 139

7 Conclusions...... 148

8 References ...... 159

ix

List of figures

Figure 1-1 Enteric virus detection at the SEALS diagnostic facility between 2004 and 2008...... 3 Figure 1-2 Genome organisation of a representative norovirus GII strain: MD145-12. .. 7 Figure 1-3 NoV demonstrates extensive genetic diversity...... 8 Figure 1-4 A structural model of the norovirus RNA-dependent RNA polymerase...... 11 Figure 1-5 Crystal structure of the norovirus capsid...... 12 Figure 1-6 Proposed mechanism of norovirus recombination...... 17 Figure 2-1 NoV is associated with outbreaks of gastroenteritis in NSW, Australia...... 28 Figure 2-2 Phylogenetic analysis of 266 bp of the 5’ end of the capsid region in NoV. . 31 Figure 2-3 Simplot analysis of NoV GII recombinant NSW199U/Sep/08...... 34 Figure 3-1 Comparison of the amino acid sequence of the six NoV RdRps used in this study to other representative strains...... 52 Figure 3-2 The effect of mutations at residue 291 on NoV GII.4 RdRp kinetic activity. . 53 Figure 3-3 Phylogenetic analysis of the amino acid sequence of the P2 domain from GII.4 (A), GII.b/GII.3 and GII.3 (B) and GII.7 (C) strains circulating between 1987 and 2008...... 56 Figure 3-4 Rate of evolution for the GII.4, GII.7, GII.b/GII.3 and GII.3 strains...... 57 Figure 3-5 Hypervariable residues in GII.3, GII.4 and GII.7 are localised to common regions on the surface of the capsid P2 domain...... 59 Figure 3-6 Alignment of the amino acid sequences of the P2 domain...... 65 Figure 4-1 Akt phosphorylates norovirus RdRp at Thr33...... 72 Figure 4-2 3D-modelling of norovirus RdRp showing finger-thumb domain interactions at Thr33...... 73 Figure 4-3 Comparison of enzyme kinetics of the wild-type 2006b RdRp with a Thr33Glu phosphomimetic mutant...... 75 Figure 5-1 Phylogenetic comparison of GII.4 ORF2 nucleotide sequence isolated longitudinally from subjects with acute and chronic NoV infections...... 87 Figure 5-2 Distribution of single nucleotide polymorphisms (SNPs) detected from the 3’ end of ORF1 to the 5’ end of ORF3...... 88 Figure 5-3 Phylogenetic analysis of sequences from the NoV transmission cluster...... 91 x

Figure 5-4 Comparison of the intra-host distribution of NoV variants in all subjects. ... 93 Figure 5-5 Analysis of amino acid variants in the P2domain in subject Ch with chronic NoV infection...... 96 Figure 6-1 Epidemics of institutional acute gastroenteritis coincided with increases in NoV detection across NSW, Australia...... 113 Figure 6-2 Phylogenetic reconstruction of NoV GII.4 capsid evolution from 1974 to 2010...... 119 Figure 6-3 Antigenic variation in the GII.4 New Orleans 2010 variant...... 121 Figure 6-4 Recombination breakpoints identified across genome in the GII.4 lineage...... 124 Figure 6-5 Evolutionary forces on the GII.4 New Orleans 2010 variant ORF2...... 130 Figure 6-6 Model for the emergence and origin of the GII.4 New Orleans 2010 variant...... 137 Figure 6-7 Representative results from the norovirus GII full-length genome RT-PCR...... 140 Figure 6-8 Simplot analysis of recombinant norovirus strains...... 141 Figure 6-9 Simplot analysis of the Cairo 2007 variant in the partial ORF1/complete ORF2 regions...... 142 Figure 6-10 Maximum likelihood phylogeny of the GII.4 ORF1 region...... 143 Figure 6-11 Maximum likelihood phylogeny of the GII.4 ORF2 region...... 144 Figure 6-12 Maximum likelihood phylogeny of the GII.4 ORF3 region...... 145 Figure 6-13 Maximum likelihood phylogeny of the GII.4 capsid P2 domain based on nucleotide sequence...... 146 Figure 6-14 Maximum likelihood phylogeny of the GII.4 capsid P2 domain based on protein sequence...... 147

xi

List of tables

Table 1-1 Description of ORF1-coding non-structural proteins ...... 9 Table 2-1 Summary of NoV-positive specimens analysed in this study ...... 30 Table 3-1 Oligonucleotide sequences designed and used in this study ...... 45 Table 3-2 Comparison of the replication accuracy and rate for NoV and HCV RdRps ... 50 Table 3-3 GenBank accession numbers for the RdRps genes used in this study ...... 64 Table 4-1 RdRp strains and their predicted Akt phosphorylation sites and prevalences ...... 70 Table 5-1 Description of the transmission cluster cohort ...... 82 Table 5-2 Description of the longitudinal study cohort ...... 82 Table 5-3 Oligonucleotides used in this study ...... 83 Table 6-1 Prevalence of NoV genotypes identified during 2009 and 2010 ...... 115 Table 6-2 GII.4 variants examined in this study ...... 117 Table 6-3 Summary of recombination in the GII.4 lineage ...... 127 Table 6-4 Primers used in this study ...... 139

xii

Publications during time of candidature

Published

Eden, JS, RA Bull, E Tu, CJ McIver, MJ Lyon, JA Marshall, DW Smith, J Musto, WD Rawlinson and PA White (2010) Norovirus GII.4 variant 2006b caused epidemics of acute gastroenteritis in Australia during 2007 and 2008, Journal of Clinical Virology 49(4):265-71.

Bull, RA, JS Eden, WD Rawlinson and PA White (2010) Rapid evolution of pandemic noroviruses of the GII.4 lineage, PLoS Pathogens 6(3):e1000831.

Eden*, JS, LJ Sharpe*, PA White† and AJ Brown† (2011) Norovirus RNA-dependent RNA polymerase is phosphorylated by an important survival kinase, Akt, Journal of Virology 85(20):10894-8.

Bull*, RA, JS Eden*, F Luciani, K McElroy, WD Rawlinson and PA White (2012) Contribution of intra- and inter-host dynamics to norovirus evolution, Journal of Virology 86(6):3219-29.

*Denotes authors contributed equally.

†Denotes senior authors contributed equally.

Accepted for publication

White, PA, JS Eden and G Hansman (Accepted) Molecular epidemiology of noroviruses and sapoviruses and their role in Australian outbreaks of acute gastroenteritis, Microbiology Australia.

Submitted for publication

Eden, JS, M Tanaka, MF Boni, WD Rawlinson and PA White (Submitted) The emergence of the pandemic norovirus GII.4 variants, PLoS Pathogens.

xiii

Ch. 1 –General introduction

1 General introduction

1.1 Acute gastroenteritis Acute gastroenteritis is a common disease that impacts significantly on human health. It is characterised by an inflammation of the mucosal surface lining the gastro- intestinal tract causing vomiting, diarrhoea, nausea and abdominal cramping in affected individuals.

Acute gastroenteritis affects people of all age groups but impacts most severely on young children, the elderly and the immuno-compromised. According to the World Health Organization, in 2004, the global incidence of diarrheal disease was 4.62 billion episodes (276). In low-income nations, the mortality associated with diarrheal disease remains high, where approximately 18-21% of deaths among children under the age of five years were caused by acute diarrhoeal disease (139, 277). These deaths are primarily attributed to unsafe water supplies and inadequate levels of sanitation and hygiene (195), which facilitates the spread of infectious agents such as rotavirus (RV) and norovirus (NoV). In 2008, 453 000 deaths in children younger than five years were attributed to RV infections globally (253). Across the same age-group, estimates suggest that up to 200 000 deaths may be caused by NoV in developing countries (202).

In developed nations, the mortality associated with diarrheal disease is less than 1% (277). Despite this, the morbidity and economic impacts of acute gastroenteritis remain high and consequently, significant strains are placed on healthcare systems. For example, in the United States (US), RV infections cause more than 200 000 hospital admissions every year (54, 135) and the economic impact of this has been estimated to exceed $US 1 billion per year (260). In the United Kingdom (UK), nosocomial outbreaks of gastroenteritis, of which 63% were caused by NoV, have been estimated to cost the English National Health Service £115 million due to the closure of wards, restriction of new admissions and staff absences (158).

There are many infectious causes of acute gastroenteritis. Bacteria such as Salmonella, Campylobacter jejuni and Clostridium difficile, as well as protozoa including

1

Ch. 1 –General introduction

Cryptosporidium and Giardia lamblia are all examples. However, it is the human enteric viruses such as RV and NoV that together account for more than half of all cases of acute gastroenteritis globally (1, 84). Other less commonly identified human enteric viruses include adenovirus (AdV) F - types 40 and 41, astrovirus (AstV) and sapovirus (SaV).

Historically, RV has been the most common cause of severe gastroenteritis in infants (14, 66, 114, 196, 264). However, following the development of two rotavirus vaccines, Rotarix (GlaxoSmithKline) and RotaTeq (Merck), and their inclusion into numerous national health programs from 2006 onwards, a dramatic decrease in the incidence of severe gastroenteritis caused by RV has been observed (200, 201, 254, 255, 261). In New South Wales (NSW), Australia, the trends have matched those seen globally. For example, between January 2004 and October 2008, a total of 39 386 diagnostic viral antigen tests were performed at the South East Area Health Service (SEALS) to detect AdV, AstV, NoV and RV in stools from cases of acute gastroenteritis. Overall, NoV was the predominant virus identified (55.59%), followed by RV (30.92%), AdV (8.44%) and AstV (5.05%). By comparing the same figures by their monthly total, a shift in predominance from RV to NoV was observed in 2007, which coincided with the introduction of a free rotavirus vaccine to all new-born infants across NSW in May of that year (Figure 1-1).

2

Ch. 1 –General introduction

Figure 1-1 Enteric virus detection at the SEALS diagnostic facility between 2004 and 2008.

The monthly totals of adenovirus, astrovirus, norovirus and rotavirus antigen detection in stool specimens from cases of acute gastroenteritis were compared between January 2004 and October 2008. The tests were performed at the SEALS diagnostic facility, Prince of Wales Hospital, Sydney, Australia, which is one of the largest such facilities in NSW. Rotavirus and norovirus were the predominant viruses detected across the period of investigation, however following the widespread roll-out of the rotavirus vaccine in May 2007 (marked with an asterisk); the incidence of rotavirus has decreased such that norovirus is now recognised as the leading cause of acute gastroenteritis in NSW, Australia.

With the emergence of epidemic NoV strains and the improved surveillance, NoV is now recognised as the leading cause of viral acute gastroenteritis and is now estimated to cause almost half of all cases of gastroenteritis globally (11, 199). Although commonly identified as the cause of sporadic disease, NoV is primarily associated with outbreaks in institutional settings such as aged care facilities, hospitals and child care centres (20). Furthermore, NoV is a highly infectious pathogen that is readily transmitted from person-to-person through faecal-oral spread (91, 256, 259). For these reasons, NoV-associated acute gastroenteritis has become a major public health concern.

3

Ch. 1 –General introduction

1.2 Background to norovirus According to the International Committee on Taxonomy of Viruses (ICTV), Norovirus is a genus within the family Caliciviridae that also includes Vesivirus, Lagovirus, Sapovirus and Nebovirus. Additional genera have been proposed including Recovirus, which is a calicivirus that infects rhesus monkeys (87), and Valovirus, which infects swine (145). Caliciviruses are commonly identified by their icosahedral capsid morphology and genome organisation and can cause disease in both humans and animals (101, 102). Originally, caliciviruses were classified based on their ability to infect certain hosts, however this system became increasingly difficult to apply as an ever growing host range were identified (101). Therefore, the classification of each genus (and individual species) is now derived from phylogenetic approaches (101). Only NoV and SaV are associated with gastrointestinal disease in humans, with SaV more commonly infecting young children and causing a much milder illness when compared to NoV (219).

The earliest likely description of NoV infections refer back to the report of a winter vomiting disease in 1929 by Zahorky (292). Sporadic reports of similar epidemics appeared throughout the literature without a causative agent being identified (99, 110, 262, 263). The common features of these epidemics included a filterable, non-bacterial agent causing gastroenteritis mostly in young children with the most characteristic symptom being vomiting. Then in 1968, an outbreak of acute non- bacterial gastroenteritis occurred at an elementary school in Norwalk, Ohio, US and despite numerous initial attempts, no specific pathogen was identified (68). Then in 1972, Kapikian et al., applied a relatively novel technique, immune-electron microscopy, to stool filtrates collected from the Norwalk outbreak (127). In this study, a 27-nm viral particle was observed and following serological evidence from experimental and natural infections, the prototype NoV species, Norwalk virus, was determined to be the cause of the outbreak of acute gastroenteritis. This was also the first time a virus had been identified as the aetiological agent of gastroenteritis in humans.

4

Ch. 1 –General introduction

Despite the progress in understanding NoV using molecular approaches, the lack of a fully permissive cell culture system has meant that fundamental questions regarding NoV replication and pathogenesis remain unanswered. Significant effort has been made to identify a permissive cell-line for human NoVs, however these attempts have failed (79). One study showed that RNA derived from human NoV was infectious and can initiate viral replication following transfection into human hepatoma Huh-7 cells, however, the virus produced was unable to infect new cells (104). This suggested that the problems culturing NoV were derived from a block in the receptor binding/un- coating stage. Then, in 2003, the identification of a novel murine norovirus (MNV) was reported that caused severe disease in immuno-compromised mice (129). The discovery of MNV was important as it not only provided the first small animal model to study NoV infections but also provided the first infectious cell culture system (281). Wobus et al. showed that MNV replicates in cells of mononuclear origin, such as primary dendritic cells and macrophages (281). Furthermore, with the development of MNV reverse genetics systems fresh insights into NoV replication, immunology and pathogenesis are being revealed (48, 272, 289).

1.3 Structure and genome organisation NoV possesses a single stranded, positive-sense, poly-adenylated RNA genome of approximately 7500 nucleotides, which is packaged into a naked icosahedral virion (27 – 32 nm in diameter) (127). The viral genome is organised into three open-reading frames (ORFs) with short untranslated regions at both the 5’ and 3’ ends (Figure 1-2). ORF1 encodes a 200 kDa polyprotein that is initially cleaved by the viral 3C-like protease into at least six non-structural proteins. In other caliciviruses including lagovirus, vesivirus and SaV, further processing of the N-terminal protein by host proteases occurs to yield a total of seven mature non-structural proteins that are referred to as NS1-NS7 (Table 1-1) (18). The additional processing of the N-terminal protein was not observed in the human NoV strain, Camberwell virus (228); however the MNV NS1-2 was shown to be cleaved by caspase-3 (242). Therefore, the ORF1 NS1- NS7 nomenclature is applied across all genera within Caliciviridae, including Norovirus. ORF2 and ORF3 encode the two structural proteins VP1 (60 kDa) and VP2 (25 kDa), respectively. VP1 is the major component of the viral capsid with each virion 5

Ch. 1 –General introduction

comprised of 180 VP1 molecules organised in 90 dimers (212). VP2 is a small basic protein with a poorly defined function. Recently, a fourth ORF has been identified in MNV, which is located within an alternate reading frame of the VP1 encoded region (ORF2) (173). The protein encoded by MNV ORF4 has been described as a virulence factor that localises to the mitochondria to antagonise the innate immune response and increase cell apoptosis.

There are other conserved features of the NoV genome. RNA structures at both the 5’ and 3’ terminal regions of the NoV genome have been described that may play a role in the recruitment of translational machinery, viral replication and virulence (12, 238). The 5’-terminal sequence of the NoV genome is also highly conserved (>95% within each genogroup). This sequence of approximately 20 nucleotides is repeated internally where the 3’-end of ORF1 overlaps with the 5’-end of ORF2, and also corresponds to the start of a sub-genomic RNA that is produced for the expression of the ORF2/3 structural proteins. Since both the genomic and sub-genomic RNA contain the same conserved 5’-terminal sequence, it has been proposed to function in the recruitment of both the viral RNA-dependent RNA polymerase (RdRp), to facilitate sub- genomic RNA production, and the cellular translational machinery, to express the proteins encoded by both the RNAs. Similar to other animal caliciviruses such as Feline Calicivirus (FCV) (112), the viral protein, VPg (NS5), is covalently bound to the 5’end of the NoV genomic and sub-genomic RNA (58).

6

Ch. 1 –General introduction

Figure 1-2 Genome organisation of a representative norovirus GII strain: MD145-12.

The genome organisation based on the human NoV GII strain MD-145 (GenBank accession number AY032605). There are three open-reading frames (ORF) with the 3’ end of ORF1 overlapping the 5’ end of ORF2 by 20 bp. ORF1 encodes a large polyprotein which undergoes proteolytic cleavage to form the various non-structural (NS) proteins (18). ORF2 encodes the major structural protein VP1 which self- assembles into the viral capsid. Capsid assembly is assisted by the VP2 minor structural protein encoded by ORF3. The grey shaded region within ORF2 is a novel ORF identified in MNV (ORF4) (173).The small triangles mark the amino acid positions cleaved by the viral protease (NS6). Other features include the covalently attached protein VPg (NS5) at the 5’ end of genomic and sub-genomic RNA, and a poly (A) tail at the 3’ end.

1.4 Classification Like most RNA viruses, NoV demonstrates extensive genetic diversity, and as such, NoV has been classified into five genogroups based on the VP1 amino acid sequence (294). Each genogroup can be further divided into more than 36 genotypes (140, 294). Human NoVs include viruses from genogroups I, II and IV, with the genogroup II, genotype 4 (GII.4) viruses most commonly identified in both outbreak and sporadic settings (236). There is typically 15% divergence between genotypes and 45% between genogroups, based on full-length capsid amino acid sequences (294). NoVs are also known to infect a wide range of mammals including pigs, sheep, cows, lions, dogs and mice (153, 167, 168, 270, 282). The NoV strains that infect sheep and cows are classified as GIII, whilst those infecting lions and dogs are GIV.2 and lastly, MNV strains form a distinct genogroup, GV. A phylogenetic analysis of the RdRp region shows the extensive genetic diversity in NoV (Figure 1-3).

7

Ch. 1 –General introduction

Figure 1-3 NoV demonstrates extensive genetic diversity.

A maximum likelihood phylogenetic analysis of the NoV RdRp region gene (partial 3’end, 777 nt). The analysis was performed using the best-fit model and MEGA5 with sequences derived from GenBank (n=171). NoV has five genogroups; GI (green), GII (red), GIII (orange), GIV (blue) and GV (cyan). The distance scale represents the number of nucleotide substitutions per position.

8

Ch. 1 –General introduction

1.5 ORF1-encoding non-structural proteins As mentioned previously, the NoV ORF1 encodes a large polyprotein that is cleaved into the viral non-structural proteins NS1-7 (Table 1-1). Our understanding of the function of each non-structural protein varies. For example, the exact functions of NS1-4 are not known, whilst the functions of NS5-7 are relatively well described. The non-structural proteins demonstrate discrete patterns of localisation in the cell (116).

Table 1-1 Description of ORF1-coding non-structural proteins

Protein Common name Proposed functions Cellular localisationa

Membrane anchor in replication Endoplasmic NS1-2 N-Terminal complex reticulum Discrete foci in NS3 NTPase Nucleotide triphosphate activity cytoplasm

NS4 3A-like Inhibition of cellular protein secretion Golgi; Endosome

Recruitment of translation machinery; NS5 VPg Unknown protein primer for replication Cleavage of viral polyprotein; Cytoplasm; NS6 Protease Inhibition of translation of host proteins Mitochondria

NS7 Polymerase Viral replication and transcription Cytoplasm; Nucleus aBased on studies of MNV by Hyde et al. (116)

A recent study has shown that the N-terminal protein (NS1-2) lacks sequence similarity to any known viral or cellular protein and has inherent structural disorder (13). This structural flexibility suggests that NS1-2 may possess multi-functional activities including a possible role in membrane recruitment during replication complex formation (116, 117). Another study used a human NoV strain, Southampton virus, to show that the NS3 protein has NTPase activity and despite containing a predicted helicase domain, did not demonstrate any in vitro helicase activity (205). The exact function of the NS4 protein is currently unknown; however it has been shown to inhibit protein secretion through the disassembly and antagonism of the Golgi apparatus (231). The NS5 protein, VPg, shows sequence homology to eIF1A and has been found to interact with eIF3, eIF4GI and eIF4E, and has therefore been proposed to play a role in the recruitment .of translation machinery (58, 59). It has also been shown that VPg undergoes poly-uridylation by the viral polymerase and interacts with 9

Ch. 1 –General introduction

the poly (A) tail of the viral sub-genomic RNA to initiate protein-primed replication (222). The NS6 protein, the viral protease, plays an important role in cleaving the large ORF1-coding polyprotein into the various non-structural proteins. It shares sequence and functional similarities to the picornavirus 3C proteases (241) and has also been shown to inhibit cellular translation by cleavage of the poly(A) binding protein (144).

The NoV RdRp (NS7) is the primary enzyme involved in viral replication and transcription. Due to the difficulties in culturing human NoVs, an alternate approach was required to examine the replication properties of NoV. A number of studies have used recombinant protein produced in bacteria to characterise the biochemical properties of the NoV RdRp using in vitro assays (16, 17, 35, 90, 115). For example, the NoV NS7 and NS6-7 precursor both demonstrate RdRp activity (16) although the active form of the RdRp was a homo-dimer with co-operative activity (115). Furthermore, the NS7 protein can initiate de novo and primed RNA synthesis using homo and hetero- polymeric templates (35, 90).

The x-ray crystal structures of RdRps from GI, GII and GV viruses have also been solved (115, 147, 185). These studies show that the NoV RdRp shares many features common to other viral RdRps. NoV RdRp demonstrates a right hand ‘closed’ conformation, where the fingers domain interact with the thumb domain, which encloses the active site and forms a channel for the RNA to pass through (Figure 1-4). The active site of the enzyme is located within the palm domain and contains a highly conserved ‘GDD’ sequence in motif C that co-ordinates with a divalent cation such as Mn2+ and Mg2+ to interact with incoming nucleotides (Figure 1-4). A number of other conserved motifs common to all viral RdRps have been identified, motifs A – F, that contain residues important for enzyme function [described in (98)]. The C-terminal arm of the RdRp is flexible and can extend into the enzyme core; however its role is unknown.

RNA polymerases, such as NoV RdRp, lack a proof-reading mechanism to correct the mis-incorporation of nucleotides during replication (243). Consequently, the fidelity of RdRps is generally 100 times less than DNA polymerases that possess 3’- 5’ exonuclease proof-reading (226). The lower fidelity contributes towards the natural

10

Ch. 1 –General introduction

genetic diversity of RNA viruses; however this also allows the viral population to rapidly evolve in response to selective pressure such as the immune system or drug therapy (267). This highlights the fundamental role that the RdRp plays in the fitness and evolution of NoV.

Figure 1-4 A structural model of the norovirus RNA-dependent RNA polymerase.

The front view of the NoV GI RdRp (PDB: 1SH0) is shown highlighting the finger (red), palm (yellow), C- term (cyan) and thumb (blue) domains. The RdRp has a closed right structure that allows the tunnelling of RNA through the central core. Below the structure, an alignment of RdRp sequences from representative strains of each genogroup show that the protein sequences are well conserved. A number of motifs that are conserved across all viral RdRps are highlighted (98).

11

Ch. 1 –General introduction

1.6 ORF2/3-encoding structural proteins The VP1 protein is divided into distinct structural domains. The N-terminal arm (N-term) forms the interior portion of the capsid along with the conserved shell (S domain). A flexible hinge connects the S-domain to a protruding stem (P1 domain) that leads to the hypervariable P2 domain that forms the external surface of the viral capsid (212). The protruding P2 domain possesses motifs that are involved in binding to the host cell and are responsible for the antigenicity of the virus (235, 251). It is important to note that currently, the cellular receptor/s that facilitates NoV entry is unknown. It is known however, that NoV, like other caliciviruses, binds to Histo-Blood Group Antigens (HBGAs), which are polymorphic carbohydrates expressed on the surface of red blood cells and mucosal epithelia, and can also be found in some secretions such as saliva and milk. Their role in viral entry and susceptibly is not well understood but are generally considered to be attachment factors, not necessarily cellular receptors [review in (249)]. VP2 is a small basic protein with an undefined function. One study demonstrated that VP2 supported viral capsid assembly (19) whilst others have suggested that it may play a role in the recruitment of genomic RNA into the virion (96)

Figure 1-5 Crystal structure of the norovirus capsid.

The crystal structure of the GI.1 Norwalk virus capsid is presented in panel A (PDB: 1IHM). It is comprised of 90 dimers of the VP1 capsid protein. A side view of a VP1 dimer is shown in panel B. In both panels, the structural regions are highlighted with color including the N-term (orange), shell (green), P1 (red) and P2 (blue). The N-term and shell domains form the interior shell of the virus, whilst the P1 and P2-domains protude to form the exterior surface.

12

Ch. 1 –General introduction

1.7 NoV immunology The nature of the host immune response to NoV infection and protective correlates are poorly described. It is known that upon infection, the initial resistance to the virus is derived from a strong innate immune response through the signal transducer and activator of transcription-1 (STAT-1) pathway (129). Despite this, asymptotic shedding of the virus can continue for weeks (259) and an antibody response is required for clearance of the virus (47). NoV is known to infect individuals of all ages; therefore the immunity generated from natural infection appears to be short-lived and incomplete. Furthermore, with the broad diversity observed in NoV, immunity will unlikely be cross-protective against infection from different strains.

Much of what we have learned regarding NoV immunity is derived from volunteer challenge studies (67, 148, 198). Dolin et al. showed that partial short-term immunity lasting between 6 to 14 weeks was generated following NoV infection (67). However, a study by Parrino et al. using the prototype Norwalk virus found that all individuals who were symptomatic following an initial challenge (50%, n=6/12) were susceptible to re-infection with homologous virus when challenged again 2-4 years later (198). This suggests that despite increases in the levels of serum antibodies to Norwalk virus, the duration of immunity is short-lived.

In the same study, 50% of the volunteers did not develop illness or an antibody response following the initial challenge (and subsequent re-challenges), which indicated an inherent natural resistance to NoV infection (198). It has since been shown that individuals with a G428A nonsense mutation in the FUT2 gene (homozygous recessive), do not express the H type 1 HBGA on the surface of epithelial cells, which Norwalk viruses (GI.1) use as an attachment factor (149). These ‘non- secretors’ are genetically resistant to infection regardless of the infective dose. In contrast, Lindesmith et al. showed that infection by the NoV GII, Snow Mountain virus was not dependent on the secretor status of the individual (148). This may contribute to the higher prevalence of the GII viruses, as they have a larger susceptible population for infection compared to the GI viruses.

13

Ch. 1 –General introduction

1.8 Molecular epidemiology Since the mid-90s, variants of the NoV GII.4 lineage have caused 62-80% of all NoV outbreaks globally (73, 236). Furthermore, distinct GII.4 variants were associated with global epidemics of acute gastroenteritis from 1996 to present, and since 2002, have occurred with increased frequency (38, 189, 258). These included US 1995/96 virus in 1996 (189, 275), Farmington Hills virus in 2002 (157, 278), Hunter virus in 2004 (38), 2006a and 2006b virus in 2006 (141, 258). The first GII.4 variant associated with a pandemic was US 1995/96 (189). From April 1995 to July 1997 it caused 55% of the gastroenteritis outbreaks in the United States (189), and was also identified as the etiological agent in large outbreaks of gastroenteritis in countries including the United Kingdom, Germany and Australia (189, 275). In 2002, the second pandemic caused by a GII.4 variant occurred with outbreaks in the US (278), Europe (157) and Australia (38). This variant, Farmington Hills, was associated with 64% of the outbreaks on cruise ships and 45% of the outbreaks on land in the US that year (278). The Farmington Hills variant then became the predominant cause of outbreaks across Europe (157) and Australia (38).In February 2004, Hunter virus was identified in NSW, Australia (38). Following its identification in Australia, this variant was subsequently detected in New Zealand, Japan, Taiwan, Madagascar, Hong Kong, Canada, Nicaragua, Brazil, and across Europe (31, 113, 121, 141, 160, 194, 217, 286). Furthermore, in 2006, two new GII.4 variants termed 2006a (US Laurens-like) and 2006b (US Minerva-like), emerged to cause widespread NoV outbreaks across the globe (141, 258). First reported in Europe, this increase in global NoV activity in 2006 was primarily associated with the GII.4 variant 2006a (141). For example, from December 2005 to August 2006, the 2006a variant caused 61.8% of outbreaks of acute gastroenteritis across Australia and New Zealand, whilst the 2006b variant was only found in 11.3% of such outbreaks (258).

In addition to the major pandemic GII.4 variants, a number of additional GII.4 variants have been identified including the Henry 2001, Japan 2001, Asia 2003 and Osaka 2007 variants that were associated with epidemics localised to a particular regions rather than pandemics (15, 164, 236). Furthermore, non-GII.4 strains commonly identified in molecular epidemiological studies include the recombinant

14

Ch. 1 –General introduction

GII.b/GII.3 viruses, which is often associated with sporadic infections in children, as well as the GII.6, GII.7 and GII.12 viruses (34, 38, 236, 258).

1.9 Antigenic variation in the GII.4 lineage A number of mechanisms are thought to drive the evolution of the GII.4 lineage [reviewed in (39)]; however all are influenced by a fundamental requirement to generate antigenic variation in the capsid in response to herd immunity (151). A minimum of a single amino acid change in the antigenic region (P2 domain) of the MNV capsid protein is sufficient to avoid immune neutralization (155). In human NoVs, amino acid divergence at key antigenic motifs within the capsid, VP1, can be seen between pandemic GII.4 variants (151, 152, 235). Diversification of the NoV capsid protruding domain through accumulated mutations has been linked to escape from host immune responses directed to previous infections (151) and this therefore allows the emergence of new epidemic GII.4 NoV variants and persistence of the lineage in the population (152). Distinct NoV GII.4 lineages have emerged where each variant was descended from its chronologic predecessor and accumulated advantageous mutations, such as the lineage of Farmington Hills 2002 – Hunter 2004 – 2006a. In this regard, NoV capsid evolution is reminiscent of the evolution of Influenza A virus (IAV) haemagglutinin (HA) where immune driven selection leads to new antigenic variants that emerge to replace their predecessors.

15

Ch. 1 –General introduction

1.10 Norovirus recombination NoV inter-genotype recombinants are frequently identified in molecular epidemiological studies (25, 37, 81, 182). In NoV, homologous recombination has been proposed to occur using the ‘copy-choice’ model (34). In this model, during the transcription of negative-strand template RNA, the RdRp is blocked by the secondary structure of the sub-genomic promoter found at the ORF1/ORF2 overlap (Figure 1-6). This causes the RdRp to lose processivity and then switch template onto the sub- genomic RNA of a co-infecting virus (Figure 1-6). Recombination provides two evolutionary advantages (284). Firstly, recombination allows for the spread of advantageous genes and secondly, it creates a mechanism for the removal of detrimental genes. In NoV, recombination typically occurs at the ORF1/2 overlap (34, 37), which facilitates the exchange of non-structural and structural genes between different NoV lineage. Some studies have identified recombination at breakpoints elsewhere in the genome, including within ORF1 and ORF2 (221, 273); however these recombination events are less common. Most recombination occurs between different genotypes (inter-genotype), although recent work suggests that intra-genotype recombination may have also played a role in the emergence of some GII.4 variants (146, 179) although this remains a controversial issue since proving recombination between two closely related viruses can be difficult.

16

Ch. 1 –General introduction

Figure 1-6 Proposed mechanism of norovirus recombination.

[1] RNA transcription by the RdRp (grey oval) generates a negative intermediate strand intermediate (dashed blue lines). [2] Binding of the RdRp to the almost identical RNA promoter sequences (filled black boxes) at the start of the genome and at the ORF1/2 overlap (internal initiation) generates positive stranded genomic and sub-genomic RNA (straight blue lines). [3] These templates direct RNA synthesis from the 3’ end that leads to the generation of both further negative-sense genomic RNA as well as a negative-sense sub-genomic RNA. [4] Recombination occurs when the RdRp initiates positive strand synthesis at the 3’ end of the full-length negative strand, stalls at the sub-genomic RNA promoter, and then template switches to an available negative-sense sub-genomic RNA species generated by a co- infecting virus (dashed red line). The net result is a recombinant virus that has acquired new ORF2 and ORF3 sequences (Solid blue/red hybrid). Adapted from Bull et al. (2005) (34).

17

Ch. 1 –General introduction

1.11 Virus-host interactions Due to their compressed genomes, RNA viruses are limited in their protein- coding capabilities compared to the thousands of proteins available to the host cell. Consequently, viruses have evolved complex networks of protein interactions, through the use of short-functional peptide motifs and post-translational modifications such as glycosylation and phosphorylation, to take control of host-cell functions to aid in viral replication and immune evasion [reviewed in (124)]. Although countless examples of viral-host interactions through phosphorylation have been described, some of these studies have specifically demonstrated that viral RdRps are targets of phosphorylation. This includes a number of human viruses such as hepatitis C virus (HCV), west nile virus (WNV), tick-borne encephalitis virus (TBEV) and dengue virus type 2 (DENV2) (128, 134, 162, 177), as well as a number of plant viruses including cucumber mosaic virus (CMV) and turnip yellow mosaic virus (TYMV)(118, 133).

Kim et al. showed that the HCV RdRp (NS5B) was phosphorylated by Protein Kinase C-related Kinase 2 (PRK2) (134). The interaction between HCV NS5B and PRK2 was discovered following a random 12-mer peptide library screen using phage display with both proteins subsequently demonstrating a peri-nuclear localisation. The phosphorylation event was confirmed by using a range of methods including an in vitro kinase assay, metabolic labelling of replicon bearing cells and western-blots using a phosphoserine specific antibody. Furthermore, the phosphorylated residue was located within the N-terminal region (aa 1-187) of NS5B. Following the knock-down of endogenous PRK2 by siRNA, HCV RNA replication was reduced. Conversely, over- expression of PRK2 significantly enhanced HCV RNA replication. This study indicated that the phosphorylation of HCV NS5B by PRK2 played a role in the regulation of viral replication.

The phosphorylation of HCV RdRp may highlight a feature common among flaviviruses. In fact, WNV, TBEV and DENV2 are all examples of flaviviruses with RdRps (NS5 protein) that are phosphorylated (128, 162, 177). Mackenzie et al. showed that the RdRp from the WNV strain, Kunjin virus, was a phosphoprotein that was localised to virus induced membranes, which is the site of replication (162). In DENV2, the RdRp

18

Ch. 1 –General introduction

is known to have differential phosphorylation states that affect cellular localisations through interactions with the NS3 protein, with the hyper-phosphorylated form of the RdRp localised to the nucleus (128). The RdRp from TBEV is phosphorylated predominately during the early stage post-infection and similar to DENV2, the phosphorylation state may affect control of viral replication (177).

There are two examples of plant viruses that have phosphorylated RdRps, CMV and TYMV (118, 133). In CMV, the phosphorylation of the RdRp (2a protein) has been found to inhibit the formation of the viral replication complex by preventing 1a-2a protein interactions (133). In contrast, the phosphorylation of TYMV RdRp did not alter interactions between viral proteins involved in the replication; however phosphorylation did affect the control of RdRp stability and its function (118).

19

Ch. 1 –General introduction

1.12 Aims and outline of this thesis NoV is the leading cause of both outbreak and sporadic community –acquired acute gastroenteritis. It impacts heavily on those most vulnerable in the community including the elderly, immuno-compromised and children and since the emergence of epidemic NoV strains of the GII.4 lineage in the mid-1990s, the overall incidence of NoV infection has grown dramatically. Furthermore, GII.4 variants have been associated with at least five pandemics of acute gastroenteritis and account for at least 80% of all infections. Given the fact that the first generation NoV vaccines have entered the initial stages of clinical trials, it is important that we determine how the GII.4 variants have become so predominant.

Therefore, the overall aims of this thesis were to describe the mechanisms of evolutions that facilitate the emergence and persistence of the epidemic NoV GII.4 variants and to then elucidate those factors that contribute to their higher epidemiological fitness.

In chapter two, a molecular epidemiological study was performed to investigate two winter epidemics of NoV-associated acute gastroenteritis in Australia in 2007 and 2008. RT-PCR, sequencing and phylogenetic analysis were used to determine if a new variant had once again emerged from the pandemic GII.4 lineage as the aetiological agent of these epidemics.

In chapter three, the replication efficiency and mutation rates, which are both important parameters in viral fitness (72), were compared between different NoV RdRps to determine if these two parameters contributed to the increased epidemiological fitness of the GII.4 strains. The in vitro mutation rates were also compared to in vivo evolution rates using a bioinformatic approach.

In chapter four, a potential Akt phosphorylation site on the NoV RdRp, predicted using in silico methods, was explored. The phosphorylation site was located on a site important for the de novo activity of polymerase and importantly, appeared to be a unique feature of the more prevalent genotypes such as GII.4 and GII.b.

20

Ch. 1 –General introduction

In chapter five, the evolutionary dynamics of NoV during a transmission event, within a typical acute NoV infection and also an atypical chronic NoV infection in an immune-compromised host were investigated using next-generation sequencing.

In chapter six, the patterns of evolution of a new GII.4 variant commonly referred to as New Orleans 2010, was described using new sequence data from two large winter epidemics in Australia in 2009 and 2010 together with sequences from public databases. This chapter details how both antigenic drift and shift contributed to the emergence of New Orleans 2010 as well as a number of other GII.4 variants.

21

Ch. 2 –NoV molecular epidemiology 2007-08

2 “Norovirus GII.4 variant 2006b caused epidemics of acute gastroenteritis in Australia during 2007 and 2008”

John-Sebastian Edena, Rowena A Bulla, Elise Tua,c, Christopher J McIvera,b,c, Michael J Lyond, John A Marshalle, David W Smithf, Jennie Mustog, William D Rawlinsona,b,c and Peter A Whitea

aSchool of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia bSchool of Medical Sciences, Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia cVirology Division, Department of Microbiology, SEALS, Prince of Wales Hospital, Randwick, NSW, Australia dPublic Health Virology, Queensland Health Scientific Services, Brisbane, QLD, Australia eVictorian Infectious Diseases Reference Laboratory, North Melbourne, VIC, Australia fPathWest Laboratory Medicine, Queen Elizabeth II Medical Centre, Sir Charles Gairdner Hospital, WA, Australia. gCommunicable Diseases Branch, New South Wales Department of Health, North Sydney, NSW, Australia.

Published in the Journal of Clinical Virology (2010) Vol. 49 (4); pp 265 – 271

Author contributions: Conceived and designed the experiments – JSE PAW; Performed the experiments – JSE RAB ET; Analyzed the data – JSE PAW; Contributed reagents/materials/analysis tools – JSE CJM MJL JAM DWS JM WDR PAW; Wrote the paper – JSE PAW.

© Elsevier Reprinted with permission 22

Ch. 2 –NoV molecular epidemiology 2007-08

2.1 Abstract Over the last decade, four epidemics of NoV-associated gastroenteritis have been reported in Australia. These epidemics were characterized by numerous outbreaks in institutional settings such as hospitals and nursing homes, as well as increases in requests for NoV testing in diagnostic centres. During 2007 and 2008, widespread outbreaks of acute gastroenteritis were once again seen across Australia, peaking during the winter months. The primary objective of this study was to characterize two winter epidemics of NoV-associated gastroenteritis in 2007 and 2008 in Australia. Following this, we aimed to determine if these epidemics were caused by a new GII.4 variant or previously circulating NoV strain. NoV positive faecal samples (n=219) were collected over a two-year period, December 2006 to December 2008, from cases of acute gastroenteritis in Australia. NoV RNA was amplified from these samples using a nested RT-PCR approach targeting the 5’ end of the capsid gene, termed region C. Further characterization was performed by sequence analysis of the RdRp and capsid genes and recombination was identified using SimPlot. From 2004 to 2008, peaks in the numbers of NoV positive EIA tests from the Prince of Wales Hospital Laboratory correlated with the overall number of gastroenteritis outbreaks reported to NSW Health, thereby supporting recent studies showing that NoV is the major cause of outbreak gastroenteritis. The predominant NoV GII variant identified during the 2007- 2008 period was the GII.4 pandemic variant, 2006b (71.51%, 128/179), which replaced the 2006a variant identified in the previous Australian epidemic of 2006. Four novel GII variants were also identified including the three GII.4 variants: NoV 2008, NoV Osaka 2007 and NoV Cairo 2007, and one novel recombinant NoV designated GII.e/GII.12. The increase in acute gastroenteritis outbreaks in 2007 and 2008 were associated with the spread of the NoV GII.4 variant 2006b.

23

Ch. 2 –NoV molecular epidemiology 2007-08

2.2 Introduction NoV is the leading cause of outbreaks of viral gastroenteritis worldwide (85, 279) and is also considered a significant cause of sporadic cases of diarrhoea in the community (202, 219). NoV transmission occurs primarily from person-to-person, however, transmission through contaminated food and water are also well- documented (161, 175, 287). The high number of NoV-associated outbreaks of gastroenteritis in hospitals and nursing-homes highlights the increased risk of transmission within semi-closed environments (25, 142). Furthermore, viral shedding is not limited to the symptomatic phase of the illness and has been reported in pre- symptomatic, post-symptomatic and even asymptomatic individuals (91, 92, 259).

NoV, a member of the Caliciviridae family, was originally identified by its small round virion of approximately 27 – 35 nm in diameter (8, 127). NoV has a single- stranded, positive-sense, poly-adenylated RNA genome of approximately 7500 nucleotides, which is organized into three ORFs (101). The first ORF encodes for the non-structural proteins including the viral RdRp, whilst the second ORF encodes VP1, the capsid protein and the third ORF encodes a minor structural protein, VP2 (109). NoVs are classified into five genogroups (GI – GV), but only GI, GII and GIV are known to infect humans, with GII the most prevalent and diverse (294). Genogroups are further divided into more than 30 genotypes and are designated, for example, as GII.1 (genogroup II, genotype 1) (269, 294). There is typically 15% divergence between genotypes and 45% between genogroups, based on full-length capsid amino acid sequences (294).

In the last decade, increasing frequencies of NoV-associated gastroenteritis epidemics have been described (38, 189, 258). A single genotype, GII.4, has emerged as the major cause of NoV pandemics, with five distinct pandemic variants identified, including US-1995/96 (189), Farmington Hills (278), Hunter (38), and the 2006a and 2006b variants (258). Overwhelming evidence now demonstrates that gastroenteritis pandemics are preceded with the emergence of a new GII.4 variant (38, 235, 258).

24

Ch. 2 –NoV molecular epidemiology 2007-08

In the winter periods, June to August, of 2007 and 2008, increases in NoV- associated acute gastroenteritis activity were reported across Australia. Therefore, in this study we investigated the cause of the 2007 and 2008 NoV epidemics in Australia to determine if a new GII.4 variant had once again emerged from the dominant GII.4 pandemic lineage.

25

Ch. 2 –NoV molecular epidemiology 2007-08

2.3 Materials and methods

2.3.1 Identification of NoV-associated outbreaks of acute gastroenteritis in NSW, Australia The number of institutional gastroenteritis outbreaks reported to NSW Health, NSW Department of Health, was collated for the period January 2004 to December 2008. The reported institutional gastroenteritis outbreak data was then compared to the total number of NoV-positive samples detected by the South Eastern Area Laboratory Service (SEALS), Prince of Wales Hospital (POWH), Sydney for the same time period using GraphPad Prism v5 (GraphPad Software, San Diego, US).

2.3.2 Stool specimens In total, 219 NoV positive stool specimens from cases of acute gastroenteritis were collected from diagnostic public health laboratories in NSW, Queensland (QLD), Victoria (VIC), and Western Australia (WA) during the period from December 2006 through December 2008.

2.3.3 Sample preparation, RNA extraction and detection of NoV GII RNA Stool specimens were prepared as 20% (v/v) suspensions as described previously (258). Viral RNA was extracted using the QIAamp Viral RNA Mini kit (Qiagen, Hilden, DE). A 266 bp region of the NoV GII capsid gene (VP1), was amplified from extracted viral RNA with a real-time, nested RT-PCR approach using a MyiQ Single- Color real-time PCR detection system (Bio-Rad, Hercules, US) as described previously (258) for all samples except those from QLD, which were amplified using the primers p289/290 (119).

2.3.4 Amplification of the full-length RdRp and capsid genes Reverse transcription (RT) was performed on 10 μl of extracted viral RNA using the Superscript VILO cDNA Synthesis kit (Invitrogen, Carlsbad, US) according to the manufacturer’s instructions. Full-length capsid genes were PCR amplified with the primers NV2oF2 (38) and GV132 (258). Full-length RdRp encoding regions were also amplified by using the primers GV22 and GV6 (38). For each RT-PCR, 35 cycles of

26

Ch. 2 –NoV molecular epidemiology 2007-08

amplification was performed with High Fidelity Platinum Taq DNA polymerase (Invitrogen) according to the manufacturer’s instructions.

2.3.5 Recombinant identification Potential recombinant strains were investigated by amplifying the region across the ORF1-ORF2 overlap using a nested RT-PCR approach. The first round primers were Hep170 and NV2oR (38), and then second round primers were Hep172 (38) and G2SKR (137). Sequence data was analysed for signs of recombination using SimPlot, as described elsewhere (34).

2.3.6 DNA sequencing and phylogenetic analysis RT-PCR products were purified and sequenced directly on an ABI 3730 DNA Analyzer (Applied Biosystems, Carlsbad, US) using dye-terminator chemistry. Database searches for related sequences were conducted using BLAST. Multiple alignments and phylogenetic analyses were performed by using MEGA as described previously (258).

2.3.7 Nucleotide sequence accession numbers The GenBank accession numbers for strains sequenced in this study are as follows: GQ849126 - GQ849130, GQ845024, and GQ845311 - GQ845370.

27

Ch. 2 –NoV molecular epidemiology 2007-08

2.4 Results Over the last decade, four epidemics of NoV-associated gastroenteritis have been reported in Australia, which have coincided with global pandemics. In Australia, these epidemics occurred in 1997, 2002, 2004 and 2006 and each epidemic has corresponded with the emergence of a new GII.4 variant. In 2007, a large increase in gastroenteritis activity was once again reported (Figure 2-1). The close proximity of the 2007 outbreak season to the 2006 outbreak season made this epidemic unique relative to previous epidemics which were separated by at least two years. This increase in frequency of outbreaks occurred again in the winter season of 2008, where large numbers of NoV-associated gastroenteritis outbreaks occurred throughout NSW (Figure 2-1).

Figure 2-1 NoV is associated with outbreaks of gastroenteritis in NSW, Australia.

Total number of gastroenteritis outbreaks reported to NSW Health (Black) was compared to the total number of NoV positive faecal samples detected by POWH Laboratory (Grey) for the period January 2004 to December 2008. The GII.4 variants associated with each epidemic season are included above each peak in NoV activity in 2004, 2006, 2007 and 2008. Increases in gastroenteritis outbreaks are associated with increases in NoV activity.

28

Ch. 2 –NoV molecular epidemiology 2007-08

2.4.1 Outbreaks of gastroenteritis in NSW are associated with peaks in NoV activity Gastroenteritis outbreaks from institutional settings, reported to NSW Health, were compared to the total number of NoV positive samples per month detected by the POWH Laboratory, from January 2004 to December 2008 (Figure 2-1). Initially in 2004, a large increase in gastroenteritis outbreaks was associated with the GII.4 Hunter virus, as previously reported (38). Subsequently during late 2004 and all of 2005 very few NoV positive specimens were detected by the POWH Laboratory and importantly, the overall reports of gastroenteritis outbreaks to NSW Health were also very low (Figure 2-1). The peak in activity during 2006 was associated mainly with the spread of the NoV GII.4 variant 2006a (258). Following from this, two peaks of gastroenteritis activity can clearly be seen in the winter seasons of 2007 and 2008 in NSW, which once again correlated with increases in the detection of NoV by the POWH Laboratory (Figure 2-1). Epidemics of gastroenteritis identified in NSW, Australia during the winters of 2004, 2006, 2007 and 2008 were associated with increases in both the number of outbreaks of gastroenteritis reported to NSW Health and the number of NoV positive samples detected by the POWH Laboratory.

2.4.2 Identification of NoV GII strains by genetic analysis NoV positive samples (n=219) from four Australian states NSW, VIC, WA and QLD (Table 2-1) were collected between December 2006 and December 2008. RNA from these samples was extracted, amplified and the RT-PCR products sequenced for 179 samples. Phylogenetic analysis of the sequence data (Table 2-1 and Figure 2-2) grouped the NoV GII sequences into five genotypes; GII.4 (89.94%), GII.6 (3.91%), GII.7 (1.68%) and recombinants GII.b/GII.3 (3.91%) and GII.e/GII.12 (0.56%). The majority of GII.4 variants identified in this study clustered with the two pandemic variants, 2006b (128, 79.50%) and 2006a (24, 14.91%). Viruses from three additional recently discovered GII.4 variants – 2008, Osaka 2007 and Cairo 2007 were also identified (Table 2-1 and Figure 2-2).

29

Table 2-1 Summary of NoV-positive specimens analysed in this study

NoV genotype(s) and prevalencec No. of confirmed No. of NoV No. Locationa Date of collection % of isolates outbreaks specimens tested sequencedb Genotyped from region NSW December 2006 - December 2008 28 127 100 GII.4 2006b 73.0 GII.6 7.0 GII.4 2008 6.0 GII.b/GII.3 5.0 GII.4 2006a 3.0 GII.7 3.0 Other GII.4 2.0 GII.e/GII.12 1.0 QLD January - September 2007 30 30 30 GII.4 2006b 73.3 GII.4 2006a 23.3 GII.b/GII.3 3.3 VIC October - November 2007 21 21 20 GII.4 2006b 90.0 GII.4 2006a 10.0 WA January - August 2007 2 41 29 GII.4 2006b 51.7 GII.4 2006a 41.4 GII.b/GII.3 3.4 Other GII.4 3.4 Total 81 219 179 aNSW, New South Wales; QLD, Queensland; VIC, Victoria; WA, Western Australia. bRegion sequenced included 266 bp of the capsid gene. cGenotype classification based on Zheng et al. 2006 (294). dFor recombinant NoV strains - RdRp genotype shown on left and capsid genotype on right as Bull et al. 2007 (37). 30

Ch. 2 –NoV molecular epidemiology 2007-08

Figure 2-2 Phylogenetic analysis of 266 bp of the 5’ end of the capsid region in NoV.

NoV strains were selected from both outbreak and sporadic cases to reflect the distribution of strains isolated. Strains identified in this study are in bold and italics with strain names designated by location/identification number/month/year of isolation. Bootstrap values are provided as percentages of 1000 replicates for values ≥ 65%. Reference strains are included and the NoV genotype classification is based on Zheng et al. 2006 (294). The distance scale represents the number of nucleotide substitutions per position. Phylogenetic analysis revealed five different NoV GII genotypes were isolated in this study, furthermore the GII.4 variants clustered into five distinct subgroups. The predominant NoV variant identified was GII.4 2006b.

31

Ch. 2 –NoV molecular epidemiology 2007-08

2.4.3 Analysis of the 2006b capsid gene The predominant NoV variant isolated during both epidemic periods of 2007 and 2008 was GII.4 2006b (71.5% of all NoVs detected in this study). This GII.4 variant was originally identified in NSW in July 2006 but was only associated with 11.3% of outbreaks during the epidemic season of 2006 (258). Therefore, to assess possible genetic drift of 2006b virus over a three-year period the full-length capsid genes of two representative 2006b viruses from 2007 and 2008 were sequenced and compared to the prototype 2006b strain NSW696T/06/AU (GenBank accession number EF684915) (258). One isolate NSW287R/Nov/07 was detected in an aged care facility in NSW in November 2007 and the other, NSW3639/Nov/08, was detected in nosocomial outbreak in NSW in November 2008. Each shared 97% nucleotide identity and 98% amino acid identity over the full-length capsid gene to the 2006 epidemic prototype strain – NSW696T/06/AU. The 2007 and 2008 isolates also shared 97% identity with each other across this same region. Comparative analysis revealed 11 amino acid substitutions present in the 2007 and 2008 epidemic 2006b NoV variants compared to the prototype NSW696T/06/AU. A single substitution (S393G) occurred in the 2008 epidemic 2006b representative, which was located at one of two sites proposed to be antigenically significant for GII.4 strains (3, 33, 152).

2.4.4 Identification of novel GII variants This study identified three novel GII.4 variants as well as one novel NoV GII recombinant. A cluster of GII.4 variants were identified which represented the continued evolution of the GII.4 pandemic lineage (Hunter-2006a ancestor) and has been designated GII.4 2008 (Figure 2-2). This variant has been isolated from cases of NoV infection in Japan and European countries including the Netherlands, Sweden and France throughout 2008 (15, 179). A representative, NSW001P/Nov/08, was compared to existing sequences on GenBank. Full-length RdRp nucleotide sequence showed highest identity (96%) to 2006a strains including NZ327/06/AU (GenBank accession number EF187497). Comparing the complete capsid sequence to reference sequences, showed NSW001P/Nov/08 had 97% identity to another GII.4 2008 variant, Apeldoorn317/07/NL (GenBank accession number AB445395). However, across the same region, the identity of NSW001P/Nov/08 to the 2006a variant, NZ327/06/AU 32

Ch. 2 –NoV molecular epidemiology 2007-08

dropped to 92%, which suggests a higher level of divergence in the capsid gene compared to the RdRp. The differences were most evident in the hypervariable P2 domain where only 88% nucleotide identity was retained between GII.4 2008 and GII.4 2006a variants. In this study, GII.4 2008 variants were identified in 6% (6/100) of NoV cases from NSW, which is comparable to figures in other countries during 2008 (15, 193).

Phylogenetic analysis revealed that two strains, WA080D/May/08 and NSW390I/Jan/08, clustered closely with the GII.4 Osaka 2007 variant OC07138/07/JP (GenBank accession number AB434770) (Figure 2-2). Comparisons of both full-length RdRp and capsid sequences showed NSW390I/Jan/08 and OC07138/07/JP shared 99% nucleotide identity. In October 2007, OC07138/07/JP was associated with a single outbreak in Osaka, Japan and represents a new GII.4 lineage distinct from other GII.4 pandemic variants (179).

Another GII.4 strain, NSW505G/Jul/07, was isolated from a single patient in July 2007, which also showed significant variation from recent pandemic GII.4 variants (Figure 2-2). NSW505G/Jul/07 clustered closer to a novel group of GII.4 variants isolated in Cairo, Egypt in 2007. Comparative analysis revealed 98% nucleotide identity to Cairo8/07/EG (GenBank accession number EU876888) across the capsid gene. This Cairo cluster was associated with sporadic viral gastroenteritis in young children in Egypt (126).

33

Ch. 2 –NoV molecular epidemiology 2007-08

Figure 2-3 Simplot analysis of NoV GII recombinant NSW199U/Sep/08.

A 2037 bp region of NSW199U/Sep/08 containing the ORF1/2 overlap was analysed for recombination. The graph shows the percentage identity (%) of our reference to parental strains GoulburnValley/1983/AU (GenBank accession number DQ379714) and SaitamaU1/1997/JP (GenBank accession number AB039775) relative to the genomic position (nt). The first nucleotide of ORF2 is labelled position 1, so that numbers extend positively towards the 3’ end of the genome (ORF2) and negatively towards the 5’ end (ORF1). The window size was 200 nt and the step size was 5 nt. The recombination site occurs where the parental strains share identity and cross over in the ORF1/2 overlap.

34

Ch. 2 –NoV molecular epidemiology 2007-08

2.4.5 Identification of novel GII recombinant A novel GII recombinant, NSW199U/Sep/08, was also identified from a sporadic case of NoV infection. This recombinant has also been isolated in oyster-associated NoV outbreaks in New Zealand during 2008 (J. Hewitt, Pers. Comm.). Based on phylogenetic analysis of the 5’ end of the capsid gene NSW199U/Sep/08 was identified as a genotype, GII.12 (Figure 2-2). Full-length capsid sequencing and phylogenetic analysis confirmed the GII.12 genotype with 91% nucleotide identity to SaitamaU1/1997/JP (GenBank accession number AB039775). Analysis of the RdRp region revealed that NSW199U/Sep/08 shared highest identity (89%) with another recombinant GoulburnValley/1983/AU (GenBank accession number DQ379714) with an unclassified ORF1 and a GII.13 capsid (ORF2). Simplot analysis confirmed NSW199U/Sep/08 as a recombinant (Figure 2-3) with the breakpoint located at the ORF1/2 overlap, typical of most NoV recombinants (34, 37). Therefore, NSW199U/Sep/08 was designated as a GII.e/GII.12 recombinant based on previous nomenclature (34).

35

Ch. 2 –NoV molecular epidemiology 2007-08

2.5 Discussion The impact of NoV-associated gastroenteritis to institutions such as hospitals, day care centres and aged-care facilities is considerable, contributing to increased morbidity and economic costs within such facilities during epidemic periods (122, 223). This study compared the timing of outbreaks of gastroenteritis reported to the NSW Department of Health to the number of positive NoV EIA tests at the POWH Laboratory (Figure 2-1). The association between high incidence of gastroenteritis outbreaks in institutions and increased NoV positive EIAs suggested that NoV was the major cause of acute gastroenteritis outbreaks in NSW between 2004 and 2008. The peaks in NoV activity also coincided with the emergence of novel GII.4 variants in the years 2004 (38), 2006 (258) and 2007/08 (this study). With no current treatment or preventive measure for NoV infection early detection leading to strict quarantine measures is one major defence.

In this study, the predominant NoV genotype identified was GII.4 (89.9%, 161/179). The high prevalence of GII.4 variants was expected as this lineage has been the cause of all major NoV gastroenteritis epidemics, not just in Australia but all across Europe, Asia and the US since the late 90s (38, 141, 157, 189, 258). The findings of the present study show the Australian 2007 and 2008 gastroenteritis epidemics were caused by the GII.4 variant, 2006b. The 2007 epidemic differed to other previous epidemics as only a small period of stasis (eight months) existed between the previous epidemic caused by the 2006a virus and the 2007 gastroenteritis season. This pattern is in contrast to previous epidemics where there has been at least two years between epidemics caused by new GII.4 variants. One explanation for this could be that previous pandemics were caused by a single lineage of GII.4 strains. That is, each pandemic strain has evolved from the previous pandemic strain and the period of stasis is determined by the rate of mutation needed for antigenic change (235, 283). However, in contrast to the previous four pandemics, the 2006b strain did not evolve from the 2006a strain (235). In fact, the 2006b strain shared common ancestry to the 2002 Farmington Hills virus, whilst the 2006a strain was more similar to the Hunter virus (235). As two separate lineages had emerged from the 2002 and 2004 ancestral

36

Ch. 2 –NoV molecular epidemiology 2007-08

GII.4 variants, it was possible then for the 2006a and 2006b strain to cause pandemics in a similar timeframe.

Another difference to previous pandemics was that for the first time the emerging GII.4 strain (2006b) was identified in large numbers of outbreaks ahead of it causing a pandemic, in this instance, the previous winter season of 2006 (258). The 2006b variant was found to be circulating up to a year before it caused the pandemic, which is longer than any previously identified pandemic variants. This provides a comparison to influenza A, where it is not uncommon for the next pandemic strain to be circulating at low levels two years prior to it becoming a pandemic strain (239).

The identification of 2006b virus circulating prior to the 2006b epidemics of 2007 and 2008 raises the question of why it took so long for the 2006b outbreak season to peak (258). Sequence analysis of the 2006b VP1 showed that the 2006b strains from the 2007 and 2008 epidemics shared 98% amino acid identity to the original 2006 epidemic strain (NSW696T/06/AU) and 97% identity to each other. There are two sites (A and B) in the protruding P2 domain of the capsid that have been shown to be targets of neutralizing antibodies and under immune selection (3, 152). There were no amino acid changes in these key antigenic epitopes in the 2007 epidemic 2006b variant when compared to the original 2006b strain. The 2006b variant was circulating in 2006 but was not the dominant GII.4 strain in that year (258). It is possible that the high prevalence of 2006a virus in that year resulted in cross protective immunity in the herd population and therefore it was not until this immunity diminished that the 2006b strain was able to spread. This theory is supported by recent work showing cross-reactivity with outbreak patient sera to GII.4 NoV pandemic variants (40). It is therefore important to consider, how the 2006b virus was able to cause further epidemics in 2008. A single substitution (S393G) at site B in the 2008 epidemic 2006b variant may have facilitated escape from any herd immunity generated in the population in 2007 to the previous 2006b epidemic variant, however changes to discontinuous epitopes cannot also be discounted.

The next question is which one of these two GII.4 lineages will be responsible for the next pandemic strain. Another possibility is that both lineages may continue to

37

Ch. 2 –NoV molecular epidemiology 2007-08

circulate, as has been seen with co-circulation of pandemic influenza variants (57, 184, 239). This is of interest because if both lineages continue to circulate then the frequency of pandemics will be higher as the two variants could alternatively cause outbreaks and drive each other to rapidly evolve or face extinction (239). Additionally, four novel NoV GII strains were isolated in this study including three GII.4 variants and a novel recombinant GII.e/GII.12. These strains already show a global distribution and could also represent a potential source of new pandemic GII variants. Regardless, the increased incidence of NoV epidemics is evident (Figure 2-1) and without quarantine and prevention measures NoV will continue to infect unabated throughout the community.

38

Ch. 3 –NoV GII.4 rapid evolution

3 “Rapid evolution of pandemic noroviruses of the GII.4 lineage”

Rowena A Bulla, John-Sebastian Edena, William D Rawlinsona,b and Peter A Whitea

aSchool of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia bVirology Division, Department of Microbiology, SEALS, Prince of Wales Hospital, Randwick, NSW, Australia

Published in PLoS Pathogens (2010) Vol. 6 (3); e1000831

Author contributions: Conceived and designed the experiments – RAB WDR PAW; Performed the experiments – RAB JSE; Analyzed the data – RAB JSE; Contributed reagents/materials/analysis tools – RAB WDR PAW; Wrote the paper – RAB JSE WDR PAW.

Reprinted under Creative Commons Attribution License

39

Ch. 3 –NoV GII.4 rapid evolution

3.1 Abstract Over the last fifteen years there have been five pandemics of NoV-associated gastroenteritis and the period of stasis between each pandemic has been progressively shortening. NoV is classified into five genogroups, which can be further classified into 25 or more different human NoV genotypes, however, only one, genogroup II genotype 4 (GII.4), is associated with pandemics. Hence, GII.4 viruses have both a higher frequency in the host population and greater epidemiological fitness. The aim of this study was to investigate if the accuracy and rate of replication are contributing to the increased epidemiological fitness of the GII.4 strains. The replication and mutation rates were determined using in vitro RdRp assays, and rates of evolution were determined by bioinformatics. GII.4 strains were compared to the second most reported genotype, recombinant GII.b/GII.3, the rarely detected GII.3 and GII.7 and as a control, HCV. The predominant GII.4 strains had a higher mutation rate and rate of evolution compared to the less frequently detected GII.b, GII.3 and GII.7 strains. Furthermore, the GII.4 lineage had on average a 1.7-fold higher rate of evolution within the capsid sequence and a greater number of non-synonymous changes compared to other NoVs, supporting the theory that it is undergoing antigenic drift at a faster rate. Interestingly, the non-synonymous mutations for all three NoV genotypes were localised to common structural residues in the capsid indicating that these sites are likely to be under immune selection. This study supports the hypothesis that the ability of the virus to generate genetic diversity is vital for viral fitness.

40

Ch. 3 –NoV GII.4 rapid evolution

3.2 Introduction NoV, a member of the Caliciviridae family, is now considered the most common cause of viral gastroenteritis outbreaks in adults worldwide (85). In the US, NoV has been identified as the cause of over 73% of outbreaks of gastroenteritis (85). Furthermore, outbreak NoV strains spread rapidly causing great economic burden on society due to medical and social expenses. Consequently, a vaccine or treatment for NoV would be useful in reducing its transmission and alleviating disease symptoms. Our current knowledge of NoV replication and evolution has made it difficult to predict the efficacy of a treatment or longevity of a vaccine, as evidence is emerging that NoV, like many other RNA viruses, exists as a dynamic, rapidly evolving and genetically diverse population (83, 152, 163). The high level of genetic diversity in RNA viruses is recognised as the basis for their ubiquity and adaptability (71). Therefore, in order to develop a successful treatment or control program it is first necessary to understand the mechanisms behind NoV replication and evolution.

NoV is a small round virion of 27 – 38 nm in diameter and possesses a single- stranded, positive-sense, polyadenylated, RNA genome of 7400 – 7700 nucleotides (10). The human NoV genome is divided into three ORFs. ORF1 encodes for the non- structural proteins, including an NTPase, 3C-like protease and RdRp (102). The two structural proteins VP1, the major capsid protein, and VP2, the minor capsid protein are encoded by ORF2 and ORF3, respectively (96, 207). NoV is a highly diverse genus with up to 61% VP1 amino acid diversity between its five genogroups (GI to GV) (294). Up to 44% amino acid diversity over VP1 is also observed within the genogroups and has resulted in the further subgrouping of GI, GII and GIII into 8, 17 and 2 genotypes, respectively (294). VP1 exhibits the highest degree of sequence variability in the genome (49, 212). It consists of three domains, namely the shell (S) domain connected by a flexible hinge (P1 domain) to a protruding domain (P2) (248). The highly conserved S domain forms the backbone of the capsid structure (248), while the moderately conserved P1 domain encodes the flexible hinge that connects the S and P2 domains. The protruding P2 domain possesses motifs that are involved in binding to the host cell, and hence, the P2 domain is responsible for the antigenicity of the virus (235, 251). The most clinically significant of the five genogroups is GII, as it is the most 41

Ch. 3 –NoV GII.4 rapid evolution

prevalent human NoV genogroup detected and more frequently associated with epidemics compared with other genogroups. Of particular interest is GII genotype 4, (GII.4), because this lineage accounts for 62% of all NoV outbreaks globally (236) and has also caused all five major NoV pandemics in the last decade (1995/1996, US-95/96 strain; 2002, Farmington Hills; 2004, Hunter; 2006, 2006a virus; and 2007, 2006b virus) (38, 74, 236, 258).

The basis for the increased epidemiological fitness (70) of the GII.4 strains, as determined by its high incidence and ability to cause pandemics, is currently unknown. Investigations with influenza indicate a link between increased viral evolution and increased viral incidence (111, 188). However, because of the non-culturable nature of human NoV, variations in rates of evolution have not been calculated for different NoVs and consequently this has not been investigated as a factor in determining viral incidence and epidemiological fitness. Replication efficiency and genetic diversity are both important parameters in viral fitness (72). The aim of this study was to determine if these two parameters are contributing to the increased epidemiological fitness of the GII.4 strains. Replication efficiency and genetic diversity are primarily determined by the viral RdRp, as it controls the rate new sequence is introduced into the genome. Therefore using in vitro RdRp assays together with bioinformatics, the replication efficiency, mutation rate and rate of evolution of GII.4 viruses was compared with other NoV GII genotypes. The results of this study suggest that, like influenza A, the increased incidence of the pandemic GII.4 lineage may be a result of the combined influence of a high mutation, replication and evolution rate which, together culminate in an increased epidemiological fitness for the GII.4 strains.

42

Ch. 3 –NoV GII.4 rapid evolution

3.3 Materials and methods

3.3.1 NoV strains Stool samples containing NoV were obtained from the Department of Microbiology, Prince of Wales Hospital, Sydney, Australia, with the exception of the stool specimen that contained NoV/Mc17/02/TH. This stool specimen was obtained from McCormic Hospital, Chiang Mai, Thailand. The six genetically diverse NoV strains used in this study included: three GII.4 pandemic strains; NoV/Sydney348/97/AU (of the NoV/US-95/96 GII.4 pandemic lineage), NoV/NZ327/06/NZ (NoV/2006a GII.4 lineage) and NoV/NSW696T/06/AU (NoV/2006b GII.4 lineage). Two recombinant strains; NoV/Sydney C14/02/AU (GII.b ORF1 and GII.3 ORF2/3 [commonly referred to as GII.b/GII.3]) and NoV/Sydney4264/01/AU (GII.4 ORF1 and GII.10 ORF2/3, [GII.4/GII.10]), and a GII.7 NoV, NoV/Mc17/02/TH associated with rare sporadic cases of gastroenteritis. The molecular epidemiology of these strains has been described previously (38, 108, 258) and their GenBank accession numbers are presented in Table 3-3 of the Supplementary information. In this study, the RdRp enzymes are referred to by their genotype, except in the case of the GII.4 strains, which are referred to by their pandemic name, e.g. GII.4 2006b-RdRp (Table 3-2). RdRps from recombinant strains are indicated by an ‘r’ in front of the nomenclature.

3.3.2 RNA extraction and cDNA synthesis Viral RNA was extracted from 140 μl of 20% faecal suspension using the QIAamp Viral RNA kit according to manufacturers’ instructions (Qiagen). RNA was resuspended in 50 μl of Baxter Steri-pour H2O and stored at -80°C. cDNA synthesis was performed as described previously (38).

3.3.3 Amplification of capsid and RdRp regions The full length capsid gene, P2 domain and RdRp regions were amplified with specific primers (Table 3-1) using RT-PCR methods described in (258). The amplified RdRp genes were cloned into pGEM-T Easy vector (Promega, Fitchburg, US).

3.3.4 DNA sequencing

43

Ch. 3 –NoV GII.4 rapid evolution

Plasmids and PCR products were purified by PEG precipitation and washed with 70% ethanol. Products were sequenced directly on an ABI 3730 DNA Analyzer (Applied Biosystems using dye-terminator chemistry.

3.3.5 Construction of RdRp expression vectors and sequence mutagenesis pGEM-T Easy vectors containing 1736 bp from the 3’ end of ORF1 were purified using the Quantum prep® plasmid miniprep kit (Bio-Rad) and used as template DNA for the construction of expression vectors. Strain specific primers incorporating restriction enzyme sites, were designed to amplify the precise RdRp region of each strain (Table 3-1). PCR was performed as described previously (258). PCR products were digested with their corresponding restriction enzymes and cloned into the expression vector pTrcHis2A (Invitrogen). Constructs containing the HCV genotype 3a RdRp (pVRL69) and HCV genotype 1b RdRp (pVRL75), were used as controls and have been described previously (123). Site directed mutagenesis of residue 291 in the GII.4 US-95/96-RdRp and the GII.4 2006a-RdRp was carried out with the Stratagene Quickchange II mutagenesis kit, according to manufacturer’s instructions (Aligent Technologies, Santa Clara, US). The primers used to introduce the mutation into the plasmid are listed in Table 3-1.

44

Table 3-1 Oligonucleotide sequences designed and used in this study

Primer Region Genotype Polaritya Sequence 5'-3' b GV6 5' Capsid GII - TTRTTGACCTCTGGKACGAG GV11 5' RdRp GII.4 US-95/96 + CTAGGATCCAGGTGATGACAGTAAGGGAAC GV12 3' RdRp GII.4 US-95/96 - TCAGAATTCGAYTCGACGCCATCTTCATTCTCA GV21 3' Protease GII + GTBGGNGGYCARATGGGNATG GV23 5' RdRp rGII.b, GII.4 2006a + CGCGGATCCAGGTGGCGACAACAAGGGAA GV24 3' RdRp rGII.b, GII.4 2006a, GII.4 2006b - CCGGAATTCGATTCGACGCCATCTTCATTCACA GV37 5' RdRp GII.7 + GACGAGCTCGGGAAATCAGGACCTT GV38 3' RdRp GII.7 - CCCAAGCTTGGATTCGACGCCATC GV40 3' RdRp rGII.4 - GCCTGCAGTACTTCGACGCCATC GV43 P2 domain GII.4 + GTNCCMCCHACWGTKGARTC GV44 P2 domain GII.4 - ARRTGYTGNAYCCAYTCYTG GV171 5' RdRp GII.4 2006b + CGCGGATCCAGGTGGTGACAGTAAGGG GV172 5' RdRp rGII.4 + CGCGGATCCAGGCGGTGACAACAAAGG GV194 K291T GII.4 US-95/96 + GGTGACTTCACAATATCAATC GV195 K291T GII.4 US-95/96 - GATTGATATTGTGAAGTCACC GV196 T291K GII.4 2006a + GGTGACTTCAAAATATCAATC GV197 T291K GII.4 2006a - GATTGATATTTTGAAGTCACC aThe symbol ‘+’ indicates that the oligonucleotide is a forward primer and the symbol ‘-‘ indicates that the oligonucleotide is a reverse primer. bUnderlined sections indicate restriction enzymes sites.

45

Ch. 3 –NoV GII.4 rapid evolution

3.3.6 RdRp expression and purification The NoV RdRps and control HCV RdRps were expressed in Escherichia coli, as described previously (123), except expression of the NoV RdRps was performed for 6 h at 30°C. Purity was checked by SDS-PAGE and the identity of the RdRp was confirmed by western blot with a hexa-histidine antibody and peptide sequencing performed by the Bioanalytical Mass Spectrometry Facility (University of New South Wales, AU). Recombinant RdRp was quantified with a NanoDrop ND-1000 Spectrophotometer (NanoDrop, Wilmington, US).

3.3.7 RdRp kinetic measurements Kinetic RdRp assays were performed in a final volume of 15 μl and contained 20 mM Tris-HCl (pH 7.4), 2.5 mM MnCl2, 5 mM DTT, 1 mM EDTA, 500 ng of homopolymeric C RNA template, 2 U RNasin (Promega), 4 mM sodium glutamate and increasing concentrations of [3H]-GTP (Amersham Biosciences, Little Chalfont, UK) ranging from 2 μM to 60 μM. Reactions were initiated with the addition of 50 nM RdRp and incubated for 9 min at 25°C. The reactions were terminated by adding EDTA to a final concentration of 60 mM, 10 μg herring sperm DNA and 170 μl of 20% (w/v) trichloroacetic acid. The incorporated radio-nucleotides were precipitated on ice for 30 min and then filtered through a 96 well GF/C unifilter microplate (PerkinElmer, Waltham, US) by a Filtermate harvester (PerkinElmer). Using the harvester, the filters were washed thoroughly with water and left to dry. The filter wells were each filled with 25 μl of Microscint scintillation fluid (PerkinElmer) and radioactivity measured using a TopCount NXT liquid scintillation counter (PerkinElmer). Background measurements for each assay consisted of reactions without RdRp and were subtracted from the count per minute (CPM) values obtained for the individual enzyme assays. Results were plotted and statistical analysis performed with the Mann Whitney Test (one-tailed, 95% confidence interval) in GraphPad Prism v4.02 (GraphPad Software).

3.3.8 Incorporation fidelity An in vitro fidelity assay was developed to measure mutation rates and was adapted from Ward et al. (271). The RdRp assay was performed using conditions

46

Ch. 3 –NoV GII.4 rapid evolution

described above with a homopolymeric C RNA template, except 82.1 pmoles of [3H]UTP (2 μCi) or [3H]ATP (4 μCi) (Amersham Biosciences) were added (as the non- complementary nucleotides) with an equimolar amount of GTP (82.1 pmoles) (Promega) added as the complementary nucleotide. The total amount of ribonucleotide incorporated was calculated in a parallel experiment with the addition of 1 μCi (164.2 pmoles) [3H]GTP (Amersham Biosciences) as the correct nucleotide. The assay was incubated for 50 min at 25°C. Error frequency of the RdRp was determined by calculating the total number (pmoles) of non-complementary ribonucleotides incorporated and dividing by the total number (pmoles) of [3H]GTP ribonucleotides incorporated.

3.3.9 Evolutionary analysis of NoV capsids In order to determine the rate of evolution of the rGII.3, GII.3, GII.4 and GII.7 capsids, the nucleotide sequences of ORF2 were analysed. RNA capsid sequences used for the analysis included eight from this study and 76 sequences from GenBank, with the oldest strains available dating back to 1987. The rate of evolution (substitutions/nucleotide site/year) for GII.3, GII.b/GII.3 GII.4 and GII.7 NoVs was determined by calculating the number of nucleotide substitutions in ORF2 compared to an ancestral strain and this was plotted against time (88). The rate of evolution was determined by linear regression with the program GraphPad Prism v4 and was equivalent to the gradient of the line. Pairwise alignments of RNA sequences and evolutionary distances between sequences were carried out using the Maximum Composite Likelihood model in MEGA4 (246). Bootstrapped trees (1000 data sets) were constructed using the Neighbour-joining method, also with the program MEGA4. In order to determine the amount of selection each genotype is under, the average Ka/Ks ratio was calculated across the capsid for each genotype (GII.4, GII.b/GII.3 and GII.7). The Ka/Ks ratio is a measure of non-synonymous amino acid changes compared to synonymous (silent) changes. Ka/Ks >1 indicates that positive selection is occurring. Ka/Ks= 1 is interpreted as neutral evolution and Ka/Ks < 1 is indicative of negative or purifying selection. The program Sliding Windows Alignment Analysis Program (SWAAP) v1.0.2 was utilised (213). The Nei-Gojobori model was used to calculate Ka

47

Ch. 3 –NoV GII.4 rapid evolution

and Ks values (183). The window size was set at 15 bp (5 aa) and the step size was 3 bp (1 aa).

3.3.10 Protein modelling Predicted secondary structure analysis of the RdRps and capsid protein VP1 were performed by generating a Protein Data Bank (PDB) file from the amino acid sequence in FastA format using software on the CPHmodels 2.0 Server (159). Three dimensional structures were then generated from the PDB files with PyMol (62).

48

Ch. 3 –NoV GII.4 rapid evolution

3.4 Results

3.4.1 In vitro analysis of NoV RdRps To investigate the replication efficiency and mutation rate for different NoV and control HCV RdRps, the RdRp encoding region from six NoVs and two HCVs were cloned and expressed. NoVs selected for this study included three pandemic GII.4 strains; a non-pandemic recombinant virus with a GII.4 ORF1 (RdRp) and a GII.10 ORF2/3 (capsid) (GII.4/GII.10), a second recombinant virus GII.b/GII.3, the second most prevalent strain (38), and a rarely detected GII.7 strain. The RdRps were expressed with additional N-terminal amino acids MDP and a C-terminal myc epitope and hexa- histidine tag. Approximately 1 to 3 mg of a 60 kDa and 68 kDa enzyme, for NoV and HCV, respectively, were obtained and confirmed by mass spectrometry.

To determine the nucleotide incorporation rate for the six NoV RdRps and the two HCV RdRps, the Kcat was calculated with rGTP as substrate and poly C RNA as template (Table 3-2). The fastest enzyme was GII.7-RdRp which had an incorporation -1 rate (Kcat) of 0.238 s , followed by the four GII.4 RdRps; 2006b-RdRp, 2006a-RdRp, rGII.4-RdRp and GII.4 US-95/96-RdRp, with incorporation rates of 0.209 s-1 ± 0.054, 0.183 s-1 ± 0.024, 0.168 s-1 ± 0.024, 0.158 s-1 ± 0.039, respectively. The slowest NoV enzyme was rGII.b-RdRp with an incorporation rate of 0.155 s-1 ± 0.093 (Table 3-2). The two HCV RdRps, HCV 3a-RdRp and HCV 1b-RdRp, had a much lower incorporation rate -1 -1 -5 compared to NoV RdRps with a Kcat value of 0.003 s for both (0.003 s ± 8.06 × 10 and 0.003 s-1 ± 1.06 × 10-5 for HCV 3a-RdRp and HCV 1b-RdRp, respectively).

49

Table 3-2 Comparison of the replication accuracy and rate for NoV and HCV RdRps

Mutation rate Rate of evolution -1 b d Kcat (s ) in vitro sequence data U RdRpa Derivative strain (nt subs/site)b (nt subs/site/yr)c (subs/genome) Mean SD Mean SD Mean SD GII.4 2006b NoV/NSW696T/06/AU 0.209 0.054 9.06 x 10-4 2.88 x 10-4 3.9 x 10-3 2.4 x 10-4 6.80 rGII.4 NoV/Sydney4264/01/AU 0.168 0.024 8.34 x 10-4 2.56 x 10-4 Not done 6.26 GII.4 US-95/96 NoV/Sydney348/97/AU 0.158 0.039 8.87 x 10-4 4.42 x 10-4 3.9 x 10-3 2.4 x 10-4 6.65 GII.4 2006a NoV/NZ327/06/NZ 0.183 0.024 5.54 x 10-4 3.84 x 10-4 3.9 x 10-3 2.4 x 10-4 4.16 rGII.b NoV/C14/02/AU 0.155 0.093 1.53 x 10-4 1.22 x 10-4 2.4 x 10-3 4.9 x 10-4 1.15 GII.7 NoV/Mc17/02/TH 0.238 0.088 2.21 x 10-5 4.58 x 10-4 2.3 x 10-3 1.5 x 10-4 0.17 HCV-3a HCV/VRL69 0.003 8.06 x 10-5 1.97 x 10-3 1.42 x 10-3 Not done 18.57 HCV-1b HCV/VRL75 0.003 1.06 x 10-4 1.23 x 10-3 1.14 x 10-3 2.3 x 10-2 N/A 11.57 aThe RdRp are named after their genotype except for the GII.4 pandemic strains which are named after their subcluster. Recombinants are indicated by an r before the genotype. bThe mean and standard deviation (SD) of a triplicate data set. cHVR1 rate of evolution published by Rispeter et al. (218). dU - Number of mutations per viral replication round.

50

Ch. 3 –NoV GII.4 rapid evolution

The first reported GII.4 associated pandemic occurred in 1995/1996. It was not until seven years later, in 2002, that a second GII.4 associated pandemic occurred. Since the 2002 pandemic, however, three more GII.4 associated pandemics have arisen (2004, 2006 and 2007). Interestingly, an increase in incorporation rate was observed in post 2002 pandemic RdRps (2006a-RdRp and 2006b-RdRp) compared to pre-2002 GII.4 RdRps (US-95/96-RdRp [1995], rGII.4-RdRp [2001]) (p = 0.022, Table 3-2). However, comparisons of the mutation rates showed no differences between pre- and post-2002 GII.4 RdRps. To identify specific residues associated with RdRps that had a higher incorporation rate, an alignment of the amino acid sequence of the six NoV RdRps was performed (Figure 3-1). The three NoV RdRps with slower incorporation rates (GII.4 US-95/96-RdRp, rGII.4-RdRp and rGII.b-RdRp) had a Lys at residue 291 whereas the three RdRps with faster incorporation rates had either a Thr (2006a-RdRp and 2006b-RdRp) or a Val (GII.7-RdRp) (Figure 3-1) (position based off Lordsdale virus RdRp, GenBank accession number X86557). No other amino acid variation was unique to the three faster enzymes (Figure 3-1). A database search of GenBank for all full length NoV GII RdRp sequences available (accessed 9th November 2009) revealed the Thr291 residue was only identified in GII.4 strains isolated during or after 2001. In fact, the Lys291 mutation appears to have become fixed in the GII.4 pandemic lineage after its initial appearance in 2001.

51

Ch. 3 –NoV GII.4 rapid evolution

Figure 3-1 Comparison of the amino acid sequence of the six NoV RdRps used in this study to other representative strains.

The top six sequences are the RdRps characterised in this study (GII.4 US-95/96, rGII.4, GII.4 2006a, GII.4 2006b, GII.7 and rGII.b). The remaining 19 strains are reference strains and are named according to their GII genotype followed by their year of isolation (GII.4 strains only) and then their strain name. The alignment illustrated that a K291T substitution first appeared in the GII.4 lineage after 2001 and was unique for the GII.4 pandemic strains.

52

Ch. 3 –NoV GII.4 rapid evolution

To analyse the effect of the Lys291Thr mutation on the incorporation rate of the GII.4 RdRps, two mutant enzymes were made: GII.4 US-95/96 [K291T]-RdRp and GII.4 2006a [T291K]-RdRp. Specifically, the US-95/96-RdRp Lys291 residue was mutated to a Thr and the 2006a-RdRp Thr291 was mutated to a Lys. The kinetic activity of the two mutant RdRps was then compared to the wildtype enzymes (Figure 3-2). The Lys291Thr mutation increased US-95/96-RdRp activity by 20.2% (US-95/96 [Wild]- RdRp: 0.168 s-1 ± 0.018 [n = 6], US-95/96 [K291T]-RdRp: 0.202 s-1 ± 0.019 [n = 6)] p = 0.008), whereas, the reverse Thr291Lys mutation decreased activity of the 2006a-RdRp by 22.2% (2006a [Wild] -RdRp: 0.351 s-1 ± 0.037 [n = 3], 2006a [T291K] -RdRp: 0.273 s-1 ± 0.020 [n = 4], p = 0.029) (Figure 3-2).

Figure 3-2 The effect of mutations at residue 291 on NoV GII.4 RdRp kinetic activity.

A K291T mutation in the NoV/US-1995/96-RdRp lead to a 20.2% increase in RdRp activity compared to the wildtype enzyme. Blank columns show US-95/96 K291 [Wild]-RdRp: 0.168 s-1 ± 0.018 [n = 6], US-95/96 T291 [K291T]-RdRp: 0.202 s-1 ± 0.019 [n = 6)], p = 0.008. Whereas, a mutation T291K lead to a 22.2% reduction in activity for NoV/2006a-RdRp compared to the wildtype. Hashed columns show 2006a T291 [Wild]-RdRp: 0.351 s-1 ± 0.037 [n = 3], 2006a K291 [T291K]-RdRp: 0.273 s-1 ± 0.020 [n = 4], p = 0.029.

53

Ch. 3 –NoV GII.4 rapid evolution

A high mutation rate (10-3 to 10-5 mis-incorporations per site) has been reported for most viral RdRps using either cell culture studies or biochemical analysis [reviewed in (70)]. The mutation rate of the NoV RdRp has however, not been studied. Consequently, the present study developed an in vitro fidelity assay to enable direct comparison of the mutation rate of RdRps which could easily be applied for use with all non-culturable and culturable RNA viruses. Using this assay the mutation rate (substitutions per nucleotide site) for a transversion event (incorporation of UTP into a poly C RNA template) was calculated for all six NoV RdRps and for two control HCV RdRps (Table 3-2). The two HCV RdRps had the highest mutation rates of all eight enzymes at an average of 1.60 × 10-3 (± 0.52 × 10-3) substitutions per nucleotide site (Table 3-2). The mutation rates of the NoV RdRps were approximately one to two orders of magnitude lower than HCV (Table 3-2). The four GII.4 RdRps, US-95/96-RdRp, rGII.4-RdRp, 2006a-RdRp and 2006b-RdRp, had similar mutation rates at 8.87 × 10-4, 8.34 × 10-4, 5.54 × 10-4 and 9.06 × 10-4 substitutions per nucleotide site, respectively, and were higher than the remaining two NoV RdRps, rGII.b-RdRp (1.53 × 10-4 substitutions per nucleotide site) and GII.7-RdRp (2.21 × 10-5 substitutions per nucleotide site) (Table 3-2). The in vitro fidelity assay above examined transversion events, which are reported to occur at a lower frequency than transition events (70). To confirm that the GII.4 enzymes have a higher mutation rate using different substrates the transition mutation rate was examined for two enzymes, US-95/96- RdRp and GII.7-RdRp. Accordingly, the frequency of transition events (ATP into a poly C RNA template) was 1.5 and 1.7 fold higher than UTP (Table 3-2), for US-95/96-RdRp, 1.30 × 10-3 ± 1.08 × 10-3 (n=3) and GII.7-RdRp, 3.71 × 10-5 ± 1.21 × 10-5 (n=3), respectively. This increase was not found to be significantly different from the transversion mutation rate (p = 0.5). The in vitro transversion mutation rates were used to estimate the number of substitutions per viral genome replication event (U) for each RdRp (75) (Table 3-2). U equals the RdRp error rate multiplied by the genome size (7555 nt for GII.4 and GII.7, 7579 nt for GII.3 NoV and 9425 nt and 9408 nt for HCV 3a and 1b, respectively). The two HCV RdRps had the highest U values with an average of 15.07 ± 4.96 substitutions per genome replication event (Table 3-2). The NoV RdRps had lower U values than HCV, with an average of 5.97 ± 1.96 substitutions per genome

54

Ch. 3 –NoV GII.4 rapid evolution

replication event for the four GII.4 RdRps, 1.15 substitutions per genome replication event for rGII.b-RdRp and 0.17 substitutions per genome replication event for GII.7- RdRp (Table 3-2).

3.4.2 Bioinformatic analysis of NoV capsid evolution The in vitro fidelity assay described above provides a format to directly compare the mutation rate of viral RdRps. To achieve a second independent comparison of the mutation rate for selected NoV GII genotypes, sequence data from four lineages was gathered and substitution rates were calculated by analysing sequence variation within ORF2, the capsid gene, over time. The capsid gene was chosen for two reasons; firstly, this region has the most sequence data available in nucleotide databases and secondly, ORF2 encodes VP1 which contains the host receptor binding domains that determine antigenicity of the virus and therefore provide the best indication of host driven evolution (251). The four lineages examined were GII.4, GII.7, GII.3 and GII.b/GII.3 (Figure 3-3). The capsid lineage derived from the recombinant strain GII.b/GII.3 were analysed independently of the wildtype GII.3 lineage in order to examine the influence of the RdRp (ORF1) on rate of evolution of VP1 following a recombination event. The GII.4 capsid analysis included 54 GII.4 strains circulating between 1987 and 2008, with the oldest NoV strain MD134-7/87/US defined as the root (Figure 3-3A). The GII.3 capsid analysis included 11 GII.b/GII.3 and 14 GII.3 strains circulating between 1987 and 2006, with MD134-10/87/US defined as the ancestral strain (Figure 3-3B). Phylogenetic analysis indicated that the GII.b/GII.3 recombination event occurred prior to 2001 and the new recombinant virus subsequently evolved away from the wildtype GII.3 strains (Figure 3-3B). Only five GII.7 strains with full length capsid sequence, three of which were generated in this study, were available for analysis. The five GII.7 strains were isolated between 1990 and 2007 and Leeds/90/UK was defined as the ancestral strain in this study (Figure 3-3C).

55

Ch. 3 –NoV GII.4 rapid evolution

Figure 3-3 Phylogenetic analysis of the amino acid sequence of the P2 domain from GII.4 (A), GII.b/GII.3 and GII.3 (B) and GII.7 (C) strains circulating between 1987 and 2008.

The phylogenetic tree was generated using the Neighbour-Joining method (246) by comparison of the P2 amino acid sequence (152 aa for GII.4 and 7, and 160 aa for GII.3) obtained for 84 NoV strains. GenBank accession numbers are included in the figure and precede the strain name. The percentage bootstrap values in which the major groupings were observed among 1000 replicates are indicated. The branch lengths are proportional to the evolutionary distance between sequences and the distance scale, in nucleotide substitutions per position, is shown.

56

Ch. 3 –NoV GII.4 rapid evolution

Analysis of the sequence data revealed that GII.4 NoVs had the highest rate of evolution at 3.9 × 10-3 nucleotide substitutions/site/year (equivalent to 6.30 ± 0.39 nucleotide changes/capsid/year) (r2=0.84, n=54, p<0.0001) (Figure 3-4). The rates of evolution for the wildtype GII.3 strain, the GII.b/GII.3 recombinant strain and the GII.7 strain were lower at 1.9 × 10-3, 2.4 × 10-3 and 2.3 × 10-3 nucleotide substitutions/site/year, respectively (r2=0.28, n=14, p=0.004; r2=0.63, n=11, p<0.001 and r2=0.99, n=5, p=0.002) (Figure 3-4). Statistical analysis of the data suggested that the GII.4 rate of evolution was significantly higher than the GII.b/GII.3, GII.3 and GII.7 rates of evolution (p<0.010). The amount of selection (purifying and positive selection) occurring in the capsid gene from the GII.4, GII.b/GII.3 and GII.7 strains was examined by calculating the Ka/Ks ratio for each genotype individually. The Ka/Ks ratio generated for the GII.4 strains was higher (0.0912 ± 0.0322) than that of the GII.b/GII.3 strains (0.0862 ± 0.495) and the GII.7 strains (0.0437 ± 0.0235).

Figure 3-4 Rate of evolution for the GII.4, GII.7, GII.b/GII.3 and GII.3 strains.

The rate of evolution for each genotype was determined by calculating the number of nucleotide substitutions in ORF2 compared to the oldest strain in each lineage. The number of changes was then plotted against the year of that strains detection. The rate of evolution was equivalent to the gradient of the line (GII.4 = 6.30 ± 0.39, r2 = 0.84; GII.b = 4.03 ± 0.80, r2 = 0.68; GII.3 = 3.09 ± 0.95, r2 = 0.49; GII.7 = 3.82 ± 0.25, r2 = 0.99) divided by the length of the capsid gene (1623 bp for GII.4 and GII.7, and 1647 bp for GII.3 and GII.b/GII.3).

57

Ch. 3 –NoV GII.4 rapid evolution

3.4.3 Evolution hotspots within the NoV capsid Sequence alignments of the capsid P2 domain from 54 GII.4 strains supported previously published data that there are 15 amino acid residues that vary between each GII.4 pandemic cluster. These amino acids include 296 to 298, 333, 340, 355, 365, 368, 372, 393 to 395, 407, 412 and 413 (Supplementary information Figure 3-6A) (152, 235). Examination of the position of these 15 residues on the predicted secondary structure revealed that they clustered on the surface of six exposed loops of the P2 domain (Figure 3-5). Similar amino acid alignments for GII.7 and GII.3 revealed there were three and six hypervariable sites which clustered onto two and four exposed loops of the P2 domain, respectively (Supplementary information Figure 3-6B and C). A structural alignment revealed that the hypervariable residues in the GII.3 and GII.7 occupied overlapping spatial sites compared to the hypervariable residues in GII.4 described above (Figure 3-5). In particular, the site occupied by 296 to 298, 365, 368, 372 in GII.4 corresponded to 310 and 312 in GII.3, 333 in GII.4 corresponded to 389 in GII.3, 393 to 395 in GII.4 corresponded to 392 and 404 in GII.3 and 355 in GII.4 corresponded to 395 in GII.3 (Figure 3-5). GII.7 only had two variable regions, 352/354 and 396, and these corresponded to similar spatial orientation as the GII.4 variable sites 296 to 298, 365, 368, 372, and 393 to 395, respectively (Figure 3-5).

58

Ch. 3 –NoV GII.4 rapid evolution

Figure 3-5 Hypervariable residues in GII.3, GII.4 and GII.7 are localised to common regions on the surface of the capsid P2 domain.

The structure of the GII.4 P1 and P2 domain was solved previously (PDB ID 2OBS (41)) while the structure of the P domain was predicted for GII.3 and GII.7 in this study. The location of the hypervariable residues are indicated numerically and are coloured pink for all three genotypes. Residues occupying similar regions are depicted by the same coloured circle. The previously published hypervariable residues in the P2 domain of GII.4 were localised to six main regions on the surface of the P2 domain (152, 235). GII.3 had hypervariable residues in four of these regions and GII.7 had hypervariable residues in two of these regions.

59

Ch. 3 –NoV GII.4 rapid evolution

3.5 Discussion Over the last decade five NoV pandemics have occurred approximately every two years and all pandemics have been associated with a single NoV genotype, GII.4 (20, 38, 236, 258). The reason for the predominance of the GII.4 strains has been the subject of much speculation but is currently unknown primarily due to a limited understanding of NoV population dynamics and evolution (3, 152, 235). Studies with other RNA viruses indicate that viral fitness is dependent on many factors, such as, viral mutation, replication efficiency, population size and host factors (reviewed in (83)). To date progress has been made in understanding the role host factors have on NoV prevalence with several studies indicating that variations in viral docking to the blood group antigens may affect infectivity of individuals within a population (reviewed in (250)). In particular, GII.4 viruses bind to all blood group antigens, whereas, GII.1 and GII.3 viruses bind fewer blood group antigens and this could account for higher prevalence of GII.4 viruses (250). This paradigm however remains controversial, especially for GII NoV, as not all studies show an association between blood group antigens and clinical infection (53, 105, 148). Apart from the host/viral interaction, no other factors have been affiliated with NoV fitness. Recent studies performed with poliovirus have shown that an increase in fidelity leads to less genetic diversity and subsequently a reduction in viral fitness and pathogenesis because of a reduced adaptive capacity of the virus (204, 268). It has been hypothesised that viruses are fitter if they are able to produce a more robust (diverse) population [reviewed in (69, 176, 225)]. In the current study we examined whether there was a link between epidemiological fitness, as defined by their incidence, and the rate and accuracy of viral replication.

In the present study error rates were assessed directly by examining the mutation rate of the viral RdRp and by analysing the rate of evolution for selected GII lineages. Our results are consistent with mutation rates for the poliovirus RdRp (271) and retrovirus reverse transcriptases (143), which range between 10-3 to 10-5 (Table 3-2). The more prevalent GII.4 strains had a 5 to 36-fold higher mutation rate compared to the less frequently detected GII.b/GII.3 and GII.7 strains, as determined by in vitro enzyme assays. Consistent with this, the rate of evolution of the capsid was 60

Ch. 3 –NoV GII.4 rapid evolution

on average 1.7-fold higher in GII.4 viruses compared to GII.3, GII.b/GII.3 and GII.7 viruses. The GII.4 capsids also had a larger Ka/Ks ratio than the GII.b/GII.3 and GII.7 strains suggesting that the increased incidence/epidemiological fitness of the GII.4 strains maybe through greater antigenic drift, a consequence of the higher mutation rate of the GII.4 RdRp.

The mutation rates for the control HCV RdRps (average of 1.6 × 10-3 substitutions per nucleotide site, Table 3-2) were 2-fold higher compared to the GII.4 RdRps. Evaluation of previously published rates of evolution for the HCV hypervariable region 1 (HVR1) within the envelope 2 glycoprotein (E2) were also higher (6–fold) than the NoV GII.4 rates of evolution calculated in this study (174) (Table 3-2). HVR1 was chosen for comparison because, like the NoV capsid gene, it is the most variable region in the genome and under the greatest immune selection. Mutation rate and rate of evolution cannot be directly compared as they are indirectly related due to the increased complexity of evolution in vivo (70). However, in this study we did find a common trend between the two different measurements of diversity with HCV displaying the highest diversity rate for both measurements compared to NoV.

Interestingly, the majority of non-synonymous mutations in the P2 domain for all three NoV genotypes were localised to six common structural sites. These six hypervariable regions within the P2 domain were consistent with hypervariable sites for GII.4 capsids already identified in other studies (152, 236). We demonstrated that GII.7 and GII.3 viruses shared two and four common hypervariable sites, respectively, with GII.4 viruses (Figure 3-5). Substitutions at one of these sites (residue 395) have been shown to alter GII.4 strains antigenic profiles (152). Localization of the hypervariable sites to common regions on the surface of the P2 domain suggests that these regions are likely to be under immune pressure possibly from a neutralizing antibody response (148). The lower number of amino acid changes at these sites for viruses with a GII.3 capsid may explain why GII.b/GII.3 is predominantly associated with gastroenteritis cases in children (206). This suggests that GII.b/GII.3 viruses are not as efficient at escaping herd immunity compared to GII.4 strains and therefore only hosts immunologically naïve to GII.3 infection are susceptible. Similarly, we propose

61

Ch. 3 –NoV GII.4 rapid evolution

that the low prevalence of the GII.7 strain is also a consequence of a low mutation rate in the RdRp resulting in limited antigenic drift and an inability to escape herd immunity.

Apart from mutation rate, replication rate is considered to be another major determinant in viral fitness (7). Replication rates are important because an increased replication rate would produce a larger heterogeneous population than a slower replicating virus in the same unit of time, given the same mutation rate. Interestingly, the RdRps from the recent 2006 GII.4 pandemic strains had a higher nucleotide incorporation rate than the recombinant GII.4 RdRp and the US-95/96 pandemic GII.4 RdRp, which could be associated with a point mutation in the RdRp (Thr291Lys). Residue 291 is located in the finger domain, which is comprised of five β sheets that run parallel and strongly interact with each other. The innermost of these five β sheets contains motif F which interacts directly with incoming nucleotides (30). Therefore, it is plausible that substitutions at residue 291 affects the orientation of motif F due to the strong interaction between the five β sheets and subsequently alters the binding affinity to the incoming nucleotide. Fixation of the Thr291Lys point mutation in the GII.4 lineage after 2001 has been paralleled with a reduction in the period of stasis between the emergence of new antigenic variants (152). Alterations in residue 291 after 2001 could have led to an increase in the rate of evolution of GII.4 strains by increasing the replication rate; however this did not seem to have an effect on mutation rate (Table 3-2). High replication rates did not always correlate with epidemiological fitness as the NoV strain, GII.7, had the highest incorporation rate but is considered to be the least fit due to it having the lowest incidence. Therefore, this study suggests mutation rate in combination with a high replication rate are key determinates in epidemiological fitness. Influenza research also indicates a relationship between rate of evolution and epidemiological fitness (reviewed in (111)). New antigenic influenza A variants arise every one to two years and cause more annual epidemics than influenza B, as well as the more devastating pandemics (111). Once a population has accumulated mass herd immunity to a virus the virus is forced to alter its antigenic determinants, a possibility for viruses with poor fidelity and fast replication rates, or face extinction (27), whereas, viruses such as influenza B, which 62

Ch. 3 –NoV GII.4 rapid evolution

have higher fidelity and slower antigenic change, are more often associated with sporadic cases (111). In this study a parallel can be seen in the epidemiology between NoV and influenza, in particular between GII.b/GII.3 viruses and influenza B and GII.4 viruses and influenza A.

In summary, this study supports the hypothesis that epidemiological fitness is a consequence of the ability of the virus to generate genetic diversity, as the NoV pandemic GII.4 strains were associated with an increased replication and mutation rate. Therefore, it would seem that GII.4 viruses, as opposed to GII.b/GII.3 and GII.7 viruses, have reached a balance in their replication rate and mutation rate that is better suited to viral adaptation. In contrast, it would seem that the GII.7 lineage, despite having a high replication rate, has a low mutation rate that limits its adaptation and therefore its incidence. It is important to improve our understanding of the mechanisms underlying NoV epidemiological fitness as future pandemics are expected.

63

Ch. 3 –NoV GII.4 rapid evolution

3.6 Supporting information

Table 3-3 GenBank accession numbers for the RdRps genes used in this study

RdRp name GenBank accession Strain GII.4_US-95/96 DQ078829 Sydney348/1997/AU rGII.4 DQ078845 Sydney4264/2001/AU GII.4_2006a EF187497 NZ327/2006/NZ GII.4_2006b EF684915 NSW696T/2006/AU GII.7 GQ849131 Mc17/2002/TH rGII.b AY845056 C14/2002/AU 4_87_MD145-12 AY032605 MD145-12/1987/US 4_93_Lordsdale X86557 Lordsdale/1993/GB 4_94_Camberwell AF145896 Camberwell/1994/AU 4_97_U1 AB039775 SaitamaU1/1997/JP 4_01_Guangzhou DQ369797 Guangzhou/2003/CN 4_01_Ast6139 AJ583672 Ast6139/2001/SP 4_02_Farmington AY502023 FarmingtonHills/2002/US 4_02_B4S6 AY587985 B4S6/2002/GB 4_02_CS-G1 AY502020 CS-G1/2002/US 4_04_Hunter DQ078814 NSW504D/2004/AU 4_06_Saga5 AB447458 Saga5/2006/JP 4_06_Saga1 AB447456 Saga1/2006/JP 1_Hawaii U07611 Hawaii/1994/US 3_U201 AB039782 SaitamaU201/1998/JP 6_U3 AB039776 SaitamaU3/1997/JP 8_U25 AB039780 SaitamaU25/1998/JP a_Snow-Mt AY134748 SnowMountain/1976/US b_Picton AY919139 Picton/2001/AU d_Sydney2212 AY588132 Sydney2212/1998/AU

64

Ch. 3 –NoV GII.4 rapid evolution

Figure 3-6 Alignment of the amino acid sequences of the P2 domain.

This analysis included A) GII.4 strains circulating between 1987 and 2006, B) GII.3 strains circulating between 1987 and 2006, and C) GII.7 strains circulating between 1990 and 2008. Sequences include the 152 aa of the P2 domain from 54 GII.4 strains and 5 GII.7 strains, and 160 aa of the P2 domain from 25 GII.3 or GII.b/GII.3 strains. Sequences were aligned using MEGA4. The NoV sequences included in the alignment are the same as in Figure 3-4.

65

Ch. 4 –Phosphorylation of NoV RdRp

4 “Norovirus RNA-dependent RNA polymerase is phosphorylated by an important survival kinase, Akt”

John-Sebastian Eden*, Laura J Sharpe*, Peter A White† and Andrew J Brown†

School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia

*Denotes authors contributed equally.

†Denotes senior authors contributed equally.

Published in the Journal of Virology (2011) Vol. 85 (20); pp 10894 – 8

Author contributions: Conceived and designed the experiments – JSE LJS AJB PAW; Performed the experiments – JSE LJS; Analyzed the data – JSE LJS AJB; Contributed reagents/materials/analysis tools – JSE LJS AJB PAW; Wrote the paper – JSE LJS AJB PAW.

© American Society for Microbiology Reprinted with permission 66

Ch. 4 –Phosphorylation of NoV RdRp

4.1 Abstract Viruses commonly use their host cells’ survival mechanisms to their own advantage. We showed that Akt, an important signalling kinase involved in cell survival, phosphorylates the RdRp from NoV, the major cause of gastroenteritis outbreaks worldwide. The Akt phosphorylation of RdRp appears to be a feature unique to the more prevalent NoV genotypes such as GII.4 and GII.b. This phosphorylation event occurs at a residue (Thr33) located at the interface where the RdRp finger and thumb domains interact, and decreases de novo activity of the polymerase. This finding provides fresh insights into virus-host cell interactions.

67

Ch. 4 –Phosphorylation of NoV RdRp

4.2 Chapter text NoV is the most common cause of acute gastroenteritis outbreaks, causing vomiting and diarrhoea in affected individuals (85). Since 1996, five variants of the genogroup II, genotype 4 (GII.4) lineage have been associated with global pandemics of acute gastroenteritis of increased frequency (236). Overall, GII.4 variants are the cause of 62-80% of all NoV outbreaks globally (73, 236). There are a number of factors contributing to this higher epidemiological fitness including host-cell receptor binding patterns and duration of herd immunity [reviewed in (39)]. Also, our group recently showed that the GII.4 viruses had a greater capacity to evolve than other NoV genotypes through higher rates of mutation (33). Therefore, the viral RdRp is likely to play an important role in driving viral evolution and fitness (33). Post-translational modifications are frequently found on viral proteins (124), including RdRps (162), and this led us to examine a potential post-translational modification of this polymerase from a representative pandemic NoV strain, GII.4 2006b (NSW696T/06/AU - GenBank EF684915). This variant was associated with a global pandemic in 2007-08 (236) and caused three consecutive epidemics of gastroenteritis in Australia in 2006 (258), 2007 and 2008 (81).

Viruses take advantage of the host cells’ existing pathways and mechanisms, and one of these is the phosphatidylinositol 3-kinase (PI3K)/Akt signalling pathway (32). Akt is a serine/threonine protein kinase that phosphorylates downstream targets leading to numerous effects, with the net result of increasing cell growth, survival and proliferation. Consequently, perturbations of the Akt signalling pathway have been implicated in the development of diseases including many human cancers (5) and diabetes (293). With such a fundamental role in the regulation of cell growth and proliferation, it is perhaps not surprising that a number of viruses have been found to interact directly with the PI3K/Akt pathway to effect greater control of their replication within the host cell. In HCV, the expression of viral proteins including NS3/4A, NS4B and NS5A result in the activation of the PI3K/Akt pathway [reviewed in (21)], which is required for efficient virus replication (29). Similarly, influenza A virus activates the PI3K/Akt pathway through the viral NS1 protein (82). This appears to suppress the induction of apoptosis, which may arrest the cell at a stage that supports viral 68

Ch. 4 –Phosphorylation of NoV RdRp

replication. Alternatively, not all viruses activate the PI3K/Akt pathway. For example, in measles virus, infection leads to the down regulation of Akt that appears to plays a role in immune suppression (44).Given the common exploitation by viruses of Akt signalling, we sought to determine whether the NoV RdRp may be a substrate for Akt.

Akt phosphorylates proteins at the minimal consensus sequence of RxRxx(S/T)h (where x is any amino acid, and h is a bulky hydrophobic amino acid) (2). This minimal consensus motif was supported by another peptide library screening approach (190). However, there is considerable variability from this consensus in the sequences surrounding Akt phosphorylation sites (232). Therefore, to help determine the potential for phosphorylation to occur on the NoV GII.4 2006b RdRp, we interrogated the amino acid sequence using Scansite (191). This program predicts various phosphorylation sites and protein binding motifs, including likelihood (stringency), based on minimal consensus sequences. This indicated that the Thr33 residue of the 2006b RdRp was predicted as both a high stringency Akt substrate site and 14-3-3 zeta binding site. The RdRp Thr33 residue is mostly conserved between pandemic strains, and almost absent in non-pandemic strains (Table 4-1, χ2: p<0.001).

69

Ch. 4 –Phosphorylation of NoV RdRp

Table 4-1 RdRp strains and their predicted Akt phosphorylation sites and prevalences

GenBank RdRp Genotype Strain Scansite scoreb Prevalencec accession sequencea GI.f Otofuke AB187514 FWKSTPQ None predicted ● GI.1 Norwalk NC_001959 FWRSSPE None predicted ● GI.2 WUG1 AB081723 FWKSSPE None predicted ● GI.4 Chiba AB042808 FWRSTPE None predicted ● GI.5 SzUG1 AB039774 FWRSTTE 0.464 ● GI.6 Hesse AF093797 FWRSTPE None predicted ● *GII.b C14 AY845056 FWRSSTT 0.407 ●●● GII.c Snow Mountain AY134748 FWRSSTV 0.056 ● GII.d OsakaNI DQ366347 FWRSSNS None predicted ● GII.g St George GQ845370 FWRSSTA 0.159 ●●● GII.4 2010 GQ845367 FWRSSTA 0.159 ●●●●● GII.4 2006a EF187497 FWRSSTA 0.159 ●●● *GII.4 2006b EF684915 FWRSSTA 0.159 ●●●●● GII.4 Hunter DQ078814 FWRSSTA 0.159 ●●●●● GII.4 Farmington AY502023 FWRSSTA 0.159 ●●●●● GII.4 Lordsdale X86557 FWRSSTA 0.159 ●●● GII.4 US-1995/96 DQ078829 FWRSSTT 0.407 ●●●●● GII.4 Osaka AB541319 FWRSSTT 0.407 ●●● GII.1 Hawaii U07611 FWRSSTT 0.407 ● GII.2 MK04 DQ456824 FWRSSTT 0.407 ● GII.3 Saitama U18 AB039781 FWRSSNA None predicted ● GII.5 Neustrelitz260 AY772730 FWRSSNT None predicted ● GII.6 Saitama U16 AB039778 FWRSSPD None predicted ● *GII.7 Mc17 GQ849131 FWRSSPD None predicted ● GII.11 Swine43 AB126320 FWRSSTA 0.159 ● GIII.1 Jena AJ011099 FWRSSPA None predicted ● GIII.2 Newbury2 AF097917 FWRSSPA None predicted ● GIV.1 NSW268O JQ613567 FWRSSVE None predicted ● GV.1 MNV-CW1 DQ285629 FWRTSPE None predicted ● GV.1 NIH-2409 JF320644 FWRTSPD None predicted ● aRdRp residues 28 to 34 relative to GII4 2006b variant NSW696T/06/AU (GenBank accession EF684915). Residues shown in boldface characters represent the predicted phosphorylation site. bValues refer to Scansite stringency scores: <0.2, high stringency (indicated with gray shading and boldface characters); 0.2 to 1, medium stringency (indicated with grey shading). cPrevalence data are presented as follows: ●●●●● indicates that the strain was associated with a global pandemic, ●●● indicates that the strain was associated with a regional epidemic, and ● indicates that the strain was associated with a local outbreak or with sporadic disease in the natural host. *The strain was used in the experiments whose results are presented in Figure 4-1.

70

Ch. 4 –Phosphorylation of NoV RdRp

In light of the in silico predictions and the complementary expertise of the investigators (81, 232), we decided to confirm Akt mediated phosphorylation of purified recombinant RdRp. Briefly, the coding region for NoV RdRp with a C-terminal hexa-histidine tag was cloned into pGEX-4T-1 (GE Life Science, Uppsala, SE), then expressed and purified as previously described (33). An N-terminal glutathione-S- transferase (GST) tag was included to increase the size of the RdRp to allow separation of the RdRp away from the Akt protein, by SDS-PAGE. An in vitro Akt kinase assay (232) was employed to compare wild-type RdRp (GII.4 2006b), with a mutated version – Thr33Ala, where the predicted phosphorylation site had been mutated to alanine. To detect phosphorylation, we made use of an antibody designed for identification of phosphorylated Akt substrates [PAS, Cat #9611, Cell Signaling Technology, Danvers, US (29)]. Our results showed clear Akt-dependent phosphorylation of the wild-type GII.4 2006b RdRp (Figure 4-1), which was absent in the mutated version. A second antibody that is more specific for phosphorylated Akt substrates (Cat #9614, also from Cell Signaling Technology) also detected wild type phosphorylated RdRp and not the mutated form (data not shown). In addition, we tested a representative RdRp with a medium stringency site (GII.b C14) (C14/2002/AU - GenBank AY845056), and one with no predicted phosphorylation site (GII.7 Mc17) (Mc17/2001/TH - GenBank AY237413). This indicated that the medium stringency sequence was also phosphorylated at Thr33, but the GII.7 RdRp with no corresponding Thr residue was not (Figure 4-1).

71

Ch. 4 –Phosphorylation of NoV RdRp

Figure 4-1 Akt phosphorylates norovirus RdRp at Thr33.

Wildtype (WT) or mutant (T33A) recombinant NoV GII.4 2006b, GII.b C14, or GII.7 Mc17 RdRp (1 μg) were incubated with or without recombinant Akt (400 ng) as indicated at 30°C for 30 min. Reactions were stopped by addition of loading buffer, and samples were subjected to SDS-PAGE and Western blotting with Phospho-Akt Substrates (PAS) antibody, followed by stripping and probing for RdRp and Akt. Representative of three independent experiments. The bold residue in the sequence indicates the residue corresponding to Thr33 in WT GII.4 2006b RdRp. Stringency values refer to Scansite stringency: <0.2 = high stringency and 0.2-1 = medium stringency.

Akt’s phosphorylation of NoV RdRp may provide additional insights into the evolutionary strategies by which viruses exploit the host for their own benefit. The Akt pathway is active when cells are proliferating rapidly, so it is possible that NoV takes advantage of this situation to increase its survival. How this phosphorylation event may influence the function of RdRp is unclear, since there is currently very little information linking structure and function of the NoV RdRp, largely due to the lack of a cell culture system for the virus. Most of the important residues have been predicted from published crystal structures (185, 291) and comparison of structures to functionally defined residues in poliovirus and foot and mouth disease virus [e.g. (257)]. The phosphorylated Thr33 residue is located at the tip of the finger domain which interacts with the top of the thumb and encloses the active site (Figure 4-2A and B).

72

Ch. 4 –Phosphorylation of NoV RdRp

Figure 4-2 3D-modelling of norovirus RdRp showing finger-thumb domain interactions at Thr33.

Panel A shows the front view of the NoV GII.4 2006b RdRp highlighting the finger (red), thumb (blue) and palm (yellow) domains. Panel B shows the top view highlighting the fingertips (red) and C-terminal region (cyan). Panel C shows a close-up view of the boxed region of panel B showing the finger-thumb interactions at Thr33 for 2006b RdRp. Images produced using PyMOL, v1.3, Schrödinger, LLC.

A study by Thompson et al. suggests that this structure, which is unique to RdRps, plays a role in stabilising poliovirus RdRp (257). These same residues may play a similar stabilising role in NoV RdRp, allowing a conformation that facilitates RNA synthesis. Homology modelling predicts extensive inter-domain interactions within this region of the RdRp including two polar bridging interactions that link the finger and thumb domains at Thr33 in the GII.4 2006b RdRp (Figure 4-2C). When the Thr33 residue is phosphorylated, the negative charge of the phosphate should repel the

73

Ch. 4 –Phosphorylation of NoV RdRp

negatively charged side chain of Glu468, altering the dynamics of these inter-domain finger/thumb interactions.

The HCV RdRp (NS5B) has been shown to be phosphorylated by Protein Kinase C-related Kinase 2, and that phosphorylation event played a role in regulating HCV RNA replication (134). To explore the possibility that the phosphorylation of Thr33 is functionally significant, we compared the enzyme kinetics of the wild-type 2006b RdRp with a mutated form that should mimic phosphorylation (Thr33Glu), using a de novo GTP incorporation assay as previously described (33). The phosphomimetic mutant had -1 a lower maximum enzyme velocity (Vmax = 100 fmol.min ) compared to the wild type -1 RdRp (Vmax = 125 fmol.min ), indicating a slower rate of RNA synthesis (Figure 4-3).

The phosphomimetic mutant also had a lower affinity for the GTP substrate with a Km value of 10.4 μM, compared to 4.8 μM for the wild-type RdRp. The differences in enzyme kinetics between the wild-type and phosphomimetic RdRp suggest that the phosphorylation of Thr33 may modulate the function of the polymerase. In HCV, it has been shown that disruptions to the interactions between the finger and thumb domains will lead to a reduction in de novo RdRp activity, while efficient primer extension activity is maintained (51, 52, 237). Since NoV utilises both de novo and protein primed RNA synthesis, it is possible that phosphorylation of NoV RdRp could alter the balance between these modes of RNA synthesis.

74

Ch. 4 –Phosphorylation of NoV RdRp

Figure 4-3 Comparison of enzyme kinetics of the wild-type 2006b RdRp with a Thr33Glu phosphomimetic mutant.

The enzyme kinetics of the wild-type 2006b RdRp (Wild) was compared with a Thr33Glu phosphomimetic mutant (T33E). Briefly, 50 ng RdRp was added to initiate a reaction where 3H-GTP was incorporated onto homopolymeric C RNA. After a ten minute incubation, the reaction was stopped, harvested onto a filter-plate, and then the incorporated nucleotides detected via scintillation counting. Samples were tested in triplicate and background values were subtracted to obtain the measure of incorporated nucleotides as fmole.min-1. This was measured across a range of GTP substrate concentrations, increasing to 51 μM. The data were analysed using the Hill equation in GraphPad Prism, v5.04. The kinetic parameters are presented with standard deviations.

75

Ch. 4 –Phosphorylation of NoV RdRp

In contrast to examples of viral exploitation of the Akt pathway (21, 44, 82), a decrease in polymerase activity could indicate that Akt phosphorylation of the NoV RdRp is an anti-viral strategy by the infected cell. However, this may be hard to reconcile with our observation that the Akt phosphorylation event occurs in the most epidemiologically fit genotype, GII.4 and is generally absent from others. Further work will be needed to distinguish between these possibilities and to determine the viral outcomes of this phosphorylation event.

Phosphorylation may not just alter function directly, but could act as a molecular signal for a binding partner. In line with this, a high stringency 14-3-3 zeta binding site was predicted by Scansite on the phosphorylated Thr33 residue. 14-3-3s are ubiquitous proteins that bind to phosphorylated residues; therefore further alterations of protein function could occur (181). In this case, due to the high intrinsic disorder within this region of the polymerase, it is possible that 14-3-3 binding may act as a clamp that fixes the polymerase into a functionally significant conformation. The interaction of 14-3-3 zeta and NoV RdRp may also affect the localisation within the cell to aid evasion of the host cell recognition system. The possibility that phosphorylation of NoV RdRp may affect intermolecular interactions represents another interesting avenue for further research.

Here, we have discovered a novel Akt substrate, a viral polymerase, and identified its phosphorylation site. This residue appears to be critical for the activity of the NoV RdRp, based on our in vitro polymerase assay. Importantly, it is routinely present in pandemic strains, but mostly lacking in non-pandemic genotypes. Future work should include confirming the NoV RdRp phosphorylation event in vivo, determining its impact on viral replication and whether this favours the host or the virus. In this respect, it would be worthwhile to engineer the phosphorylation site into the existing murine NoV infectious cell culture system. The current work provides a good basis for future efforts to better understand how phosphorylation of viral proteins may affect replication.

76

Ch. 5 – NoV evolutionary dynamics

5 “Contribution of intra- and inter-host dynamics to norovirus evolution”

Rowena A Bulla,b*, John-Sebastian Edena*, Fabio Lucianib, Kerensa McElroya,b, William D Rawlinsona,b,c and Peter A Whitea

aSchool of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia bSchool of Medical Sciences, Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia cVirology Division, Department of Microbiology, SEALS, Prince of Wales Hospital, Randwick, NSW, Australia

*Denotes authors contributed equally.

Published in the Journal of Virology (2012) Vol. 86 (6); pp 3219 – 29

Author contributions: Conceived and designed the experiments – RAB JSE PAW; Performed the experiments – JSE; Analyzed the data – RAB JSE; Contributed reagents/materials/analysis tools – RAB JSE FL KM WDR PAW; Wrote the paper – RAB JSE PAW.

© American Society for Microbiology Reprinted with permission

77

Ch. 5 – NoV evolutionary dynamics

5.1 Abstract NoV is an emerging RNA virus that has been associated with global epidemics of gastroenteritis. Each global epidemic arises following the emergence of novel antigenic variants. While the majority of NoV infections are mild and self-limiting, in the young, elderly and immune-compromised, severe and prolonged illness can result. As yet there is no vaccine or therapeutic treatment to prevent or control infection. In order to design effective control strategies it is important to understand the mechanisms and source of the new antigenic variants. In this study, we used NGS technology to investigate genetic diversification in three contexts: the impact of a NoV transmission event on viral diversity; the contribution to diversity of intra-host evolution over both a short period of time (10 days), in accordance with a typical acute NoV infection, and; a prolonged period (288 days), as observed in NoV chronic infections of immune- compromised individuals. Investigation of the transmission event revealed that minor variants at frequencies as low as 0.01% were successfully transmitted indicating that transmission is an important source of diversity at the inter-host level of NoV evolution. Our results also suggest that immune-compromised subjects chronically infected represent a potential reservoir for the emergence of new viral variants. In contrast, in a typical acute NoV infection the viral population was highly homogenous and relatively stable. These results indicate that evolution of NoV occurs through multiple mechanisms.

78

Ch. 5 – NoV evolutionary dynamics

5.2 Introduction NoV is a rapidly evolving RNA virus that causes global epidemics of acute gastroenteritis (33, 151, 234, 236), approximately biennially since 2002 (236). These global epidemics are associated with the emergence of novel, antigenically distinct variants of the genogroup II, genotype 4 (GII.4) lineage that cause significant morbidity particularly in the young, elderly and immune-compromised (151, 234).

NoV has a single stranded RNA genome of approximately 7.5 kb that is divided into three ORFs (120). ORF1 encodes for all non-structural proteins involved in viral replication (18). ORF2 is the most well characterised region of the NoV genome as it encodes for the viral capsid protein, VP1, which contains the antigenic domains and the receptors that determine viral entry. VP1 itself can be divided into three structural domains (212). A conserved shell domain exists at the N–terminus leading into a protruding central stem, the P1 domain, which has a hypervariable insert termed the P2 domain. The P2 domain is the most surface exposed region of the viral capsid and is therefore believed to be involved in immune escape from neutralising antibodies (4, 61, 150-152). The P2 domain also contains residues involved HBGA binding (41, 230, 252). These polymorphic carbohydrates are thought to be attachment factors for NoV (166, 230). ORF3 encodes for a small basic protein, VP2. Although the exact function of VP2 is yet to be determined, it is believed to support viral capsid assembly through the stabilisation of VP1 (19).

Despite large amounts of sequence diversity, approximately 5% nucleotide difference across ORF2, arising between the global outbreak GII.4 variants, minimal diversity has been observed within a global outbreak season, which raises the question as to where these new variants originate from. The inter-host evolutionary trends of NoV have been frequently compared to those of influenza virus (33, 235). However, in influenza virus, in addition to viral diversity generated from infections within the human population, new variants also emerge from a zoonotic source following re- assortment events between human, avian and/or swine strains, such as with the emergence of the swine-origin H1N1 2009 pandemic strain (240). NoV strains have been identified in a wide range of animals including pigs, cows, dogs, sheep and mice

79

Ch. 5 – NoV evolutionary dynamics

(129, 153, 168, 270, 282). Furthermore, human NoVs have been shown to infect some non-human primates and pigs under experimental conditions (24, 220, 245). Despite this, no example of zoonotic transmission from an animal to human has been reported. Therefore, the current evidence suggests that the evolution of the human NoV variants is confined to the human population. Analogous to re-assortment in influenza, NoV has a mechanism of recombination that facilitates the interchange of non-structural and structural genomic regions at the ORF1/2 overlap when co-infection occurs (34, 37). The exchange of antigenic elements has also been reported through recombination at the capsid P1/P2 domain boundaries (152). Therefore recombination is likely to be an important mechanism for the emergence of new NoV variants.

In addition to understanding the impact of recombination on NoV evolution, it is also important to understand NoV between-host dynamics, as transmission events will determine which variant will persist in the host population. In human immunodeficiency virus (HIV) and HCV evolutionary studies, a strong genetic bottleneck occurs following a transmission event, where on average only 1-3 viruses are transmitted to the new host (36, 89, 100, 131, 224). Strong functional constraints on the transmitted variants are believed to drive this bottleneck event [reviewed in (130)]. However, both HIV and HCV are associated with chronic infection. In-depth viral population analyses of acute viral infections caused by rhinovirus and equine influenza virus revealed that transmission events were not characterised by strong genetic bottlenecks, but rather by co-infection of a cloud of closely related variants (55, 180).

Intra-host dynamics is another source of NoV genomic diversification. It has been suggested that persons chronically infected with NoV may be a source of new variants as mutations accumulate over the course of infection (233). However, currently very little is known about the patterns of evolution at the intra-host level and how this contributes to the overall evolution at the inter-host level and the subsequent emergence of new global epidemic variants. Of the studies that have examined NoV intra-host evolution to-date, the majority have been performed on immune- compromised individuals with chronic NoV infection (43, 227, 233). Despite its clinical significance, the prevalence of chronic NoV infection in the population is unknown. To-

80

Ch. 5 – NoV evolutionary dynamics

date chronic NoV infections have been identified in a range of settings where the immune status of an individual was compromised, such as transplant recipients, HIV- positive individuals and patients with leukaemia (22, 42, 227, 280). In addition, all NoV evolution studies have only been investigated using traditional sequencing methods such as Sanger (3, 43, 235, 259, 287). No study has looked in-depth at the intra-host evolution of an acute or chronic NoV infection at high resolution using NGS methods. As most NoV infections within an epidemic season will occur between healthy individuals, quantifying and understanding the extent of diversification generated by the viral population within an individual host is important for determining the source of NoV diversity. The development of NGS enables an in-depth understanding of viral population dynamics (36, 215). For example, NGS of influenza virus has revealed that drug resistant variants are present in the viral population at low levels and only emerge when drug therapy is initiated (93). This technology can therefore be applied to test whether novel NoV variants are already present at low levels in the viral population and under the right selection pressure an alternative variant could emerge to dominate the inter-host population.

In terms of NoV evolution, understanding transmission events and the potential differences between acute and chronic infections are important for vaccine development and anti-viral therapy design. Given the fact that NoV vaccines have entered the initial stages of clinical trials the need for this information is imminent. Therefore, in this study we investigated the evolutionary dynamics of NoV during a transmission event, within a typical acute NoV infection and also an atypical chronic NoV infection in an immune-compromised host.

81

Ch. 5 – NoV evolutionary dynamics

5.3 Materials and methods

5.3.1 Cohort Transmission cohort: Stool specimens were collected from three subjects, all from the same family, where a five year old boy (DS) was known to infect both his father (RF) and grandfather (RG) within a week after the presentation of symptoms (Table 5-1).

Table 5-1 Description of the transmission cluster cohort

Subject ID Clinical description Virus

Donor - Son DS A young boy returned from child care with symptomatic acute gastroenteritis. The Recipient - Father RF father and grandfather were then exposed to GII.g/GII.12 Recipient - vomitus and stools that day within a similar RG Grandfather time frame

Longitudinal cohort: Longitudinal samples were collected from two de- identified subjects through the Department of Microbiology, SEALS, Prince of Wales Hospital, Sydney, Australia. One subject had a typical acute NoV infection (referred to as Ac) and the other had an atypical chronic NoV infection (Ch). Two stool specimens were collected from the acute subject at day one (Ac_1) and day ten (Ac_10), whilst the chronic subject had three stool specimens collected at day one (Ch_1), day four (Ch_4) and day 288 (Ch_288) (Table 5-2).

Table 5-2 Description of the longitudinal study cohort

Subject Day of specimen Subject Specimen Clinical description Virus ID collection Acute 1 Ac_1 Nosocomial infection of GII.4 Ac infection 10 Ac_10 immunocompetent individual 2006b 1 Ch_1 Chronic infection of infant Chronic GII.4 Ch 4 Ch_4 with a severe undefined infection 2006b 288 Ch_288 immunodeficiency

82

Ch. 5 – NoV evolutionary dynamics

Day one was defined as the first day of collection and not the first day of disease onset. All stool specimens were collected between 2010 and 2011 then stored at -20°C until required.

5.3.2 NoV capsid RT-PCR Viral RNA was extracted from 20% (v/v) suspensions using the QIAamp Viral RNA mini kit (Qiagen) and initially genotyped as previously described (81). A one-step RT-PCR was then employed to amplify a region of the NoV genome that included the complete ORF2 (which encodes VP1) and partial ORF3 using the SuperScript III One- Step RT-PCR System with Platinum Taq High Fidelity (Invitrogen) and the primers described in Table 5-3. The RT-PCR products (~1.9 kb) were gel purified using the QIAquick Gel Extraction Kit (Qiagen) and then quantified using a NanoDrop 1000 spectrophotometer (Thermo Scientific, Wilmington, US).

Table 5-3 Oligonucleotides used in this study

Primer Sequence (5' - 3')a Orientation Positionb Genotype GV305 CAGRCAAGAGCCAATGTTCAGATGG Sense 4999 GII.4 GV306 GGCCTCAATTTGTGCTTGGAGC Antisense 6916 GII.4 GV308 GCTTGGAGCATCTCTTTRTCATG Antisense 6903 GII.4 GV315 CAAGGCAAGAGCCAATGTTTCGATGG Sense 5004 GII.g GV316 CTGGTGATGATGTATTTACTGTCTCC Sense 5662 GII.12 GV317 GTGGCTYGAATTTGAGCTTGCAGC Antisense 6876 GII.12 aDegenerate bases are R (A or G) and Y (T or C). bRelative to the sequence of Farmington Hills (GenBank accession number AY502023) for GII.4 and relative to the sequence of NSW199U (GenBank accession number GQ845370) for GII.g/GII.12.

5.3.3 DNA sequencing and analysis Purified PCR products were sequenced directly on an ABI 3730 DNA Analyser (Applied Biosystems) using dye-terminator chemistry. Pairwise alignments of DNA and protein sequences and evolutionary distances between sequences were carried out using programs within MEGA5 (247).

Protein structure homology modelling of the viral capsid was performed by generating a Protein Data Bank (PDB) file from the amino acid sequence in FastA format using software on the Swiss-Model Server, Swiss Institute of Bioinformatics (9, 132, 203). Protein structures were then visualised and manipulated using PyMOL

83

Ch. 5 – NoV evolutionary dynamics

v1.4.1 (Schrodinger, Portland, US). Informative site analysis of the ORF2 coding region was performed using the DIVEIN webserver (64).

5.3.4 454 Roche FLX sequencing Purified RT-PCR amplicons were submitted for library preparation before subsequent NGS (165) using a 454 Roche FLX Titanium at Murdoch University, Perth, Australia. Samples were barcoded and then combined in one lane on an eight gasket plate. Analysis of the 454 data was conducted as previously described (36). In brief, sequence reads were removed prior to the assembly stage if they were: shorter than 55 bp and had an average quality score <20. The terminal 20 nt were removed from all remaining reads. These remaining sequences were aligned with a nucleotide identity threshold of 95% against the unique consensus sequence for each subject with the alignment tool, MOSAIK (http://bioinformatics.bc.edu/marthlab/Mosaik). The consensus sequence for each subject was derived from sequencing the gel purified RT- PCR products on the ABI 3730 DNA Analyser. The quality of the aligned file was assessed and reads were excluded from the alignments according to the standards outlined previously (36).

5.3.5 Cloning and colony PCR The same purified ORF2/partial-ORF3 RT-PCR products submitted for 454 sequencing were TA cloned using the pGEM-T Easy Vector System (Promega). Individual colonies were screened by PCR for inserts of the appropriate size with vector specific primers. Five positive amplicons from each sample were then purified using ExoSAP-IT (GE Life Science) and sequenced directly.

5.3.6 SNP detection and haplotype reconstruction Single nucleotide polymorphism (SNP) detection from the aligned 454 reads was performed with SNP caller, VarScan (136). A further manual check was performed around homopolymeric regions. Additionally, since 454 sequencing has an intrinsic error rate of 1% (95), a minimum quality score threshold was assigned for SNP calling. The quality score combines a variety of information on noise, false-calls and errors from homopolymeric stretches to produce a probability-of-error value using the same phred algorithm employed in Sanger sequencing (86). The quality scores range from 0

84

Ch. 5 – NoV evolutionary dynamics

(worst) to 40 (best). Most studies use a cut-off of 20 [Reviewed in (187)], since this score represents an accuracy of at least 99% (165). In this study a conservative approach was taken and the threshold was raised to 25, as this was the median score (=25) of all the individual bases in the reads. Aligned 454 reads were further analysed with a Bayesian probabilistic method implemented in the software package ShoRAH (290), as previously described (36). This software was used to reconstruct NoV haplotypes (variants) over the capsid domain (~1620 nt in length – ORF2 coding region), and to estimate the frequency of occurrence of each re-assembled variant within the sample. A window size of 330 and step size of 110 was used. The resulting population of capsid variants were compared to the available sequences obtained from standard cloning and sequencing. The ShoRAH analyses were performed in triplicate for each dataset to ensure that the stochastic nature of Bayesian statistics based on Monte Carlo Markov Chain simulations was not affecting the results. Only SNPs and variants detected in all the three simulation runs with a frequency of >2.5% were considered for further analyses as it has been previously shown that chimeric variants may be reconstructed below this frequency [see (36) for a detailed explanation of these parameters and validation of the haplotype reconstruction method].

5.3.7 Phylogenetic analyses Sequences from standard cloning and reconstructed haplotypes were visualized and curated with MEGA5 and R packages. Phylogenetic and evolutionary analyses, including detection of recombination, were performed with PhyML (103), MEGA5 and RDP3 (169). Trees were constructed from sequences using the best-fit model determined by JModelTEST according to AICc criteria (211). Trees were visualized with FigTree. Group mean amino acid distances were calculated in MEGA5 using the Poisson model with uniform rates. Sequence alignments of reconstructed haplotypes and clonal data can be obtained from the authors on request.

85

Ch. 5 – NoV evolutionary dynamics

5.4 Results

5.4.1 Subjects Five subjects were selected for detailed analysis of the intra-host NoV population. Three of these subjects were infected with a NoV recombinant GII.g/GII.12 virus (St George virus - NSW199U/2008/AU, GenBank GQ845370) and were a transmission cluster, where the son, donor (DS), infected two recipients, the father (RF) and grandfather (RG). A single sample was collected for each of these three subjects (Table 5-1). The remaining two subjects, Ac and Ch, were infected with a NoV GII.4 2006b variant (Figure 5-1). Subject Ac had a typical acute infection and two samples were collected nine days apart, designated Ac_1 and Ac_10 in this study (Table 5-2). In contrast, subject Ch was an infant (<2 yrs) persistently infected with the same NoV GII.4 2006b variant (Figure 5-1). This child had underlying immune deficiency of unknown type, manifesting clinically with hepatomegaly, anaemia, and treated long term with Adalimumab [a Tumour Necrosis Factor (TNF)-alpha inhibitor] for control of inflammation induced hepatitis. Three samples from Ch were collected at day one (Ch_1), day four (Ch_4) and day 288 (Ch_288) (Table 5-2).

5.4.2 Sequence analysis NGS was performed on the amplicons generated from RT-PCR amplification of the ORF2/partial ORF3 region of the NoV genome for each of the eight samples. The NGS run was performed twice on the samples and the two runs combined to produce a total of 69,154 reads (28.5 Mb), with a median read length of 412 and a median average quality score per site of 25. Any reads that were shorter than 55 bp were automatically removed and excluded from further analysis. The remaining reads were aligned with a sequence identity threshold of >95% against the consensus sequence for each sample, which was generated by bulk sequencing of the RT-PCR product. This resulted in the alignment of 16.7 Mb with an average number of bases aligned per site of 950 (range 231-1892). For comparison, regions amplified from the eight samples were also cloned and five colonies were isolated and sequenced.

86

Ch. 5 – NoV evolutionary dynamics

Figure 5-1 Phylogenetic comparison of GII.4 ORF2 nucleotide sequence isolated longitudinally from subjects with acute and chronic NoV infections.

The full-length ORF2 consensus sequences, determined by bulk sequencing, were generated at two timepoints (ranging 10 days) for the acute (green) and three timepoints (ranging 288 days) for the chronic (blue) subject. In addition, ORF2 sequences from GII.4 epidemic variants detected in NSW, Australia during 2006-2011 and reference sequences derived from GenBank were included in the analysis (n=248). Each major GII.4 clade has been described previously (236) except for the 2009 and 2010 GII.4 variants, which are recently emerged. Both subjects were infected with variants that clustered within the GII.4 2006b variant clade (red). In the acute subject, the consensus sequence was identical at both timepoints. For the chronic subject, ORF2 sequence at the third timepoint differed by 4.1% compared to sequence from the first two timepoints. The tree shows that subject Ch was persistently infected with the same variant and not reinfected with another circulating 2006b variant. The distance scale represents the number of nucleotide substitutions per position.

87

Ch. 5 – NoV evolutionary dynamics

5.4.3 Intra-host population diversity SNP analysis was performed across full length ORF2 and partial ORF3 regions for all eight samples isolated from the five subjects (Figure 5-2). Analysis revealed that the viral population was highly homogenous within the four subjects with a typical acute NoV infection (DS, RF, RG and both time points for subject Ac, i.e. Ac_1 and Ac_10) (Figure 5-2B and D). For these four subjects, only 5 to 8 SNPs occurred at frequencies above 2%. Approximately half, 46 to 64%, of the SNPs detected were non- synonymous.

Figure 5-2 Distribution of single nucleotide polymorphisms (SNPs) detected from the 3’ end of ORF1 to the 5’ end of ORF3.

Panel A portrays the NoV genomic region analysed and indicates domains of interest within ORF2, which are shown across the x-axis of panels B to D. Panel B shows the distribution of SNPs for subject Ac with acute NoV infection measured nine days apart. Panel C shows the distribution of SNPs for subject Ch (chronic infection) at days one, four and 288. Panel D shows the distribution of SNPs for the transmission cluster, involving three acutely infected family members. The donor (DS) transmitted the virus to two recipients, RF and RG. A single sample point was collected from each infected subject during the acute stage. The distribution of SNPs, portrayed as percentage of the viral population, indicate that intra-host viral populations were homogenous for the four acutely infected subjects, with only a few prevalent SNPs (>10%), panels B and D. In contrast, subject Ch (panel C) presented a heterogeneous intra-host population over the course of the infection. Multiple SNPs with a frequency of >10% in the viral population were distributed across the entire length analysed. 88

Ch. 5 – NoV evolutionary dynamics

In contrast, in the fifth subject, Ch, who had an atypical chronic NoV infection, the population had greater heterogeneity, with 48, 59 and 109 SNPs detected above 2% in the three time points, Ch_1, Ch_4 and Ch_288, respectively (Figure 5-2C). Almost half of these SNPs detected were non-synonymous, 34 to 48%. The high frequency SNPs (>2%) were randomly distributed and did not localise to any particular region within ORF2 or ORF3 (Figure 5-2).

5.4.4 Transmission driven NoV evolution A transmission cluster was analysed to investigate the impact of the transmission event on NoV evolution. The transmission cluster consisted of a son (DS), who infected, his father (RF) and grandfather (RG) with a NoV GII.g/GII.12 strain (Table 5-1).

NGS analysis was performed on a single time point isolated from each of the three subjects. These three subjects had a highly homogenous population with very low SNPs detected (Figure 5-2D). In order to compare the infecting strain between the three subjects the variants spanning the full-length ORF2 and first 102 nucleotides of ORF3, with a frequency above 2.5% were re-assembled from NGS reads. In the donor subject, two ORF2/3 variants were reconstructed with a frequency ~61% and ~38%. These two variants differed by only one synonymous substitution (nt 1048 with reference to the start of ORF2). In the two recipients, distinct ORF2/3 variants with frequencies of ~99% were found. Phylogenetic analysis of the inter-host sequences from the NGS and clonal data revealed that the recipients, RF and RG, were not infected with either of the two major variants present in the donor (Figure 5-3A). The dominant variants present in the two recipients, RF and RG, were most similar to the minor variant (39%) in donor DS, but differed at two and one nucleotide sites, respectively.

The nucleotide difference between DS and RG was located in ORF3 and was non- synonymous (residue 26). The two nucleotide differences between DS and RF were both located in ORF2 and were also non-synonymous. One caused a mutation within the base of the VP1 protein, residue 509, and the other caused a mutation in residue 347, which is located within the P2 domain and lies adjacent to the HBGA binding site.

89

Ch. 5 – NoV evolutionary dynamics

Due to a sensitivity limitation of 2.5% of the ShoRAH software for variant reassembly, the raw reads were manually searched for firstly, the presence of the recipient’s major variant in the donor and secondly, the presence of the donor’s major variant in the recipient. This analysis focused on the 3’ terminal 171 nucleotides of ORF2 and 5’ terminal 133 nucleotides of ORF3, as this region displayed the greatest inter-host diversity. In this high-resolution analysis, each recipient’s major variant was found to be 100% identical to a unique minor variant (<0.01%) isolated from the donor (Figure 5-3B). Surprisingly, the major variant present in the donor was not identified in either of the two recipients, even at low frequency (<0.01%).

90

Ch. 5 – NoV evolutionary dynamics

Figure 5-3 Phylogenetic analysis of sequences from the NoV transmission cluster.

Panel A shows a phylogenetic tree of the full-length ORF2 sequences generated from re-assembled short NGS reads and from cloning. Sequences are labelled first by their subject name, followed by whether they were generated by cloning (C) or NGS haplotype reconstructions (H). The haplotype frequency is also included at the end of the name. In this cluster, the donor (DS), had two closely related variants present at high frequency (38 and 61%) and were identified by both NGS and cloning. However, neither of the two donor variants were found in the viral populations of the two recipients and each subject’s sequences clustered separately. The distance scale represents the number of nucleotide 91

Ch. 5 – NoV evolutionary dynamics

substitutions per position. Panel B shows a high resolution phylogenetic analysis of a region spanning the 3’ terminal 171 nucleotides of ORF2 to the 5’ terminal 133 nucleotides of ORF3. This analysis was performed from NGS data and revealed substantial inter-host diversity. In each subject (DS, RF and RG), the major variant was located at the node of the branch with minor variants branching from it, indicating that the minor variants had evolved from the dominant variant. In addition, each recipient’s major variant was found to be identical to a unique minor variant (<0.01%) isolated from the donor (DS). The donor’s major variant was not identified in any of the recipient variants at frequencies as low as 0.01%. Filled circles represent major variants within a population and open circles minor variants. Red is the donor son (DS), whilst blue represents the recipient father (RF) and green the recipient grandfather

(RG). The distance scale represents the number of nucleotide substitutions per position.

5.4.5 Longitudinal evolution of the intra-host viral population in a typical NoV acute infection To examine the evolutionary dynamics of a typical acute NoV infection, two sample points isolated nine days apart for subject Ac were analysed via NGS of the ORF2 and partial ORF3 region. The SNPs were calculated at each time point (Figure 5-2B) and their variation in frequency between the time points determined. This revealed that the distribution of viral variants being excreted by subject Ac remained reasonably constant overtime. The SNPs only varied by a maximum of 2.6% and by an average of 0.07%. Mutations did emerge in reported antigenic regions of the capsid but at frequencies below 2.5% (data not shown).

In order to understand the population diversity at the variant level, a Bayesian statistical tool (ShoRAH) was used to reconstruct the ORF2 variants (1620 nt) present in the population at a frequency >2.5%. For both time points two variants were reconstructed. These two variants differed at one nucleotide site, which resulted in the non-synonymous substitution (S to G) at residue 134 within VP1. The major variant, VP1-134S, represented 80% and 77% of the population at the first time point and second time points, respectively. The second minor variant, VP1-134G, represented about 20% and then 21% of the population at the two time points, respectively (Figure 5-4). Other minor variants did exist within the population but were present at a frequency below 2.5%, below the reconstruction threshold established for this type of analysis and were therefore not reported (36). The individual SNPs contributing to these minor variants are presented within Figure 5-2B.

92

Ch. 5 – NoV evolutionary dynamics

Figure 5-4 Comparison of the intra-host distribution of NoV variants in all subjects.

Full-length NoV variants of the ORF2 region were re-assembled from NGS reads and translated into amino acid sequences and each unique variant represented by alternate grey shading. The histogram shows the frequency distribution of unique NoV variants in each sample analysed. Low frequency variants, with an estimated frequency of occurrence below the detection threshold (2%), are indicated by black dotted lines. In the subject with acute infection (Ac), only two variants were detected with frequencies of occurrence of ~79% and ~20%, respectively. These variants remained stable over the nine days of infection. In the subject with chronic infection (Ch), no dominant variant was observed. Instead a distribution of low frequency variants co-existed and their prevalence varied over the course of the infection. A single variant was identified in each subject within the transmission cluster cohort (DS, RF and RG).

93

Ch. 5 – NoV evolutionary dynamics

5.4.6 Longitudinal evolution of the intra-host viral population in an atypical chronic NoV infection In subject Ch, an infant with underlying immune deficiency and persistent NoV infection, three sample points were collected at day one, day four and day 288. All three time points were analysed by NGS. There was significant variation in synonymous and non-synonymous SNPs observed overtime, particularly at the third time point where 18 SNPs reached fixation (defined as >98% of the population), eight of which resulted in an amino acid substitution. Fixation of multiple SNPs indicated that a selective sweep had occurred between day four and day 288 and the variants present in the first two time points were replaced by new variants. The first two time points had a mean group genetic distance of 0.7%; whilst the third time point had a mean group genetic distance of 4.1% compared to the first two time points and appeared to be genetically distinct. To confirm that the variants present at the third time point were indeed progeny from the early time points and not reinfection by a new variant, phylogenetic analysis was performed on the variants determined using the majority consensus sequence from all three time points and compared to other GII.4 variants in circulation across NSW and globally (Figure 5-1). The viruses identified in the chronic subject clustered together for all three time points and away from the other GII.4 2006b variants in circulation suggesting that subject Ch had been continuously infected with the same virus.

In order to understand how the population was evolving at the variant level, the ORF2 variants were reconstructed with ShoRAH. Surprisingly, unlike the four acute subjects, there was no dominant (i.e. >50%) ORF2 variant isolated from the viral population at any of the three samples collected from subject Ch. Six main variants with a frequency >2.5% were detected at day one and five main variants were detected at day four. None of these variants were present at both day one and day four. The most common variant had a frequency of 6.1% at day one. At day 288, no ORF2 variants were reconstructed at a frequency greater than the threshold of 2.5%. The lack of variants reconstructed in the third time point and the low frequency of those that were reconstructed in the first two time points indicated that the viral population in subject Ch was highly heterogeneous, with the circulation of many (>50) 94

Ch. 5 – NoV evolutionary dynamics

minor variants at all three timepoints. This was also supported by the fact that all five clones, for each timepoint of subject Ch, were non-identical sequences (data not shown).

The distribution of ORF2 variants at the amino acid level was also determined for subject Ch. In this analysis, more variants with a higher frequency were detected compared to the nucleotide analysis. Despite the increase, the overall frequency of the individual variants remained low (Figure 5-4). At day one, only eight variants were reconstructed at a frequency >2.5% with the most common variant present at 13.4%. Similarly, at day four and day 288 only 12 and six variants were reconstructed at a frequency >2.5%, respectively, however no variants were detected above 10% (Figure 5-4).

Individuals persistently infected with NoV may act as a reservoir for novel antigenic variants (233). To explore this possibility, we examined the frequency of variants within the viral populations and the distribution of polymorphic positions across the antigenic P2 domain of the viral capsid (aa 275 – 417) in subject Ch (Figure 5-5).

95

Ch. 5 – NoV evolutionary dynamics

Figure 5-5 Analysis of amino acid variants in the P2domain in subject Ch with chronic NoV infection.

Panel A shows the positions of evolving sites (highlighted in red) on the surface of a P2 domain dimer. Each P2 monomer is distinguished by light or dark grey shading. Key antibody binding sites (site A and site B) as well as HBGA binding pocket (highlighted aqua) are shown. Two orientations are provided: side view (left) and top view (right). Panel B shows the amino acid sequence of each NoV variant at 24 amino acid sites of which, 22 were identified as evolving and polymorphic. The frequency of each variant is provided at days one, four and 288 for subject Ch. Residues that varied between subjects and over time are highlighted by shades of red.

96

Ch. 5 – NoV evolutionary dynamics

Reconstructions of the P2 domain amino acid sequence (at a frequency >2.5%) identified 21 distinct P2 variants present throughout the infection. Eight, six and 13 distinct variants were present at days one, four and 288, respectively (Figure 5-5B). The frequency of variants present at each time point varied significantly. At day one, the two most common variants were present at 24.7% and 24.2%. By day four, a shift in variant dominance occurred where the first variant (present initially at 24.7% at day one) increased in frequency to 59.5%, whilst the second variant (present at 24.2% at day one) could no longer be detected (Figure 5-5B). Similarly, two other variants present in day one had disappeared by day four. By day 288, 13 P2 domain variants were identified that were distinct to those previously identified in days one and four. This transition of viral variants identified at day 288 was defined by substitutions at positions 293, 357, 368, 393 and 412 (Figure 5-5B). The most common variant was present at 16.9% and similar to the analysis of the entire ORF2 region, the P2 domain from Ch_288 appeared to be highly heterogeneous (Figure 5-5B).

Due to the co-existence of genetically diverse variants in the chronic subject, it is possible that additional diversity through recombination could occur further increasing the evolution rate of NoV. In this study the viral variants isolated from within the chronically infected subject were screened for recombination events (data not shown), however, due to the closely related nature of the population and the low frequency of the variants in the population it was not possible to call any recombination events with confidence.

97

Ch. 5 – NoV evolutionary dynamics

5.5 Discussion Genetic diversity of NoV occurs at the inter-host level and results in new antigenic variants and subsequent escape from herd immunity (151). However, it is not known when and how this genetic change occurs. By studying NoV intra-host evolution in acute and chronic NoV infections, and following a documented transmission cluster we address these issues in this study.

Analysis of NGS data from three subjects with known epidemiological clustering revealed that transmission of NoV is characterised by a strong genetic bottleneck, with only minor variants present in the source population being successfully transmitted to the recipient hosts for subsequent infection. This result was despite the fact that all acute subjects studied had a highly homogenous viral population, with a single variant representing >60% of the total viral population. Interestingly, the minor variants transmitted differed at the amino acid level to the dominant donor variant, with one recipient, RF, carrying an amino acid change in VP1 and the other recipient, RG, carrying an amino acid change in VP2. These findings are not unique to NoV, as similar patterns of transmission have been identified in HCV and HIV (46, 65, 154). In HIV, the variants responsible for establishing infection in the new host are reported to have unique phenotypic properties that are likely to contribute to establishment of infection. For example, specific amino acids that increase viral entry efficiency, as well as unique glycosylation patterns that are hypothesized to help stabilize the structural proteins and aid in attachment to host receptors (97, 130).

Since NoV is known to bind to the HBGA attachment factors (41, 230, 252), it is possible that the transmission of NoV variants may be based on similar structural constraints as those seen in HIV. The amino acid change S347G identified in the GII.12

VP1 of the recipient’s (RF) variants may provide evidence of HBGA structural selective pressure, as residue 347 sits directly adjacent to the primary HBGA binding site in these viruses (107). Alternatively, the transmission of a minor variant could just be a random event as NoV has a low infectious dose, approximately 18 virions (256), therefore it is likely that only a small number of variants are transmitted. In addition, the transmission route could influence which variant is able to establish infection. For

98

Ch. 5 – NoV evolutionary dynamics

example, in foot and mouth disease virus (FMDV), different evolutionary trajectories of the transmitted viral variants have been observed in isolated body compartments of the same host (285). Therefore, the NoV population excreted in the vomitus could be genetically different from the virus excreted in the faeces. Unfortunately we were not able to test this theory as no sample was available from the vomitus. Whether stochastic or deterministic in nature, this study suggests that transmission is an important contributor to genetic change in the NoV population.

Longitudinal analysis of intra-host NoV evolution revealed that limited diversity was observed within the capsid region of the immune-competent individual with a typical acute NoV infection and was in contrast to the chronic infection, where VP1 diversity significantly increased over the course of the infection. The observed variation of the intra-host NoV population is in accordance with previous bulk sequencing studies on chronic NoV infection (43, 227). However, this study revealed that chronic NoV infection generated a very diverse viral population, with co- circulation of many (>50) minor variants (<13.4% frequency at the amino acid level). This result contrasts with the limited diversification observed in subjects who resolved infection during acute phase, and where a single major variant dominated the population.

Viral escape from neutralising antibodies has been mapped to the hypervariable P2 domain, which sits on the protruding surface of the viral capsid (155). Based on the previous suggestion that chronic shedders may be an important source of novel antigenic variants (244), an in-depth analysis of the NoV population at the VP1 P2 domain was performed. This analysis revealed 22 polymorphic positions (Figure 5-5), seven of which occurred at sites previously determined to be functionally significant (4, 61, 150, 230). For example, residues at 296-298 and 393-395 (termed site A and site B, respectively) form a variant specific epitope involved in neutralising antibody binding (4). At days one and four, variation was observed at site A, residue 297 for patient Ch, where both arginine and histidine residues were identified (Figure 5-5B). Interestingly, variants with an arginine increased in frequency between days one and four, while variants with a histidine decreased in frequency within the same time

99

Ch. 5 – NoV evolutionary dynamics

frame. By day 288, substitutions were identified at residues 297 and 298 of site A and residues 393 and 395 of site B. More recently, two studies have identified additional residues involved in antibody binding (61, 150). Debbink et al. identified epitope A that expanded site A to include amino acids 294, 368 and 372. Of these the latter two, were shown to evolve across all time points in the chronic patient (61). Furthermore, Lindesmith et al. showed that residues 407, 412 and 413 formed an alternate blockade epitope with an asparagine-aspartic acid transition observed in patient Ch between days four and day 288 at position 412 (150). Therefore, despite the severe immune deficiency of subject Ch, some weak antibody response may have driven selection and variation at these antigenic sites. The P2 domain also contains two sites involved in HBGA binding (230) (Figure 5-5A). Site I (residues 344-346, 374 and 440-444) is the primary HBGA binding site and is highly conserved across GII.4 variants. In subject Ch, no substitutions were observed at site I in the reconstructed P2 variants. At site II (residues 387-396), which stabilises HBGA interactions and modulates binding specificity, substitutions were identified in variants at day 288, however these corresponded with the same shared residues of the antigenic site B (393 and 395). Therefore, these changes observed at residues 393 and 395, may actually be explained by HBGA-selection and host receptor adaptation rather than immune driven selection.

Of the 22 polymorphic positions identified in the P2 domain, ten were found to contain substitutions that matched previously identified transitions between different GII.4 variants (residues 297, 298, 340, 341, 352, 356, 372, 393, 395 and 412). For example, at position 298, P2 domain variants were identified with either aspartic acid or asparagine residues. Most ancestral GII.4 variants (Pre-2001) such as Bristol, Camberwell and US-1995/96 have an aspartic acid at this position, whilst modern GII.4 variants such as Hunter (2004), 2006b (2006-7), Apeldoorn (2008) and New Orleans (2010) have an asparagine at this position. Within the chronically infected subject, position 340 was another example where the P2 variants were toggling between residues that defined GII.4 variants with the amino acids glycine, arginine and threonine all identified in the reconstructed P2 variants. Glycine was a defining residue of Farmington Hills (2002), Asia2003 (2003) 2006b (2006-7) and Cairo (2007) GII.4 variants, whilst arginine was present in Hunter (2004) and 2006a virus (2006) GII.4 100

Ch. 5 – NoV evolutionary dynamics

variants. Lastly, a threonine at position 340 defines the recent Apeldoorn (2008) and New Orleans (2010) GII.4 variants. This provides some evidence that individuals chronically infected with NoV may act as reservoirs for new variants, as detection of a range of known antigenic variants was identified in the chronically infected subject. However, toggling between these residues may also just reflect the virus exploring its sequence space and functionally permitted changes. Due to the unavailability of serum samples we were not able to investigate the driving force of the residue toggling in the chronic subjects. The subject with chronic infection analysed here was severely immune-compromised, therefore the evolution in the P2 domain could either be a consequence of a weak humoral immune response or of a greater capacity of the specific virus to generate novel variants at these sites.

It is also important to consider the role recombination plays in the generation of antigenic diversity. We did not detect the presence of ORF2 recombination in patient Ch with any statistical significance. The short read lengths of 454 data created a reliance on bioinformatic tools (ShoRAH) to estimate the full length ORF2 sequence. We have previously shown that variant sequences of similar length to ORF2 can be reliably reconstructed from 454 data when the variant is present at a frequency above 2.5% (36). Below this frequency there is a risk of recombinant variants being reconstructed in silico. Given the low frequency of the variants in the chronic subject and the high sequence similarity of intra-host viral populations, we could not report any recombination events with confidence, even if detected. So the extent of recombination in intra-host evolution remains to be determined.

While this study has shown that chronic variants have the propensity to rapidly generate novel variants, the contribution of this diversity towards the evolution of NoV at the inter-host population level is still unclear. However it is likely to be significant as one study has shown that chronic shedders can act as a source for nosocomial outbreaks (244). In feline calicivirus (FCV), chronic infections are considered to play an important role in the epidemiology of the disease (56). However they occur with a much greater frequency (15-91%) than persistent NoV infections in humans, therefore

101

Ch. 5 – NoV evolutionary dynamics

it is hard to assess the contribution of this diversity to the overall evolution of NoV at the inter-host level.

In summary, we revealed that following NoV transmission only minor variants successfully established a new infection. This is the first study to investigate the evolutionary impact of transmission on NoV evolution and indicates that a significant bottleneck is likely to at least in-part drive genetic diversification in NoV and could impact on the effectiveness of future vaccines. Further studies are needed to determine whether transmission of minor variants is a common phenomenon for NoV, as is observed in HIV and HCV (36, 89, 224). It is also important to establish the biological relevance of this transmission bottleneck. It may be the case, that NoV attachment factors such as HBGA may provide a structural barrier that reduces the number of viruses that can establish infection. This study also showed that in a typical acute NoV infection the viral population is highly homogenous and relatively stable. In contrast, immune-compromised subjects with chronic NoV infection had a rapidly evolving and dynamic viral population. This suggests that subjects with chronic NoV infection could represent a potential reservoir for the emergence of new antigenic variants, however, the contribution of the genetically diverse repertoire to the overall evolution of NoV at the inter-host level has yet to be determined. It would be beneficial to study a larger cohort of subjects chronically infected with NoV and with more frequent collection points to determine if the rapid diversification and turnover of variants observed in this study was a common phenomenon. Given the fact that NoV vaccines have entered the initial stages of clinical trials the need for information on NoV evolution is imminent and will help predict the components needed for an effective vaccine and guide sequence predictions of future NoV pandemic variants.

102

Ch. 6 – NoV molecular epidemiology 2009-10

6 “The emergence of the pandemic norovirus GII.4 variants”

John-Sebastian Edena, Mark M Tanakaa, Maciej F Bonib, William D Rawlinsona,c,d and Peter A Whitea

aSchool of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia bOxford University Clinical Research Unit, Wellcome Trust Major Overseas Programme, Ho Chi Minh City, Vietnam cSchool of Medical Sciences, Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia dVirology Division, Department of Microbiology, SEALS, Prince of Wales Hospital, Randwick, NSW, Australia

Submitted for publication to PLoS Pathogens

Author contributions: Conceived and designed the experiments – JSE MMT PAW; Performed the experiments – JSE MMT MFB; Analyzed the data – JSE MMT MFB; Contributed reagents/materials/analysis tools – JSE MMT MFB WDR PAW; Wrote the paper – JSE MMT MFB PAW.

103

Ch. 6 – NoV molecular epidemiology 2009-10

6.1 Abstract NoV is the leading cause of viral gastroenteritis globally. During 2009 and 2010, epidemics of NoV-associated acute gastroenteritis occurred in NSW, Australia and around the world. These appear to have coincided with the emergence of a new global epidemic genogroup II, genotype 4 (GII.4) variant, commonly referred to as New Orleans 2010. In this study, we examined 123 NoV-positive stool specimens (59 from 2009, 64 from 2010), and characterised the viruses using RT-PCR and sequencing. During 2009, two distinct lineages of the GII.4 variant New Orleans 2010 were identified, a novel NSW-lineage and a global-lineage that clustered with the New Orleans 2010 strains identified globally. Furthermore, the NSW-lineage was predominant during the epidemic period of 2009 causing 67.8% of the infections. In comparison, strains from the global-lineage were only associated with 13.6% of NoV infections in 2009, but predominated during 2010 causing 65.5% of NoV infections, almost completely replacing the NSW-lineage of 2009. In order to characterise the patterns of evolution that facilitated the emergence of the GII.4 variant, New Orleans 2010, the evolutionary history of the GII.4 capsid gene was reconstructed using a Bayesian time-scaled approach. This analysis revealed that the GII.4 capsid gene has evolved at rate of 5.54x10-3 subs/site/year since 1974 with oscillations in relative genetic diversity that coincided with the emergence of novel variants. A genome-wide approach was then employed with the program 3seq, to consider the role that intra- genotype recombination played in the emergence of novel GII.4 variants. Recombination breakpoints were detected near the ORF1/2 overlap in five NoV variants and ORF2/3 overlap in two NoV variants. One putative breakpoint was identified within ORF2 in the Cairo 2007 variant. Therefore, intra-genotype recombination is a common feature of NoV GII.4 evolution and importantly, played a role in the emergence of the current predominant GII.4 variant in circulation, New Orleans 2010. Lastly, this study highlights the many challenges in the identification of true recombination events and proposes that guidelines be applied for identifying recombination in NoV.

104

Ch. 6 – NoV molecular epidemiology 2009-10

6.2 Introduction NoV, a member of the Caliciviridae family, is the leading cause of acute viral gastroenteritis and is now estimated to cause almost half of all cases of gastroenteritis globally (199). Although commonly identified as the cause of sporadic disease, NoV is primarily associated with outbreaks in institutional settings such as aged-care facilities, hospitals, cruise ships and child care centres (20, 258). NoV is a highly infectious pathogen that is readily transmitted from person-to-person or through contamination of water and food sources (91, 256, 259). Furthermore, epidemics of acute gastroenteritis are associated with the emergence of antigenic variants from a specific genetic lineage, the genogroup II, genotype 4 (GII.4) viruses, which have occurred globally since the mid-1990s with increasing frequency (33, 234, 236). Consequently, NoV associated gastroenteritis has become a major public health concern.

NoV possesses a single stranded, positive-sense, poly-adenylated RNA genome of approximately 7,500 nucleotides, which is packaged within a naked icosahedral virion of 27 – 32 nm in diameter (127). The viral genome is organised into three ORFs with short untranslated regions at both the 5’ and 3’ ends. ORF1 encodes a 200 kDa polyprotein that is cleaved by the viral protease into the non-structural proteins, which includes an RdRp (18). The two structural capsid proteins VP1 and VP2 are encoded by ORF2 and ORF3, respectively. VP1 is the major component of the viral capsid (90 dimers per virion) and is divided into three major structural domains. These include a conserved shell (S) domain connected by a flexible hinge to a protruding stem (P1) domain that leads to the hypervariable P2 domain, which forms the external surface of the viral capsid (212). VP2 is a small basic protein with an undefined function, although a role in capsid assembly (19) and RNA packaging into the virion (96) have been proposed.

Like most RNA viruses, NoV demonstrates extensive genetic diversity and has been classified into five genogroups based on VP1 amino acid sequence (294). Each genogroup can be furthered divided into genotypes; currently more than 36 genotypes have been described (140, 294). Human NoVs include viruses from genogroups I, II and IV, with the GII.4 viruses most commonly identified in both outbreak and sporadic

105

Ch. 6 – NoV molecular epidemiology 2009-10

settings (236). NoVs are also known to infect a wide range of mammals including pigs, sheep, cows, lions, dogs and mice (153, 167, 168, 270, 282).

Since the mid-90s, variants of the NoV GII.4 lineage have caused 62-80% of NoV outbreaks globally (73, 236). Furthermore, distinct GII.4 variants were associated with global epidemics of acute gastroenteritis from 1996 to present. These GII.4 variants included US 1995/96 in 1996 (189, 275), Farmington Hills in 2002 (157, 278), Hunter in 2004 (38) and 2006b virus in 2007-2008 (81). A number of additional GII.4 clades have been identified including the Henry 2001, Japan 2001, Asia 2003, 2006a and Apeldoorn 2008 variants, however these were associated with epidemics localised to particular regions rather than pandemics (15, 106, 164, 171, 179, 193, 234, 236). More recently, a number of countries have reported the spread of a new GII.4 variant termed New Orleans 2010 (214, 288), which appears to have replaced 2006b as the predominant NoV in circulation

A number of mechanisms are thought to drive the evolution of the GII.4 lineage [reviewed in (39)]. The GII.4 viruses are thought to have a larger susceptible population to infect as a result of having a wider binding range to HBGAs, compared to viruses from other genotypes (60, 230). Additionally, through a fast rate of evolution, new antigenic variants emerge from the GII.4 lineage every two to three years, which are often associated with wide-spread epidemics (33, 151).

Antigenic change is most evident within the hypervariable P2 domain of VP1, which contains the host-cell receptor binding and antigenic regions of the viral capsid and is therefore under the greatest selective pressure (4, 151, 152). In this regard, NoV capsid evolution is reminiscent of the evolution of influenza A virus haemagglutinin (HA) where immune driven selection leads to new antigenic variants that emerge to replace their predecessors. As well as evolution through genetic drift, NoV undergoes homologous recombination generally at the ORF1/2 overlap (33, 37), although some studies have reported recombination within ORF1 and ORF2 (221, 273). Recombination at the ORF1/2 overlap is important as it facilitates the exchange of non- structural and structural genes between different NoV lineages. NoV inter-genotype recombinants are frequently identified in molecular epidemiological studies (25, 37,

106

Ch. 6 – NoV molecular epidemiology 2009-10

81, 182), however recent work suggests that intra-genotype recombination may have also played a role in the emergence of some GII.4 variants (179).

Since its emergence in late 2008, a new GII.4 variant commonly referred to as New Orleans 2010, has been spread globally causing epidemics of acute gastroenteritis (186, 214, 265, 288). The variant was reported in the US, Australia, Europe and across Asia, replacing the previous pandemic variant 2006b as the predominant GII.4 variant in global circulation. Using new data from two large winter epidemics in Australia in 2009 and 2010 together with sequences from public databases, we explore the evolutionary profile of this variant and determine the influence of both antigenic drift and recombination on its emergence. Our analysis then applies a genome wide examination to consider the broader role that recombination has played in the emergence of several other GII.4 variants.

107

Ch. 6 – NoV molecular epidemiology 2009-10

6.3 Materials and methods

6.3.1 Outbreak data and specimen collection Institutional gastroenteritis outbreak data were collected through the NSW Department of Health, with outbreaks primarily reported from settings such as nursing homes, hospitals and child-care centres. Stools were collected through SEALS at POWH, between January 2009 and December 2010 and all initially confirmed to be NoV positive using Ridascreen 3rd Generation NoV ELISA (r-biopharm, Darmstadt, DE) and monthly NoV positive numbers were recorded. Each NoV specimen was then classified as outbreak or sporadic based on the following criteria: Outbreaks were defined as those with two or more temporally linked NoV-positive specimens collected from the same institution or location; all remaining specimens were classified as sporadic

6.3.2 Detection of NoV GI and GII RNA All stool specimens were prepared as 20% (v/v) suspensions and viral RNA extracted as previously described (258). Viral RNA was reversed transcribed using the SuperScript™ VILO cDNA synthesis kit (Invitrogen) according to the manufacturer’s instructions. A real-time RT-PCR targeting the 5’ end of the NoV capsid gene was employed using iQ SYBR Green Supermix (Bio-Rad) and the primers COGIF/GISKR (125, 137) and G2F3/G2SKR (108, 137) for NoV GI and GII detection, respectively. The reaction conditions were as those described in the manufacturer’s instructions with an annealing temperature of 55°C and an extension time of 60 s.

6.3.3 Amplification of NoV GII.4 RdRp and VP1 gene An RT-PCR was developed to amplify the full-length VP1 capsid gene of GII.4 variants using the SuperScript™ III One-Step RT-PCR System with Platinum® Taq High Fidelity (Invitrogen). The reaction conditions were as those described in the manufacturer’s instructions with an annealing temperature of 55°C and an extension time of 135 s. The primers GV305 and GV306 (Supplementary information Table 6-4) generated a PCR amplicon with a length of 1918 bp. that was purified for sequencing using ExoSAP-IT (GE Life Sciences).

108

Ch. 6 – NoV molecular epidemiology 2009-10

6.3.4 Amplification of NoV full-length GII.4 genomes A long RT-PCR was employed to amplify the full-length genome of NoV GII.4 variants. Firstly, cDNA was transcribed from viral RNA using a modified Oligo(dT)30 primer (GV270) (Supplementary information Table 6-4) and the SuperScript™ III First- Strand Synthesis System (Invitrogen). The primer GV270 was designed so that it would bind to the NoV poly(A) tail and generate full-length cDNA transcripts whilst incorporating a DNA tag sequence (5’-GCATGACTGACATAGCACAGCGGCCGCCC-3’). Following cDNA synthesis, a long RT-PCR was performed using the primers GV207 and GV271 (or GV17 and GV271) and the SequalPrep™ Long PCR kit (Invitrogen) (Supplementary information Table 6-4). The primers GV207 and GV17 bind to the first 33 bases of the NoV GII genome with GV207 also having a NotI restriction site. The primer GV271 was complementary to the DNA tag sequence present in the cDNA primer GV270. The conditions for the long RT-PCR were as follows: Initial denaturation at 94°C for 120 s; then 10 cycles of 94°C for 15 s and 68°C for 510 s; then 30 cycles starting at 94°C for 15 s and 68°C for 510 s and with every cycle, the extension time increased by 20 s. Following the RT-PCR, amplicons were analysed by agarose gel electrophoresis and bands corresponding to the full-length NoV genome (approximately 7.6 kb), were gel excised and purified using the QIAquick Gel Extraction kit (Qiagen).

6.3.5 DNA sequencing and alignments All purified PCR products were sequenced using dye-terminator chemistry on an ABI 3730 DNA Analyzer (Applied Biosystems). The VP1 capsid gene products were sequenced using primers GV305, GV307 and GV308 (Supplementary information Table 6-4) and full-length NoV genomes with primers listed in Table 6-4 (Supplementary information). The raw sequence reads were first edited with FinchTV v1.4 (Geospiza, Seattle, US), then DNA alignments were built in MEGA5 (247).

6.3.6 Time-measured phylogenetic analysis of the NoV GII.4 capsid gene For this analysis, a total of 261 GII.4 capsid sequences identified between 1974 and 2010 were examined, of which 181 sequences were derived from GenBank and 80 were from NSW NoV strains generated as part of this study. The full-length capsid gene

109

Ch. 6 – NoV molecular epidemiology 2009-10

sequence alignments were build using MEGA5 (247) and then screened for recombination using the Recombination Detection Program (RDP) v3.44 (169). GII.4 Cairo 2007 variants, such as NSW505G/Jul/2007 (GenBank accession number GQ845368), were excluded from the phylogenetic reconstructions as there was trace evidence of recombination within ORF2 detected using RDP. The NoV GII.4 capsid phylogeny was reconstructed using the Bayesian methods implemented in BEAST (77). In order to account for both rate variation amongst lineages (branches) and sites, an uncorrelated lognormal relaxed clock model (76) was used with the generalised time reversible (GTR) substitution model including gamma distributed substitution rates (4 classes) and invariant sites. The substitution rate parameters and base frequencies were unlinked across (1st + 2nd) and 3rd codon partitions to allow for separate model estimates. The Bayesian Skyline demographic model (78) with 50 groups in the piecewise-constant function size was used to account for the complex population structure. For each analysis, three independent chains were run for 100 million steps (with at least 10% burn-in), and then the trace was checked with Tracer to ensure convergence before the three runs were combined. The maximum credibility clade tree was then inferred using Tree Annotator and visualised with FigTree. Both Tracer and FigTree are available from http://tree.bio.ed.ac.uk/software. Molecular adaptation was also explored through the web server of the HyPhy package (63, 208, 210) (available at http://www.datamonkey.org/). The iFEL method was employed with a statistical threshold of p≤0.05 (209).

6.3.7 Genome wide examination for recombination in GII.4 viruses Full-length GII.4 genome sequences were obtained from GenBank, and then combined with 15 new sequences generated as part of this study. There was an over- representation of 2006b sequences from two Japanese data sets of full-length GII.4 genomes identified between 2006 and 2009 (178, 179). Other GII.4 clades such as Farmington Hills 2002 were also found to contain highly homogeneous sequences. Therefore, the alignment of 252 sequences was trimmed to remove those sequences with >99% identity. This resulted in an alignment of 81 sequences with representatives from all major GII.4 clades identified between 1974 and 2010 except Japan 2001 as no full-length genome representative was available from GenBank. 110

Ch. 6 – NoV molecular epidemiology 2009-10

The alignment was examined for strains that were positioned outside of the major GII.4 clades by phylogenetic analysis of the full-length genome. This led to exclusion of three strains, Toyama5/2008/JP, Iwate5/2007/JP/ and PC51/2007/IN, as the recombination events were suspected to be artificial. The remaining 78 sequences were then analysed using 3seq (28), which considers each triplet of sequences to identify potential recombination breakpoints. The significance threshold was p<0.05 and Dunn-Sidak correction was applied to correct for multiple testing. Maximum likelihood phylogenies were produced using MEGA5 based on sequences between recombination breakpoint regions identified by 3seq. An appropriate model of molecular evolution model was chosen according the Akaike Information Criterion with correction (AICc). Individual recombinants were also compared to potential parental strains using SimPlot (156)

6.3.8 Nucleotide sequence accession numbers The GenBank accession numbers for strains sequenced in this study are as follows. For the full-length genome data set: GQ845367 - GQ845369, HM748971 - HM748973, JQ613570 - JQ613573, EF187497, EF684915, GQ845024, GQ845366 and DQ078814. For the NSW capsid data-set: JQ613507 - JQ613566, GQ845312, GQ845317, GQ845318, GQ845329, GQ845332, GQ845336, GQ845337, GQ845341, GQ845345, GQ849126 and GQ849128. The sequence alignments used for each analysis are available from the author by request.

111

Ch. 6 – NoV molecular epidemiology 2009-10

6.4 Results

6.4.1 Epidemics of acute gastroenteritis during 2009 and 2010 In NSW, Australia, there was an increase in the number of NoV-associated gastroenteritis outbreaks reported to the NSW Department of Health during the late winter periods of both 2009 and 2010 (Figure 6-1). The timing of these epidemics coincided with the emergence of a new GII.4 variant, New Orleans 2010, which appears to have replaced the previous pandemic variant, 2006b, as the predominant GII.4 variant in global circulation (214, 265, 288). Initially, we sought to determine the role of this variant on the winter outbreaks seen in Australia. In 2009, a peak in gastroenteritis outbreaks was reported between August and October (252% higher compared to the three previous months, n=338 outbreaks). In 2010, the peak months for reported gastroenteritis outbreaks were between July and September (114% higher compared to the three previous months, n=257 outbreaks) (Figure 6-1). The majority of outbreaks occurred within aged care facilities (47.9%) followed by child care centres (35.2%) and hospitals (14.4%). Furthermore, the 2009 and 2010 peaks in institutional gastroenteritis outbreaks coincided with increased detection of NoV positive faecal specimens at the SEALS diagnostic facility (Figure 6-1).

112

Ch. 6 – NoV molecular epidemiology 2009-10

Figure 6-1 Epidemics of institutional acute gastroenteritis coincided with increases in NoV detection across NSW, Australia.

The monthly totals of institutional gastroenteritis outbreaks reported to NSW Health (black) was compared to the monthly totals of NoV positive faecal specimens detected by SEALS, POWH (grey) for the period January 2009 to December 2010. Increases in both institutional gastroenteritis outbreaks and NoV detection in clinical faecal specimens were observed in the late winter periods of 2009 and 2010.

113

Ch. 6 – NoV molecular epidemiology 2009-10

6.4.2 NoV strains identified in this study In order to characterise the aetiological agent of these epidemics of acute gastroenteritis, 59 and 64 NoV-positive stool specimens were examined from both outbreaks (n=29, 2009; n=24, 2010) and sporadic infections (n=3, 2009; n=26, 2010) during 2009 and 2010, using real-time RT-PCR. Each specimen was then genotyped based on partial capsid gene sequencing.

In 2009, two NoV genotypes were identified from the 59 specimens tested: GII.4 (94.9%, n=56/59) and GII.g/GII.12 (5.1%, n=3/59) (Table 6-1). The predominant GII.4 variant, identified in 67.8% of samples (n=40/59 including 19/29 outbreaks), was a discreet sub-cluster of the GII.4 New Orleans 2010 variant, referred to here as the ‘NSW-lineage’ (Table 6-1). Closely related viruses that clustered with the New Orleans 2010 variants detected in the US, Europe and Asia such as NewOrleans/1805/2009/US (GenBank accession number GU445325) (Figure 6-2) were also identified with lower frequency (13.6%, n=8/59) (Table 6-1). In this study we refer to these New Orleans 2010 variants as the ‘global-lineage’ to differentiate them from the NSW-lineage. The remaining GII.4 variants identified belonged to previously described GII.4 variants, including 2006b (6.8%, n=4/59) and Apeldoorn 2008 (6.8%, n=4/59) (Table 6-1).

In 2010, six NoV genotypes were identified from 64 specimens tested; GII.4 (73.4%, n=47/64), GII.b/GII.3 (14.1%, n=9/64), GII.7 (6.3%, n=4/64), GIV.1 (3.1% n=2/64), GII.g/GII.12 (1.6%, n=1/64) and GI.b/GI.6 (1.6%, n=1/64) (Table 6-1). The predominant GII.4 virus identified during 2010, were those belonging to the GII.4 New Orleans 2010 global-lineage (65.6%, n=42/64, including 19/24 outbreaks).Only a single sporadic infection was caused by a virus of the NSW-lineage (1.6%, n=1/64). The GII.4 2006b variant was also identified in four sporadic infections (6.3%, n=4/64) (Table 6-1).

114

Table 6-1 Prevalence of NoV genotypes identified during 2009 and 2010

GenBank No. of % of total No. of % of total Period of study Genotypea Prototype strainb accession infections infections outbreaks outbreaks GII.4 New Orleans 2010 [NSW] Orange/NSW001P/2008/AU GQ845367 40 67.8 19 65.5 GII.4 New Orleans 2010 [Global] NewOrleans/1805/2009/US GU445325 8 13.6 5 17.2 January - GII.4 2006b DenHaag/89/2006/NL EF126965 4 6.8 2 6.9 December 2009 GII.4 Apeldoorn 2008 Apeldoorn/317/2007/NL AB445395 4 6.8 2 6.9 GII.g/GII.12 St George/NSW199U/2008/AU GQ845370 3 5.1 1 3.5 59 100 29 100 GII.4 New Orleans 2010 [Global] NewOrleans/1805/2009/US GU445325 42 65.6 19 79.2 GII.b/GII.3 Sydney/C14/2002/AU AY845056 9 14.1 1 4.2 GII.4 2006b DenHaag/89/2006/NL EF126965 4 6.3 0 0 January - GII.7 Gwynedd/273/1994/US AF414409 4 6.3 1 4.2 December 2010 GIV.1 Alphatron/98-2/1998/NL AF195847 2 3.1 2 8.3 GII.4 New Orleans 2010 [NSW] Orange/NSW001P/2008/AU GQ845367 1 1.6 0 0 GII.g/GII.12 St George/NSW199U/2008/AU GQ845370 1 1.6 1 4.2 GI.b/GI.6 Wakayama/WUG1/2000/JP AB081723 1 1.6 0 0 64 100 24 100 aGenotypes based on nomenclatures proposed by NoroNet (140). Recombinant strains are presented with ORF1/ORF2 designations. bPrototype strains presented as Location/Strain ID/Year/Country

115

Ch. 6 – NoV molecular epidemiology 2009-10

6.4.3 Distinct GII.4 variants were identified as the cause of each epidemic in 2009 and 2010 Since 2007, at least four new GII.4 variants have been identified, including Cairo 2007 (126), Osaka 2007 (179), Apeldoorn 2008 (15) and New Orleans 2010 (81, 288) (Table 6-2). We sought to compare these recently emerged GII.4 variants to those previously identified since 1974 to determine the evolutionary mechanisms that influenced their emergence. Eighty new full-length capsid gene sequences were generated from Australian strains identified between 2007 and 2010 from a total of 55 NoV outbreaks and 25 unrelated sporadic infections. The sequences were then added to an alignment of 181 reference sequences derived from GenBank, which included representative strains from all known GII.4 pandemic clades including; US 1995/96, Farmington Hills 2002, Hunter 2004, 2006b and New Orleans 2010 (Table 6-2). Each pandemic GII.4 variant was characterised by period of temporal dominance of approximately 2 years, although some variants such as US 1995/96 and 2006b demonstrated longer periods of circulation (Table 6-2). Also represented were sequences from other GII.4 variant clades that were associated with localised regional epidemics and not pandemics including Japan 2001, Henry 2001, Asia 2003, 2006a, Osaka 2007 and Apeldoorn 2008, as well as ancestral GII.4 strains such CHDC 1970s, Bristol 1993 and Camberwell 1994 (Table 6-2).

116

Table 6-2 GII.4 variants examined in this study

Period of GenBank GII.4 variantsa Prototype strainb Referencec circulation accession CHDC 1970s 1974 - 1977 CHDC/5191/1974/US FJ537134 (23) Bristol 1993 1987 - 1993 Bristol/B493/1993/GB X76716 (6) Camberwell 1994 1987 - 1994 Camberwell/101922/1994/AU AF145896 (45) US 1995/96 1995 - 2002 MiamiBeach/326/1995/US AF414424 (189) Henry 2001 2000 - 2004 Henry/2000/US FJ411170 (197) Japan 2001 2002 - 2003 Emmen/E006/2002/NL AB303929 (229) Farmington Hills 2002 2002 - 2004 FarmingtonHills/2002/US AY502023 (157) Asia 2003 2003 - 2006 Guangzhou/NVgz01/2003/CN DQ369797 (192) Hunter 2004 2004 - 2006 Hunter/NSW504D/2004/AU DQ078814 (38) 2006a 2006 - 2008 Kenepuru/NZ327/2006/NZ EF187497 (258) 2006b 2006 - 2010 DenHaag/89/2006/NL EF126965 (258) Osaka 2007 2007 - 2008 Osaka/07138/2007/JP AB434770 (179) Cairo 2007 2007 Sutherland/NSW505G/2007/AU GQ845368 (126) Apeldoorn 2008 2007 - 2009 Apeldoorn/317/NL/2007 AB445395 (15) New Orleans 2010 [NSW] 2008 - 2010 Orange/NSW001P/2008/AU GQ845367 (81) New Orleans 2010 [Global] 2009 - 2010 NewOrleans/1805/2009/US GU445325 (288) aGII.4 variants shown in bolded italicised text were associated with a pandemics of acute gastroenteritis and demonstrated a global distribution. bPrototype strains presented as Location/Strain ID/Year/Country. cReferences refer to the first report or full epidemiological description for each variant.

117

Ch. 6 – NoV molecular epidemiology 2009-10

Using the Bayesian methods implemented in BEAST, the phylogenetic history of all GII.4 variants in circulation between 1974 and 2010 was reconstructed (Figure 6-2A). This analysis revealed that the GII.4 variants have evolved with a mean substitution rate of 5.54x10-3 subs/site/year since 1974 (S.E. ± 7.79x10-6) and demonstrate the complex oscillatory population dynamics consistent with an acute RNA virus that causes periodic epidemics of disease (Figure 6-2B). Furthermore, the cyclic changes in relative genetic diversity (NeT) have mostly coincided with the emergence of novel variants that caused pandemics and spread globally (Figure 6-2B). The most recently identified GII.4 variant, New Orleans 2010, forms two distinct genetic sub-clusters, the NSW and global-lineages, which have a mean group identity of 96.95% across the capsid gene (Figure 6-2A and C). The GII.4 New Orleans variants share a common ancestry with the GII.4 Apeldoorn 2008 variants (Figure 6-2A). The New Orleans 2010 NSW-lineage appears to be a novel Australian lineage as no other sequences from GenBank formed part of this clade (Figure 6-2C). In contrast, the majority of GII.4 strains identified during the 2010 epidemic period clustered with the New Orleans 2010 global-lineage that included strains identified in the US, France, Hungary, Hong Kong, Taiwan and Japan (Figure 6-2C).

118

Figure 6-2 Phylogenetic reconstruction of NoV GII.4 capsid evolution from 1974 to 2010. (A) Time-scaled phylogenetic analysis of the NoV GII.4 capsid gene (VP1) using strains derived from GenBank (n=181) and strains detected in NSW, Australia from 2007-2010 (n=80) (total n=261). Each GII.4 clade has been previously described (50), whilst the two lineages of New Orleans 2010 variant are recently emerged. This analysis showed that the since 1974, the GII.4 variants have evolved with a mean substitution rate of 5.54x10-3 subs/site/year. (B) The demographic history of the GII.4 variants shown as changes in relative genetic diversity (NeT) through time using a Bayesian Skyline Plot (77, 78). The GII.4 variants demonstrate a complex population structure with oscillations that coincide with the emergence of novel variants causing epidemics of acute gastroenteritis. The black line represents the median posterior value, the grey lines the 95% Highest Probability Density intervals. (C) A close-up view of the two New Orleans 2010 lineages, NSW-lineage and global-lineage, with the strains identified in this study show in bold text. The x-axis for all panels was scaled to time (years).

119

Ch. 6 – NoV molecular epidemiology 2009-10

6.4.4 Antigenic variation facilitated the emergence of the GII.4 New Orleans 2010 variant Viruses of the GII.4 New Orleans 2010 NSW-lineage were initially identified during 2008 (81); prototype strain Orange/NSW001P/Nov/2008 (GenBank accession number GQ845367). In July 2009, viruses of the NSW-lineage were identified that had amino acid substitutions at known antibody binding sites on the viral capsid (61); prototype strain Rockdale/NSW006D/Jul/2009 (GenBank accession number JQ613570). Furthermore, all viruses of the NSW-lineage identified after Rockdale virus contained the same substitutions, which may have contributed to their predominance during the 2009 epidemic of the acute gastroenteritis in NSW between August and October 2009. During the same month, July 2009, the New Orleans 2010 global-lineage was also identified for the first time in NSW.

Since both the lineages of the New Orleans 2010, NSW and global-lineages were identified in July 2009, immediately prior to the epidemic, we sought to determine the potential antigenic differences between them and the previously circulating GII.4 pandemic variant 2006b. A number of studies have shown that NoV capsid evolution is shaped by escape from herd immunity through amino acid variation mapped onto the most surface exposed region of the viral capsid, the P2 domain (3, 4, 61, 73, 150-152). Therefore, we chose the capsid P2 domain as our region of analysis. The strain Westmead/NSW3639/Nov/2008 (GenBank accession number GQ845366) was selected to represent the 2006b variants that caused the epidemic of acute gastroenteritis in NSW during 2008 (81). In this analysis, 22 residues were identified across the capsid P2 domain where variation was observed between the recent GII.4 New Orleans 2010 variants and 2006b (Figure 6-3). Of these 22 residues, 14 were common to both the global and NSW-lineages of New Orleans 2010 (coloured green in Figure 6-3). Of the remaining eight changes, five were unique for the NSW-lineage (residues 297, 356, 376, 377 and 395 - coloured blue in Figure 6-3) and three were unique for the global-lineage (residues 294, 341 and 396 - coloured red in Figure 6-3). In both New Orleans 2010 lineages, the unique changes included residues within two epitopes (A and D) previously shown to be targets for antibody binding (61, 150-152)

120

Ch. 6 – NoV molecular epidemiology 2009-10

which are likely to provide antigenic distinction from the previous circulating GII.4 2006b variant (Figure 6-3).

Figure 6-3 Antigenic variation in the GII.4 New Orleans 2010 variant.

The top panels show 22 sites of antigenic variation in the NSW-lineage (left) and global-lineage (right) compared to the 2006b strain NSW3639/Nov/2008 mapped onto the GII.4 capsid P2 domain dimer structure. Fourteen of these sites are shared between the New Orleans 2010 NSW and global-lineages (coloured green). There are five changes unique to the variants of the NSW-lineage that are coloured blue and three unique changes present in the global-lineage. A number of these positions are involved in neutralizing antibody binding (mAb) or Histo-Blood Group Antigen (HBGA) binding selectivity, these residues are referred to as epitope A and D, respectively (18). In the table, strains names with bolded and italicised text were associated with epidemics of acute gastroenteritis in NSW, Australia during year of identification.

121

Ch. 6 – NoV molecular epidemiology 2009-10

6.4.5 Genome-wide examination of recombination in GII4 variants from 1974 to 2010. The final aim of this study was to examine the evidence for intra and inter- genotype recombination as a mechanism facilitating the emergence of new GII.4 variants and in particular, the New Orleans 2010 variant that has replaced 2006b as the predominant GII.4 variant in circulation.

Selection of full-length genome sequences for recombinant analysis using 3seq: Full-length genome sequencing is not common practise in NoV molecular epidemiological studies and the lack of full-length GII.4 sequences has hindered the search for recombinant NoVs. It has also made it difficult to determine true intra- genotype recombination from other evolutionary forces such as molecular adaptation, lineage-specific rate variation or even from artefacts created during PCR or errors introduced during sequence assembly [reviewed in (170)]. Therefore, we developed a two-step reverse transcription PCR, to amplify the full-length NoV genome as a single amplicon (approximately 7.6 kb) (Supplementary information Figure 6-7). Using this method, we amplified and sequenced 15 new full-length genomes from representative GII.4 strains identified between 2004 and 2010. This included representative strains from the Hunter 2004, 2006a, 2006b, Osaka 2007, Cairo 2007, Apeldoorn 2008 and New Orleans 2010 GII.4 variants (Table 6-2). These 15 strains were aligned against all available full-length genome GII.4 strains from GenBank (n=237 as of 4th October 2011). Following the removal of highly similar sequences (those with > 99% similarity) this resulted in a refined alignment of 81 sequences with representatives from all major GII.4 clades identified between 1974 and 2010 except Japan 2001 (Table 6-2).

To identify and remove putative artificial recombinant genomes, the alignment was then examined for individual strains that were positioned outside of the major GII.4 clades by phylogenetic analysis of the full-length genome. This identified three recombinants strains that are shown in Supplementary information Figure 6-8. The strain Toyama5/2008/JP (GenBank accession number AB541362) was found to have a single breakpoint at position 5631 nt with the parental strains, 2006b (1-5631) and Apeldoorn 2008 (5632-7509) (Supplementary information Figure 6-8). The breakpoint

122

Ch. 6 – NoV molecular epidemiology 2009-10

is approximately 500 nt downstream of the ORF1/2 overlap and was located close to the primer binding site used in the study (position 5433 nt). Therefore, it is possible that the recombination present in Toyama5/2008/JP may be artificial. The strain Iwate5/2007/JP (GenBank accession number AB541275), referred to as Japan 2007b in Motomura et al. (179), was also found to have three breakpoints at positions 352, 2765 and 5085 nt. This produced a mosaic genome from Apeldoorn 2008 and 2006b parental strains (Supplementary information Figure 6-8). Another strain with recombination breakpoints within ORF1 was PC51/2007/IN (GenBank accession number EU921388) (Supplementary information Figure 6-8). Across the majority of the genome, PC51/2007/IN was related to Osaka 2007, however between positions 3744 and 4320 nt the sequence showed high similarity to GII.b viruses including the strain PC52/2007/IN (GenBank accession number EU921389), which was characterised in the same study (50). Therefore, this recombinant may also be artificial, especially since the authors used 10 separate RT-PCR reactions to amplify the full-length genome and the breakpoints were adjacent to primer binding sites (50).

For the remaining strains (n=78), the complete genomes sequences were examined for recombination using 3seq (28). This program provides statistical tests for detecting mosaic structures in sequences by comparing relationships among sequence triplets. We used this method, as it is resilient to false-positives from rate variation amongst sites, it is well-suited to large datasets, and it is able to calculate breakpoints with corresponding statistical significance. In this analysis, a total of eight potential recombination events were detected. The distribution of the breakpoints was plotted along the genome in Figure 6-4A. Most breakpoints (n=5/8) were located near the ORF1/2 overlap (approximate position 5100 nt), a commonly identified site of recombination in norovirus (34, 37). Two breakpoints were identified near the ORF2/3 overlap and one breakpoint internal to ORF2 was identified in the Cairo 2007 strain NSW505G/2007/AU. To examine the putative recombination events, the same alignment was used to produce maximum likelihood phylogenies of ORF1, ORF2 and ORF3 separately (Figure 6-4B and Supplementary information Figure 6-10, Figure 6-11 and Figure 6-12).

123

Figure 6-4 Recombination breakpoints identified across genome in the GII.4 lineage. An alignment of 78 full-length genome sequences that contained representatives from all major GII.4 variant clades except Japan 2001 was analysed using the program 3seq. (A) The summarised results that identified eight recombination breakpoints along the entire genome. The range and mean value for each breakpoint are shown along with their genome position (x-axis). The putative recombinants are listed in the legend with coloured lines matching each corresponding breakpoint. The three open reading frames (ORFs) are shown along with the structural domains of the capsid protein (ORF2), which are the N-terminal (N), Shell (S), P1 and P2 domains. Most breakpoints localised to the ORF1/2 and ORF2/3 overlaps; however a single breakpoint was identified within ORF2. (B) Maximum likelihood phylogenies of the ORF1, ORF2 and ORF3 regions from the alignment of NoV GII.4 strains used in the 3seq recombination analysis (n=78). Bootstrap support for key nodes are shown (1000 replicates). All major GII.4 clades are represented and labelled. The branches of recombinant clades/strains are coloured as per panel A. The scale bars represent the number of substitutions per site. The phylogenetic inconsistencies across the different ORFs indicate recombination.

124

Ch. 6 – NoV molecular epidemiology 2009-10

ORF1/2 recombination in the GII.4 lineage: The phylogenetic analysis revealed that the Asia 2003, Osaka 2007 and CHDC 1970s clades have novel ORF1s that sit away from the remaining GII.4 clades and are therefore likely to be inter-genotype recombinants with an ORF1/2 breakpoint (Figure 6-4). The mean inter-group nucleotide distance from the Osaka 2007 and Asia 2003 clades to the main GII.4 lineage was 12.47% and 11.82%, respectively. The CHDC 1970s viruses are the oldest GII.4 strains sequenced and have 14.19% mean inter-group nucleotide distance in ORF1 to the main GII.4 lineage. In comparison, the mean intra-group distance of the main GII.4 lineage was 5.61%. Based on these comparisons, the Asia 2003, Osaka 2007 and CHDC 1970s variant may be inter-genotype recombinants; however it is possible that they represent divergent ancestral GII.4 lineages (Table 6-3).

The Japan 2008b clade presented with clear evidence of ORF1/2 intra-genotype recombination (Figure 6-4) as previously reported (179). The parent for the Japan 2008b ORF1 was GII.4 Apeldoorn 2008 and the parent for ORF2-3 was GII.4 2006b (Table 6-3). The recent GII.4 variant, New Orleans 2010, was also identified as an ORF1/2 intra-genotype recombinant (breakpoint position 4915 – 5221 nt) with an ORF1 region derived from a 2006a parent and an ORF2-3 from Apeldoorn 2008 (Figure 6-4 and Table 6-3).

ORF2/3 recombination in the GII.4 lineage: Two of the eight recombination breakpoints were identified near to the ORF2/3 overlap (single nucleotide overlap). The Osaka 2007 variants had a second breakpoint between 6542 – 6594 nt, which is approximately 100 nt upstream of the ORF2/3 overlap (Figure 6-4A). These variants have a distinct GII.4 ORF2 region; however their ORF3 clustered closely with 2006b variants (Figure 6-4B and Table 6-3). The ancestral GII.4 strain CHDC2094/74/US (GenBank accession number FJ537135) was also found to have a breakpoint near the ORF2/3 overlap at 6661 – 6821 nt (Figure 6-4A). Across both ORF1 and ORF2, CHDC2094/74/US was related to the ancestral GII.4 CHDC 1970s variants; however the CHDC2094/74/US ORF3 was novel and demonstrated only 83.88% nucleotide identity to its closet GII.4 relative, the CHDC 1970s variants CHDC5191/74/US and

125

Ch. 6 – NoV molecular epidemiology 2009-10

CHDC4871/77/US (Figure 6-4A and Table 6-3). Importantly, this ORF3 region also did not demonstrate any close relationship to any other NoV GII genotype.

Recombination within the ORF2 region: In NoV, recombination typically occurs at the ORF1/2 overlap; however a number of studies have suggested that recombination may occur at breakpoints within ORF2, the VP1 gene, near the P2 domain boundaries (146, 152, 221). This scenario could lead to the creation of antigenically novel variants and contribute to the emergence of GII.4 variants.

In our analysis of full-length genome sequences using 3seq, only the Cairo 2007 variant NSW505G/2007/AU (GenBank accession number GQ845368) was found to have a possible recombination breakpoint between 5591 – 5602 nt, near the capsid shell/P1 domain boundary within the capsid gene (Figure 6-4A and Table 6-3). In ORF1, the Cairo 2007 strain was distantly related to the Bristol 1993 variant, whereas in ORF2 it clustered closer to the Osaka 2007 variant (Figure 6-4B). The Japan 2001 GII.4 variant was not included in our 3seq analysis as no full-length genome sequence was available. In order to include the Japan 2001 variant as a possible parent, the region of analysis was shortened to a partial ORF1/complete ORF2 (genome position 4286 – 6707 nt). Using similarity plots, the Cairo 2007 strain was then compared to Japan 2001, Bristol 1993, US 1995/96 and Osaka 2007, to identify possible parent strains (Figure 6-9). This revealed that the likely ORF1 parent of Cairo 2007 is actually Japan 2001, not Bristol 1993. Following a sharp drop in identity to Japan 2001 at the proposed breakpoint (near position 5596 nt), the Cairo 2007 sequence was more closely related to that of the Osaka 2007 and US 1995/96 variants compared to Japan 2001 (Figure 6-9). However for both these possible parents, the identity to Cairo 2007 sequence falls below 90% in the P2 domain (Figure 6-9), indicating that if this were a true recombination event, it took place in the past and antigenic drift subsequently occurred.

126

Table 6-3 Summary of recombination in the GII.4 lineage

Classificationb Genomic GenBank Type of GII4 variants Breakpoint(s) Referencesc representativea accession recombination ORF1 ORF2 ORF3

5078-5360; Inter-genotype; Osaka 2007 Osaka1/2007/JP AB541319 GII.e# GII.4 Osaka 2007 GII.4 2006b (126, 179) 6542-6594 Intra-genotype

CHDC 1970s CHDC5191/1974/US FJ537134 4967-5281 Ancestral GII.4 GII.4 CHDC 1970s GII.4 CHDC 1970s Intra-genotype (23)

Asia 2003 Guangzhou/2003/CN DQ369797 5002-5360 GII.12# GII.4 Asia 2003 GII.4 Asia 2003 Inter-genotype (179, 192)

~ CHDC2094/1974/US FJ537135 6661-6821 Ancestral GII.4 GII.4 CHDC 1970s Novel GII Inter-genotype (23)

GII.4 Apeldoorn Japan 2008b Toyama2/2008/JP AB541356 4981-5126 GII.4 2006b GII.4 2006b Intra-genotype (179) 2008

New Orleans GII.4 Apeldoorn GII.4 Apeldoorn NSW001P/2008/AU GQ845367 4915-5221 GII.4 2006a Intra-genotype (81, 146) 2010 2008 2008 GII.4 Osaka 2007- GII.4 Osaka 2007- Cairo 2007 NSW505G/2007/AU GQ845368 5591-5602; GII.4 Japan 2001 Intra-genotype This study like like aGenomic representative strains presented as Strain ID/Year/Country. bClassified to level of genotype with GII.4 variant assignments also shown. Strains marked with a hash show their NoroNet classification but may in fact be ancestral GII.4. Recombinant regions are coloured grey. cReferences include those that first reported a strain and/or described it as a recombinant

127

Ch. 6 – NoV molecular epidemiology 2009-10

Evolution of the GII.4 New Orleans GII.4 variant: It has been suggested that the most recent pandemic GII.4 variant, New Orleans 2010, was a recombinant with a mosaic P2 domain (indicating two breakpoints) derived from a 2006b-like virus (146). In our analysis of full-length genome sequences using 3seq, we did not detect any mosaic signal within ORF2 for New Orleans 2010; instead we identified a recombination breakpoint at the ORF1/2 overlap (Figure 6-4). Therefore, our evidence suggests that New Orleans 2010 was a 2006a/Apeldoorn 2008 intra-genotype recombinant, with the break point residing at the ORF1/2 overlap (Figure 6-4). In order to clarify the possible origin of New Orleans 2010, the evidence for both recombination scenarios was considered, that is, a single ORF1/2 recombination event proposed in this study or the mosaic P2 domain recombination proposed by Lam et al. (146). Factors that may have influenced each analysis to produce possible false- positive recombination signals were also considered.

Firstly, since we did not detect any recombination within ORF2 using our full- length genome data set, we repeated our 3seq analysis using the capsid (ORF2- encoding) sequences from our BEAST reconstructions (Figure 6-2). In this analysis, putative breakpoints were identified at around the P2 domain boundary in the Apeldoorn 2008 and New Orleans 2010 variants. The first breakpoint was located between 740 – 824 nt of ORF2 (genomic position 5824 – 5908 nt) and the second breakpoint was located between 1386 – 1443 nt of ORF2 (genomic position 6470 – 6527 nt). The position of these breakpoints matched those found by Lam et al. (146), which were 5899 nt and 6477 nt for the first and second breakpoints, respectively. To help remove effects from natural selection within the capsid, the 3seq analysis was repeated to consider just the 3rd codon position (-3po option). This analysis did not detect any recombination signal in the capsid alignment. Despite this, we examined the phylogenetic evidence for recombination by producing maximum likelihood phylogenies of the mosaic P2 region (capsid position 782 – 1415 nt). The mosaic P2 region for both Apeldoorn 2008 and New Orleans 2010 did suggest a ‘2006b-like’ origin by phylogenetic analysis, however the bootstrap support for the key nodes were poor (<56%) (Supplementary information Figure 6-13). The phylogenies were repeated using amino acid sequences and were able to obtain higher support for a ‘2006b-like’ 128

Ch. 6 – NoV molecular epidemiology 2009-10

ancestry, however these phylogenies showed that the Osaka 2007 and Cairo 2007 were in fact the closest relatives for the Apeldoorn 2008 and New Orleans 2010 variants and not 2006b (Supplementary information Figure 6-14). This suggests that either the P2 domain of the GII.4 variants Osaka 2007, Cairo 2007, Apeldoorn 2008 and New Orleans 2010 have a recombinant origin or that these variants have experienced parallel evolution, undergoing similar P2 amino acid substitutions to those that lead to the evolution and emergence of the 2006b P2 domain.

Natural selection in the GII.4 capsid: Since the selection pressure across the capsid is likely to vary, the pattern of adaptation was examined using the iFEL method of the HyPhy package (Figure 6-5). Overall, the GII.4 capsid was characterised by strong purifying selection with some codons under positive selection that mostly localised to the P2 domain, including codons 352, 357, 368, 393 and 395 (Figure 6-5A). The New Orleans 2010 variant was then compared to Hunter 2004, 2006a, 2006b and Apeldoorn 2008 GII.4 variants using similarity plots of nucleotide sequence (Figure 6-5B). From the start of the capsid until the first breakpoint between 740 – 824 nt, the New Orleans 2010 variant shows the highest identity to Hunter 2004, 2006a and Apeldoorn 2008 variants (>95%). Following the first breakpoint, the identity of New Orleans 2010 to Hunter 2004 and 2006a drops sharply to less than 80%, whilst the identity to Apeldoorn 2008 remains high (>95%) (Figure 6-5B). In the P2 domain, the nucleotide identity between New Orleans 2010 and 2006b is higher than the identity to Hunter 2004 and 2006a, which may partly explain the phylogenies suggesting a 2006b-like ancestry, although the identity remains less than 90% (Figure 6-5B). This suggests that a combination of differential selective pressure, hyper-variability and parallel evolution in the P2 domain may have led to the identification of false-positive recombination signal in the recent New Orleans 2010 variant.

129

Ch. 6 – NoV molecular epidemiology 2009-10

Figure 6-5 Evolutionary forces on the GII.4 New Orleans 2010 variant ORF2.

(A) The distribution of non-synonymous/synonymous change across the ORF2 coding, GII.4 capsid protein determined using the iFEL method (209) of the HyPhy package (210) on the datamonkey webserver (63, 208). The putative breakpoints (codon positions) near the P2 domain boundary are shown with ranges and mean values. The dN/dS ratios are plotted on the y-axis and the codon position on the x-axis. A dN/dS ratio > 0 indicates that the codon is under positive selection; a dN/dS ratio < 0 indicates purifying selection; a dN/dS ratio = 0 indicates neutral evolution. Codon positions with statistical significance (p<0.05) are coloured as per the legend. The NoV GII.4 capsid was characterised by strong purifying selection across the gene; however codons under positive selection were mostly localised to the P2 domain. Since the adaptive pressure varies across the capsid, this may have influenced the recombination analysis. (B) A similarity plot comparing the GII.4 New Orleans 2010 variant (Orange/NSW001P/2008/AU), to the GII.4 variants Hunter 2004, 2006a, 2006b and Apeldoorn 2008, coloured as per the key. The y-axis shows the percentage nucleotide identity and the x-axis shows the nucleotide position of ORF2. The putative breakpoints are shown as per panel A except are nucleotide positions. (C) A breakdown of the capsid into its different structural domains. Their relative nucleotide and amino acid positions are also indicated. The x-axes of panels A to C are to scale. 130

Ch. 6 – NoV molecular epidemiology 2009-10

6.5 Discussion Since its emergence in late 2008, a new NoV GII.4 variant commonly referred to as New Orleans 2010, has been spread globally causing epidemics of acute gastroenteritis (186, 214, 265, 288). The variant was reported in the US, Australia, Europe and across Asia and has now replaced the previous pandemic variant 2006b as the predominant GII.4 variant in global circulation. This study aimed to describe the origin of New Orleans 2010 and other GII.4 variants and examined the evolutionary mechanisms that facilitated their emergence.

In order to assess the impact of the New Orleans 2010 GII.4 variant, locally in NSW and globally, we characterised two NoV-associated epidemics of acute gastroenteritis that occurred in NSW, Australia during the winters of 2009 and 2010 (Figure 6-1), then compared these to the NoV molecular epidemiological trends seen globally. Using RT-PCR followed by sequencing the 5’ end of the capsid gene, 123 NoV- positive stool specimens were examined (59 from 2009; 64 from 2010). This revealed that two distinct genetic lineages of the GII.4 New Orleans 2010 variant were in circulation (Table 6-1 and Figure 6-2). The first, referred to as the NSW-lineage, was the predominant virus identified in 2009, whilst the second, referred to here as the global-lineage, was also identified in 2009 but with less frequency (Table 6-1). However during 2010, the global-lineage became predominant whilst the NSW-lineage was only identified in a single sporadic infection (Table 6-1). The New Orleans 2010 NSW-lineage was originally identified in NSW, Australia in August 2008 and was the first report of the New Orleans 2010 variant (81). By comparing 80 full-length capsid sequences from NSW GII.4 strains identified between 2007 and 2010 to those available on GenBank, we found that the New Orleans 2010 NSW-lineage was confined to NSW, Australia (Figure 6-2C). This study did not include data from other Australian states; therefore, we cannot determine the extent of spread of the NSW-lineage into these other regions, although it was likely to have occurred. In contrast, the New Orleans 2010 global-lineage demonstrated a global distribution (Figure 6-2C). In the US, the New Orleans 2010 global-lineage has replaced 2006b as the predominant GII.4 variant in circulation (288) and was associated with hundreds of outbreaks in the US during the 2009-2010 winter period; however the magnitude of the epidemic was less than that 131

Ch. 6 – NoV molecular epidemiology 2009-10

observed following the emergence of 2006b. Two recent studies show that the New Orleans 2010 global-lineage emerged in Scotland and Finland in October and December 2009, respectively (172, 214), which was a similar time-frame to the emergence of the global-lineage in the US (October 2009) (288). In this study, the New Orleans 2010 global-lineage was first identified in NSW in July 2009, a few months earlier than the previously mentioned studies. Although, a French study described a GII.4 2008 strain (DijonE4032/2009/FR - GenBank accession number GQ246801) that was identified in a nosocomial outbreaks in February 2009 (15). Our analysis showed it cluster with New Orleans 2010 global-lineage (Figure 6-2C) and therefore, the New Orleans 2010 global-lineage has likely been circulating in Europe since early 2009. There is currently no published data available that describes the prevalence of the New Orleans 2010 variant in Asia. Based on phylogenetic analysis of the capsid gene, sequences from GenBank show that the New Orleans 2010 global-lineage has been circulating in Taiwan, Japan and Hong Kong since 2009 (Figure 2C).

The current study highlights that the global epidemiological trends of NoV GII.4 epidemic variants show a single predominant GII.4 variant in global circulation. However, region-specific epidemic GII.4 variants also co-circulate, such as seen with the New Orleans 2010 NSW-lineage and more broadly with the GII.4 variants Asia 2003, 2006a, Cairo 2007, Osaka 2007 and Apeldoorn 2008. Therefore, the epidemiological success of any given GII.4 variant is likely to be determined by a range of host and immunological factors but also, similar to influenza, will be influenced by the timing and source of seeding events from other regions.

In NoV, emergence of novel strains has been linked with antigenic variation in capsid P2 domain (4, 33, 41, 61, 151, 152). In this study, the epidemic form of the New Orleans 2010 NSW-lineage (Rockdale/NSW006D/2009/AU) and the New Orleans 2010 pandemic-lineage (NewOrleans/1805/2009/US) were both identified for the first time in July 2009, the month before the start epidemic period in NSW for that year. By comparing the capsid P2 domain protein sequence, we sought to identify potential antigenic differences between them and the previously circulating GII.4 pandemic variant 2006b (Figure 6-3). In total, 22 residues were identified across the capsid P2

132

Ch. 6 – NoV molecular epidemiology 2009-10

domain where variation was observed between the recent GII.4 New Orleans 2010 variants and 2006b. It is possible that any of these 22 substitutions contributed to the emergence of the New Orleans 2010 variants, since a minimum of a single amino acid change in the antigenic region (P2 domain) of the MNV capsid protein is sufficient to avoid immune neutralization (155). However, the epitope A residues 297 and 298 specifically, have been shown to be important modulators of GII.4 2006b antigenicity (61). Therefore, the R297H substitution present in the New Orleans 2010 NSW-lineage may have been a key mutation contributing to the predominance of this variant during the epidemic of 2009, ahead of viruses of the New Orleans 2010 global-lineage, where no substitutions at 297 or 298 was observed (Figure 6-3). It is also important to note, that the R297H substitution was not present in the ‘pre-epidemic’ form of the NSW- linage (NSW001P/Nov/2008) (Figure 6-3). The distinct changes at key antigenic sites between the NSW001P/2008/AU and NSW006D/2009/AU, provides evidence that evolutionary transitional strains facilitate the emergence of novel epidemic GII.4 variants. For comparison, the New Orleans 2010 global-lineage did not differ at any residue across the P2 domain between the strains identified initially in 2009 and those from the epidemic period of 2010. Therefore, despite both New Orleans 2010 lineages circulating in NSW in the month preceding the 2009 epidemic period, it may be that the NSW-lineage was more antigenically distinct, which allowed it to cause the epidemic by escaping any herd immunity generated from the previous circulation of 2006b.

To date, there have been few reports on the role that recombination plays in the emergence of GII.4 variants (37, 146, 179). NoV is known to undergo homologous recombination at the ORF1/2 overlap, which facilitates the exchange of non-structural and structural genes (34, 37). Analogous to reassortment in influenza virus, recombination is likely to be an important mechanism contributing to viral emergence (216). In this study, we performed a genome-wide examination for recombination in the GII.4 lineage using the program 3seq and found a total of eight possible recombination breakpoints (Figure 6-4A). Of the eight, five occurred near the ORF1/2 overlap, two near the ORF2/3 overlap and one was identified within ORF2.

133

Ch. 6 – NoV molecular epidemiology 2009-10

Phylogenetic analysis revealed that three of the ORF1/2 recombinants had novel ORF1 regions that appeared distinct to those in the main GII.4 lineage (Figure 6-4B). The three ORF1s demonstrated enough diversity to classify them as inter-genotype recombinants. On average, the CHDC 1970s, Asia 2003 and Osaka 2007 variants had 87.17% nucleotide identity to the main GII.4 lineage. The most recent NoroNet nomenclature system has classified the Osaka 2007 and Asia 2003 as having non-GII.4 ORF1 genotypes, GII.e and GII.12, respectively (available from http://www.noronet.nl/databases/TypingtoolBackgroundInformation.jsp). The GII.12 ORF1 was previously referred to as ‘recombinant GII.4’ (33, 37) since the ORF1 was related to the main GII.4 lineage yet was associated with numerous ORF2/3 genotypes including GII.3, GII.10 and GII.12 (33, 37). The relative distance of the Asia 2003 and Osaka 2007 in ORF1, compared to the CHDC 1970s variants, which are the earliest GII.4 strains described (23), suggests that these novel ORF1s may actually be evolved from ancestral GII.4 lineages rather than different genotypes. This fact makes classifying these recombination events as inter-genotype or intra-genotype difficult and it is also possible that they are not recombinants at all and are simply ancestral GII.4 lineages.

Recombination breakpoints were also identified at the ORF2/3 overlap. The Osaka 2007 variants have two breakpoints with a distinct GII.4 ORF2 and a 2006b derived ORF3 (Figure 6-4). Therefore, the Osaka 2007 variants are double recombinants with breakpoints at both the ORF1/2 and ORF2/3 overlaps (Table 6-3). The CHDC 1970s strain CHDC2094/74/US (GenBank accession number FJ537135) was also found to have a breakpoint near the ORF2/3 overlap and a novel ORF3 region with only 83.88% nucleotide identity to the other CHDC 1970s variants (Figure 6-4). Furthermore, CHDC2094/74/US did not show close identity to any other non-GII.4 strain in ORF3. Therefore, similar to the difficulties proving ORF1/2 recombination in the ancestral GII.4s, we cannot say that the differences observed in ORF3 for CHDC2094/74/US were produced through recombination without more strains for comparison.

In this study, we explored the evidence for recombination within the ORF2 region using both full length genomes and ORF2 capsid sequences. This phenomenon

134

Ch. 6 – NoV molecular epidemiology 2009-10

has been previously suggested to occur in some GII.4 variants with breakpoints near the P2 domain boundaries (146, 152, 221). In the current study, only the Cairo 2007 variant was identified as potential recombinant with a breakpoint within the capsid between 5591 – 5602 nt (Figure 6-4 and Supplementary information Figure 6-9). Based on comparisons of GII.4 nucleotide identity the ORF1/5’-ORF2 region may be derived from a Japan 2001 parent and the 3’-ORF2/ORF3 region may be derived from an Osaka 2007 lineage (Supplementary information Figure 6-9). The evidence for recombination in the Cairo 2007 strain may be complicated, as the Japan 2001 variants were also suggested previously to be recombinants with a breakpoint position at 5879 nt (152).

Based on this study, there appears to be wide-spread recombination in the GII.4 lineage (Figure 6-4and Table 6-3). One aim of this study was to address how recombination may have a played a role in the emergence of the most recent pandemic GII.4 variant, New Orleans 2010. In our analysis of full-length genomes, the New Orleans 2010 variant was identified as a GII.4 intra-genotype recombinant with a single breakpoint at the ORF1/2 overlap (Figure 6-4). Our phylogenetic analysis of ORF1, showed that the New Orleans 2010 variants were closely related to 2006a, whilst the Apeldoorn 2008 variants had a Hunter 2004-like ORF1, showing distinct segregation with strong bootstrap support (≥99%) (Figure 6-4B). In contrast, the ORF3 regions of the Apeldoorn 2008 and New Orleans 2010 variants cluster closely together and appear to maintain a Hunter 2004 – 2006a ancestry (Figure 6-4B). The origins of the ORF2 region for New Orleans 2010 are not as easily explained but our analysis suggests an Apeldoorn 2008 ancestry (Figure 6-2 and Figure 6-4B). Lam et al. recently proposed that the New Orleans 2010 variants have a P2 domain derived from a ‘2006b-like’ ancestor through a double recombination event (146). Our analysis found conflicting evidence for this scenario. In our screen of full-length GII.4 genomes, we did not identify the mosaic P2 domain recombination proposed by Lam et al. (146); however 3seq did identify similar breakpoints when using the large GII.4 capsid alignment of Figure 6-2. Although subsequent phylogenetic analysis revealed that all GII.4 variants identified since 2007, which includes Osaka 2007, Cairo 2007, Apeldoorn 2008 and New Orleans 2010 appear to have a ‘2006b-like’ P2 domain when comparing their protein sequences (Supplementary information Figure 6-14), which is evidence 135

Ch. 6 – NoV molecular epidemiology 2009-10

for parallel evolution. Therefore, we propose a more parsimonious explanation of ORF1/2 recombination between 2006a and Apeldoorn 2008 variants confounded by differential patterns of selection and parallel evolution in the capsid lead to the emergence of the New Orleans 2010 variant (Figure 6-6).

136

Figure 6-6 Model for the emergence and origin of the GII.4 New Orleans 2010 variant.

Based on the phylogenetic and recombination analysis, the origin of New Orleans 2010 is summarised in the following model. A common ancestor (grey) temporally located in the 1999 diverged to become Farmington Hills 2002 (green) and 2006b (blue). A similar divergence in the Farmington Hills 2002 lineage, was involved in the emergence of Hunter 2004 (light red), which evolved to become 2006a (dark red). Apeldoorn 2008 (yellow) was evolved from a close relative to Hunter 2004-like virus and following four years of antigenic drift in ORF2 was able to emerge. Apeldoorn 2008 then recombined (dashed lines) with 2006a at the ORF1/2 overlap to produce the New Orleans 2010 (red/yellow hybrids) and its two distinct clusters, the NSW and global-lineages. The variants with dashed outlines were not observed directly but represent evolutionary intermediates. Farmington Hills 2002, 2006a and New Orleans 2010 NSW-lineage represent dead-end lineages, whilst 2006b and New Orleans 2010 global-lineage continue to circulate. The prevalence of Apeldoorn after 2010 is unknown.

137

Ch. 6 – NoV molecular epidemiology 2009-10

As a consequence of the potential problems in identifying true recombination events, Boni et al. have proposed guidelines for the identification of homologous recombination in influenza virus (26). Currently, no such guidelines exist for NoV; however the principles of the influenza guidelines could easily be applied. For example, the guidelines include, limiting sources of contamination, determining if a breakpoint occurs near a primer binding site, checking alignments for errors, assessing the statistical significance of recombination events by using programs such as 3seq and confirming proposed recombinants by independent identification by another laboratory. Our group has also highlighted previously, that in order to account for artificial recombination events, from amplifying sequences from two co-infecting viruses, it is necessary to generate a single amplicon across the region of analysis (34, 37). In this study, we avoided a number of these problems by employing a method of amplifying whole genomes in a single RT-PCR reaction, furthermore, we tested for recombination using 3seq with a statistical threshold of p<0.05. We also removed a number of suspect recombinant strains from our analysis (those in Supplementary information Figure 6-8).

In conclusion, this study shows that the GII.4 New Orleans 2010 variant has replaced 2006b as the predominant NoV strain in circulation globally. Furthermore, two distinct genetic sub-clusters, the NSW and global-lineages, were found to be the cause of NoV-associate acute gastroenteritis in NSW, Australia during 2009 and 2010, respectively. The New Orleans 2010 variant was shown to be an ORF1/2 intra- genotype recombinant between two recent GII.4 variants, 2006a and Apeldoorn 2008. This study highlights the role that both antigenic drift and shift (recombination) play in the emergence of novel variants of the GII.4 lineage.

138

Ch. 6 – NoV molecular epidemiology 2009-10

6.6 Supporting information

Table 6-4 Primers used in this study

Primer Sequence (5'- 3')a Orientation Positionb G2F3 TTGTGAATGAAGATGGCGTCGA Sense 5079 G2SKR CCRCCNGCATRHCCRTTRTACAT Antisense 5389 GV5 AACACTGTCATATGTGCCAC Sense 3521 GV6 TTRTTGACCTCTGGKACGAG Antisense 5155 GV17 GTGAATGAAGATGGCCTCTAACGACGCTTCCGC Sense 1 GV63 GGAAGAACCACTTCATGAC Antisense 1178 GV86 TTNGGCACGGTTGAGACTGT Antisense 7422 GV132 CCRGCRAAGAAAGCTCCAGCCAT Antisense 6729 GV202 GTCATGAAGTGGTTCTTCCC Sense 1160 GV203 CCWGGYCAACCTGACATGTGGAA Sense 1892 GV205 TTCTCAAGCAAAGGTCTBAGTGATGA Sense 2678 GV206 TTGGCCTCYTCYTCTTCACAGAA Antisense 2850 GV207 CGAGCGGCCGCGTGAATGAAGATGGCGTCTAACGACGCTTCCGC Sense 1 GV220 TGYTATGCCTTCTGYTGTTGGGT Sense 617 GV221 CGTGGAGGAGCCCTGATGCTCG Antisense 2066 GV269 CAACGAGACTTGGTTCTACAGCTGG Sense 7230 GV270 GCATGACTGACATAGCACAGCGGCCGCCC-(Tx30) Antisense Poly(A) Tail GV271 GCATGACTGACATAGCACAGCGGCCGCCC Antisense TAG GV272 GCHGCTGGTGGTTTGTGCACACC Antisense 477 GV287 GTCCTTGCCACCAAGRTAGGC Antisense 3721 GV288 CCAGATGTGTCTGCGATGAT Sense 944 GV305 CAGRCAAGAGCCAATGTTCAGATGG Sense 4999 GV306 GGCCTCAATTTGTGCTTGGAGC Antisense 6916 GV307 CAGTYTCTTGTCGAGTYCTCACG Sense 5674 GV308 GCTTGGAGCATCTCTTTRTCATG Antisense 6903 aDegenerate bases are N (A,C,T or G), H (A, C or T), B (C, G or T), K ( G or T), W (A or T), R (A or G) and Y (T or C). bRelative to the sequence of Farmington Hills (GenBank accession number AY502023).

139

Ch. 6 – NoV molecular epidemiology 2009-10

Figure 6-7 Representative results from the norovirus GII full-length genome RT-PCR.

A representative result from the norovirus GII full-length genome RT-PCR developed in this study analysed by 0.8% (w/v) agarose gel electrophoresis stained with ethidium bromide. Lane 1 shows a 1 kb DNA ladder (Promega) with selected molecular weights listed. Lane 2 shows the unpurified RT-PCR reaction for the NoV strain NSW217I/2010/AU. Lane 3 shows the NoV genome amplicon (approximately 7.6kb) from NSW217I/2010/AU following purification by gel extraction. Lanes 4 and 5, present a similar result for the NoV strain NSW 817L/2010/AU, showing the unpurified and gel extracted amplicon, respectively. For lane 1, 700 ng of DNA was loaded and for lanes 2 – 5, 2 μl of sample was loaded.

140

Ch. 6 – NoV molecular epidemiology 2009-10

Figure 6-8 Simplot analysis of recombinant norovirus strains.

(A) Simplot for the recombinant strain Toyama5/2008/JP. A single breakpoint was identified at position 5631 nt with. (B) Simplot for the recombinant strain Iwate5/2007/JP. Three breakpoints were identified at positions 352, 2765 and 5085 nt. This produced a mosaic genome from Apeldoorn 2008 and 2006b parental strains. (C) Simplot for the recombinant strain PC51/2007/IN. This recombinant had an Osaka 2007 backbone with a mosaic insertion between positions 3744 nt and 4320 nt from a GII.b parental strain. For each panel, the y-axis represents the percentage sequence similarity between the recombinant and each parent strain, and the x-axis shows the nucleotide position along the full-length genome except for panel C, which shows only the ORF1 region. Each analysis used a window size of 300 and a step size of 10. 141

Ch. 6 – NoV molecular epidemiology 2009-10

Figure 6-9 Simplot analysis of the Cairo 2007 variant in the partial ORF1/complete ORF2 regions.

A Simplot comparing the putative recombinant Cairo 2007 (NSW505G/2007/AU) to the GII.4 variants Japan 2001, Bristol 1993 , US 1995/96 and Osaka 2007, coloured as per the key. A single breakpoint identified between positions 5591 to 5602 nt (mean value 5596 nt shown – dashed line) is shown, which is located within the ORF2-encoding capsid gene. The y-axis represents the percentage sequence similarity between the recombinant and each strain used for comparison, and the x-axis shows the relative nucleotide position in the full-length genome. The relative positions of ORF1 and ORF2 are shown at the bottom. The analysis used a window size of 300 and a step size of 10. There is some evidence that the Cairo 2007 variant is a Japan 2001/Osaka 2007 intra-genotype recombinant.

142

Ch. 6 – NoV molecular epidemiology 2009-10

Figure 6-10 Maximum likelihood phylogeny of the GII.4 ORF1 region.

A maximum likelihood phylogenetic analysis of the NoV ORF1 region (genomic position 5 to 5104 nt) used in Figure 6-4B. The analysis was performed using the best-fit model and MEGA5 with sequences derived from GenBank and this study (n=78). It is representative of 1000 bootstrap replicates. Taxa are labelled with GenBank accession numbers, strain ID, year and country of origin. The branches of recombinant clades/strains are coloured as per Figure 6-4. The scale bars represent the number of substitutions per site.

143

Ch. 6 – NoV molecular epidemiology 2009-10

Figure 6-11 Maximum likelihood phylogeny of the GII.4 ORF2 region.

A maximum likelihood phylogenetic analysis of the NoV ORF2 region (genomic position 5085 to 6707 nt) used in Figure 6-4B. The analysis was performed using the best-fit model and MEGA5 with sequences derived from GenBank and this study (n=78). It is representative of 1000 bootstrap replicates. The branches of recombinant clades/strains are coloured as per Figure 6-4. Taxa are labelled with GenBank accession numbers, strain ID, year and country of origin. The scale bars represent the number of substitutions per site.

144

Ch. 6 – NoV molecular epidemiology 2009-10

Figure 6-12 Maximum likelihood phylogeny of the GII.4 ORF3 region.

A maximum likelihood phylogenetic analysis of the NoV ORF3 region (genomic position 6707 to 7513 nt) used in Figure 6-4B. The analysis was performed using the best-fit model and MEGA5 with sequences derived from GenBank and this study (n=78). It is representative of 1000 bootstrap replicates. The branches of recombinant clades/strains are coloured as per Figure 6-4A. Taxa are labelled with GenBank accession numbers, strain ID, year and country of origin. The scale bars represent the number of substitutions per site.

145

Ch. 6 – NoV molecular epidemiology 2009-10

Figure 6-13 Maximum likelihood phylogeny of the GII.4 capsid P2 domain based on nucleotide sequence.

A maximum likelihood phylogenetic analysis of the putative mosaic P2 domain (capsid gene position 782 to 1415 nt) based on nucleotide sequence. The analysis was performed using the best-fit model and MEGA5 with sequences derived from GenBank and this study (n=78). It is representative of 1000 bootstrap replicates. The various GII.4 clades have been labelled with parentheses. Taxa are labelled with GenBank accession numbers, strain ID, year and country of origin. The scale bars represent the number of substitutions per site.

146

Ch. 6 – NoV molecular epidemiology 2009-10

Figure 6-14 Maximum likelihood phylogeny of the GII.4 capsid P2 domain based on protein sequence.

A maximum likelihood phylogenetic analysis of the putative mosaic P2 domain (capsid gene position 782 to 1415 nt) based on protein sequence. The analysis was performed using the best-fit model and MEGA5 with sequences derived from GenBank and this study (n=78). It is representative of 1000 bootstrap replicates. The various GII.4 clades have been labelled with parentheses. Taxa are labelled with GenBank accession numbers, strain ID, year and country of origin. The scale bars represent the number of substitutions per site.

147

Ch. 7 – Conclusions

7 Conclusions

Since 1996, there have been five pandemics of NoV-associated gastroenteritis, all caused by variants of the GII.4 lineage. The pandemic GII.4 variants included US 1995/96 in 1996 (189, 275), Farmington Hills in 2002 (157, 278), Hunter in 2004 (38), 2006b virus in 2007 and 2008 (Chapter 2) and New Orleans 2010 in 2009 and 2010 (Chapter 6). Overall, GII.4 variants cause 62-80% of all NoV outbreaks globally (73, 236). The emergence and predominance of the pandemic GII.4 variants has resulted in millions of infections globally and thousands of outbreaks impacting on those most vulnerable in the community including the elderly, children and immuno-compromised individuals.

In NSW, Australia, pandemic GII.4 variants have caused winter epidemics of acute gastroenteritis every year since 2006 and this pattern is likely to continue. Given that the period of stasis between each pandemic has been progressively shortening and that the first generation NoV vaccines have entered the initial stages of clinical trials, it is imperative that the factors contributing to the higher epidemiological fitness of the pandemic GII.4 variants are elucidated.

Therefore, this thesis aimed to describe the mechanisms of evolution of the epidemic NoV GII.4 variants to elucidate those factors that contribute to their higher epidemiological fitness and predominance as the cause of acute gastroenteritis.

148

Ch. 7 – Conclusions

NoV GII.4 variants were associated with four consecutive winter-epidemics of acute gastroenteritis in NSW, Australia between 2007 and 2010

In this thesis, two comprehensive molecular epidemiological studies were performed to characterise the NoV strains in circulation in NSW, Australia between 2007 and 2010 (Chapters 2 and 6). Using RT-PCR, sequencing and phylogenetic analyses, three distinct GII.4 variants were identified as the cause of four consecutive winter epidemics of acute gastroenteritis across NSW in 2007 to 2010.

The GII.4 variant 2006b was the predominant virus identified during 2007 and 2008 causing 90.9% (n=10/11) and 82.4% (n=14/17) of the outbreaks investigated, respectively (Table 2-1). There were some differences between the epidemics caused by GII.4 2006b in 2007 and 2008 compared to those seen in previous seasons. This was the first time that an emerging GII.4 strain was identified in circulation at least one year ahead of it causing an epidemic (December 2005) (236). This is comparable to influenza A virus, where it is not uncommon for the next pandemic strain to be circulating at low levels two years prior to it becoming a pandemic strain (239). The most likely explanation for this was a lack of surveillance during previous epidemic seasons; however it may also be explained by the fact that two GII.4 variants, 2006a and 2006b, were in circulation during 2006 (258). Although the peak in activity during 2006 was associated mainly with the spread of the NoV GII.4 variant 2006a, it is possible that the high prevalence of 2006a virus resulted in cross protective herd immunity (i.e. immunity to both 2006a and 2006b) and therefore it was not until this immunity diminished that the 2006b strain was able to spread. Cannon et al. (40) demonstrated some cross-reactivity between GII.4 variants using outbreak patient sera and a surrogate neutralization assay that blocks HBGA binding; however studies into GII.4 antigenic variation using patient sera or other human-derived antibodies are lacking and must be considered a priority for future research.

A similar epidemiological pattern was observed during the epidemics of acute gastroenteritis in NSW during 2009 and 2010 (Chapter 6). A single GII.4 variant, termed New Orleans 2010, replaced its temporal predecessor 2006b to become the predominant NoV strain in circulation. With improved epidemiological data, sampling 149

Ch. 7 – Conclusions

and sequence coverage (full-length capsid and genome sequencing), we were able to show that the New Orleans 2010 variant had diverged into two distinct genetic clades around 2009 (Figure 6-2). The ‘NSW-lineage’ was a novel clade that was originally identified in NSW during 2008; NSW001P/2008/AU (GenBank accession number GQ845367) (Chapter 2) and became the predominant virus in circulation during the 2009 epidemics (67.8% of infections) (Table 6-1). The second lineage of New Orleans 2010, was referred to as the ‘global-lineage’ as it contained strains found in the US, Europe and Asia (Figure 6-2). The global lineage of New Orleans 2010 was only identified in 13.6% of infections during 2009; however rose to prominence in the following year to cause the epidemic of 2010 (Table 6-1).

As we move towards a potential NoV vaccine, it is important that molecular epidemiological studies, similar to those presented in this thesis continue, and that the global NoV surveillance network grows. If the pattern of multiple co-circulating GII.4 lineages continues, then the composition of a NoV vaccine may require multiple virus strains as well as seasonal changes as the predominance and antigenic properties of each virus varies.

150

Ch. 7 – Conclusions

Epidemiological fitness was associated with an increased adaptive capacity through faster replication and higher mutation rates

Since 1995, NoV has caused pandemics of acute gastroenteritis that have spread across the globe within a few months causing great economic burden on society due to medical and social expenses. Furthermore, only viruses of the GII.4 lineage are associated with pandemics of disease. Very little research has been conducted to determine why GII.4 viruses, and no other genotype, can cause pandemics. Consequently, the evolution properties of several pandemic GII.4 strains were compared to non-pandemic genotypes using a variety of methods (Chapter 3).

Since, replication efficiency and genetic diversity are primarily determined by the viral RdRp, an in vitro enzyme assay was employed to compare the error rate of different viral RdRps. The more prevalent GII.4 strains were found to have a 5 to 36- fold higher mutation rate compared to the less frequently detected GII.b/GII.3 and GII.7 strains (Table 3-2). This trend was also observed when comparing the in vivo evolution rates where on average, the GII.4 capsid was evolving at a rate 1.7 times higher than the less prevalent GII genotypes (Figure 3-4).

Apart from mutation rate, the viral replication rate is considered to be another major determinant in viral fitness (7). Therefore, the replication efficiency was compared between NoV RdRps from different genotypes (Table 3-2). In this analysis, no consistent trend was observed. The ‘fastest’ NoV RdRp belonged to the GII.7 Mc17 -1 strain (Kcat = 0.238 s ± 0.088), whilst the GII.b C14 strain had the ‘slowest’ RdRp (Kcat = 0.155 s-1 ± 0.093). It was interesting to note that the GII.4 variants, 2006a and 2006b, contained a Thr291Lys substitution that appeared to increase the replication rate of these viruses compared to the GII.4 variant US- 1995/96 (Figure 3-2). This suggests that it is likely a combination of factors including both replication rate and mutation rate that contribute to NoV fitness. Overall, this study showed that the GII.4 viruses were undergoing evolution and adaptation to the immune system at a much higher rate than the non-pandemic NoV strains, which supports the hypothesis that epidemiological fitness is a consequence of the ability of the virus to generate genetic diversity. 151

Ch. 7 – Conclusions

The NoV GII.4 RdRp is phosphorylated by Akt, which alters the de novo activity of the enzyme

Viruses commonly exploit post-translational modifications to interact with the host cell to aid in viral replication and immune evasion [reviewed in (124)]. For example, viruses such as HCV, influenza A virus and measles virus have been shown to interact with the PI3K/Akt signalling pathway, which plays an important role in the regulation of cell growth, survival and proliferation, to enact greater control of the host cell to benefit viral replication (29, 44, 82). There are also other viral RdRps such as HCV, WNV, TBEV and DENV-2, which have been shown to be specific targets for phosphorylation.

The study presented in chapter 4, shows that the RdRp from the more prevalent NoV strains, including the GII.4 and GII.b viruses, were phosphorylated by Akt (Figure 4-1). The phosphorylated residue (Thr33) was initially predicted using in silico methods (ScanSite server), then, using an in vitro kinase assay, the phosphorylated residue was confirmed by western blotting with an antibody specific for phosphorylated Akt substrates. Mapping onto structural models of the RdRp showed that the Thr33 was located in a region of the enzyme where the finger and thumb domains interact (Figure 4-2). These interactions provide the ‘closed’ structure characteristic of most viral RdRps and are important in de novo polymerase activity. Using a phosphomimetic mutant (Thr33Glu), the NoV RdRp was shown to have a reduced de novo polymerase activity when phosphorylated, which may indicate regulation of viral replication (Figure 4-3).

152

Ch. 7 – Conclusions

Genetic bottlenecks occur during transmission

With the advent of NGS technologies, many aspects of viral evolution are being revealed with unprecedented detail [for example (36)]. In chapter 5, NGS was used to examine a known transmission cluster, where a young boy infected his father and grandfather with a GII.g/GII.12 NoV strain. This study revealed that following NoV transmission, only minor variants of the source viral population successfully established a new infection (Figure 5-3). This is the first study to investigate the evolutionary impact of transmission on NoV evolution and indicates that a significant bottleneck is likely to at least partly drive genetic diversification in NoV. Further studies are needed to determine whether the transmission of minor variants is a common phenomenon for NoV, as is observed in HIV and HCV (36, 89, 224).

It is also important to establish the biological relevance of this transmission bottleneck. It may be the case that NoV attachment factors such as HBGA provide a structural barrier that reduces the number of viruses that can establish infection. Also, since HBGAs are polymorphic in the human population, it is possible that the composition of these complex carbohydrates will further alter susceptibility to infection and influence which viral variants establish infection.

153

Ch. 7 – Conclusions

Immuno-compromised individuals with chronic NoV infections are a potential reservoir for novel variants

In chapter 5, the intra-host dynamics of a typical acute NoV infection were compared to an atypical chronic infection in an immuno-compromised individual using NGS of the viral capsid gene (ORF2). Overall, the analysis revealed that in a typical acute infection the viral population was relatively stable with only one SNP detected at a frequency >10% (Figure 5-2B). In contrast, the viral population in the chronically infected immuno-compromised individual demonstrated extreme heterogeneity with more than 50 minor variants identified at each time point (Figure 5-2C). It was also shown that the P2 domain amino acid changes present in the chronically infected patient mirrored the toggling observed between different epidemic GII.4 variants. Therefore, GII.4 NoVs that cause chronic infections could be a source of novel antigenic viruses.

In light of this, strategies should be employed to treat individuals with chronic NoV infections. In the case of transplant recipients, an adjustment or discontinuation of the immunosuppressive regime may help eliminate the infection (274); however alternate treatment approaches have involved the use of immunoglobulin therapy to successfully treat chronic NoV infections (80).

154

Ch. 7 – Conclusions

Recombination is a powerful evolutionary force that facilitates emergence in NoV

NoV has been frequently reported to recombine, further increasing the opportunity for NoV to generate greater genetic diversity. This thesis showed that recombination plays an important role in the emergence of NoV strains with a number of both inter-genotype and intra-genotype recombinants identified (Chapter 2 and 6). Importantly, this work also highlighted the challenges of detecting true recombination events and suggested guidelines to minimise the detection of false-positive recombinants (Chapter 6).

During 2008, we first reported the emergence of a novel recombinant strain, referred to as St George virus (NSW199U/2008/AU – GenBank accession number GQ845370) with a breakpoint at the conserved ORF1/2 overlap (Chapter 2). A similarity plot shows that St George virus has a GII.e ORF1 region and GII.12 ORF2/3 region (Figure 2-3). The ORF1 region has since been re-classified as GII.g, to match the nomenclature applied by European and US NoV surveillance groups. Following its emergence in NSW, Australia, St George virus was identified in the US, where it was associated with 16% of all reported NoV outbreaks during the winter season of 2009 (266). It was also detected in food-borne outbreaks and sporadic infections in children in Europe between 2009 and 2010 (94). The impact and spread of St George virus highlights the important role of recombination in the emergence and evolution of novel NoV strains. It also raises questions about potential reservoirs for NoV. A study in the US showed that St George virus could infect and replicate in gnotobiotic pigs (245), which may provide some evidence that, similar to influenza, pigs may act as evolutionary intermediates and sources of novel viruses. Therefore, it would be worthwhile exploring potential animal reservoirs to assess how they contribute to emergence of human NoV strains.

A genome-wide examination for recombination using the program 3seq revealed that both inter- and intra-genotype recombination is common in the GII.4 lineage (Chapter 6). Most recombination breakpoints were found at the ORF1/2 and ORF2/3 overlaps, with one putative breakpoint located within ORF2 (Figure 6-4 and Table 6-3). 155

Ch. 7 – Conclusions

In some cases, the evidence for recombination was conclusive, such as in the ORF1/2 recombinant, Japan 2008b (Figure 6-4). In other cases, there were clear inconsistencies in the phylogenetic relationships across putative breakpoints; however the lack of suitable parents for comparison made confirming the recombination events difficult. These difficulties are confounded when the recombination breakpoint is located within the ORF2-encoding capsid gene such as with Cairo 2007 (Figure 6-9). Firstly, the selective pressure varies along the capsid gene (Figure 6-5A). In the N-terminal and Shell domains, most residues were found to be under purifying selection, which is likely due to the requirement for structural conservation. In contrast, the hypervariable P2 domains were found to have a number of positions under positive selection likely in response to immune pressure. This suggests that these domains can experience different evolutionary trajectories and if the same selection pressure is applied to only the P2 domain of a number of different GII.4 variants, it is possible that they evolve to become more similar to each other and exhibit parallel evolution. This could be mis-interpreted as recombination, as seen with the GII.4 variant, New Orleans 2010 (146). This also highlights the need for guidelines for the identification of recombination in NoV, similar to those proposed for influenza A virus (26).

156

Ch. 7 – Conclusions

Future directions

The conclusions derived from this thesis suggest that the pattern of evolution in the GII.4 epidemic variants is complex, yet shares many similarities with influenza A virus. Furthermore, there are many factors that contribute to the higher epidemiological fitness and predominance of the GII.4 variants; however this thesis highlighted that the GII.4 viruses appear to have an increased capacity for evolutionary change, in response to herd immunity, through higher replication and mutation rates as well as through recombination amongst co-circulating lineages.

Despite this, there are still a number of avenues for further research. Firstly, in order to determine the source of novel GII.4 variants, molecular epidemiological studies need to be performed that ensure thorough sampling coverage across time and different regions of the world. Currently, developed nations such as the US, Australia, New Zealand and those in Europe and Asia, have established NoV surveillance programs such as the CaliciNet in the US (265) and Food-Borne Viruses in Europe (138). Therefore, the prevalence of NoV in these countries is relatively well- described. In developing nations including parts of Africa, South America and South- East Asia, the prevalence of NoV is essentially unknown, and therefore the contribution of the regions as potential reservoirs for novel GII.4 variants is also unknown. By adopting a broader surveillance strategy with improved sequence coverage, we are more likely to identify novel GII.4 variants closer to the point of emergence as well as identify the intermediate transitional strains that may have facilitated their emergence.

It is also necessary to continue with efforts to establish a cell culture system for human NoVs. Many of the fundamental features of NoV replication, pathogenesis and immunity could be explained with such a system. It would also help specifically address some unanswered questions of the research presented in this thesis, such as confirming the RdRp phosphorylation by Akt in vivo, then determining how it might impact the viral life-cycle. More broadly, it would be interesting to determine how common these phosphorylation events are amongst other viral RdRps with a focus on the identification of shared motifs. This has implications for the design of novel anti- 157

Ch. 7 – Conclusions

viral therapies as well as establishing broader evolutionary relationships amongst RNA viruses.

158

References

8 References

1. Akihara, S., T. G. Phan, T. A. Nguyen, G. Hansman, S. Okitsu, and H. Ushijima. 2005. Existence of multiple outbreaks of viral gastroenteritis among infants in a day care center in Japan. Arch Virol 150:2061-2075. 2. Alessi, D. R., F. B. Caudwell, M. Andjelkovic, B. A. Hemmings, and P. Cohen. 1996. Molecular basis for the substrate specificity of protein kinase B; comparison with MAPKAP kinase-1 and p70 S6 kinase. FEBS Lett 399:333-338. 3. Allen, D. J., J. J. Gray, C. I. Gallimore, J. Xerry, and M. Iturriza-Gomara. 2008. Analysis of amino acid variation in the P2 domain of the GII-4 norovirus VP1 protein reveals putative variant-specific epitopes. PLoS One 3:e1485. 4. Allen, D. J., R. Noad, D. Samuel, J. J. Gray, P. Roy, and M. Iturriza-Gomara. 2009. Characterisation of a GII-4 norovirus variant-specific surface-exposed site involved in antibody binding. Virology journal 6:150. 5. Altomare, D. A., and J. R. Testa. 2005. Perturbations of the AKT signaling pathway in human cancer. Oncogene 24:7455-7464. 6. Ando, T., M. N. Mulders, D. C. Lewis, M. K. Estes, S. S. Monroe, and R. I. Glass. 1994. Comparison of the polymerase region of small round structured virus strains previously classified in three antigenic types by solid-phase immune electron microscopy. Arch Virol 135:217-226. 7. Andreoni, M. 2004. Viral phenotype and fitness. New Microbiol 27:71-76. 8. Appleton, H., M. Buckley, B. T. Thom, J. L. Cotton, and S. Henderson. 1977. Virus-like particles in winter vomiting disease. Lancet 1:409-411. 9. Arnold, K., L. Bordoli, J. Kopp, and T. Schwede. 2006. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22:195-201. 10. Atmar, R. L., and M. K. Estes. 2001. Diagnosis of noncultivatable gastroenteritis viruses, the human caliciviruses. Clin Microbiol Rev 14:15-37. 11. Atmar, R. L., and M. K. Estes. 2006. The epidemiologic and clinical importance of norovirus infection. Gastroenterol Clin North Am 35:275-290, viii. 12. Bailey, D., I. Karakasiliotis, S. Vashist, L. M. Chung, J. Rees, N. McFadden, A. Benson, F. Yarovinsky, P. Simmonds, and I. Goodfellow. 2010. Functional analysis of RNA structures present at the 3' extremity of the murine norovirus genome: the variable polypyrimidine tract plays a role in viral virulence. J Virol 84:2859-2870.

159

References

13. Baker, E. S., S. R. Luckner, K. L. Krause, P. R. Lambden, I. N. Clarke, and V. K. Ward. 2012. Inherent Structural Disorder and Dimerisation of Murine Norovirus NS1-2 Protein. PLoS One 7:e30534. 14. Barnes, G. L., E. Uren, K. B. Stevens, and R. F. Bishop. 1998. Etiology of acute gastroenteritis in hospitalized children in Melbourne, Australia, from April 1980 to March 1993. Journal of clinical microbiology 36:133-138. 15. Belliot, G., A. H. Kamel, M. Estienney, K. Ambert-Balay, and P. Pothier. 2010. Evidence of emergence of new GGII.4 norovirus variants from gastroenteritis outbreak survey in France during the 2007-to-2008 and 2008-to-2009 winter seasons. Journal of clinical microbiology 48:994-998. 16. Belliot, G., S. V. Sosnovtsev, K. O. Chang, V. Babu, U. Uche, J. J. Arnold, C. E. Cameron, and K. Y. Green. 2005. Norovirus proteinase-polymerase and polymerase are both active forms of RNA-dependent RNA polymerase. J Virol 79:2393-2403. 17. Belliot, G., S. V. Sosnovtsev, K. O. Chang, P. McPhie, and K. Y. Green. 2008. Nucleotidylylation of the VPg protein of a human norovirus by its proteinase- polymerase precursor protein. Virology 374:33-49. 18. Belliot, G., S. V. Sosnovtsev, T. Mitra, C. Hammer, M. Garfield, and K. Y. Green. 2003. In vitro proteolytic processing of the MD145 norovirus ORF1 nonstructural polyprotein yields stable precursors and products similar to those detected in calicivirus-infected cells. J Virol 77:10957-10974. 19. Bertolotti-Ciarlet, A., S. E. Crawford, A. M. Hutson, and M. K. Estes. 2003. The 3' end of Norwalk virus mRNA contains determinants that regulate the expression and stability of the viral capsid protein VP1: a novel function for the VP2 protein. J Virol 77:11603-11615. 20. Blanton, L. H., S. M. Adams, R. S. Beard, G. Wei, S. N. Bulens, M. A. Widdowson, R. I. Glass, and S. S. Monroe. 2006. Molecular and epidemiologic trends of caliciviruses associated with outbreaks of acute gastroenteritis in the United States, 2000-2004. J Infect Dis 193:413-421. 21. Bode, J. G., E. D. Brenndorfer, J. Karthe, and D. Haussinger. 2009. Interplay between host cell and hepatitis C virus in regulating viral replication. Biol Chem 390:1013-1032. 22. Boillat Blanco, N., R. Kuonen, C. Bellini, O. Manuel, C. Estrade, J. Mazza-Stalder, J. D. Aubert, R. Sahli, and P. Meylan. 2011. Chronic norovirus gastroenteritis in a double hematopoietic stem cell and lung transplant recipient. Transpl Infect Dis 13:213-215.

160

References

23. Bok, K., E. J. Abente, M. Realpe-Quintero, T. Mitra, S. V. Sosnovtsev, A. Z. Kapikian, and K. Y. Green. 2009. Evolutionary dynamics of GII.4 noroviruses over a 34-year period. J Virol 83:11890-11901. 24. Bok, K., G. I. Parra, T. Mitra, E. Abente, C. K. Shaver, D. Boon, R. Engle, C. Yu, A. Z. Kapikian, S. V. Sosnovtsev, R. H. Purcell, and K. Y. Green. 2011. Chimpanzees as an animal model for human norovirus infection and vaccine development. Proceedings of the National Academy of Sciences of the United States of America 108:325-330. 25. Bon, F., K. Ambert-Balay, H. Giraudon, J. Kaplon, S. Le Guyader, M. Pommepuy, A. Gallay, V. Vaillant, H. de Valk, R. Chikhi-Brachet, A. Flahaut, P. Pothier, and E. Kohli. 2005. Molecular epidemiology of caliciviruses detected in sporadic and outbreak cases of gastroenteritis in France from December 1998 to February 2004. Journal of clinical microbiology 43:4659-4664. 26. Boni, M. F., M. D. de Jong, H. R. van Doorn, and E. C. Holmes. 2010. Guidelines for identifying homologous recombination events in influenza A virus. PLoS One 5:e10434. 27. Boni, M. F., J. R. Gog, V. Andreasen, and F. B. Christiansen. 2004. Influenza drift and epidemic size: the race between generating and escaping immunity. Theoretical population biology 65:179-191. 28. Boni, M. F., D. Posada, and M. W. Feldman. 2007. An exact nonparametric method for inferring mosaic structure in sequence triplets. Genetics 176:1035-1047. 29. Brenndorfer, E. D., J. Karthe, L. Frelin, P. Cebula, A. Erhardt, J. Schulte am Esch, H. Hengel, R. Bartenschlager, M. Sallberg, D. Haussinger, and J. G. Bode. 2009. Nonstructural 3/4A protease of hepatitis C virus activates epithelial growth factor- induced signal transduction by cleavage of the T-cell protein tyrosine phosphatase. Hepatology 49:1810-1820. 30. Bruenn, J. A. 2003. A structural and primary sequence comparison of the viral RNA- dependent RNA polymerases. Nucleic Acids Res 31:1821-1829. 31. Bucardo, F., J. Nordgren, B. Carlsson, M. Paniagua, P. E. Lindgren, F. Espinoza, and L. Svensson. 2008. Pediatric norovirus diarrhea in Nicaragua. Journal of clinical microbiology 46:2573-2580. 32. Buchkovich, N. J., Y. Yu, C. A. Zampieri, and J. C. Alwine. 2008. The TORrid affairs of viruses: effects of mammalian DNA viruses on the PI3K-Akt-mTOR signalling pathway. Nat Rev Microbiol 6:266-275. 33. Bull, R. A., J. S. Eden, W. D. Rawlinson, and P. A. White. 2010. Rapid evolution of pandemic noroviruses of the GII.4 lineage. PLoS Pathog 6:e1000831.

161

References

34. Bull, R. A., G. S. Hansman, L. E. Clancy, M. M. Tanaka, W. D. Rawlinson, and P. A. White. 2005. Norovirus recombination in ORF1/ORF2 overlap. Emerging infectious diseases 11:1079-1085. 35. Bull, R. A., J. Hyde, J. M. Mackenzie, G. S. Hansman, T. Oka, N. Takeda, and P. A. White. 2011. Comparison of the replication properties of murine and human calicivirus RNA-dependent RNA polymerases. Virus Genes 42:16-27. 36. Bull, R. A., F. Luciani, K. McElroy, S. Gaudieri, S. T. Pham, A. Chopra, B. Cameron, L. Maher, G. J. Dore, P. A. White, and A. R. Lloyd. 2011. Sequential bottlenecks drive viral evolution in early acute hepatitis C virus infection. PLoS Pathog 7:e1002243. 37. Bull, R. A., M. M. Tanaka, and P. A. White. 2007. Norovirus recombination. J Gen Virol 88:3347-3359. 38. Bull, R. A., E. T. Tu, C. J. McIver, W. D. Rawlinson, and P. A. White. 2006. Emergence of a new norovirus genotype II.4 variant associated with global outbreaks of gastroenteritis. Journal of clinical microbiology 44:327-333. 39. Bull, R. A., and P. A. White. 2011. Mechanisms of GII.4 norovirus evolution. Trends Microbiol 19:233-240. 40. Cannon, J. L., L. C. Lindesmith, E. F. Donaldson, L. Saxe, R. S. Baric, and J. Vinje. 2009. Herd immunity to GII.4 noroviruses is supported by outbreak patient sera. J Virol 83:5363-5374. 41. Cao, S., Z. Lou, M. Tan, Y. Chen, Y. Liu, Z. Zhang, X. C. Zhang, X. Jiang, X. Li, and Z. Rao. 2007. Structural basis for the recognition of blood group trisaccharides by norovirus. J Virol 81:5949-5957. 42. Capizzi, T., G. Makari-Judson, R. Steingart, and W. C. Mertens. 2011. Chronic diarrhea associated with persistent norovirus excretion in patients with chronic lymphocytic leukemia: report of two cases. BMC Infect Dis 11:131. 43. Carlsson, B., A. M. Lindberg, J. Rodriguez-Diaz, K. O. Hedlund, B. Persson, and L. Svensson. 2009. Quasispecies dynamics and molecular evolution of human norovirus capsid P region during chronic infection. J Gen Virol 90:432-441. 44. Carsillo, M., D. Kim, and S. Niewiesk. 2010. Role of AKT kinase in measles virus replication. J Virol 84:2180-2183. 45. Cauchi, M. R., J. C. Doultree, J. A. Marshall, and P. J. Wright. 1996. Molecular characterization of Camberwell virus and sequence variation in ORF3 of small round- structured (Norwalk-like) viruses. J Med Virol 49:70-76.

162

References

46. Ceballos, A., G. Andreani, C. Ripamonti, D. Dilernia, R. Mendez, R. D. Rabinovich, P. C. Cardenas, C. Zala, P. Cahn, G. Scarlatti, and L. Martinez Peralta. 2008. Lack of viral selection in human immunodeficiency virus type 1 mother-to-child transmission with primary infection during late pregnancy and/or breastfeeding. J Gen Virol 89:2773- 2782. 47. Chachu, K. A., D. W. Strong, A. D. LoBue, C. E. Wobus, R. S. Baric, and H. W. t. Virgin. 2008. Antibody is critical for the clearance of murine norovirus infection. J Virol 82:6610-6617. 48. Chaudhry, Y., M. A. Skinner, and I. G. Goodfellow. 2007. Recovery of genetically defined murine norovirus in tissue culture by using a fowlpox virus expressing T7 RNA polymerase. J Gen Virol 88:2091-2100. 49. Chen, R., J. D. Neill, J. S. Noel, A. M. Hutson, R. I. Glass, M. K. Estes, and B. V. Prasad. 2004. Inter- and intragenus structural variations in caliciviruses and their functional implications. J Virol 78:6469-6479. 50. Chhabra, P., A. M. Walimbe, and S. D. Chitambar. 2010. Complete genome characterization of Genogroup II norovirus strains from India: Evidence of recombination in ORF2/3 overlap. Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases 10:1101- 1109. 51. Chinnaswamy, S., A. Murali, P. Li, K. Fujisaki, and C. C. Kao. 2010. Regulation of de novo-initiated RNA synthesis in hepatitis C virus RNA-dependent RNA polymerase by intermolecular interactions. J Virol 84:5923-5935. 52. Chinnaswamy, S., I. Yarbrough, S. Palaninathan, C. T. Kumar, V. Vijayaraghavan, B. Demeler, S. M. Lemon, J. C. Sacchettini, and C. C. Kao. 2008. A locking mechanism regulates RNA synthesis and host protein interaction by the hepatitis C virus polymerase. J Biol Chem 283:20535-20546. 53. Choi, J. M., A. M. Hutson, M. K. Estes, and B. V. Prasad. 2008. Atomic resolution structural characterization of recognition of histo-blood group antigens by Norwalk virus. Proceedings of the National Academy of Sciences of the United States of America 105:9175-9180. 54. Chow, C. M., A. K. Leung, and K. L. Hon. 2010. Acute gastroenteritis: from guidelines to real life. Clin Exp Gastroenterol 3:97-112.

163

References

55. Cordey, S., T. Junier, D. Gerlach, F. Gobbini, L. Farinelli, E. M. Zdobnov, B. Winther, C. Tapparel, and L. Kaiser. 2010. Rhinovirus genome evolution during experimental human infection. PLoS One 5:e10588. 56. Coyne, K. P., R. M. Gaskell, S. Dawson, C. J. Porter, and A. D. Radford. 2007. Evolutionary mechanisms of persistence and diversification of a calicivirus within endemically infected natural host populations. J Virol 81:1961-1971. 57. D'Agaro, P., T. Rossi, P. Burgnich, G. D. Molin, N. Coppola, G. Rocco, and C. Campello. 2008. The molecular epidemiology of influenza viruses: a lesson from a highly epidemic season. Journal of clinical pathology 61:355-360. 58. Daughenbaugh, K. F., C. S. Fraser, J. W. Hershey, and M. E. Hardy. 2003. The genome- linked protein VPg of the Norwalk virus binds eIF3, suggesting its role in translation initiation complex recruitment. EMBO J 22:2852-2859. 59. Daughenbaugh, K. F., C. E. Wobus, and M. E. Hardy. 2006. VPg of murine norovirus binds translation initiation factors in infected cells. Virology journal 3:33. 60. de Rougemont, A., N. Ruvoen-Clouet, B. Simon, M. Estienney, C. Elie-Caille, S. Aho, P. Pothier, J. Le Pendu, W. Boireau, and G. Belliot. 2011. Qualitative and quantitative analysis of the binding of GII.4 norovirus variants onto human blood group antigens. J Virol 85:4057-4070. 61. Debbink, K., E. F. Donaldson, L. C. Lindesmith, and R. S. Baric. 2012. Genetic Mapping of a Highly Variable Norovirus GII.4 Blockade Epitope: Potential Role in Escape from Human Herd Immunity. J Virol 86:1214-1226. 62. DeLano, W. L. 2002. The PyMOL Molecular Graphics System. DeLano Scientific, San Carlos, CA, USA. 63. Delport, W., A. F. Poon, S. D. Frost, and S. L. Kosakovsky Pond. 2010. Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics 26:2455-2457. 64. Deng, W., B. S. Maust, D. C. Nickle, G. H. Learn, Y. Liu, L. Heath, S. L. Kosakovsky Pond, and J. I. Mullins. 2010. DIVEIN: a web server to analyze phylogenies, sequence divergence, diversity, and informative sites. Biotechniques 48:405-408. 65. Dickover, R. E., E. M. Garratty, S. Plaeger, and Y. J. Bryson. 2001. Perinatal transmission of major, minor, and multiple maternal human immunodeficiency virus type 1 variants in utero and intrapartum. J Virol 75:2194-2203. 66. Diez-Domingo, J., J. M. Baldo, M. Patrzalek, P. Pazdiora, J. Forster, L. Cantarutti, J. Y. Pircon, M. Soriano-Gabarro, and N. Meyer. 2011. Primary care-based surveillance to

164

References

estimate the burden of rotavirus gastroenteritis among children aged less than 5 years in six European countries. Eur J Pediatr 170:213-222. 67. Dolin, R., N. R. Blacklow, H. DuPont, R. F. Buscho, R. G. Wyatt, J. A. Kasel, R. Hornick, and R. M. Chanock. 1972. Biological properties of Norwalk agent of acute infectious nonbacterial gastroenteritis. Proceedings of the Society for Experimental Biology and Medicine. Society for Experimental Biology and Medicine 140:578-583. 68. Dolin, R., N. R. Blacklow, H. DuPont, S. Formal, R. F. Buscho, J. A. Kasel, R. P. Chames, R. Hornick, and R. M. Chanock. 1971. Transmission of acute infectious nonbacterial gastroenteritis to volunteers by oral administration of stool filtrates. J Infect Dis 123:307-312. 69. Domingo, E. 1997. Rapid evolution of viral RNA genomes. The Journal of nutrition 127:958S-961S. 70. Domingo, E. 2007. Virus Evolution, p. 389 - 422. In D. M. Knipe and P. M. Howley (ed.), Field's Virology, 5th ed, vol. 1. Lippincott Williams & Wilkins, Philadelphia. 71. Domingo, E., C. Escarmis, N. Sevilla, A. Moya, S. F. Elena, J. Quer, I. S. Novella, and J. J. Holland. 1996. Basic concepts in RNA virus evolution. Faseb J 10:859-864. 72. Domingo, E., and J. J. Holland. 1997. RNA virus mutations and fitness for survival. Annu Rev Microbiol 51:151-178. 73. Donaldson, E. F., L. C. Lindesmith, A. D. Lobue, and R. S. Baric. 2010. Viral shape- shifting: norovirus evasion of the human immune system. Nat Rev Microbiol 8:231- 241. 74. Doyle, T. J., L. Stark, R. Hammond, and R. S. Hopkins. 2008. Outbreaks of noroviral gastroenteritis in Florida, 2006-2007. Epidemiology and infection:1-9. 75. , J. W. 1993. Rates of spontaneous mutation among RNA viruses. Proceedings of the National Academy of Sciences of the United States of America 90:4171-4175. 76. Drummond, A. J., S. Y. Ho, M. J. Phillips, and A. Rambaut. 2006. Relaxed phylogenetics and dating with confidence. PLoS biology 4:e88. 77. Drummond, A. J., and A. Rambaut. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214. 78. Drummond, A. J., A. Rambaut, B. Shapiro, and O. G. Pybus. 2005. Bayesian coalescent inference of past population dynamics from molecular sequences. Molecular biology and evolution 22:1185-1192. 79. Duizer, E., K. J. Schwab, F. H. Neill, R. L. Atmar, M. P. Koopmans, and M. K. Estes. 2004. Laboratory efforts to cultivate noroviruses. J Gen Virol 85:79-87.

165

References

80. Ebdrup, L., B. Bottiger, H. Molgaard, and A. L. Laursen. 2011. Devastating diarrhoea in a heart-transplanted patient. J Clin Virol 50:263-265. 81. Eden, J. S., R. A. Bull, E. Tu, C. J. McIver, M. J. Lyon, J. A. Marshall, D. W. Smith, J. Musto, W. D. Rawlinson, and P. A. White. 2010. Norovirus GII.4 variant 2006b caused epidemics of acute gastroenteritis in Australia during 2007 and 2008. J Clin Virol 49:265-271. 82. Ehrhardt, C., T. Wolff, S. Pleschka, O. Planz, W. Beermann, J. G. Bode, M. Schmolke, and S. Ludwig. 2007. Influenza A virus NS1 protein activates the PI3K/Akt pathway to mediate antiapoptotic signaling responses. J Virol 81:3058-3067. 83. Elena, S. F., and R. Sanjuan. 2005. Adaptive value of high mutation rates of RNA viruses: separating causes from consequences. J Virol 79:11555-11558. 84. Elliott, E. J. 2007. Acute gastroenteritis in children. BMJ 334:35-40. 85. Estes, M. K., B. V. Prasad, and R. L. Atmar. 2006. Noroviruses everywhere: has something changed? Curr Opin Infect Dis 19:467-474. 86. Ewing, B., and P. Green. 1998. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 8:186-194. 87. Farkas, T., K. Sestak, C. Wei, and X. Jiang. 2008. Characterization of a rhesus monkey calicivirus representing a new genus of Caliciviridae. J Virol 82:5408-5416. 88. Ferguson, N. M., A. P. Galvani, and R. M. Bush. 2003. Ecological and immunological determinants of influenza evolution. Nature 422:428-433. 89. Fischer, W., V. V. Ganusov, E. E. Giorgi, P. T. Hraber, B. F. Keele, T. Leitner, C. S. Han, C. D. Gleasner, L. Green, C. C. Lo, A. Nag, T. C. Wallstrom, S. Wang, A. J. McMichael, B. F. Haynes, B. H. Hahn, A. S. Perelson, P. Borrow, G. M. Shaw, T. Bhattacharya, and B. T. Korber. 2010. Transmission of single HIV-1 genomes and dynamics of early immune escape revealed by ultra-deep sequencing. PLoS One 5:e12303. 90. Fukushi, S., S. Kojima, R. Takai, F. B. Hoshino, T. Oka, N. Takeda, K. Katayama, and T. Kageyama. 2004. Poly(A)- and primer-independent RNA polymerase of Norovirus. J Virol 78:3889-3896. 91. Gallimore, C. I., D. Cubitt, N. du Plessis, and J. J. Gray. 2004. Asymptomatic and symptomatic excretion of noroviruses during a hospital outbreak of gastroenteritis. Journal of clinical microbiology 42:2271-2274. 92. Gaulin, C., M. Frigon, D. Poirier, and C. Fournier. 1999. Transmission of calicivirus by a foodhandler in the pre-symptomatic phase of illness. Epidemiology and infection 123:475-478.

166

References

93. Ghedin, E., J. Laplante, J. DePasse, D. E. Wentworth, R. P. Santos, M. L. Lepow, J. Porter, K. Stellrecht, X. Lin, D. Operario, S. Griesemer, A. Fitch, R. A. Halpin, T. B. Stockwell, D. J. Spiro, E. C. Holmes, and K. St George. 2011. Deep sequencing reveals mixed infection with 2009 pandemic influenza A (H1N1) virus strains and the emergence of oseltamivir resistance. J Infect Dis 203:168-174. 94. Giammanco, G. M., V. Rotolo, M. C. Medici, F. Tummolo, F. Bonura, C. Chezzi, V. Martella, and S. De Grazia. 2012. Recombinant norovirus GII.g/GII.12 gastroenteritis in children. Infect Genet Evol 12:169-174. 95. Gilles, A., E. Meglecz, N. Pech, S. Ferreira, T. Malausa, and J. F. Martin. 2011. Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing. BMC Genomics 12:245. 96. Glass, P. J., L. J. White, J. M. Ball, I. Leparc-Goffart, M. E. Hardy, and M. K. Estes. 2000. Norwalk virus open reading frame 3 encodes a minor structural protein. J Virol 74:6581-6591. 97. Go, E. P., G. Hewawasam, H. X. Liao, H. Chen, L. H. Ping, J. A. Anderson, D. C. Hua, B. F. Haynes, and H. Desaire. 2011. Characterization of glycosylation profiles of HIV-1 transmitted/founder envelopes by mass spectrometry. J Virol 85:8270-8284. 98. Gong, P., and O. B. Peersen. 2010. Structural basis for active site closure by the poliovirus RNA-dependent RNA polymerase. Proceedings of the National Academy of Sciences of the United States of America 107:22505-22510. 99. Goodall, J. F. 1954. The winter vomiting disease; a report from general practice. Br Med J 1:197-198. 100. Goonetilleke, N., M. K. Liu, J. F. Salazar-Gonzalez, G. Ferrari, E. Giorgi, V. V. Ganusov, B. F. Keele, G. H. Learn, E. L. Turnbull, M. G. Salazar, K. J. Weinhold, S. Moore, N. Letvin, B. F. Haynes, M. S. Cohen, P. Hraber, T. Bhattacharya, P. Borrow, A. S. Perelson, B. H. Hahn, G. M. Shaw, B. T. Korber, and A. J. McMichael. 2009. The first T cell response to transmitted/founder virus contributes to the control of acute viremia in HIV-1 infection. J Exp Med 206:1253-1272. 101. Green, K. Y., T. Ando, M. S. Balayan, T. Berke, I. N. Clarke, M. K. Estes, D. O. Matson, S. Nakata, J. D. Neill, M. J. Studdert, and H. J. Thiel. 2000. Taxonomy of the caliciviruses. J Infect Dis 181 Suppl 2:S322-330.

167

References

102. Green, K. Y., R. M. Chanock, and A. Z. Kapikian. 2001. Human calicivirus, p. 841-874. In K. D.M. and H. P.M. (ed.), Field's Virology, 4th ed, vol. 1. Lippincott Williams & Wilkins, Philadelphia. 103. Guindon, S., and O. Gascuel. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696-704. 104. Guix, S., M. Asanaka, K. Katayama, S. E. Crawford, F. H. Neill, R. L. Atmar, and M. K. Estes. 2007. Norwalk virus RNA is infectious in mammalian cells. J Virol 81:12238- 12248. 105. Halperin, T., H. Vennema, M. Koopmans, G. Kahila Bar-Gal, R. Kayouf, T. Sela, R. Ambar, and E. Klement. 2008. No Association between Histo-Blood Group Antigens and Susceptibility to Clinical Infections with Genogroup II Norovirus. J Infect Dis 197:63-65. 106. Han, T. H., C. H. Kim, J. Y. Chung, S. H. Park, and E. S. Hwang. 2011. Emergence of norovirus GII-4/2008 variant and recombinant strains in Seoul, Korea. Arch Virol 156:323-329. 107. Hansman, G. S., C. Biertumpfel, I. Georgiev, J. S. McLellan, L. Chen, T. Zhou, K. Katayama, and P. D. Kwong. 2011. Crystal structures of GII.10 and GII.12 norovirus protruding domains in complex with histo-blood group antigens reveal details for a potential site of vulnerability. J Virol 85:6687-6701. 108. Hansman, G. S., K. Katayama, N. Maneekarn, S. Peerakome, P. Khamrin, S. Tonusin, S. Okitsu, O. Nishio, N. Takeda, and H. Ushijima. 2004. Genetic diversity of norovirus and sapovirus in hospitalized infants with sporadic cases of acute gastroenteritis in Chiang Mai, Thailand. Journal of clinical microbiology 42:1305-1307. 109. Hardy, M. E. 2005. Norovirus protein structure and function. FEMS Microbiol Lett 253:1-8. 110. Haworth, J. C., D. A. Tyrrell, and J. E. Whitehead. 1956. Winter vomiting disease with meningeal involvement; an outbreak in a children's hospital. Lancet 271:1152-1154. 111. Hay, A. J., V. Gregory, A. R. Douglas, and Y. P. Lin. 2001. The evolution of human influenza viruses. Philosophical transactions of the Royal Society of London 356:1861- 1870. 112. Herbert, T. P., I. Brierley, and T. D. Brown. 1997. Identification of a protein linked to the genomic and subgenomic mRNAs of feline calicivirus and its role in translation. J Gen Virol 78 ( Pt 5):1033-1040.

168

References

113. Ho, E. C., P. K. Cheng, D. A. Wong, A. W. Lau, and W. W. Lim. 2006. Correlation of norovirus variants with epidemics of acute viral gastroenteritis in Hong Kong. J Med Virol 78:1473-1479. 114. Ho, M. S., R. I. Glass, P. F. Pinsky, and L. J. Anderson. 1988. Rotavirus as a cause of diarrheal morbidity and mortality in the United States. J Infect Dis 158:1112-1116. 115. Hogbom, M., K. Jager, I. Robel, T. Unge, and J. Rohayem. 2009. The active form of the norovirus RNA-dependent RNA polymerase is a homodimer with cooperative activity. J Gen Virol 90:281-291. 116. Hyde, J. L., and J. M. Mackenzie. 2010. Subcellular localization of the MNV-1 ORF1 proteins and their potential roles in the formation of the MNV-1 replication complex. Virology 406:138-148. 117. Hyde, J. L., S. V. Sosnovtsev, K. Y. Green, C. Wobus, H. W. Virgin, and J. M. Mackenzie. 2009. Mouse norovirus replication is associated with virus-induced vesicle clusters originating from membranes derived from the secretory pathway. J Virol 83:9709-9719. 118. Jakubiec, A., V. Tournier, G. Drugeon, S. Pflieger, L. Camborde, J. Vinh, F. Hericourt, V. Redeker, and I. Jupin. 2006. Phosphorylation of viral RNA-dependent RNA polymerase and its role in replication of a plus-strand RNA virus. J Biol Chem 281:21236-21249. 119. Jiang, X., P. W. Huang, W. M. Zhong, T. Farkas, D. W. Cubitt, and D. O. Matson. 1999. Design and evaluation of a primer pair that detects both Norwalk- and Sapporo-like caliciviruses by RT-PCR. Journal of virological methods 83:145-154. 120. Jiang, X., M. Wang, K. Wang, and M. K. Estes. 1993. Sequence and genomic organization of Norwalk virus. Virology 195:51-61. 121. Johansen, K., K. Mannerqvist, A. Allard, Y. Andersson, L. G. Burman, L. Dillner, K. O. Hedlund, K. Jonsson, U. Kumlin, T. Leitner, M. Lysen, M. Thorhagen, A. Tiveljung- Lindell, C. Wahlstrom, B. Zweygberg-Wirgart, and A. Widell. 2008. Norovirus strains belonging to the GII.4 genotype dominate as a cause of nosocomial outbreaks of viral gastroenteritis in Sweden 1997--2005. Arrival of new variants is associated with large nation-wide epidemics. J Clin Virol 42:129-134. 122. Johnston, C. P., H. Qiu, J. R. Ticehurst, C. Dickson, P. Rosenbaum, P. Lawson, A. B. Stokes, C. J. Lowenstein, M. Kaminsky, S. E. Cosgrove, K. Y. Green, and T. M. Perl. 2007. Outbreak management and implications of a nosocomial norovirus outbreak. Clin Infect Dis 45:534-540.

169

References

123. Jones, L. A., L. E. Clancy, W. D. Rawlinson, and P. A. White. 2006. High-affinity aptamers to subtype 3a hepatitis C virus polymerase display genotypic specificity. Antimicrobial agents and chemotherapy 50:3019-3027. 124. Kadaveru, K., J. Vyas, and M. R. Schiller. 2008. Viral infection and human disease-- insights from minimotifs. Front Biosci 13:6455-6471. 125. Kageyama, T., S. Kojima, M. Shinohara, K. Uchida, S. Fukushi, F. B. Hoshino, N. Takeda, and K. Katayama. 2003. Broadly reactive and highly sensitive assay for Norwalk-like viruses based on real-time quantitative reverse transcription-PCR. Journal of clinical microbiology 41:1548-1557. 126. Kamel, A. H., M. A. Ali, H. G. El-Nady, A. de Rougemont, P. Pothier, and G. Belliot. 2009. Predominance and circulation of enteric viruses in the region of Greater Cairo, Egypt. Journal of clinical microbiology 47:1037-1045. 127. Kapikian, A. Z., R. G. Wyatt, R. Dolin, T. S. Thornhill, A. R. Kalica, and R. M. Chanock. 1972. Visualization by immune electron microscopy of a 27-nm particle associated with acute infectious nonbacterial gastroenteritis. J Virol 10:1075-1081. 128. Kapoor, M., L. Zhang, M. Ramachandra, J. Kusukawa, K. E. Ebner, and R. Padmanabhan. 1995. Association between NS3 and NS5 proteins of dengue virus type 2 in the putative RNA replicase is linked to differential phosphorylation of NS5. J Biol Chem 270:19100-19106. 129. Karst, S. M., C. E. Wobus, M. Lay, J. Davidson, and H. W. t. Virgin. 2003. STAT1- dependent innate immunity to a Norwalk-like virus. Science 299:1575-1578. 130. Keele, B. F. 2010. Identifying and characterizing recently transmitted viruses. Curr Opin HIV AIDS 5:327-334. 131. Keele, B. F., E. E. Giorgi, J. F. Salazar-Gonzalez, J. M. Decker, K. T. Pham, M. G. Salazar, C. Sun, T. Grayson, S. Wang, H. Li, X. Wei, C. Jiang, J. L. Kirchherr, F. Gao, J. A. Anderson, L. H. Ping, R. Swanstrom, G. D. Tomaras, W. A. Blattner, P. A. Goepfert, J. M. Kilby, M. S. Saag, E. L. Delwart, M. P. Busch, M. S. Cohen, D. C. Montefiori, B. F. Haynes, B. Gaschen, G. S. Athreya, H. Y. Lee, N. Wood, C. Seoighe, A. S. Perelson, T. Bhattacharya, B. T. Korber, B. H. Hahn, and G. M. Shaw. 2008. Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proceedings of the National Academy of Sciences of the United States of America 105:7552-7557. 132. Kiefer, F., K. Arnold, M. Kunzli, L. Bordoli, and T. Schwede. 2009. The SWISS-MODEL Repository and associated resources. Nucleic Acids Res 37:D387-392.

170

References

133. Kim, S. H., P. Palukaitis, and Y. I. Park. 2002. Phosphorylation of cucumber mosaic virus RNA polymerase 2a protein inhibits formation of replicase complex. EMBO J 21:2292-2300. 134. Kim, S. J., J. H. Kim, Y. G. Kim, H. S. Lim, and J. W. Oh. 2004. Protein kinase C-related kinase 2 regulates hepatitis C virus RNA polymerase function by phosphorylation. J Biol Chem 279:50031-50041. 135. King, C. K., R. Glass, J. S. Bresee, and C. Duggan. 2003. Managing acute gastroenteritis among children: oral rehydration, maintenance, and nutritional therapy. MMWR Recomm Rep 52:1-16. 136. Koboldt, D. C., K. Chen, T. Wylie, D. E. Larson, M. D. McLellan, E. R. Mardis, G. M. Weinstock, R. K. Wilson, and L. Ding. 2009. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25:2283-2285. 137. Kojima, S., T. Kageyama, S. Fukushi, F. B. Hoshino, M. Shinohara, K. Uchida, K. Natori, N. Takeda, and K. Katayama. 2002. Genogroup-specific PCR primers for detection of Norwalk-like viruses. Journal of virological methods 100:107-114. 138. Koopmans, M., H. Vennema, H. Heersma, E. van Strien, Y. van Duynhoven, D. Brown, M. Reacher, and B. Lopman. 2003. Early identification of common-source foodborne virus outbreaks in Europe. Emerging infectious diseases 9:1136-1142. 139. Kosek, M., C. Bern, and R. L. Guerrant. 2003. The global burden of diarrhoeal disease, as estimated from studies published between 1992 and 2000. Bull World Health Organ 81:197-204. 140. Kroneman, A., H. Vennema, K. Deforche, H. v d Avoort, S. Penaranda, M. S. Oberste, J. Vinje, and M. Koopmans. 2011. An automated genotyping tool for enteroviruses and noroviruses. J Clin Virol 51:121-125. 141. Kroneman, A., H. Vennema, J. Harris, G. Reuter, C. H. von Bonsdorff, K. O. Hedlund, K. Vainio, V. Jackson, P. Pothier, J. Koch, E. Schreier, B. E. Bottiger, and M. Koopmans. 2006. Increase in norovirus activity reported in Europe. Euro Surveill 11:E061214 061211. 142. Kroneman, A., L. Verhoef, J. Harris, H. Vennema, E. Duizer, Y. van Duynhoven, J. Gray, M. Iturriza, B. Bottiger, G. Falkenhorst, C. Johnsen, C. H. von Bonsdorff, L. Maunula, M. Kuusi, P. Pothier, A. Gallay, E. Schreier, M. Hohne, J. Koch, G. Szucs, G. Reuter, K. Krisztalovics, M. Lynch, P. McKeown, B. Foley, S. Coughlan, F. M. Ruggeri, I. Di Bartolo, K. Vainio, E. Isakbaeva, M. Poljsak-Prijatelj, A. H. Grom, J. Z. Mijovski, A. Bosch, J. Buesa, A. S. Fauquier, G. Hernandez-Pezzi, K. O. Hedlund, and M.

171

References

Koopmans. 2008. Analysis of integrated virological and epidemiological reports of norovirus outbreaks collected within the foodborne viruses in Europe Network from 1 July 2001 to 30 June 2006. Journal of clinical microbiology 46:2959-2965. 143. Kunkel, T. A., R. M. Schaaper, R. A. Beckman, and L. A. Loeb. 1981. On the fidelity of DNA replication. Effect of the next nucleotide on proofreading. J Biol Chem 256:9883- 9889. 144. Kuyumcu-Martinez, M., G. Belliot, S. V. Sosnovtsev, K. O. Chang, K. Y. Green, and R. E. Lloyd. 2004. Calicivirus 3C-like proteinase inhibits cellular translation by cleavage of poly(A)-binding protein. J Virol 78:8172-8182. 145. L'Homme, Y., R. Sansregret, E. Plante-Fortier, A. M. Lamontagne, M. Ouardani, G. Lacroix, and C. Simard. 2009. Genomic characterization of swine caliciviruses representing a new genus of Caliciviridae. Virus Genes 39:66-75. 146. Lam, T. T.-Y., H. Zhu, D. K. Smith, Y. Guan, E. C. Holmes, and O. G. Pybus. 2012. The recombinant origin of emerging human norovirus GII.4/2008: intra-genotypic exchange of the capsid P2 domain. Journal of General Virology 93:817-822. 147. Lee, J. H., I. Alam, K. R. Han, S. Cho, S. Shin, S. Kang, J. M. Yang, and K. H. Kim. 2011. Crystal structures of murine norovirus-1 RNA-dependent RNA polymerase. J Gen Virol 92:1607-1616. 148. Lindesmith, L., C. Moe, J. Lependu, J. A. Frelinger, J. Treanor, and R. S. Baric. 2005. Cellular and humoral immunity following Snow Mountain virus challenge. J Virol 79:2900-2909. 149. Lindesmith, L., C. Moe, S. Marionneau, N. Ruvoen, X. Jiang, L. Lindblad, P. Stewart, J. LePendu, and R. Baric. 2003. Human susceptibility and resistance to Norwalk virus infection. Nature medicine 9:548-553. 150. Lindesmith, L. C., K. Debbink, J. Swanstrom, J. Vinje, V. Costantini, R. S. Baric, and E. F. Donaldson. 2012. Monoclonal Antibody-Based Antigenic Mapping of Norovirus GII.4-2002. J Virol 86:873-883. 151. Lindesmith, L. C., E. F. Donaldson, and R. S. Baric. 2011. Norovirus GII.4 Strain Antigenic Variation. J Virol 85:231-242. 152. Lindesmith, L. C., E. F. Donaldson, A. D. Lobue, J. L. Cannon, D. P. Zheng, J. Vinje, and R. S. Baric. 2008. Mechanisms of GII.4 norovirus persistence in human populations. PLoS Med 5:e31.

172

References

153. Liu, B. L., P. R. Lambden, H. Gunther, P. Otto, M. Elschner, and I. N. Clarke. 1999. Molecular characterization of a bovine enteric calicivirus: relationship to the Norwalk- like viruses. J Virol 73:819-825. 154. Liu, C. H., B. F. Chen, S. C. Chen, M. Y. Lai, J. H. Kao, and D. S. Chen. 2006. Selective transmission of hepatitis C virus quasi species through a needlestick accident in acute resolving hepatitis. Clin Infect Dis 42:1254-1259. 155. Lochridge, V. P., and M. E. Hardy. 2007. A single-amino-acid substitution in the P2 domain of VP1 of murine norovirus is sufficient for escape from antibody neutralization. J Virol 81:12316-12322. 156. Lole, K. S., R. C. Bollinger, R. S. Paranjape, D. Gadkari, S. S. Kulkarni, N. G. Novak, R. Ingersoll, H. W. Sheppard, and S. C. Ray. 1999. Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J Virol 73:152-160. 157. Lopman, B., H. Vennema, E. Kohli, P. Pothier, A. Sanchez, A. Negredo, J. Buesa, E. Schreier, M. Reacher, D. Brown, J. Gray, M. Iturriza, C. Gallimore, B. Bottiger, K. O. Hedlund, M. Torven, C. H. von Bonsdorff, L. Maunula, M. Poljsak-Prijatelj, J. Zimsek, G. Reuter, G. Szucs, B. Melegh, L. Svennson, Y. van Duijnhoven, and M. Koopmans. 2004. Increase in viral gastroenteritis outbreaks in Europe and epidemic spread of new norovirus variant. Lancet 363:682-688. 158. Lopman, B. A., M. H. Reacher, I. B. Vipond, D. Hill, C. Perry, T. Halladay, D. W. Brown, W. J. Edmunds, and J. Sarangi. 2004. Epidemiology and cost of nosocomial gastroenteritis, Avon, England, 2002-2003. Emerging infectious diseases 10:1827-1834. 159. Lund, O., M. Nielsen, C. Lundegaard, and P. Worning. 2002. CPHmodels 2.0: X3M a Computer Program to Extract 3D Models., Critical Assessment of Techniques for Protein Structure Prediction (CASP5) Conference, , United States. 160. Lyon, M. J., G. Wei, and G. A. Smith. 2005. Epidemic viral gastroenteritis in Queensland coincides with the emergence of a new norovirus variant. Commun Dis Intell 29:370-373. 161. Lysen, M., M. Thorhagen, M. Brytting, M. Hjertqvist, Y. Andersson, and K. O. Hedlund. 2009. Genetic diversity among foodborne and waterborne norovirus outbreaks in Sweden. Journal of clinical microbiology. 162. Mackenzie, J. M., M. T. Kenney, and E. G. Westaway. 2007. West Nile virus strain Kunjin NS5 polymerase is a phosphoprotein localized at the cytoplasmic site of viral RNA synthesis. J Gen Virol 88:1163-1168.

173

References

163. Manrubia, S. C., C. Escarmis, E. Domingo, and E. Lazaro. 2005. High mutation rates, bottlenecks, and robustness of RNA viral quasispecies. Gene 347:273-282. 164. Mans, J., J. C. de Villiers, N. M. du Plessis, T. Avenant, and M. B. Taylor. 2010. Emerging norovirus GII.4 2008 variant detected in hospitalised paediatric patients in South Africa. J Clin Virol 49:258-264. 165. Margulies, M., M. Egholm, W. E. Altman, S. Attiya, J. S. Bader, L. A. Bemben, J. Berka, M. S. Braverman, Y. J. Chen, Z. Chen, S. B. Dewell, L. Du, J. M. Fierro, X. V. Gomes, B. C. Godwin, W. He, S. Helgesen, C. H. Ho, G. P. Irzyk, S. C. Jando, M. L. Alenquer, T. P. Jarvie, K. B. Jirage, J. B. Kim, J. R. Knight, J. R. Lanza, J. H. Leamon, S. M. Lefkowitz, M. Lei, J. Li, K. L. Lohman, H. Lu, V. B. Makhijani, K. E. McDade, M. P. McKenna, E. W. Myers, E. Nickerson, J. R. Nobile, R. Plant, B. P. Puc, M. T. Ronan, G. T. Roth, G. J. Sarkis, J. F. Simons, J. W. Simpson, M. Srinivasan, K. R. Tartaro, A. Tomasz, K. A. Vogt, G. A. Volkmer, S. H. Wang, Y. Wang, M. P. Weiner, P. Yu, R. F. Begley, and J. M. Rothberg. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376-380. 166. Marionneau, S., N. Ruvoen, B. Le Moullac-Vaidye, M. Clement, A. Cailleau-Thomas, G. Ruiz-Palacois, P. Huang, X. Jiang, and J. Le Pendu. 2002. Norwalk virus binds to histo-blood group antigens present on gastroduodenal epithelial cells of secretor individuals. Gastroenterology 122:1967-1977. 167. Martella, V., M. Campolo, E. Lorusso, P. Cavicchio, M. Camero, A. L. Bellacicco, N. Decaro, G. Elia, G. Greco, M. Corrente, C. Desario, S. Arista, K. Banyai, M. Koopmans, and C. Buonavoglia. 2007. Norovirus in captive lion cub (Panthera leo). Emerging infectious diseases 13:1071-1073. 168. Martella, V., E. Lorusso, N. Decaro, G. Elia, A. Radogna, M. D'Abramo, C. Desario, A. Cavalli, M. Corrente, M. Camero, C. A. Germinario, K. Banyai, B. Di Martino, F. Marsilio, L. E. Carmichael, and C. Buonavoglia. 2008. Detection and molecular characterization of a canine norovirus. Emerging infectious diseases 14:1306-1308. 169. Martin, D. P., P. Lemey, M. Lott, V. Moulton, D. Posada, and P. Lefeuvre. 2010. RDP3: a flexible and fast computer program for analyzing recombination. Bioinformatics 26:2462-2463. 170. Martin, D. P., P. Lemey, and D. Posada. 2011. Analysing recombination in nucleotide sequences. Mol Ecol Resour 11:943-955.

174

References

171. Mattison, K., T. K. Sebunya, A. Shukla, L. N. Noliwe, and S. Bidawid. 2010. Molecular detection and characterization of noroviruses from children in Botswana. J Med Virol 82:321-324. 172. McAllister, G., A. Holmes, L. Garcia, F. Cameron, K. Cloy, J. Danial, J. A. Cepeda, P. Simmonds, and K. E. Templeton. 2012. Molecular epidemiology of norovirus in Edinburgh healthcare facilities, Scotland 2007-2011. Epidemiology and infection:1-9. 173. McFadden, N., D. Bailey, G. Carrara, A. Benson, Y. Chaudhry, A. Shortland, J. Heeney, F. Yarovinsky, P. Simmonds, A. Macdonald, and I. Goodfellow. 2011. Norovirus regulation of the innate immune response and apoptosis occurs via the product of the alternative open reading frame 4. PLoS Pathog 7:e1002413. 174. Mondelli, M. U., A. Cerino, A. Lisa, S. Brambilla, L. Segagni, A. Cividini, M. Bissolati, G. Missale, G. Bellati, A. Meola, B. Bruniercole, A. Nicosia, G. Galfre, and E. Silini. 1999. Antibody responses to hepatitis C virus hypervariable region 1: evidence for cross- reactivity and immune-mediated sequence variation. Hepatology 30:537-545. 175. Monroe, S. S., T. Ando, and R. I. Glass. 2000. Introduction: human enteric caliciviruses- an emerging pathogen whose time has come. J Infect Dis 181 Suppl 2:S249-251. 176. Montville, R., R. Froissart, S. K. Remold, O. Tenaillon, and P. E. Turner. 2005. Evolution of mutational robustness in an RNA virus. PLoS biology 3:e381. 177. Morozova, O. V., N. A. Tsekhanovskaya, T. G. Maksimova, V. N. Bachvalova, V. A. Matveeva, and Y. Kit. 1997. Phosphorylation of tick-borne encephalitis virus NS5 protein. Virus Res 49:9-15. 178. Motomura, K., T. Oka, M. Yokoyama, H. Nakamura, H. Mori, H. Ode, G. S. Hansman, K. Katayama, T. Kanda, T. Tanaka, N. Takeda, and H. Sato. 2008. Identification of monomorphic and divergent haplotypes in the 2006-2007 norovirus GII/4 epidemic population by genomewide tracing of evolutionary history. J Virol 82:11247-11262. 179. Motomura, K., M. Yokoyama, H. Ode, H. Nakamura, H. Mori, T. Kanda, T. Oka, K. Katayama, M. Noda, T. Tanaka, N. Takeda, and H. Sato. 2010. Divergent evolution of norovirus GII/4 by genome recombination from May 2006 to February 2009 in Japan. J Virol 84:8085-8097. 180. Murcia, P. R., G. J. Baillie, J. Daly, D. Elton, C. Jervis, J. A. Mumford, R. Newton, C. R. Parrish, K. Hoelzer, G. Dougan, J. Parkhill, N. Lennard, D. Ormond, S. Moule, A. Whitwham, J. W. McCauley, T. J. McKinley, E. C. Holmes, B. T. Grenfell, and J. L. Wood. 2010. Intra- and interhost evolutionary dynamics of equine influenza virus. J Virol 84:6943-6954.

175

References

181. Muslin, A. J., J. W. Tanner, P. M. Allen, and A. S. Shaw. 1996. Interaction of 14-3-3 with signaling proteins is mediated by the recognition of phosphoserine. Cell 84:889- 897. 182. Nayak, M. K., D. Chatterjee, S. M. Nataraju, M. Pativada, U. Mitra, M. K. Chatterjee, T. K. Saha, U. Sarkar, and T. Krishnan. 2009. A new variant of Norovirus GII.4/2007 and inter-genotype recombinant strains of NVGII causing acute watery diarrhoea among children in Kolkata, India. J Clin Virol 45:223-229. 183. Nei, M., and T. Gojobori. 1986. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Molecular biology and evolution 3:418-426. 184. Nelson, M. I., L. Edelman, D. J. Spiro, A. R. Boyne, J. Bera, R. Halpin, N. Sengamalay, E. Ghedin, M. A. Miller, L. Simonsen, C. Viboud, and E. C. Holmes. 2008. Molecular epidemiology of A/H3N2 and A/H1N1 influenza virus during a single epidemic season in the United States. PLoS Pathog 4:e1000133. 185. Ng, K. K., N. Pendas-Franco, J. Rojo, J. A. Boga, A. Machin, J. M. Alonso, and F. Parra. 2004. Crystal structure of norwalk virus polymerase reveals the carboxyl terminus in the active site cleft. J Biol Chem 279:16638-16645. 186. Nguyen, L. M., and J. P. Middaugh. 2011. Suspected transmission of norovirus in eight long-term care facilities attributed to staff working at multiple institutions. Epidemiology and infection:1-8. 187. Nielsen, R., J. S. Paul, A. Albrechtsen, and Y. S. Song. 2011. Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443-451. 188. Nobusawa, E., and K. Sato. 2006. Comparison of the mutation rates of human influenza A and B viruses. J Virol 80:3675-3678. 189. Noel, J. S., R. L. Fankhauser, T. Ando, S. S. Monroe, and R. I. Glass. 1999. Identification of a distinct common strain of "Norwalk-like viruses" having a global distribution. J Infect Dis 179:1334-1344. 190. Obata, T., M. B. Yaffe, G. G. Leparc, E. T. Piro, H. Maegawa, A. Kashiwagi, R. Kikkawa, and L. C. Cantley. 2000. Peptide and protein library screening defines optimal substrate motifs for AKT/PKB. J Biol Chem 275:36108-36115. 191. Obenauer, J. C., L. C. Cantley, and M. B. Yaffe. 2003. Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 31:3635-3641.

176

References

192. Okada, M., T. Tanaka, M. Oseto, N. Takeda, and K. Shinozaki. 2006. Genetic analysis of noroviruses associated with fatalities in healthcare facilities. Arch Virol 151:1635- 1641. 193. Pang, X. L., J. K. Preiksaitis, S. Wong, V. Li, and B. E. Lee. 2010. Influence of novel norovirus GII.4 variants on gastroenteritis outbreak dynamics in Alberta and the Northern Territories, Canada between 2000 and 2008. PLoS One 5:e11599. 194. Papaventsis, D. C., W. Dove, N. A. Cunliffe, O. Nakagomi, P. Combe, P. Grosjean, and C. A. Hart. 2007. Norovirus infection in children with acute gastroenteritis, Madagascar, 2004-2005. Emerging infectious diseases 13:908-911. 195. Parashar, U. D., J. S. Bresee, and R. I. Glass. 2003. The global burden of diarrhoeal disease in children. Bull World Health Organ 81:236. 196. Parashar, U. D., E. G. Hummelman, J. S. Bresee, M. A. Miller, and R. I. Glass. 2003. Global illness and deaths caused by rotavirus disease in children. Emerging infectious diseases 9:565-572. 197. Parker, T. D., N. Kitamoto, T. Tanaka, A. M. Hutson, and M. K. Estes. 2005. Identification of Genogroup I and Genogroup II broadly reactive epitopes on the norovirus capsid. J Virol 79:7402-7409. 198. Parrino, T. A., D. S. Schreiber, J. S. Trier, A. Z. Kapikian, and N. R. Blacklow. 1977. Clinical immunity in acute gastroenteritis caused by Norwalk agent. N Engl J Med 297:86-89. 199. Patel, M. M., A. J. Hall, J. Vinje, and U. D. Parashar. 2009. Noroviruses: a comprehensive review. J Clin Virol 44:1-8. 200. Patel, M. M., and U. D. Parashar. 2009. Assessing the effectiveness and public health impact of rotavirus vaccines after introduction in immunization programs. J Infect Dis 200 Suppl 1:S291-299. 201. Patel, M. M., D. Steele, J. R. Gentsch, J. Wecker, R. I. Glass, and U. D. Parashar. 2011. Real-world impact of rotavirus vaccination. Pediatr Infect Dis J 30:S1-5. 202. Patel, M. M., M. A. Widdowson, R. I. Glass, K. Akazawa, J. Vinje, and U. D. Parashar. 2008. Systematic literature review of role of noroviruses in sporadic gastroenteritis. Emerging infectious diseases 14:1224-1231. 203. Peitsch, M. C. 1996. ProMod and Swiss-Model: Internet-based tools for automated comparative protein modelling. Biochem Soc Trans 24:274-279. 204. Pfeiffer, J. K., and K. Kirkegaard. 2005. Increased fidelity reduces poliovirus fitness and virulence under selective pressure in mice. PLoS Pathog 1:e11.

177

References

205. Pfister, T., and E. Wimmer. 2001. Polypeptide p41 of a Norwalk-like virus is a nucleic acid-independent nucleoside triphosphatase. J Virol 75:1611-1619. 206. Phan, T. G., T. A. Nguyen, S. Nishimura, T. Nishimura, A. Yamamoto, S. Okitsu, and H. Ushijima. 2005. Etiologic agents of acute gastroenteritis among Japanese infants and children: virus diversity and genetic analysis of sapovirus. Arch Virol 150:1415-1424. 207. Pletneva, M. A., S. V. Sosnovtsev, and K. Y. Green. 2001. The genome of hawaii virus and its relationship with other members of the caliciviridae. Virus Genes 23:5-16. 208. Pond, S. L., and S. D. Frost. 2005. Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics 21:2531-2533. 209. Pond, S. L., S. D. Frost, Z. Grossman, M. B. Gravenor, D. D. Richman, and A. J. Brown. 2006. Adaptation to different human populations by HIV-1 revealed by codon-based analyses. PLoS Comput Biol 2:e62. 210. Pond, S. L., S. D. Frost, and S. V. Muse. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21:676-679. 211. Posada, D., and T. R. Buckley. 2004. Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol 53:793-808. 212. Prasad, B. V., M. E. Hardy, T. Dokland, J. Bella, M. G. Rossmann, and M. K. Estes. 1999. X-ray crystallographic structure of the Norwalk virus capsid. Science 286:287- 290. 213. Pride, D. T., and M. J. Blaser. 2002. Concerted evolution between duplicated genetic elements in Helicobacter pylori. J Mol Biol 316:629-642. 214. Puustinen, L., V. Blazevic, M. Salminen, M. Hamalainen, S. Rasanen, and T. Vesikari. 2011. Noroviruses as a major cause of acute gastroenteritis in children in Finland, 2009-2010. Scand J Infect Dis 43:804-808. 215. Ramachandran, S., D. S. Campo, Z. E. Dimitrova, G. L. Xia, M. A. Purdy, and Y. E. Khudyakov. 2011. Temporal variations in the hepatitis C virus intrahost population during chronic infection. J Virol 85:6369-6380. 216. Rambaut, A., O. G. Pybus, M. I. Nelson, C. Viboud, J. K. Taubenberger, and E. C. Holmes. 2008. The genomic and epidemiological dynamics of human influenza A virus. Nature 453:615-619. 217. Ramirez, S., S. De Grazia, G. M. Giammanco, M. Milici, C. Colomba, F. M. Ruggeri, V. Martella, and S. Arista. 2006. Detection of the norovirus variants GGII.4 hunter and GGIIb/hilversum in Italian children with gastroenteritis. J Med Virol 78:1656-1662.

178

References

218. Rispeter, K., M. Lu, S. E. Behrens, C. Fumiko, T. Yoshida, and M. Roggendorf. 2000. Hepatitis C virus variability: sequence analysis of an isolate after 10 years of chronic infection. Virus Genes 21:179-188. 219. Rockx, B., M. De Wit, H. Vennema, J. Vinje, E. De Bruin, Y. Van Duynhoven, and M. Koopmans. 2002. Natural history of human calicivirus infection: a prospective cohort study. Clin Infect Dis 35:246-253. 220. Rockx, B. H., W. M. Bogers, J. L. Heeney, G. van Amerongen, and M. P. Koopmans. 2005. Experimental norovirus infections in non-human primates. J Med Virol 75:313- 320. 221. Rohayem, J., J. Munch, and A. Rethwilm. 2005. Evidence of recombination in the norovirus capsid gene. J Virol 79:4977-4990. 222. Rohayem, J., I. Robel, K. Jager, U. Scheffler, and W. Rudolph. 2006. Protein-primed and de novo initiation of RNA synthesis by norovirus 3Dpol. J Virol 80:7060-7069. 223. Said, M. A., T. M. Perl, and C. L. Sears. 2008. Healthcare epidemiology: gastrointestinal flu: norovirus in health care and long-term care facilities. Clin Infect Dis 47:1202-1208. 224. Salazar-Gonzalez, J. F., M. G. Salazar, B. F. Keele, G. H. Learn, E. E. Giorgi, H. Li, J. M. Decker, S. Wang, J. Baalwa, M. H. Kraus, N. F. Parrish, K. S. Shaw, M. B. Guffey, K. J. Bar, K. L. Davis, C. Ochsenbauer-Jambor, J. C. Kappes, M. S. Saag, M. S. Cohen, J. Mulenga, C. A. Derdeyn, S. Allen, E. Hunter, M. Markowitz, P. Hraber, A. S. Perelson, T. Bhattacharya, B. F. Haynes, B. T. Korber, B. H. Hahn, and G. M. Shaw. 2009. Genetic identity, biological phenotype, and evolutionary pathways of transmitted/founder viruses in acute and early HIV-1 infection. J Exp Med 206:1273- 1289. 225. Sallie, R. 2005. Replicative homeostasis: a fundamental mechanism mediating selective viral replication and escape mutation. Virology journal 2:10. 226. Sanjuan, R., M. R. Nebot, N. Chirico, L. M. Mansky, and R. Belshaw. 2010. Viral mutation rates. J Virol 84:9733-9748. 227. Schorn, R., M. Hohne, A. Meerbach, W. Bossart, R. P. Wuthrich, E. Schreier, N. J. Muller, and T. Fehr. 2010. Chronic norovirus infection after kidney transplantation: molecular evidence for immune-driven viral evolution. Clin Infect Dis 51:307-314. 228. Seah, E. L., J. A. Marshall, and P. J. Wright. 1999. Open reading frame 1 of the Norwalk-like virus Camberwell: completion of sequence and expression in mammalian cells. J Virol 73:10531-10535.

179

References

229. Seto, Y., N. Iritani, H. Kubo, A. Kaida, T. Murakami, K. Haruki, O. Nishio, M. Ayata, and H. Ogura. 2005. Genotyping of Norovirus strains detected in outbreaks between April 2002 and March 2003 in Osaka City, Japan. Microbiol Immunol 49:275-283. 230. Shanker, S., J. M. Choi, B. Sankaran, R. L. Atmar, M. K. Estes, and B. V. Prasad. 2011. Structural analysis of histo-blood group antigen binding specificity in a norovirus GII.4 epidemic variant: implications for epochal evolution. J Virol 85:8635-8645. 231. Sharp, T. M., S. Guix, K. Katayama, S. E. Crawford, and M. K. Estes. 2010. Inhibition of cellular protein secretion by norwalk virus nonstructural protein p22 requires a mimic of an endoplasmic reticulum export signal. PLoS One 5:e13130. 232. Sharpe, L. J., W. Luu, and A. J. Brown. 2011. Akt phosphorylates Sec24: new clues into the regulation of ER-to-Golgi trafficking. Traffic 12:19-27. 233. Siebenga, J. J., M. F. Beersma, H. Vennema, P. van Biezen, N. J. Hartwig, and M. Koopmans. 2008. High prevalence of prolonged norovirus shedding and illness among hospitalized patients: a model for in vivo molecular evolution. J Infect Dis 198:994- 1001. 234. Siebenga, J. J., P. Lemey, S. L. Kosakovsky Pond, A. Rambaut, H. Vennema, and M. Koopmans. 2010. Phylodynamic reconstruction reveals norovirus GII.4 epidemic expansions and their molecular determinants. PLoS Pathog 6:e1000884. 235. Siebenga, J. J., H. Vennema, B. Renckens, E. de Bruin, B. van der Veer, R. J. Siezen, and M. Koopmans. 2007. Epochal evolution of GGII.4 norovirus capsid proteins from 1995 to 2006. J Virol 81:9932-9941. 236. Siebenga, J. J., H. Vennema, D. P. Zheng, J. Vinje, B. E. Lee, X. L. Pang, E. C. Ho, W. Lim, A. Choudekar, S. Broor, T. Halperin, N. B. Rasool, J. Hewitt, G. E. Greening, M. Jin, Z. J. Duan, Y. Lucero, M. O'Ryan, M. Hoehne, E. Schreier, R. M. Ratcliff, P. A. White, N. Iritani, G. Reuter, and M. Koopmans. 2009. Norovirus illness is a global problem: emergence and spread of norovirus GII.4 variants, 2001-2007. J Infect Dis 200:802-812. 237. Simister, P., M. Schmitt, M. Geitmann, O. Wicht, U. H. Danielson, R. Klein, S. Bressanelli, and V. Lohmann. 2009. Structural and functional analysis of hepatitis C virus strain JFH1 polymerase. J Virol 83:11926-11939. 238. Simmonds, P., I. Karakasiliotis, D. Bailey, Y. Chaudhry, D. J. Evans, and I. G. Goodfellow. 2008. Bioinformatic and functional analysis of RNA secondary structure elements among different genera of human and animal caliciviruses. Nucleic Acids Res 36:2530-2546.

180

References

239. Smith, D. J., A. S. Lapedes, J. C. de Jong, T. M. Bestebroer, G. F. Rimmelzwaan, A. D. Osterhaus, and R. A. Fouchier. 2004. Mapping the antigenic and genetic evolution of influenza virus. Science 305:371-376. 240. Smith, G. J., D. Vijaykrishna, J. Bahl, S. J. Lycett, M. Worobey, O. G. Pybus, S. K. Ma, C. L. Cheung, J. Raghwani, S. Bhatt, J. S. Peiris, Y. Guan, and A. Rambaut. 2009. Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature 459:1122-1125. 241. Someya, Y., N. Takeda, and T. Miyamura. 2000. Complete nucleotide sequence of the chiba virus genome and functional expression of the 3C-like protease in Escherichia coli. Virology 278:490-500. 242. Sosnovtsev, S. V., G. Belliot, K. O. Chang, V. G. Prikhodko, L. B. Thackray, C. E. Wobus, S. M. Karst, H. W. Virgin, and K. Y. Green. 2006. Cleavage map and proteolytic processing of the murine norovirus nonstructural polyprotein in infected cells. J Virol 80:7816-7831. 243. Steinhauer, D. A., E. Domingo, and J. J. Holland. 1992. Lack of evidence for proofreading mechanisms associated with an RNA virus polymerase. Gene 122:281- 288. 244. Sukhrie, F. H., J. J. Siebenga, M. F. Beersma, and M. Koopmans. 2010. Chronic shedders as reservoir for nosocomial transmission of norovirus. Journal of clinical microbiology 48:4303-4305. 245. Takanashi, S., Q. Wang, N. Chen, Q. Shen, K. Jung, Z. Zhang, M. Yokoyama, L. C. Lindesmith, R. S. Baric, and L. J. Saif. 2011. Characterization of emerging GII.g/GII.12 noroviruses from a gastroenteritis outbreak in the United States in 2010. Journal of clinical microbiology 49:3234-3244. 246. Tamura, K., J. Dudley, M. Nei, and S. Kumar. 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular biology and evolution 24:1596-1599. 247. Tamura, K., D. Peterson, N. Peterson, G. Stecher, M. Nei, and S. Kumar. 2011. MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular biology and evolution 28:2731-2739. 248. Tan, M., R. S. Hegde, and X. Jiang. 2004. The P domain of norovirus capsid protein forms dimer and binds to histo-blood group antigen receptors. J Virol 78:6233-6242.

181

References

249. Tan, M., and X. Jiang. 2011. Norovirus-host interaction: multi-selections by human histo-blood group antigens. Trends Microbiol 19:382-388. 250. Tan, M., and X. Jiang. 2005. Norovirus and its histo-blood group antigen receptors: an answer to a historical puzzle. Trends Microbiol 13:285-293. 251. Tan, M., and X. Jiang. 2005. The p domain of norovirus capsid protein forms a subviral particle that binds to histo-blood group antigen receptors. J Virol 79:14017-14030. 252. Tan, M., M. Xia, S. Cao, P. Huang, T. Farkas, J. Meller, R. S. Hegde, X. Li, Z. Rao, and X. Jiang. 2008. Elucidation of strain-specific interaction of a GII-4 norovirus with HBGA receptors by site-directed mutagenesis study. Virology 379:324-334. 253. Tate, J. E., A. H. Burton, C. Boschi-Pinto, A. D. Steele, J. Duque, and U. D. Parashar. 2012. 2008 estimate of worldwide rotavirus-associated mortality in children younger than 5 years before the introduction of universal rotavirus vaccination programmes: a systematic review and meta-analysis. Lancet Infect Dis 12:136-141. 254. Tate, J. E., J. D. Mutuc, C. A. Panozzo, D. C. Payne, M. M. Cortese, J. E. Cortes, C. Yen, D. H. Esposito, B. A. Lopman, M. M. Patel, and U. D. Parashar. 2011. Sustained decline in rotavirus detections in the United States following the introduction of rotavirus vaccine in 2006. Pediatr Infect Dis J 30:S30-34. 255. Tate, J. E., C. A. Panozzo, D. C. Payne, M. M. Patel, M. M. Cortese, A. L. Fowlkes, and U. D. Parashar. 2009. Decline and change in seasonality of US rotavirus activity after the introduction of rotavirus vaccine. Pediatrics 124:465-471. 256. Teunis, P. F., C. L. Moe, P. Liu, S. E. Miller, L. Lindesmith, R. S. Baric, J. Le Pendu, and R. L. Calderon. 2008. Norwalk virus: how infectious is it? J Med Virol 80:1468-1476. 257. Thompson, A. A., R. A. Albertini, and O. B. Peersen. 2007. Stabilization of poliovirus polymerase by NTP binding and fingers-thumb interactions. J Mol Biol 366:1459-1474. 258. Tu, E. T., R. A. Bull, G. E. Greening, J. Hewitt, M. J. Lyon, J. A. Marshall, C. J. McIver, W. D. Rawlinson, and P. A. White. 2008. Epidemics of gastroenteritis during 2006 were associated with the spread of norovirus GII.4 variants 2006a and 2006b. Clin Infect Dis 46:413-420. 259. Tu, E. T., R. A. Bull, M. J. Kim, C. J. McIver, L. Heron, W. D. Rawlinson, and P. A. White. 2008. Norovirus excretion in an aged-care setting. Journal of clinical microbiology 46:2119-2121. 260. Tucker, A. W., A. C. Haddix, J. S. Bresee, R. C. Holman, U. D. Parashar, and R. I. Glass. 1998. Cost-effectiveness analysis of a rotavirus immunization program for the United States. JAMA 279:1371-1376.

182

References

261. Unknown. 2008. Delayed onset and diminished magnitude of rotavirus activity--United States, November 2007-May 2008. MMWR Morb Mortal Wkly Rep 57:697-700. 262. Unknown. 1943. Is There an Epidemic Vomiting Disease of Winter? Am J Public Health Nations Health 33:412-413. 263. Unknown. 1965. Winter vomiting disease. Br Med J 2:953-954. 264. van der Donck, I., L. van Hoovels, K. de Leener, T. Goegebuer, L. Vanderwegen, J. Frans, M. Rahman, and M. van Ranst. 2003. [Severe diarrhea due to rotavirus infection in a Belgian hospital 1981-2002]. Acta Clin Belg 58:12-18. 265. Vega, E., L. Barclay, N. Gregoricus, K. Williams, D. Lee, and J. Vinje. 2011. Novel surveillance network for norovirus gastroenteritis outbreaks, United States. Emerging infectious diseases 17:1389-1395. 266. Vega, E., and J. Vinje. 2011. Novel GII.12 norovirus strain, United States, 2009-2010. Emerging infectious diseases 17:1516-1518. 267. Vignuzzi, M., J. K. Stone, J. J. Arnold, C. E. Cameron, and R. Andino. 2006. Quasispecies diversity determines pathogenesis through cooperative interactions in a viral population. Nature 439:344-348. 268. Vignuzzi, M., E. Wendt, and R. Andino. 2008. Engineering attenuated virus vaccines by controlling replication fidelity. Nature medicine 14:154-161. 269. Vinje, J., R. A. Hamidjaja, and M. D. Sobsey. 2004. Development and application of a capsid VP1 (region D) based reverse transcription PCR assay for genotyping of genogroup I and II noroviruses. Journal of virological methods 116:109-117. 270. Wang, Q. H., M. G. Han, S. Cheetham, M. Souza, J. A. , and L. J. Saif. 2005. Porcine noroviruses related to human noroviruses. Emerging infectious diseases 11:1874-1881. 271. Ward, C. D., M. A. Stokes, and J. B. Flanegan. 1988. Direct measurement of the poliovirus RNA polymerase error frequency in vitro. J Virol 62:558-562. 272. Ward, V. K., C. J. McCormick, I. N. Clarke, O. Salim, C. E. Wobus, L. B. Thackray, H. W. t. Virgin, and P. R. Lambden. 2007. Recovery of infectious murine norovirus using pol II-driven expression of full-length cDNA. Proceedings of the National Academy of Sciences of the United States of America 104:11050-11055. 273. Waters, A., S. Coughlan, and W. W. Hall. 2007. Characterisation of a novel recombination event in the norovirus polymerase gene. Virology 363:11-14.

183

References

274. Westhoff, T. H., M. Vergoulidou, C. Loddenkemper, S. Schwartz, J. Hofmann, T. Schneider, W. Zidek, and M. van der Giet. 2009. Chronic norovirus infection in renal transplant recipients. Nephrol Dial Transplant 24:1051-1053. 275. White, P. A., G. S. Hansman, A. Li, J. Dable, M. Isaacs, M. Ferson, C. J. McIver, and W. D. Rawlinson. 2002. Norwalk-like virus 95/96-US strain is a major cause of gastroenteritis outbreaks in Australia. J Med Virol 68:113-118. 276. WHO. 2008. The global burden of disease - 2004 update. 277. WHO. 2011. World health statistics. 278. Widdowson, M. A., E. H. Cramer, L. Hadley, J. S. Bresee, R. S. Beard, S. N. Bulens, M. Charles, W. Chege, E. Isakbaeva, J. G. Wright, E. Mintz, D. Forney, J. Massey, R. I. Glass, and S. S. Monroe. 2004. Outbreaks of acute gastroenteritis on cruise ships and on land: identification of a predominant circulating strain of norovirus--United States, 2002. J Infect Dis 190:27-36. 279. Widdowson, M. A., S. S. Monroe, and R. I. Glass. 2005. Are noroviruses emerging? Emerging infectious diseases 11:735-737. 280. Wingfield, T., C. I. Gallimore, J. Xerry, J. J. Gray, P. Klapper, M. Guiver, and T. J. Blanchard. 2010. Chronic norovirus infection in an HIV-positive patient with persistent diarrhoea: a novel cause. J Clin Virol 49:219-222. 281. Wobus, C. E., S. M. Karst, L. B. Thackray, K. O. Chang, S. V. Sosnovtsev, G. Belliot, A. Krug, J. M. Mackenzie, K. Y. Green, and H. W. Virgin. 2004. Replication of Norovirus in cell culture reveals a tropism for dendritic cells and macrophages. PLoS biology 2:e432. 282. Wolf, S., W. Williamson, J. Hewitt, S. Lin, M. Rivera-Aban, A. Ball, P. Scholes, M. Savill, and G. E. Greening. 2009. Molecular detection of norovirus in sheep and pigs in New Zealand farms. Vet Microbiol 133:184-189. 283. Wolf, Y. I., C. Viboud, E. C. Holmes, E. V. Koonin, and D. J. Lipman. 2006. Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus. Biology direct 1:34. 284. Worobey, M., and E. C. Holmes. 1999. Evolutionary aspects of recombination in RNA viruses. J Gen Virol 80 ( Pt 10):2535-2543. 285. Wright, C. F., M. J. Morelli, G. Thebaud, N. J. Knowles, P. Herzyk, D. J. Paton, D. T. Haydon, and D. P. King. 2011. Beyond the consensus: dissecting within-host viral population diversity of foot-and-mouth disease virus by using next- generation genome sequencing. J Virol 85:2266-2275.

184

References

286. Wu, F. T., T. Oka, K. Katayama, H. S. Wu, D. S. Donald Jiang, T. Miyamura, N. Takeda, and G. S. Hansman. 2006. Genetic diversity of noroviruses in Taiwan between November 2004 and March 2005. Arch Virol 151:1319-1327. 287. Xerry, J., C. I. Gallimore, M. Iturriza-Gomara, D. J. Allen, and J. J. Gray. 2008. Transmission events within outbreaks of gastroenteritis determined through analysis of nucleotide sequences of the P2 domain of genogroup II noroviruses. Journal of clinical microbiology 46:947-953. 288. Yen, C., M. E. Wikswo, B. A. Lopman, J. Vinje, U. D. Parashar, and A. J. Hall. 2011. Impact of an emergent norovirus variant in 2009 on norovirus outbreak activity in the United States. Clin Infect Dis 53:568-571. 289. Yunus, M. A., L. M. Chung, Y. Chaudhry, D. Bailey, and I. Goodfellow. 2010. Development of an optimized RNA-based murine norovirus reverse genetics system. Journal of virological methods 169:112-118. 290. Zagordi, O., A. Bhattacharya, N. Eriksson, and N. Beerenwinkel. 2011. ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics 12:119. 291. Zamyatkin, D. F., F. Parra, A. Machin, P. Grochulski, and K. K. Ng. 2009. Binding of 2'- amino-2'-deoxycytidine-5'-triphosphate to norovirus polymerase induces rearrangement of the active site. J Mol Biol 390:10-16. 292. Zarhorsky, J. 1929. Hyperemesis hiemis or the winter vomiting disease. Arch Pediatr 46:391-395. 293. Zdychova, J., and R. Komers. 2005. Emerging role of Akt kinase/protein kinase B signaling in pathophysiology of diabetes and its complications. Physiol Res 54:1-16. 294. Zheng, D. P., T. Ando, R. L. Fankhauser, R. S. Beard, R. I. Glass, and S. S. Monroe. 2006. Norovirus classification and proposed strain nomenclature. Virology 346:312- 323.

185