Elucidating the Role of gpW: an Essential Baseplate Protein in P2

by

Mostafa Fatehi Hassanabad

A thesis submitted in conformity with the requirements for the degree of Master’s of Science Department of Molecular Genetics University of Toronto

© Copyright by Mostafa Fatehi Hassanabad 2011

Elucidating the role of gpW: an Essential Baseplate Protein in Bacteriophage P2

Mostafa Fatehi Hassanabad

Master’s of Science

Department of Molecular Genetics University of Toronto

2011 Abstract

The long, contractile tails of myophages are the conduit for phage DNA transfer into the bacterial host cell and the most important part of the myophage tail is the baseplate; a complex structure, distal to the phage head. To better understand the structure and function of myophage baseplates, a component of the phage P2 baseplate, gpW was studied. This protein is widely conserved among myophages and is essential for the formation of infectious phage particles.

Bioinformatic work confirmed that gpW homologues are found in almost all myophages and in many . Moreover, gpW was shown to be a structural component of the virion; and, using electron microscopy, it was found to be at the top of the P2 baseplate. It was also found that some single residue substitutions can completely disrupt gpW function. Finally, evidence is presented that at least eight different proteins may be required to form intermediate P2 baseplate structures while other proteins may be necessary for the formation of stable baseplate complexes.

ii

Acknowledgments

First and foremost, I’d like to thank my supervisor, Alan Davidson, for giving me the opportunity to pursue graduate studies in Molecular Genetics. His support and supervision throughout the past two years have helped make my introduction to the life sciences exciting and scientifically rewarding. I thank my supervisory committee; Dr. Barbara Funnell and Dr. John Parkinson who helped guide my project and kept me on my toes at committee meetings. I am also thankful to Battista Calvieri and Steven Doyle for teaching me how to use the electron microscope.

I have certainly come a long way since I joined the lab in 2010 and for this progress I must thank everyone in the Edwards lab. Dr. Karen Maxwell deserves special thanks for her constant guidance, constructive comments and enthusiasm. I’d also like to thank Dr. Lisa Pell for helping me organize my original experiments and for her insightful suggestions. Kelly, Nichole and Diane are incredible teachers and they always supervised me the first time I tried an experiment. On the “rarest” occasion that an experiment didn’t work perfectly, they helped dissect what went wrong. Senjuti, Norrapat and Kris took turns in getting me to do things outside the lab; a restaurant, a game of bowling or trips to the gym. In addition to stimulating discussions about everything, from theatre to world politics and sports, my lab mates were always supportive and encouraging; they have all become very valuable friends. I also thank all of the members of the Davidson lab; specifically, Dave was a very helpful mentor and Tom was my phage P2 colleague who kindly lent me several of his cloning constructs.

I must also sincerely thank all of my other teachers. I am thankful for the support and invaluable friendship of my brothers. Most importantly, I am grateful to my first two teachers, my mother and father who taught me the reason for any pursuit.

iii

Table of Contents

Acknowledgments ...... iii

List of Tables ...... vi

List of Figures ...... vii

List of Appendices ...... ix

Chapter 1 Introduction ...... 1

1 Introduction ...... 1

1.1 Background ...... 1

1.2 The phage tail ...... 2

1.2.1 The myophage tail ...... 3

1.2.2 The baseplate is the command center of infection ...... 4

1.3 Phage P2 is a good model to study myophage baseplates ...... 5

1.4 GpW is homologous to gp25 in phage T4 ...... 7

1.5 Research goals ...... 9

Chapter 2 Materials and Methods ...... 10

2 Materials and Methods ...... 10

2.1 Media and Buffers ...... 10

2.2 Bacterial Strains ...... 11

2.3 PCR amplification ...... 11

2.4 Gene cloning, plasmids and mutagenesis ...... 12

2.4.1 Gene Cloning ...... 12

2.4.2 Plasmids ...... 12

2.4.3 Mutagenesis ...... 14

2.5 Preparation and transformation of competent cells ...... 14

2.6 Phage preparation and purification ...... 15

2.7 The in vivo complementation assay ...... 16 iv

2.8 Electron Microscopy ...... 17

2.8.1 Grid Preparation ...... 17

2.8.2 Sample Staining ...... 17

2.8.3 Image acquisition ...... 17

2.9 Particle alignment and averaging ...... 17

2.10 Protein expression and purification ...... 18

2.10.1 Native protein purification ...... 18

2.10.2 Determining protein concentration ...... 19

2.11 SDS-Page and Western blotting ...... 19

Chapter 3 Results ...... 21

3 Results ...... 21

3.1 GpW is a very widely conserved protein in Myophages ...... 21

3.2 GpW is a structural component of the P2 baseplate ...... 24

3.3 Different in gene W affect gpW function ...... 26

3.4 The formation of intermediate P2 baseplate structures requires at least eight proteins .... 32

Chapter 4 Discussion and Future Directions ...... 36

4 Discussion and future directions ...... 36

References ...... 41

Appendices ...... 46

v

List of Tables

Table 2.1 List of media and buffers used in this study

Table 2.2 Bacterial strains used in this study

Table 3.1 Classifying gpW homologues involved in T6SS

vi

List of Figures

Figure 1.1 The .

Figure 1.2. The contractile tail sheath.

Figure 1.3 R-type Pyocin

Figure 1.4 The dynamic baseplate structure.

Figure 1.5 The myophage baseplate.

Figure 1.6 Sequence alignment of gpW and gp25

Figure 1.7 The putative location of gpW in the P2 baseplate.

Figure 2.1 The gpW_MBP and gpW_trc constructs.

Figure 2.2 The V-G, D and UD constructs.

Figure 2.3. The XUD construct.

Figure 3.1. Distribution of gpW homologues in prophages.

Figure 3.2 gp46 of phage Mu is homologous to gpW.

Figure 3.3. Electron microscope image of the Wam lysate.

Figure 3.4. GpW is a structural component of P2.

Figure 3.5. EM images were used to determine the position of gpW.

Figure 3.6 The X-ray crystallography structure of a gpW homologue in G. sulfurreducens.

Figure 3.7 Selecting residues for substitutions.

Figure 3.8. In vivo complementation assays with gpW expressed from the gpW_trc construct

Figure 3.9. Transmission Electron Microscope images of mutant phage lysates.

vii

Figure 3.10. Varying levels of gpW expression.

Figure 3.11. In vivo complementation assays with gpW expressed from the gpW_ENDO construct

Figure 3.12 GpX is required for P2 tail formation.

Figure 3.13. EM images were used to determine the position of gpX.

Figure 3.14. Purifying intermediate baseplate complexes.

Figure 3.15. EM image of putative baseplate complexes.

Figure 4.1 Our model of the P2 baseplate.

viii

List of Appendices

Appendix A1. The P2 baseplate proteins

Appendix A2. The P2 Genome

Appendix A3. List of Primers used in this study

ix 1

Chapter 1 Introduction 1 Introduction 1.1 Background

Bacteriophages (phages) are a diverse group of that infect eubacteria and archea. Phages are ubiquitous in the biosphere; i.e., anywhere there are bacteria or archea, there are phages. They are especially abundant in the oceans, with approximately 1010 particles per liter in surface waters (Bergh et al., 1989; Wommack and Colwell, 2000). Hence, phages are the most abundant biological entity on Earth with an estimated global population on the order of 1031 (Hendrix, 2003). In addition, most cultivable bacteria harbor at least one , a phage genome that has been incorporated into the bacterial genetic material (Casjens, 2003; Ackermann, 2007).

Because of their prevalence and abundance, phage play a significant role in regulating microbial populations and in the cycling of nutrients within the environment (Calendar, 2006). Moreover, phage contribute to bacterial diversity and evolution in several ways. Phage mediate genetic exchange among bacterial strains, and even species; hence, increasing diversity at the genetic level (Calendar, 2006). Also, by killing dominant bacterial strains or species, phage allow bacterial strains or species to survive that would otherwise have been out-competed by more dominant bacteria (Thingstad, 2000). Concurrently, the parasitic nature of phages drives antagonistic coevolution between them and their bacterial hosts.

To date, more than 5500 different have been examined under the electron microscope (Ackermann, 2007). A great majority of these, 96%, belong to the order Caudovirales (tailed ) while the rest are more unusual phages such as the polyhedral, pleiomorphic and filamentous phages. The Caudovirales are an extremely diverse order with tail lengths ranging from 10-800nm and genome sizes ranging from 17 to 500 kb. However, they all have double stranded DNA (dsDNA), icosahedral capsids (85% have isometric capsids) and tails that consist of stacked disks or helical strands (Calendar, 2006).

As shown in Figure 1.1, tailed bacteriophages can be classified into three families based upon their tail structure. The myophages have a long tail surrounded by a contractile sheath; the

2

siphophages have a long, non-contractile, tail and podophages have short tails (Ackermann, 2007) (Figure 1.1). In general, myophages have larger genome sizes than the other two families and the myophage tail is usually more complex, consisting of more proteins than siphophage or podophage tails.

Capsid

Connector complex

Tail

Tail Fibers

Baseplate

Figure 1.2 The Caudovirales. From left to right, Schematic of a long-tailed phage, electron microscope (EM) images of a myophage, P2, a siphophage, λ and a podophage, T7 are shown. Image of phage λ is adapted from (www.biochem.wisc.edu/faculty/inman/empics/virus.htm)

1.2 The phage tail

The bacterial cell wall is a multilayered structure that provides structural integrity and protects the bacteria against osmotic pressure. This structure is also a barrier across which phage have to translocate their genetic material. Moreover, prior to the injection of genetic material, phage have to recognize and bind to their host cells. In the Caudovirales, both of these challenges are overcome by the action of phage tail proteins. The tail spike protein and the tail fibers bind to molecules on the surface of the bacterial cell wall during phage adsorption. Subsequently, tail proteins form a channel in the bacterial cell wall for DNA entry (Fuecht et al, 1990; Calendar, 2006). For example, the tail spike protein of phage P2 (a myophage which is the focus of our study) most likely binds to lipopolysaccharide (LPS); a molecule on the outer membrane of E.coli (and many other gram-negative bacteria) (Kagayama et al., 2009). Next, the P2 tail sheath and tail tube proteins must act in concert to create a channel that traverses the outer membrane, the periplasm and the cytoplasmic membrane of E.coli.

3

Due to its role in host recognition and DNA injection, the tail may be considered as the phage infection machinery. As mentioned above, the myophage tail is structurally more complex than other phage tails and understanding the function of a myophage tail (phage P2) is the ultimate aim of this study.

1.2.1 The myophage tail

The myophage tail is composed of the tail tube and a surrounding contractile sheath. Hundreds of copies of the Tail Tube Protein, TTP are stacked as hexameric rings to form the tail tube. The sheath consists of hundreds of copies of the Tail Sheath Protein, TSP, stacked as hexameric rings or as helical strands: 31-33 hexameric rings in phage P2 (Lengyel et. al, 1974) and 6 strands (each with 23 copies of the TSP) in myophage T4 (Aksyuk et al., 2009). The tail is connected to the capsid through a connecter complex and, distal to the capsid, there is a complex of proteins referred to as the baseplate. As shown in Figure 1.2, the sheath almost always contracts toward the capsid, thus exposing the lower end of the inner tail tube (Calendar, 2006).

Figure 1.2. The contractile tail sheath. EM image of P2 phages, one with a contracted tail sheath and the other with a relaxed tail sheath (the bar represents 20nm).

Gene clusters that encode myophage tail-like structures have been incorporated by bacteria. These structures may be used directly as bacteria killing agents; for example, the R-type Pyocins (Figure 1.3) secreted by many Pseudemonas species selectively kill other bacterial strains. The encoding Pyocins are related to phage genomes because significant sequence similarity exists between phage tail proteins and Pyocin proteins (Nakayama et al., 2000). Another myophage tail-like structure employed by bacteria is the type VI secretion system, T6SS, which

4 is used to secrete proteins. T6SS and phage tails are clearly connected evolutionarily; several studies have found sequence and structural similarities between T6SS proteins and proteins from siphophage (Pell et. al, 2009) and myophage (Leiman et al., 2009) tails. Many gram negative bacteria use T6SS to secrete effector proteins and toxins into their environment, into surrounding bacteria or even into eukaryotic cells. Hence, the T6SS is a major virulence determinant in many pathogenic strains of bacteria (Filloux et al., 2008; Burtnik et al., 2011).

Figure 1.3 R-type Pyocin. This myophage tail-like bacteria killing agent is secreted by P.aeruginosa PAO1 (EM image from Senjuti Saha).

1.2.2 The baseplate is the command center of infection

During myophage assembly, tail polymerization starts at a complex structure referred to as the baseplate; and in fully formed phage, the baseplate is located on the end of the tail (Calendar, 2006; Ackermann, 1999). The baseplate which includes a protruding spike protein plays an important role in myophages. Tail fiber proteins bind reversibly to cell surface receptors and the protruding spike binds irreversibly to cell-surface receptors in a two-step adsorption process. Upon adsorption to the host cell, conformational changes in the baseplate trigger contraction of the tail sheath and ultimately lead to the ejection of the phage DNA through the tail tube and into the bacterial cytoplasm (Figure 1.4). Hence, the baseplate may be considered the command center of myophage infection (Kostyuchenko et al., 2003). To fully understand myophage infection, one needs to elucidate the structure and function of baseplates.

5

A. B.

Figure 1.4. The dynamic baseplate structure. The conformation of the T4 baseplate changes upon interaction with the host cell, leading to the contraction of tail sheath protein and the ejection of phage DNA. The cryoEM reconstructions of the baseplate structure prior to (A), and after (B), sheath contraction have been included (modified from Leiman et al., 2010).

1.3 Phage P2 is a good model to study myophage baseplates

Most studies of the myophage baseplate have focused on the uncharacteristically complex phage T4 baseplate. This baseplate is constructed in a stepwise fashion where six wedge-like structures surround a central hub complex to form an intermediate structure. T4 baseplate wedges and the central hub are created separately and their assembly requires 7 and 5 proteins respectively. The addition of tail fibers and two other proteins (gp48 and gp54 which are at the very top of the baseplate) complete the formation of the T4 baseplate (Yap et al., 2010). The baseplate wedges and central hub structures exist in other myophage baseplates but their formation usually requires fewer proteins.

While much of our current knowledge of myophage baseplate structure and function stems from studies of the uncharacteristically complex T4 baseplate, some interactions between T4 baseplate proteins are uncharacterized. Hence, studying their homologues in a less complex system may help in understanding the interactions. An ideal candidate for such research is the baseplate of phage P2 which is less complex than the T4 baseplate and more representative of myophage baseplate complexity.

6

P2 is a temperate myophage with a 33.6 kb genome packaged in a 60 nm icosahedral head, which is attached to a 135 nm long tail (Calendar, 2006). Bacteriophages and prophages that are similar to P2 with respect to genome organization and nucleotide sequence are called P2-like. P2-like prophages are commonly found in E. coli. In fact, at least 26% of the strains in the E. coli reference collection (a collection of 72 E.coli strains chosen from 2600 isolates from around the world; Ochman et al., 1984) contain a P2-like prophage (Calendar, 2006). In addition, P2-like phages and prophages are distributed among other proteobacteria of the gamma subgroup such as phages HP1 and HP2 of Haemophilus influenzae (Esposito et al., 1996; Williams et al., 2002), ΦCTX of Pseudomonas aeruginosa (Nakayama, 1999), K139 of Vibrio cholerae (Nesper et al., 1999), PSP3 of Salmonella potsdam (Bullas et al., 1991) and SopEΦ of Salmonella typhimurium (Mirold et al., 2001).

Of the 44 P2 genes (Appendix A2), 15 are involved in tail formation and of these, 9 may be required for baseplate assembly. By comparison, the formation of the T4 baseplate alone requires 17 proteins (Yap et al., 2010). The functions of four of the P2 baseplate proteins (gpD: a hub protein; gpV:tail spike; gpH: tail fiber; gpG: chaperone required for fiber formation) are known (Haggard-Ljungquist et al., 1995; Temple et al. 1991); and, due to its relative simplicity, the P2 baseplate should be an easier system to fully understand. Illustrated below, are cartoon diagrams of our current (incomplete) model of the P2 and (better studied) T4 baseplates alongside the cryo-EM reconstruction of the dome-shaped T4 baseplate.

7

P2 T4 A.

B.

gp6 gp25 T4

P2 Baseplate proteins gpWgpJ Other tail proteins

Figure 1.5 The myophage baseplate. A cartoon representation for our model of the P2 baseplate region along with the more complex T4 baseplate region. Also included is the cryoEM reconstruction of the T4 baseplate (Kostyuchenko et al., 2003) B. Stretches of the T4 and P2 genomes with baseplate proteins coloured in red. The lines connect gpJ and gpW of phage P2 with their homologues in phage T4, gp6 and gp25, respectively.

1.4 GpW is homologous to gp25 in phage T4

Gp25 (132 a.a., 15.1 kDa) of bacteriophage T4 is an essential protein of unknown function and is one of the most widely conserved proteins in myophages. It is the last of seven proteins to be incorporated into baseplate wedges and is required for the formation of stable wedges (Yap et al., 2010; Leiman et al., 2010). Gp25 specifically interacts with gp6 and gp53 to form a structure at the top of the dome-shaped T4 baseplate (Fig. 1.7); however, in the cryoEM reconstruction of the T4 baseplate, the precise position of gp25 is not fully determined. Moreover, new evidence suggests that gp25 may interact with other tail proteins, such as, the TSP.

8

The gp25 homologue in phage P2 is gpW (pairwise sequence identity of 19.33%, similarity of 33%). A sequence alignment of these two proteins created using the MAFFT algorithm (Katoh et al, 2002) in Jalview (Waterhouse et al., 2009) is presented in Figure 1.6. Like gp25, gpW is a small (115 a.a., 12.6 kDa) protein that is required for the formation of infectious particles.

Figure 1.6. Sequence alignment of gpW and gp25. This alignment was created using the MAFFT algorithm (Katoh et al., 2002) in the Jalview (Waterhouse et al., 2009) program.

As mentioned above, gp25 is at the top of the T4 baseplate and it interacts with gp6 (Yap et al., 2010) which is homologous to gpJ from P2. In an earlier study, Haggard-Ljungquist et al., (1995) found that gpJ is a peripheral protein in the P2 baseplate. Based upon this information, gpW is thought to be a protein at the top of the P2 baseplate (Fig. 1.7).

gp25

Figure 1.7 The putative location of gpW in the P2 baseplate. GpW is thought to be at the top of the baseplate and may interact with other P2 tail proteins such as gpJ.

9

1.5 Research goals

The overall objective of this study was to gain insight into the mechanism by which the baseplate of bacteriophage P2 carries out its crucial role in the initial stages of the phage infection cycle. More specifically, the aim was to better understand the role of gpW, a baseplate protein of unknown function which is required for P2 infectivity. To accomplish this goal, the following questions were addressed in this study: How widespread are gpW homologues? Is gpW a structural component of the virion; and, if so, where is gpW? How do mutations in gene W affect gpW function and phage infectivity? And finally, which P2 proteins are required for P2 baseplate assembly? As gpW is one of the most widely conserved proteins in myophages, information gathered about its interactions within the baseplate will further our understanding of the myophage baseplate.

10

Chapter 2 Materials and Methods 2 Materials and Methods 2.1 Media and Buffers

Table 2.1. List of media and buffers used in this study

LB-Lennox 5g yeast extract, 10 g tryptone, 5 g NaCl; H2O added for a final volume of 1 L

Super LB 1 g glucose, 0.190 g MgCl2, 0.110 g CaCl2; LB-Lennox added for a final volume of 1 L

Top agar 7 g agar; LB-Lennox added for a final volume of 1 L

SOC media 5 g yeast extract, 10 g tryptone, 0.5 g NaCl, 0.186 g KCl, 0.952 g MgCl2, 1.20

g MgSO4, 3.603 g glucose; H2O added for a final volume of 1 L

Storage Media 0.05 M Tris, 0.1 M NaCl, 8 mM MgSO4.7H2O; adjusted to pH 7.5 with Tris (SM) HCl

Binding buffer 0.05 M Tris, 0.5 M NaCl, 5 mM imidazole; adjusted to pH 7.5 with Tris HCl

Wash buffer: 0.05 M Tris, 0.5 M NaCl, 0.03 M imidazole; adjusted to pH 7.5 with Tris HCl

Elution buffer: 0.05 M Tris, 0.5 M NaCl, 0.25 M imidazole; adjusted to pH 7.5 with Tris HCl

Dialysis buffer 0.05 M Tris, 0.4 M NaCl, 2 mM dithiothreitol; adjusted to pH 8.0 with Tris HCl

Transfer 0.05 M Tris, 0.04 M glycine, 1.4 mM SDS, 20% v/v methanol; adjusted to pH buffer 7.5 with Tris HCl

Tris buffered 0.015 M Tris, 0.15 M NaCl, 0.1% v/v tween-20; adjusted to pH 7.5 with Tris saline tween HCl (TBST)

11

2.2 Bacterial Strains

The K-12 strain of E.coli has a small remnant of a P2-like prophage in the preferred P2 integration site (Nilsson et al., 2004). Moreover, Lindahl et al. reported that P2 grows better in E.coli C than in the K12 strain (Lindahl et al., 1971) and this allows higher titres of phage to be propagated in E.coli C. As such, P2 was always grown in E.coli strain C in this study. The table below includes information on the various strains used in this study.

Table 2.2. List of E.coli strains used in this study. Relevant characteristics are shown along with the original reference for each strain

Designation Relevant Characteristic Original Reference

C1a Prototrophic, non-suppressor Sasaki and Bertani (1965)

C1792 SuIII+ (Tyr inserted) Amber suppressor; Sunshine et. al (1971)

BL21 (DE3)T1R E. coli B strain with DE3, a λ prophage Studier and Moffatt (1986) carrying the T7 RNA polymerase gene and lacIq. In addition, the FhuA (tonA) genotype confers resistance to the lytic bacteriophages T1 and T5.

DH5α Slow growing, transforms with high Hanahan (1985) efficiency

2.3 PCR amplification

All primers (Appendix A3) used in this study were purchased from Eurofins MWG and were either salt-free or HPLC purified (primers used for sequencing). When amplifying genes W and X, a purified sample of phage P2 was used as the source of the template DNA and the Pfu DNA polymerase (Fermentas) was used. Colony screens were usually carried out with one gene- specific and one plasmid-specific primer (i.e., T7 promoter). To amplify larger pieces of DNA,

12 the Phusion High Fidelity DNA polymerase (New England BioLabs) was used. The amplified DNA fragments were subjected to 1% agrose gel electrophoresis and visualized using UV light.

2.4 Gene cloning, plasmids and mutagenesis

2.4.1 Gene Cloning

I used the Clontech In-Fusion® PCR Cloning System to fuse the ends of the PCR amplified fragment to the homologous ends of a linearized vector. The 3' and 5' regions of homology were generated by adding extensions (at least 15 bps) to both PCR primers (forward and reverse) that precisely match the ends of the linearized vector. To improve the efficiency of this cloning reaction, the insert DNA fragments were PCR purified (Qiagen Purification Kit) and the linearized plasmids were gel purified (Qiagen Gel Extraction Kit). Linearized vector and insert DNA were mixed at a molar ratio of at least 1:2 and incubated with the In-Fusion® enzyme, which promotes single-strand annealing reactions, for 15 minutes at 50oC and at 37oC for another 15 minutes. The mixture was then used to transform competent cells (see Sec. 2.5).

2.4.2 Plasmids

I used several plasmids extensively in this study for cloning gene W and expressing tagged gpW. As shown in Figure 2.1, gene W was cloned downstream of the malE sequence in the pMal.c4x (NEB) plasmid using the EcoRI and XbaI restriction sites. E.coli transformed with this construct expresses Maltose Binding Protein (MBP) fused to the N-terminus of gpW (gpW_MBP). Gene W was also cloned into pAD100 (Davidson and Sauer 1994) using the NcoI and XbaI sites. The construct which uses a trc hybrid promoter (-35 region of the trp and -10 region of the lac promoters) and is under the regulation of lacI and lacIq, produces gpW with a C-terminal FLAG epitope and 6xHis tag (gpW_trc; Fig. 2.1).

For the co-expression of various P2 tail proteins, the PCDF (Novagen; Streptomycin resistance conferred) and pET-21d (Novagen; Ampicillin resistance conferred) plasmids were used. As shown in Figure 2.2, genes V, W, J, I, H and G were cloned into PCDF and this construct (V-G) produces an N-terminally 6xHis-tagged gpV (made by Tom Chang). Genes U and D were cloned (UD; Fig 2.2) into pET-21d such that a C-terminally 6xHis-tagged gpD is expressed (made by Tom

13

Chang). I cloned Gene X into the UD construct immediately upstream of gene U. To accomplish this, linear vector was generated by PCR amplifying the entire UD construct (pET-21D plasmid containing genes U and D); gene X was amplified with flanking regions homologous to the ends of the linear plasmid (Fig. 2.3).

A)

B)

Figure 2.1 The gpW_MBP and gpW_trc constructs. Gene W was cloned into the (A.) pAD 100 and (B.) pMal.c4x vectors to allow the expression of FLAG and MBP tagged gpW respectively.

V W J I H G

U D Figure 2.2 The V-G, D and UD constructs. Genes V, W, J, I, H and G were cloned into the PCDF (Novagen) plasmid such that gpV was expressed fused to an N-terminal hexahistidine (6xHis). The red stars in the figure represent the 6xHis tag. The D and UD constructs were made

14 by cloning gene D and genes U and D into the pET-21D expression vector (C-terminal tagged gpD) (constructs made by Tom Chang).

Figure 2.3. The XUD construct. Linearized UD was produced by PCR amplifying the entire plasmid which contained genes U and D (UD). Gene X was amplified with flanking regions that were identical to plasmid ends and cloning was completed using the In-Fusion® PCR Cloning System.

2.4.3 Mutagenesis

Gene W mutations were made in the gpW_trc construct by site-directed mutagenesis. In this technique, I used a pair of complementary mutagenic primers to amplify the complete plasmid and generate a nicked, circular DNA which contained mutated geneW. As the product DNA was the same size as the plasmid, the template DNA had to be eliminated by digestion using the DpnI restriction enzyme (1hr at 37oC). This enzyme digests methylated DNA and thus, only the biosynthesized template DNA was cut. Prior to transforming into competent cells, the in vitro generated (mutated) plasmid was gel purified (Qiagen Gel Extraction Kit).

2.5 Preparation and transformation of competent cells

In this study, chemically competent E.coli, cells that can uptake extracellular DNA, were prepared by treating the bacteria with calcium chloride. In this method, I used an overnight culture to inoculate LB-Lennox at a 1:100 dilution which was subsequently incubated at 37°C until an OD600 of 0.4-0.6 (optical density of culture measured at a wavelength of 600 nm; also

15

known as A600 or absorbance at 600 nm). The cells were transferred to a conical tube and chilled on ice for half an hour before being pelleted at 2560g for 10 minutes at 4°C. The pellet was then re-suspended in 0.1 M calcium chloride (1/25th the original volume) and placed on ice for 10 minutes. The cells were again pelleted as above and the pellet re-suspended in 0.1 M calcium chloride (1/125th the original volume of cell culture). Glycerol was added to the cells for a final concentration of 40% v/v (glycerol/total volume). The cells were aliquoted and flash frozen in an ethanol-dry ice bath before being stored at -80°C.

Competent cells were thawed on ice (5-10 minutes) and 1.5 μL of plasmid DNA was added to 50 μL of cells which were then incubated on ice (15-30 minutes). Next, cells were heat shocked at 42°C for 45 seconds and incubated on ice for 5 minutes. Prior to plating the transformed cells, I added 450 μL of LB (or SOC) and allowed the cells to recover at 37°C for 1hr (the recovery step was not performed when the plasmid conferred ampicillin resistance to the transformed cells).

2.6 Phage preparation and purification

To prepare phage P2, I grew the C1a strain (37oC, 200 rpm shaking incubator) in super LB to an 8 OD600 of 0.1 (corresponding to ~2x10 cfu/ml) before being challenged with P2 at a multiplicity of infection (M.O.I) of 1/5000 (phage/cells). The low MOI allowed at least two rounds of infection and lysis and yielded very high titres ( >1012pfu/ml). After the addition of phage, the culture continued to grow (37oC, 100 rpm) and OD measurements were made every 20 minutes. When the OD started to decrease (usually 90-120 min after phage addition), I added EGTA (16 ml of filter-sterilized 8% w/v solution added per liter of culture). EGTA is a Ca2+ chelator which disrupts the binding of newly formed phage to cellular debris.

The process for preparing Wam and Xam phage mutants was similar to the one described above. To determine the effects of the amber mutations on phage formation, I grew a culture of C1a in 9 super LB to an OD600 of 0.5 (~10 cfu/ml) before adding mutant phage at an MOI of 1/100 (phage/cells). Finally, to produce phage that had incorporated MBP or FLAG tagged gpW or gpX, cells expressing the tagged protein were challenged with Wam or Xam mutant phage respectively.

16

Subsequent to phage production, I concentrated and purified the particles. A simple procedure that served to both purify and concentrate the phage particles was PEG (polyethylene glycol) precipitation. In this procedure, the phage lysate was added to a mixture of PEG 8000 and NaCl (80 g/L and 25 g/L respectively) and kept at 4oC overnight. The mechanism through which PEG induces phage precipitation is not fully known; but precipitation is partially due to the fact that PEG molecules attract water molecules away from the phage. This, in turn, decreases the solvent volume available to particles (Vajda, 1978; Atha et al., 1981) and promotes precipitation. The solution was centrifuged (4500g, 30 min at 4oC) and the resultant pellet was resuspended in SM. Chloroform was added and, upon centrifugation (4500g, 30 min at 4oC), phage were separated from PEG and resuspended in SM.

To further purify phage, I used equilibrium centrifugation (CsCl purification). In this technique, the phage lysate was added to a concentrated solution of CsCl (final concentration: 0.64 g/ml) and the mixture was subjected to extended centrifugation (Beckman 75 Ti: 50000 rpm: 223320g, 24hr). During the centrifugation, cesium ions are influenced by the centrifugal force and the tendency to diffuse; eventually, a stable gradient of cesium ions is formed in the tube. Particles with equivalent densities form narrow bands which can then be extracted by puncturing the tube (Karp, 2009).

Cesium chloride interferes with phage binding and must be removed from the phage after CsCl purification. To this end, the collected phage bands were always dialyzed (using 6-8 kDa dialysis membrane) against SM. Phage preparations kept in SM at 4oC were found to have very stable titres over long periods (less than ten-fold drop in titre over one year).

2.7 The in vivo complementation assay

To assess the function of various modified versions (amino acid substitutions) of gpW and of tagged gpW, I performed the in vivo complementation assay. This assay determined the ability of gpW and modified versions of gpW to complement the activity of Wam phage mutants. Complementation can be assessed by adding Wam phage to liquid cultures or by spotting serial dilutions of Wam phage onto a lawn of gpW-expressing E.coli. The lawns were made by adding

17

150 μL of E.coli carrying gpW plasmids cells to 3 mL of top agar, and pouring the mixture onto agar plates. Plates were then incubated overnight, usually at 37°C.

2.8 Electron Microscopy

2.8.1 Grid Preparation

I used high electrical current to vaporize carbon which then deposited as a very thin layer (6-20 nm) on a smooth surface of mica. After a week, the coated mica was submerged in water; thus the carbon film was separated from the mica and allowed to cover copper electron microscopy grids (Aurion, Electron Microscopy Sciences). The carbon coated girds were then dried in a 60oC oven for 2 hours; after which, they were stored for use for several weeks. Prior to loading the sample, grids were glow discharged in a reduced atmosphere of air to increase the affinity of the carbon coat for charged particles (Aebi et al., 1987).

2.8.2 Sample Staining

After a grid was glow discharged, I placed 5μL of sample on the carbon-coated side of the grid for 2 minutes before being dried off. The grid was then washed three times by being submerged in water drops. Samples were stained by submerging the entire grid in a 15 μl drop of 2% (w/v) uranyl acetate for several seconds. Excess stain was blotted off with filter paper and the grid was air-dried for 2 minutes.

2.8.3 Image acquisition

Except for particle averaging (see below), I acquired all EM images at the Microscopy Image Laboratories of the Faculty of Medicine, University of Toronto, using an H-7000 Hitachi Transmission Electron Microscope (TEM). To obtain the best images, aperture alignment of this machine was performed under the supervision of the facility technicians. All images were acquired using a 100 kV electron beam and recorded with a 6.0 megapixel CCD camera

2.9 Particle alignment and averaging

The alignment and averaging processes increase the contrast between the phage tail (which ought to have invariant features in all images) and its surroundings (which vary in each image)

18

(Cardarelli, 2010). To obtain the best results, phage samples used for particle averaging were CsCl purified and then dialyzed against SM to remove the salt. Furthermore, image acquisition was performed by Nawaz Pirani at 50000x magnification using a Tecnai F20 electron microscope operating at 200 kV (FEI Company). Images were recorded on photographic films and subsequently digitized by scanning the micrographs.

The program SPIDER (Frank & Radermacher, 1996) was used to align the particles. The particles were translationally and rotationally aligned using an iterative approach whereby the average image of each iteration was used as the reference for the next round of particle alignments (the initial reference was the average of unaligned particle images).

2.10 Protein expression and purification

I expressed wild-type and modified versions (with amino acid substitutions) of gpW using E.coli C1a cells transformed by the gpW_trc construct. The V-G, D, UD and XUD plasmids contained these genes under the control of the T7 promoter. These plasmids were transformed into BL21 (DE3) cells, which express T7 RNA polymerase.

To assess the expression of proteins, I grew a 5 mL bacterial culture until the OD600 reached 0.6- 0.8; after which, protein expression was induced by the addition of isopropyl-beta-D- thiogalactopyranoside (IPTG) to a final concentration of 0.190 g/L (0.8 mM). IPTG induction was followed by incubation at 37oC for 3hours. Next, cells were pelleted (15000 g for 2 minutes) and the pellet was resuspended in SDS loading dye. This suspension was boiled at 100oC for 5 minutes before being subjected to SDS-PAGE (see Sec. 2.11).

For protein purification, I inoculated 1L of LB-Lennox with 10 ml of overnight culture and then o incubated at 37 C until the culture reached an OD600 of 0.8 and 0.8mM IPTG was added. IPTG induction was followed by incubating the culture at 18oC overnight. Cells were harvested by centrifugation (15 min at 12000g) and the pellet was frozen in preparation for purification.

2.10.1 Native protein purification

When trying to purify protein complexes, I used the native protein purification method. The frozen pellet was resuspended in 37.5ml of binding buffer and cells were lysed by sonication.

19

Cellular debris was removed by centrifugation (35000g for 30 minutes at 4oC) and Ni2+-NTA was added to the supernatant. After rocking for 20 minutes, the supernatant mixture was poured onto the column and the column was washed 10 times with 15ml of wash buffer. The protein was eluted with 5ml of elution buffer and subsequently dialyzed against dialysis buffer to remove the imidazole (imidazole interferes with concentration measurements). Purified protein was stored at 4oC.

2.10.2 Determining protein concentration

In addition to qualitatively visualizing protein expression levels using SDS-PAGE (see Sec. 2.11), I determined the concentration of purified protein using absorbance at 280 nm. The predicted molar extinction coefficient was determined using the ExPasy ProtParam program available at: http://expasy.org/tools/protparam.html. Using the absorbance value and the molar extinction coefficient, the protein concentration was calculated according to the following equation:

2.11 SDS-Page and Western blotting

I used sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) to separate proteins according to their electrophoretic mobility (a function of protein molecular weight). To directly visualize protein migration, gels (15% polyacrylamide) were stained with Coomassie brilliant blue R-250. However, to more sensitively detect proteins, I performed Western blots. In this method, the gel and a nitrocellulose membrane were soaked in transfer buffer for 15 minutes after SDS-PAGE. Two pieces of extra-thick membrane were also soaked in transfer buffer and used to sandwich the gel and nitrocellulose membrane during the transfer. Transfer was carried out at 10 V in a Bio-Rad TransBlot semi-dry electrophoretic transfer cell for ~40 minutes. The nitrocellulose membrane was blocked with 5% (w/v) skim milk (Bio-Shop) in TBST for one hour at room temperature (RT). Subsequently, the membrane was incubated with primary antibody (diluted in %5 skimmed milk) overnight at 4°C.

After primary antibody incubation, I washed the membrane three washes in TBST (10 minutes each). Next, secondary antibody, goat-anti-mouse IgG-HRP (Santa Cruz Biotechnology; diluted

20

1:10,000 in 5% milk TBST) was applied to the membrane which was kept at RT for 1hr. Three more TBST washes were performed before the membrane was prepared for detection using the West-one™ detection system or the ECL+ detection kit (GE Health Sciences).

21

Chapter 3 Results 3 Results 3.1 GpW is a very widely conserved protein in Myophages

To determine how widely conserved gpW is across phages and prophages, I performed comprehensive psi-BLAST searches on the NCBI non-redundant (NR) protein database (with an E-value cutoff of 0.005). Over a thousand proteins from a variety of bacterial genomes, prophages and phages were found to have significant sequence similarity with gpW. As shown in Figure 3.1, most of these proteins were found in bacterial genomes and prophages from proteobacteria (primarily the γ-proteobacteria) but many homologous proteins were also found in other Gram negative and Gram positive bacteria.

Bacteria (1314) Proteobacteria (1059) γ-proteobacteria (671) 4% 3% 3% β-proteobacteria (198) Proteobacteria α-proteobacteria (100) 8% δ-proteobacteria (67) Firmicutes ε-proteobacteria (15)

Firmicutes (101) Actinobacteria Actinobacteria (45) Bacteroidetes (45) 82% CFB group Other (45)

Other

Figure 3.1. Distribution of gpW homologues in prophages. Over a thousand gpW homologues were identified in prophages from different bacteria. The great majority of these homologues were found in the proteobacteria. The inset table indicates the number of gpW homologues found in various groups of bacteria.

Moreover, at least 80% of sequenced myophage genomes have a gene W homologue. Even a group of Pseudomonas phages, which have massive genomes (200-300kb) and are morphologically different from P2, have proteins that seem to share significant sequence similarity with gpW. Somewhat surprisingly, enterobacteria phage Mu, which is morphologically

22

almost identical to phage P2 (and also has a similar genome size; Mu: 36.7kb, P2: 33.6kb), does not seem to have any proteins with significant sequence similarity to gpW. However, based upon the presence of surrounding baseplate protein-encoding genes and protein size, Mup46 (145a.a; 16.3kDa) may be inferred to be homologous to gpW (Fig. 3.2).

The homology between gpW and Mup46 was confirmed using the protein structure and function prediction program HHpred (Soding, 2005). Unlike BLAST which uses sequence-sequence comparisons and searches sequence databases like NR, HHpred is based upon the pair-wise comparison of profile Hidden Markov Models (HMMs) and it searches alignment databases. As such, HHpred can be used to detect remote protein homologues which may not share significant sequence similarity. Using HHpred, I also found that the N-terminal domain of gp104 (231 a.a.) of phage A9 is a lysM domain (gpX of P2 also includes a lysM domain) while the C-terminal domain is homologous to gpW. This interesting finding, along with the close proximity of the two proteins in the virion (Sec 3.4), suggests that gpW and gpX interact with each other.

A) gpV gpJ gpH gpG

P2

28% 24% 93%

Mu

Mup45 Mup47 Mup49 Mup50 B) gp46 of Mu 5’ 3’

3’ gpW of P2 5’

Figure 3.2 gp46 of phage Mu is homologous to gpW A) Comparison of P2 and Mu phage genomes. The genes encoding most of the P2 baseplate proteins are included with a region of the Mu genome that contains homologous proteins (Red colored genes encode annotated tail proteins). Also included are some sequence identities between P2 and Mu proteins. B) Predicted secondary structures of gpW and gp46. The green arrows represent β sheets and the red bars are α helices. This prediction was made using the JNet Secondary Structure Prediction utility in Jalview and is based upon protein sequence (Green arrows: β sheets and red blocks: α-helicies)

23

In addition to homologues in phages and prophages, gpW was found to have a homologue in structures such as R-type pyocins (tail-like bacterial killing agents). For example, a baseplate protein in the R-type pyocin secreted by P. aeruginosa (Nakayama et al., 2000) was found to have very high sequence similarity with gpW (40% identity). Another example of a gpW homologue encoded by a bacterial gene can be found in the type six secretion systems (T6SS). As mentioned in the Introduction, earlier studies had reported structural similarities between phage tail proteins and constituent proteins of type six secretion systems. Specifically, Leiman et.al (2009), found structural similarity between gp25 of phage T4 and a protein in the T6SS; and, in fact, almost 100 gpW homologues from various gram-negative bacteria are annotated as T6SS proteins in the NCBI protein database.

Some gpW homologues that are T6SS proteins may not be classified as such in the NCBI protein database. To confidently classify a bacterial protein as being expressed from a prophage or from the bacterial genome itself (as in the T6SS proteins or pyocin proteins) one can consider the proteins encoded by nearby genes. Prophages have clusters of genes required for phage formation. Proteins specific to phages, such as the major capsid protein, can be used as markers for prophages; alternatively, T6SS-specific proteins, such as Vgr family proteins (which contains domains that are structurally similar to the T4 tail spike complex and gpV in P2) or ClpV1 (members of this protein family are homologous to ClpB, an ATPase associated with chaperone- related functions and are a key component of the T6SS) family proteins, may be used as markers for T6SS. In this way, I annotated almost thirty unclassified proteins which were surrounded by T6SS-speciifc proteins (Table 3.1).

Table 3.1. Classifying T6SS proteins. Several gpW homologues which were previously unclassified are surrounded by T6SS-specific proteins and thus, are likely to be T6SS proteins.

Accession Code Protein Classification Organism YP_001185595 T6SS-associated; ImpA domain-containing Pseudomonas Mendonica YP_001185598 GPW/gp25 family protein Pseudomonas Mendonica T6SS component; VasA [Intracellular trafficking, YP_001185601 secretion, and vesicular transport] Pseudomonas Mendonica YP_001185603 T6SS ATPase, ClpV1 family Pseudomonas Mendonica

YP_001267440 Rhs element VgrG protein Pseudomonas putida F1 YP_001267448 GPW/gp25 family protein Pseudomonas putida F1

24

YP_001267450 T6SS component protein Pseudomonas putida F1 YP_001267465 T6SS ATPase, ClpV1 family Pseudomonas putida F1

EES52233 Rhs element VgrG protein Leptospirilum ferrodiazotrop EES52234 T6SS ATPase, ClpV1 family Leptospirilum ferrodiazotrop EES52237 GPW/gp25 family protein Leptospirilum ferrodiazotrop EES52238 Secreted T6SS effector, Hcp1 family Leptospirilum ferrodiazotrop

3.2 GpW is a structural component of the P2 baseplate

It was previously known that gpW is required to form infectious phage particles (Haggard- Ljungquist et al., 1995); however, it was unclear whether phage particles are capable of assembling in the absence of gpW. To address this question, I used EM to observe the phage products produced by Wam phage (P2 phage bearing an amber in gene W). These lysates were prepared by infecting C1a (non-suppressor E.coli) cells with P2 phages that bear the amber mutation in gene W. After cell lysis, cellular debris was removed by centrifugation and the supernatant used for electron microscopy (as described in Sec. 2.8). In this manner, I determined that phage tails are not assembled in the absence of gpW (Fig. 3.3). However, there was no effect in phage head (capsid) assembly, which occurs in a pathway independent from tail assembly.

Figure 3.3. Electron microscope image of the Wam lysate. There were no fully formed phages or tails in a Wam lysate while DNA-filled heads were observed (Scale bar is 20 nm).

25

After determining that phage tails cannot be formed in the absence of gpW, I sought to confirm that gpW is a structural component of the phage. As mentioned in Sec 2.4.2, E. coli cells transformed with pMal_W expressed gpW fused to the large maltose binding protein (MBP). Thus, when cultures of these cells were infected with Wam phage, the produced phage particles had MBP incorporated into the particle. I purified these phage using two rounds of CsCl centrifugation (see Sec. 2.5). The purified phage were analyzed by SDS-PAGE; and, a Western blot (see Sec. 2.9) was performed with an anti-MBP primary antibody (NEB, 1:10000 dilution). As shown in Figure 3.4, a single 55kDa band was visualized, corresponding to the molecular weight of the gpW MBP fusion protein. This experiment proved that gpW is indeed a structural component of the phage particle.

Figure 3.4. GpW is a structural component of P2. CsCl purification of MBP-tagged phage particles and the subsequent western blot proved that gpW is indeed a structural component of the phage particle (control is cell lysate of cells transformed with pMal_W). Below and above are fractions collected from below and above the phage band (titre of phage 4x1012 pfu/ml) in the centrifuge tube where there is no phage (titre of phage <104 pfu/ml) and there should be no MBP-tagged gpW; negative controls).

While the previous experiment confirmed that gpW is a structural component of the P2 virion, it did not provide any information about its location within the phage. Based upon the location of its homologue in phage T4, gpW was hypothesized to be a baseplate protein. To confirm this hypothesis, we used EM images of phage particles with and without MBP fused to gpW. Compared to gpW, MBP is a large molecule and is easily distinguishable in high resolution electron microscope images. Thus, the location of MBP helped us determine the position of gpW in the phage baseplate.

26

EM images were acquired and particles from the images were aligned and averaged (Sec. 2.6) by Nawaz Pirani in our lab. The averaging process was required to yield high-resolution images of the tail tip region. As shown in Figure 3.5, the position of gpW was determined by comparing the average images of the tail-tip region of MBP-tagged P2 (318 phage tails) and wild type P2 (353 phage tails). Hence, gpW was found to be at the top of the P2 baseplate.

A) B)

gpV

Figure 3.5. EM images were used to determine the position of gpW. A) The average of 318 aligned images from the tail tip region of MBP-tagged P2. B) The average of 353 aligned images from the tail tip region of P2. The averaged images exhibit much improved resolution compared to individual image. The arrows point to the location of MBP-tagged gpW. The tail spike protein, gpV, is also visible in both images (arrow in B).

3.3 Different mutations in gene W affect gpW function

As discussed above, Wam phage mutants are unable to produce phage tails in non-suppressor E.coli. To determine how mutations in gene W affect gpW function within the baseplate, I made point mutations in geneW. As the structure of gpW is not known, the X-ray crystal structure of a gpW homologue (27% sequence identity) from a prophage in G. Sulfurreducen (Fig. 3.6; PDB ID: 2IA7) was used to determine residues that are highly exposed (SwissPDB program, Guex and Peitsch, 1997). Exposed residues were selected because mutating buried residues usually disrupts protein secondary structure and abrogates protein function. I used the MAFFT algorithm (Katoh et al., 2002) and Jalview (Waterhouse et al., 2009) program to obtain multiple sequence alignments of gpW homologues and determine highly conserved residues. Using this

27 information, the R37, R40, D42, W74, P76 and R77 residues were selected for mutagenesis (Fig. 3.7) because they were highly conserved and exposed residues (>25% exposed).

Non-suppressor E. coli cells transformed with constructs that expressed residue substituted versions of gpW (modified gpW_trc constructs which express gpW fused to a FLAG epitope and are inducible with IPTG; Sec 2.4.2) were used for in vivo complementation assays. As shown in Figure 3.8A, without IPTG induction, spotting patterns on lawns of bacteria expressing the R40A, D42R, W74A and R77A substitutions were similar to a lawn of bacteria expressing gpW (spotting was performed with 100 fold dilutions of Wam phage). Thus, these substitutions had little effect on in vivo complementation. However, the R37A, R37D and P76A substitutions had deleterious effects (zones of clearing weren’t observed with diluted Wam phage and this confirms that in vivo complementation was low).

Initially, all in vivo complementation studies were performed without IPTG induction; i.e., proteins were expressed at basal levels through leaky expression. However, with IPTG induction, the function of the R37A residue substituted gpW protein was partially rescued (Fig. 3.8b) while R37D and P76A were still inactive (Fig 3.8b).

W74 gpW homologue in G.Sulfurreducens

gpW of P2

Figure 3.6 The X-ray crystallography structure of a gpW homologue in G. sulfurreducens. For comparison with Figure 3.7, the arrow points to where W74 of gpW maps onto the structure. Also included is a secondary structure prediction of gpW and its homologue. This prediction was made using the JNet Secondary Structure Prediction utility in Jalview and is based upon protein sequence (Green arrows: β sheets and red blocks: α-helicies)

28

R37, R40

W74 W74

W74 P76

D42

R77

Figure 3.7 Selecting residues for substitutions. A representative alignment of gpW homologues from various prophages and phage highlighting the R37, R40, D42, W74, P76 and R77 residues. GpW from P2 and its homologue from a prophage in G. sulfurreducens have been boxed. Three perspectives are presented from the crystal structure of a gpW homologue and the positions of various residues from gpW have been mapped unto this structure (prophage in G.sulfurreducens).

29

A) B) Empty vector

gpW+IPTG gpW

R37A+IPTG R37A

R37D+IPTG R37D

R40A

D42R

W74A

P76A P76A+IPTG

R77A

Figure 3.8 In vivo complementation assays with gpW expressed from the gpW_trc construct. A) Spotting assays to determine the ability of residue substituted gpW to complement Wam phage. Non-suppressor E.coli cells transformed with constructs coding for a residue substituted gpW are challenged with serial dilutions of Wam (no IPTG induction). B) IPTG induction improves the complementation in cells expressing R37A but not in cells expressing R37D or P76A.

Next, I assessed in vivo complementation by adding Wam phage to liquid cultures of cells transformed by the mutant constructs and then using electron microscopy to examine the produced phage. Observations made thus were consistent with the earlier spotting assay results, i.e., R40A had no effect on phage formation while no tail like structures were observed with the R37A and P76A substitutions. Moreover, the function of R37A was improved when more protein was expressed by IPTG induction (Fig. 3.9).

30

R40A R37A P76A

R37A+IPTG

Figure 3.9. Transmission Electron Microscope images of mutant phage lysates. The R40A substitution complements for fully formed phage but other substitutions have serious effects on phage formation. The R37A residue substitution’s deleterious effect can be partially overcome by IPTG induction.

I tested the expression levels of R37A, R37D, W74A and R77A versions of gpW (see Sections 2.10 and 2.11) by performing a Western Blot with the anti-flag antibody (Fig.3.10). Interestingly, the R37D substituted protein was expressed at a higher level than the R37A protein but had lower complementation and its function could not be rescued by IPTG induction (Fig 3.8). To test the activity of gpW (and modified gpW versions) at lower expression levels, I cloned gene W and 45 bps upstream of the gene into pAD100 (gpW_ENDO) so as to include the endogenous translation start site. This led to much lower levels of gpW expression which were undetectable in the Western Blot (Fig.3.10).

I mutated gene W in the gpW_ENDO construct to encode residue substituted versions (R37A, R37D, W74A and R77A) of gpW. As shown in Figure 3.10, the expression levels of these proteins are much lower than the gpW_trc. Furthermore, non-suppressor E. coli cells

31 transformed with the gpW_ENDO constructs were used for in vivo complementation assays. The effects of the residue substitutions are much more pronounced when gpW was under its endogenous translation start site (Figure 3.11). The W74A and R77A substitutions which had previously not shown any effects in complementation (Fig3.8), lead to 104 fold decreased phage production when expressed at lower levels. However, adding IPTG increases protein levels and the function of R37A, W74A and R77A was partially recovered (Fig 3.11B). Not surprisingly, the R37D substitution completely knocks out gpW function and this effect is not reversed by expressing the protein at higher levels.

15kDa

10kDa

trc Endogenous translation start site

Figure 3.10. Varying levels of gpW expression. The bands are just under 15kDa (corresponding to gpW which is flag and his-tagged). The lanes are from gpW_trc, gpW_trc(R37A), gpW_trc(R37A), gpW_trc(W74A), gpW_trc(R77A) and ladder. The next five lanes are for gpW expressed with its endogenous translation start site and the same residue substitutions (expression too low to visualize).

32

A) B)

gpW_trc

gpW_ENDO

R37A_ENDO

R37D_ENDO

W74A_ENDO

R77A_ENDO

No IPTG IPTG induction Figure 3.11. In vivo complementation assays with gpW expressed from the gpW_ENDO construct. A) gpW_trc, gpW_ENDO and various residue substituted versions expressed at basal levels (without IPTG induction). B) The same assays performed with IPTG induction. The effects of residue substitutions are much more pronounced when less protein is expressed.

3.4 The formation of intermediate P2 baseplate structures requires at least eight proteins

Aside from gpV, the tail spike protein (Haggard-Ljungquist et al., 1995; Kagyama et al., 2009), and gpW, at least five other proteins are thought to be required for P2 baseplate formation; gpD, gpH, gpG, gpI and gpJ. gpD is the hub protein, gpH and gpG are required for tail fiber formation (Haggard-Ljungquist et al., 1995). GpJ (homologous to the T4 baseplate wedge subunit gp6) and gpI share a common structural fold (detected by HHpred); and as geneJ is adjacent to geneI in the P2 genome, gpI and gpJ may interact in forming the baseplate wedges of P2.

In addition to the seven proteins described above, gpU and gpX are also likely to be required for the formation of complete baseplates. Using HHpred, I found gpU to be structurally similar to gp54 of phage T4 (which along with gp48 creates a platform on top of the hub to initiate tail tube oligomerization; Yap et al., 2010). This similarity, along with the genomic position of geneU in

33

P2 (adjacent to geneD) suggests that gpU interacts with the central hub in the P2 baseplate (Fig. 1.7). Furthermore, I showed that gpX, a small protein of unknown function, was required for tail formation (Fig 3.12); and, as tail formation requires a fully formed baseplate, it was inferred that gpX may also be required for baseplate assembly. Moreover, using electron microscopy (in a process similar to the one used to determine the position of gpW) we found gpX to be a baseplate protein located close to gpW (Fig 3.13).

Figure 3.12 GpX is required for P2 tail formation. GpX is a small protein of unknown function, without which, no phage tails are observed. I used Xam phage to infect non-suppressor E.coli and only fully packaged capsids were observed using EM.

A) B)

Figure 3.13 EM images were used to determine the position of gpX. A) The average of 353 aligned images from the tail tip region of wild-type P2. B) The average of 576 aligned images from the tail tip region of P2 with MBP-tagged gpX. The averaged images exhibit much improved resolution compared to individual images. The arrows point to the location of MBP- tagged gpX.

34

Thus, 9 proteins (gpV, W, J, I, H, G, U, D and X) may be required for the formation of complete P2 baseplates. To test this, I co-expressed groups of these proteins and looked for the formation of intermediate or stable structures. Tom Chang in our lab had made construct that expressed gene D (encoding C-terminal His-tagged gpD), a construct that coexpressed genes VWIJHG (V- G; N-terminal His-tagged gpV) and a construct that coexpressed genes U and D (UD; C-terminal His-tagged gpD) (Sec. 2.4). These plasmids were transformed (and contransformed) into cells such that we could coexpress V-G, V-G/D and V-G/UD. Using affinity chromatography (Sec. 2.11), His-tagged proteins were purified and the eluted proteins were analyzed by SDS-PAGE. As shown in Figure 3.14, there was no copurification of non-tagged baseplate proteins when the V-G construct was used. However, some proteins seemed to co-purify with the His-tagged gpV and gpD when using the V-G/D and V-G/UD constructs (gpV and gpD are the most intense bands in Fig. 3.14A because they are His-tagged). The sizes of the protein bands observed in Figure 3.14A correspond closely with the expected molecular weights of baseplate proteins gpW (12.6 kDa), gpU (17.4 kDa), gpI (19.6 kDa), gpV (22.1 kDa), gpJ (33 kDa) and gpD (43 kDa) (Appendix A1).

V-G A. B.

gpD 55kDa 40kDa gpJ 35kDa

gpV 25kDa gpV

gpI gpU 15kDa

gpW 10kDa

Figure 3.14. Purifying intermediate baseplate complexes. A) Using the V-G/UD (1st lane) and V- G/D (3rd lane) constructs, strong bands corresponding to gpV and gpD were observed along with other bands at sizes that corresponded to the expected molecular weight of other baseplate proteins. There are also some bands corresponding to proteins larger than gpD which may include gpH but these are poorly resolved. B) Using SDS-PAGE, only one band (gpV) was

35 observed when purifying proteins expressed from the V-G construct alone (Purification using affinity chromatography)

Finally, I used electron microscopy to visualize intermediate protein complexes. The elutions of each of the purifications outlined above (V-G, V-G/D and V-G/UD) were investigated and particles of the right shape (circular/hexagonal) and size (~25 nm diameter) of intermediate baseplate structures were only observed when using the V-G/UD co-transformation (Fig. 3.15). Together, these experiments confirmed that at least 8 proteins (V, W, J, I, H, G U and D) are required for the formation of intermediate baseplate structures. However, the V-G/UD purified samples which had been inspected by EM immediately after purification did not seem to have any complexes after a few days (re-inspection with EM). It was thus inferred that the formed complexes may have been unstable. This may be due to the absence of gpX; and thus, to fully elucidate the proteins required for P2 baseplate formation, further work is required (as outlined in the Future Directions).

A. B.

Figure 3.15. EM image of putative baseplate complexes. A) Arrows point to particles which are similar in shape and size to baseplates; they are seen in the V-G/UD purification only. The baseplate complexes shown above may be in two different conformations as has been observed for purified baseplate complexes from other phages (Yap et al., 2010; Campannaci et al., 2010) (Scale bar is 20nm). B) For comparison, lactococcal phage baseplates are shown. Scale bar is 50nm (Campannaci et al., 2010)

36

Chapter 4 Discussion and Future Directions 4 Discussion and future directions

The purpose of this study was to gain insight into the structure and function of the baseplate of myophage P2. Baseplate proteins are required for recognizing and binding to cell surface molecules during phage adsorption to its bacterial host. Moreover, in myophages, conformational changes in the baseplate lead to the contraction of the tail sheath and eventual release of phage DNA. To better understand the baseplate, we studied the role of gpW; a putative baseplate protein in phage P2. This small protein is one of the most widely conserved proteins in all myophages and is homologous to gp25 in the extensively studied myophage T4. In T4, gp25 is the seventh and last protein to be incorporated into the baseplate wedges and is placed at the top of the dome-shaped baseplate. Gp25 is known to be required for the formation of stable wedges; however, its precise interactions within the baseplate are not fully understood. As the T4 baseplate is uncharacteristically complex, understanding the role of gpW within the P2 baseplate should be a more tractable aim.

Before characterizing the role of gpW in phage P2, I used bioinformatic methods to study the wide distribution of gpW homologues. By performing comprehensive psiBLAST searches, I found that most sequenced myophage (93/132) have a protein with significant sequence similarity to gpW. In some myophages which have no proteins with significant sequence similarity to gpW (i.e., enterobacteria phage Mu), a protein can be inferred to be homologous to gpW by similarities in secondary structure, by genome position (proteins encoded by surrounding genes have high sequence identity with P2 proteins gpV, gpH and gpG) and by HHPred.

I also found that there are over 1000 gpW homologues (sequence identity ranged from >95% to 15%) in various prophages from phylogenetically diverse bacteria. An earlier study (Leiman et al., 2009) had found a gp25 homologue in the bacterial type VI secretion system (T6SS) and almost 100 proteins from different T6SS were found to be gpW homologues. Moreover, it was possible to classify some gpW homologues as belonging to T6SS by inspecting the proteins

37 encoded by surrounding genes. The very wide conservation of gpW in myophage tail producing gene clusters suggests that this protein plays an important role in myophage tail formation.

The first priority in my study was to determine whether gpW is a structural component of the phage; and, if so, determine its position within the virion. Using electron microscopy, I showed Wam phage (phage that cannot produce gpW) cannot form tail structures when infecting non- suppressor E.coli strains (Fig. 3.3). This finding suggested that gpW is required for baseplate formation. In addition, by purifying P2 phage with MBP-tagged gpW and performing a Western Blot with an anti-MBP, I showed that gpW is a structural component of the phage (Fig. 3.4). Finally, using electron microscopy, we determined the position of gpW and confirmed that it is at the top of the P2 baseplate (Fig. 3.5).

After determining the position of gpW, I sought to characterize the interactions of gpW within the phage tail. One way to do this is by making residue substitutions in the interaction surfaces of the protein which may knock out specific interactions. To choose residues for substitutions, I mapped the sequence of gpW onto the crystal structure of a gpW homologue (from a prophage in G. sulfurreducens) and determined residues that were conserved and exposed. Using in vivo complementation assays and electron microscopy, the effects of these residue substitutions were assessed. Mutations in gene W were shown to have differing impacts on the expression level (R37A was expressed at lower than normal levels) and functionality of gpW; sometimes fully abrogating function (R37D and P76A). Moreover, when expressed at lower levels (from the endogenous translation start site), every substitution checked (R37A, R37D, W74A and R77A) abrogated gpW function (Fig 3.11A).

The effects of residue substitution on gpW function may have been due to the destabilization of modified gpW instead of being due to the disruption of surfaces required for interactions with other tail proteins. To confirm that residue substitutions had not destabilized the secondary structure of gpW, circular dichroism (CD) should be performed.

Another important question that had to be addressed was “which P2 proteins are required for baseplate formation?”. By purifying tagged proteins, putative baseplate complexes were isolated and it was determined that at least 8 proteins (gpV, W, J, I, H, G, U, D; V-G/UD) are necessary for the formation of intermediate baseplate structures (Fig 3.14). However, complexes made by the coexpression of these proteins were unstable (over several days in SM buffer at 4oC) and this

38 suggests that other proteins are required for stable complexes or that baseplates are inherently unstable without the tape measure and/or tail polymerization.

In an attempt to form more stable baseplates, gene X has been cloned into a coexpression construct, XUD (Sec. 2.4 gpD is His-tagged). Using electron microscopy, I showed that gpX is required for tail formation (Fig 3.12). Moreover, we were able to determine the position of gpX in the P2 baseplate and found that gpX is very close to gpW (Fig. 3.13). Hence, the coexpression of gpX may lead to the formation of more stable baseplate structures. The formation of these structures can be investigated using co-purification experiments with the XUD/V-G constructs and confirmed by EM.

EM images of baseplate complexes made by coexpressing 8 proteins (V-G/UD) have been acquired (Fig. 3.15). However, the resolution of single images is not great enough to make definitive conclusions about the structure of these complexes. Hence, the image alignment and averaging process may have to be applied to these EM images. Moreover, earlier studies that have produced and imaged baseplate complexes (Campanacci et al., 2010; Yap et al., 2010) have used size-exclusion chromatography to purify the sample and this procedure may also have to be applied to P2 baseplates before extensive EM work.

To fully determine the role of gpW, we will need to understand its interactions with proteins in the P2 baseplate and tail. In bacteriophage T4, gp25 interacts with six other proteins in the stepwise assembly of the baseplate wedges that form the structural basis of the baseplate (Yap et al., 2010). While the P2 baseplate is simpler than the T4 baseplate, gpW may interact with gpJ (homologous to gp6, a wedge protein in T4) and gpI to form a simplified version of the baseplate wedge. Furthermore, with the identification of a protein which has gpX and gpW-like domains, and with the confirmation that gpX is close to gpW in the P2 baseplate, gpX also likely interacts with gpW.

In addition to its interactions with other baseplate proteins, there are several indications that gpW may also interact with the tail sheath protein, gpFI. First, the location of both gpW and gp25 is quite close to the tail sheath protein. More significantly, the N-terminal domain structure of the tail sheath protein from a Mu-like phage is similar to the structure of the gpW homologue in a prophage from G. sulfurreducens. Hence, in myophages, gpW homologues may interact with the

39 tail sheath protein; perhaps helping initiate its polymerization (which would explain the lack of tail structures in Wam mutants).

The direct method to fully elucidate the interactions between gpW and other P2 proteins would be to express and purify tagged gpW and attempt to “pull down” other tail and baseplate proteins which interact with gpW. As gpW and gp25 are quite insoluble, the gpW MBP fusion protein which is much more soluble can be used. In addition to the pull down experiment, the collected elutions from this purification can be observed with EM to look for the formation of intermediate baseplate complexes.

Alternatively, one may perform the same pull down experiment using modified gpW (expressed from mutated gene W) to determine whether any interactions have been specifically knocked out and whether these disrupted interactions affect tail and baseplate formation. As mentioned above, I have already started this experiment and the residue substitutions made in this study all affected the function of gpW; they may be a good starting point for investigating knocked-out interactions. However, I have only made substitutions in one surface of the gpW (residues selected were the most highly exposed and conserved in gpW homologues) and other surfaces may have to be targeted.

Finally, it may be possible to produce P2 baseplates in vitro, i.e., by mixing constituent proteins at stoichiometrically correct ratios. The resultant intermediate complexes may be isolated and characterized by EM and analytical ultracentrifugation (through the differential sedimentation of heterogeneous particles). Recently, similar studies on the T4 baseplate assembly process confirmed the sequential incorporation of proteins in the baseplate wedges, the baseplate hub and the subsequent attachment of the two structures (Yap et al., 2010).

Based upon its position in the P2 baseplate and the interactions of gp25 (the gpW homologue in T4), gpW may be required for baseplate wedge formation along with gpJ (homologous to gp6 from T4), gpI and gpX in P2. Phage proteins in the gpW/gp25 family may also interact with the tail sheath protein, gpF1. Hence, we have hypothesized a model for the P2 baseplate and this is represented in Figure 4.1.

gpFII 40

gpFI

gpW gpU

gpX gpI

gpH gpJ gpV gpD

Figure 4.1 Our model of the P2 baseplate. This model is based upon known positions for gpW, gpJ and gpX along with our hypothesized interactions between gpW, gpX, gpJ and gpF1.

The experiments proposed above should greatly improve our understanding of the interactions of gpW within the P2 baseplate. Moreover, the potential interaction of gpW with the tail sheath protein may suggest a reason for its very widespread existence in myophage tail-like structures. Finally, the experiments should elucidate the baseplate assembly pathway and the precise structure of the P2 baseplate.

41

References

Aebi, U., Pollard, T. D. (1987) A glow discharge unit to render electron microscope grids and other surfaces hydrophilic. Journal of Electron Microscopy Technique 7:29-33

Ackermann, H.W. (1999). Tailed bacteriophages: The order Caudovirales. Adv.Virus Res. 51:135-201

Ackermann, H.W. (2007) 5500 phages examined in the electron microscope. Arch Virol 152:227–243

Aksyuk, A. Leiman, P.G., Kurochkina, L.P., Shneider, M. M., Kostyuchenko1, V.A., Mesyanzhinov, V.V., et al. (2009) The tail sheath structure of bacteriophage T4: a molecular machine for infecting bacteria. EMBO J 28: 821-829

Atha, D.H., Ingham, K.C. (1981) Mechanism of precipitation of proteins by polyethylene glycols. The Journal of Biological Chemistry 256(23): 12108-12117

Bergh, O., Borsheim, K.Y., Bratbak, G., Heldal, M. (1989) High abundance of viruses found in aquatic environments. Nature 340:467-468.

Bullas, L. R., Mostaghimi, A. R., Arensdorf, J. J., Rajadas, P.T., Zuccarelli, A. J. (1991) Salmonella phage PSP3, another member of the P2-like phage group. Virology 185:918-921

Burtnick M.N., Brett P.J., Harding S.V., Ngugi S.A., Ribot W.J., Chantratita N. et al. (2011) The Cluster 1 Type VI Secretion System Is a Major Virulence Determinant in Burkholderia pseudomallei. Infect Immun. 79(4):1512-25

Calendar, R., (2006). The Bacteriophages (2nd Ed.). Oxford University Press, New York

Campanacci, V., Veesler, D., Lichière, J., Blangy, S., Sciara, G., Moineau, S. et al., (2010) Solution and electron microscopy characterization of lactococcal phage baseplates expressed in . J Struct Biol. 172(1):75-84.

Cardarelli, L. and Pell, L.G. and Neudecker, P. and Pirani, N. and Liu, A., Baker, L.A. et al., (2010). Evolutionary relationships may exist among very diverse groups of proteins even though

42 they perform different functions and display little sequence similarity. Proc Natl Acad Sci U S A 107(32): 14384-14389

Casjens, S. (2003). Prophages and bacterial genomics: what have we learned so far? Mol. Microbiol. 49:277–300.

Davidson, A.R. and Sauer, R.T. (1994) Folded Proteins Occur Frequently in Libraries of Random Amino Acid Sequences. Proc. Natl. Acad. Sci. USA 91: 2146-2150

Esposito, D., Fitzmaurice, W. P., Benjamin, R. C., Goodman, S. D., Waldman, A. S., Scocca, J. J. (1996). The complete nucleotide sequence of bacteriophage HP1DNA. Nucleic Acids Res. 24:2360-2368.

Feucht, A., Schmid, A., Benz, R., Schwarz, H., Heller, K. J. (1990). Pore formation associated with the tail-tip protein pb2 of bacteriophage T5. The Journal of Biological Chemistry 265 (30): 18561-7.

Guex, N. and Peitsch, M.C. (1997) SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modeling. Electrophoresis 18: 2714-2723.

Haggard-Ljungquist, E., Jacobsen, E., Rishovd, S., Six, E., Nilssen, Φ., Sunshine, M., et al., (1995). Bacteriophage P2: genes involved in baseplate assembly. Viriology 213:109-121

Hanahan, D. (1985) Techniques for transformation of E. coli, p. 109-135. In D. M. Glover, DNA cloning: a practical approach, vol. 1. IRL Press, Oxford, United Kingdom

Bozzola, J.J., Russell L. D. (1999) Electron microscopy: principles and techniques for biologists.(2nd Edition) Jones & Bartlett Learning

Karp, G (2009) Cell and Molecular Biology: Concepts and Experiments (9th Edition) John Wiley and Sons

Katoh, K., Misawa, K., Kuma, K., Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30: 3059–3066.

43

Kostyuchenko, V. A., Leiman, P. G., Chipman, P. R., Kanamaru, S., van Raaij, M. I., Arisakaa, F., et a.l, (2003). Three-dimensional structure of bacteriophage T4 baseplate. Nat. Struct. Biol.. 10: 688-693

Leiman, P. G., Basler, M., Ramagopal, U. A., Bonanno, J. B., Sauder, J. M., Pukatzki, S., et al., (2009). Type VI secretion apparatus and phage tail-associated protein complexes share a common evolutionary origin. Proc. Nat. Acad. Sci. 106:4154-4159

Leiman P.G., Arisaka F., van Raaij M.J., Kostyuchenko V.A., Aksyuk A.A., Kanamaru S., et al., (2010) Morphogenesis of the T4 tail and tail fibers. Virol J. 7:355-383.

Lengyel, J., Goldstein, R.N., Marsh, M., Calendar, R. (1974) Structure of the bacteriophage P2 tail. Virology 62:161-174

Lindahl, G., Hirota, Y., Jacob, F. (1971) On the process of cellular division in Escherichia coli: Replication of the bacterial chromosome under control of prophage P2 Proc. Nat. Acad. Sci. USA 68(10): 2407-2411

Mirold, S., Rabsch, W., Tschape, H., Hardt, W.-D. (2001) Transfer of the Salmonella type III effector sopE between unrelated phage families. J. Mol. Biol. 312:7-16.

Nakayama, K., Kanaya, S., Ohnishi, M., Terawaki, Y., and Hayashi, T. (1999). The complete nucleotide sequence of FCTX, a cytotoxin-converting phage of Pseudomonas aeruginosa: implications for phage evolution and horizontal transfer via bacteriophages. Mol. Microbiol. 31: 399-419.

Nakayama, k., Takashima, K., Ishihara,H., Shinomiya, T., Kageyama, M., Kanaya,S et al., (2000). The R-type pyocin of Pseudomonas aeruginosa is related to P2 phage, and the F-type is related to . Mol Microbiology. 38:213-231

Nesper, J., J. Blass, M. Fountoulakis, and J. Reidl. (1999). Characterization of the major control region of Vibrio cholerae bacteriophage K139: immunity, exclusion, and integration. J. Bacteriol. 181:2902-2913.

Nilsson, A.S., Karlsson, J.L., Haggard-Ljungquist, E. (2004) Site-specific recombination links the evolution of P2-like coliphages and pathogenic enterobacteria Mol. Biol. Evol. 21(1):1–13.

44

Ochman, H., and Selander, R. K. (1984). Standard reference strains of Escherichia coli from natural populations. J. Bacteriol.157:690-693

Pell, L. G., Kanelis, V., Donaldson, L. W., Howell, P. L., Davidson, A. R. (2009). The phage lambda major tail protein structure reveals a common evolution for long-tailed phages and the type VI bacterial secretion system. Proc. Nat. Acad. Sci. 106:4160-4165

Sasaki, I., Bertani, G., (1965) Growth abnormalities in Hfr derivatives of Escherichia coli strain C. J. Gen. Microbiol. 40:365-376.

Söding, J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21: 951-960

Studier, F.W., Moffatt B.A. (1986) Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes. J Mol Biol. 189(1):113-30

Sunshine, M., Thorn, M., Gibbs, W. and Calendar, R., (1971) P2 Phage Amber Mutants: Characterization by use of a Polarity Supressor Virology 46: 691–702

Temple, L.M., Forsburg, S.L., Calender, R., Christie, G.E. (1991) Nucleotide sequence of the genes encoding the major tail sheath and tail tube protein of bacteriophage P2. Virology 181: 353–388.

Thingstad,T. F. (2000) Elements of a theory for the mechanisms controlling abundance, diversity, and biogeochemical role of lytic bacterial viruses in aquatic systems. Limnol. Oceanogr. 45:1320-1328.

Vajda, B.P. (1978) Concentration and purification of viruses and bacteriophages with polyethylene glycol. Folia Microbiol. 28:88-96

Waterhouse, A. M., Procter J. B., Martin D. M., Clamp, M., Barton, G.J., (2009). Jalview Version 2; a multiple sequence alignment editor and analysis workbench, Bioinformatics 25:1189-1191

Williams, B. J., Golomb, M., Phillips, T., Brownlee, J., Olson, M. V., Smith. A. L. (2002). Bacteriophage HP2 of Haemophilus influenzae. J. Bacteriol. 184:6893-6905.

45

Wommack, K.E. & Colwell, R.R. (2000) Virioplankton: viruses in aquatic ecosystems. Microbiology and Molecular Biology Reviews, 64: 69–114.

Yap, M.L, Mio, K., Leiman, P.G., Kanamau, S., Arisaka, F. (2010) The baseplate wedges of bacteriophage T4 spontaneously assemble into hubless baseplate-like structure in vitro. J. Mol. Biol. 395:349-360

46

Appendices

Appendix A1. The P2 baseplate proteins

Protein Residues Predicted Molecular Weight Function

gpH 669 71 kDa Tail Fiber gpD 387 43 kDa Baseplate hub gpJ 302 33 kDa Peripheral member of baseplate gpV 211 22.1 kDa Tail spike gpG 175 20.2 kDa Required for fiber formation gpI 176 19.6 kDa gpU* 159 17.4 kDa gpW 115 12.6 kDa Baseplate wedge protein gpX* 67 7.1 kDa *These proteins are putative members of the P2 baseplate

Appendix A2. The P2 genome

Q P O N M L X Y K R S

V W J I H G fun(Z) FI FII E E’ T

U D ogr int C cox B A tin old

Portal Protein Capsid Proteins lys Proteins; lysA, lysB and lysC

Baseplate Proteins Tail Proteins Hypothetical Proteins

47

Gene Description of encoded protein Gene Description of encoded protein Q Portal protein fun (Z) P Terminase ATPase F I Tail sheath protein O Capsid scaffold protein F II Tail tube protein N Major capsid protein E, E’ Putative tail proteins M Terminase endonuclease T Tape measure protein L Capsid completion protein U Putative baseplate protein X Putative baseplate protein D Baseplate hub protein Y Holin protein ogr Late gene activator K Lysin protein int Integrase lysA Regulation of lysis C Predicted transcriptional regulator lysB Regulation of lysis cox lysC Regulation of lysis P2p34 R Tail completion protein B S Tail completion protein P2p36 P2p15 P2p37 V Tail Spike P2p38 W Baseplate assembly protein P2p39 J Baseplate assembly protein A Putative replication initiator I Baseplate assembly protein P2p41 H Tail fiber protein tin G Fiber assembly protein old Overcoming Lysogenation Defect

48

Appendix A3 List of Primers used in this Study Primer Name Primer Sequence (5’3’) Use gpW_pAD Cloning gene W in pAD100 AGGAAACAGAC Forward CATGGCAATGACAGCGCGTTATCTCG gpW_pAD Cloning gene W in pAD100 GTCCTTGTAGTCTAGAGCACTCACAGGGATGGT Reverse TAATG gpW_pAD ENDO AGGAAACAGACCATGGCCATAAACACCCCGGC Cloning gene W with 45 base pairs For GAC upstream into pAD100 gpW_pMAL AAGGATTTCAGAATTCATGACAGCGCGTTATCT Cloning gene W in pMal.c4x Forward C G gpW_pMAL GCAGGTCGACTCTAGATCAACTCACAGGGATGG Cloning gene W in pMal.c4x Reverse T TAATG R37A Forward ACACCGGTCGGCTCAGCGGTGATGCGTCGTGAT Mutating gene W; encoding R37A R37A Reverse ATCACGACGCATCACCGCTGAGCCGACCGGTGT Mutating gene W; encoding R37A R37D Forward GCACACCGGTCGGCTCAGATGTGATGCGTCGTG Mutating gene W; encoding R37D ATTAC R37D Reverse GTAATCACGACGCATCACATCTGAGC Mutating gene W; encoding R37D CGACCGGTGTGC R40A Forward GCTCACGGGTGATGGCGCGTGATTACGGCTC Mutating gene W; encoding R40A R40A Reverse GAGCCGTAATCACGCGCCATCACCCGTGAGC Mutating gene W; encoding R40A D42A Forward GTGATGCGTCGTGCGTACGGCTCGTTG Mutating gene W; encoding D42A D42A Reverse CAACGAGCCGTACGCACGACGCATCAC Mutating gene W; encoding D42A D42R Forward GGTGATGCGTCGTCGTTACGGCTCGTTG Mutating gene W; encoding D42R D42R Reverse CAACGAGCCGTAACGACGACGCATCACC Mutating gene W; encoding D42R W74A Forward CAT GGC AGT GCT GAA AGC GGA ACC CCG Mutating gene W; encoding W74A CGT CAC W74A Reverse GTG ACG CGG GGT TCC GCT TTC AGC ACT Mutating gene W; encoding W74A GCC ATG P76A Forward CAGTGCTGAAATGGGAAGCGCGCGTCACCCTGT Mutating gene W; encoding P76A CATC P76A Reverse GATGACAGGGTGACGCGCGCTTCCCATTTCAGC Mutating gene W; encoding P76A

49

ACTG R77A Forward CTGAAATGGGAACCCGCGGTCACCCTGTCATC Mutating gene W; encoding R77A R77A Reverse GATGACAGGGTGACCGCGGGTTCCCATTTCAG Mutating gene W; encoding R77A p21d_UD_For ATGATGCTCGCGTTAGGTATGTTTG Amplifying pET-21d p21d_UD_Rev GGTATATCTCCTTCTTAAAGTTAAA Amplifying pET-21d X_p21d_UD_For AGAAGGAGATATACCATGAAGACCTTTGCGC Cloning gene X in pET-21d X_p21d_UD_Rev TAACGCGAGCATCATCTCCCACAGATTGAC Cloning gene X in pET-21d pADseqFor GCTGTTGACAATTAATCATCCGGCTCG Sequencing constructs in pAD100 pADseqRev CTCAAGACCCGTTTAGAGGCCCCAAGGGG Sequencing constructs in pAD100