University of Calgary PRISM: University of Calgary's Digital Repository

Graduate Studies The Vault: Electronic Theses and Dissertations

2014-10-02 Isolation, characterization, and applications of rhizobiophages

Halmillawewa, Anupama

Halmillawewa, A. (2014). Isolation, characterization, and applications of rhizobiophages (Unpublished doctoral thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/26680 http://hdl.handle.net/11023/1912 doctoral thesis

University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY

Isolation, characterization, and applications of rhizobiophages

by

Anupama P. Halmillawewa

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE

DEGREE OF DOCTOR OF PHILOSOPHY

DEPARTMENT OF BIOLOGICAL SCIENCES

CALGARY, ALBERTA

AUGUST, 2014

© Anupama Halmillawewa 2014

Abstract

Rhizobiophages are the group of bacteriophages that infect rhizobia. The rhizobia constitute a bacterial group that includes members of several different genera and are capable of nodulating legume plant roots, and providing a source of fixed nitrogen for the plant. Rhizobiophages can play a vital role in the Rhizobium-legume symbiosis by altering the population dynamics of the rhizobia present in the rhizosphere, which can be used in agriculture to improve the efficacy of commercial Rhizobium inoculants and mitigate the Rhizobium competition problem. As a prerequisite for the application of such technology, a thorough understanding of rhizobiophage biology is required. Isolation of rhizobiophages from soil samples obtained from Alberta, Saskatchewan, Ontario and

British Columbia was performed using several different strains of rhizobia as trapping hosts. The isolated phages were characterized using host range, transmission electron microscopy, protein profiles and genomic characterization. All phage isolates were characterized as tailed phages belonging to the Order Caudovirales, while further classification allowed us to place them in families Siphoviridae, Myoviridae and

Podoviridae. Depending on the host range, morphotype and trapping host, several phages were selected and detailed genome characterization was performed using whole genome sequencing. Five rhizobiophage genomes were sequenced to finished state

(vB_RleM_P10VF, vB_RglS_P106B, vB_RleM_PPF1, vB_MloP_Lo5R7ANS, and vB_MloP_Cp1R7ANS-C2) and annotated. The complete genome sequences of vB_RglS_P106B (KF977490), vB_RleM_PPF1 (KJ746502), vB_RleM_P10VF

(KM199770) and vB_MloP_Lo5R7ANS (KM199771) are publically available at the database of National Center of Biological Information (NCBI). The genome of

ii

vB_MloP_Cp1R7ANS-C2 will be submitted to the NCBI in the near future. The integration of temperate phage vB_RleM_PPF1 into its bacterial host R. leguminosarum

F1 was also studied. The site-specific recombination system of the phage targets an integration site that lies within a putative tRNA-Pro (CGG) gene in R. leguminosarum

F1. Upon integration, the phage is capable of restoring the disrupted tRNA gene, owing to the 50 bp homologous sequence (att core region) it shares with its rhizobial host. To develop a phage-based inoculant technology, the phages of Rhizobium leguminosarum with broadest possible host range were selected and tested in nodulation competition assays. The presence of phages altered the nodule occupancy by phage-sensitive and phage-resistant strains of rhizobia, increasing the efficacy of nodulation by phage- resistant rhizobia under controlled environmental conditions.

iii

Acknowledgements

Firstly, I am extremely grateful to my supervisor, Dr. Michael Hynes for believing in me enough to give me an opportunity to work in his lab. I am indebted to him for resurrecting my own faith and rescuing my carrier in science, at a time when I was reconsidering my decision to pursue a doctoral degree and doubting my abilities after being at the receiving end of series of rejections by various graduate schools. All his advices, mentorship and constant support during my time in the lab helped me in becoming a better scientist. I am also indebted to Dr. Christopher Yost, who has been a co-supervisor than a committee member to me, for all his guidance, support and encouragements. His endless excitement and enthusiasm over anything related to science was contagious and always motivated me to give my best into everything I did.

I am thankful to Dr. Kenneth Sanderson for providing helpful suggestions and feedbacks on my research over the years. I am also grateful to Dr. Tao Dong and Dr. Antonet

Svircev for serving in my thesis defense and providing valuable inputs on my thesis manuscript. I would also like to extend my gratitude to Dr. Raymond Turner for taking time out from his busy schedule to teach me techniques related to ‘proteomics’.

I greatly appreciate all the support provided by the University of Kelaniya, especially by the members of the Department of Microbiology, including Dr. D. L. Jayaratne and Mr.

M.M. Gunawardena. I owe a big ‘thank you’ to Dr. Indrika Abeygunawardena for all her kind advices and encouragements during my struggle to get a placement to do my Ph.D.

iv

I consider myself as fortunate, to be a part of the Hynes lab when four wonderful scientists were around. I am thankful to Dr. Hao Ding for his scientific inputs and help for my research. I also need to thank Dr. Cynthia Yip for all her suggestions during our

‘little-scientific-coffee chats’ and her friendship. I’ve found one of the most kind-hearted, loving and caring friend and also a great researcher who never fails to inspire me in Dr.

Dinah Tambalo. My work in the lab wouldn’t have been such a pleasure without the help and friendship of Marcela Restrepo. I sincerely appreciate the friendships of Dinah and

Marcela, which made me feel comfortable and secure during my early days in Calgary. I am grateful to Suriakarthiga Ganesan and Rémy Gavard, two project students who worked under my supervision, for their contributions to my research.

Finally, I am very thankful to all my family and friends. I wouldn’t have achieved anything in my life without the never wavering support and unconditional love of my dearest parents. Thank you very much for always being the guiding light in my world and encouraging me to chase my own dreams. And I am blessed to have my baby sister and her little encouraging and loving words in my life.

v

Dedication

This work is dedicated to my loving parents; my father, Nihal Halmillawewa and my mother, Ranjani Halmillawewa for being there with me and making me what I am today.

vi

Table of Contents

Abstract ...... ii Acknowledgements ...... iv Dedication ...... vi Table of Contents ...... vii List of Tables ...... xii List of Figures and Illustrations ...... xiv List of Symbols, Abbreviations and Nomenclature ...... xvii

CHAPTER ONE: Introduction ...... 1 1.1 Bacteriophages ...... 1 1.1.1 Bacteriophage abundance and distribution ...... 2 1.1.2 The general infection process of a bacteriophage: the lytic life cycle ...... 3 1.1.3 Lysogeny and its consequences ...... 5 1.1.4 Phages as agents of horizontal gene transfer agents ...... 8 1.1.4.1 Generalized Transduction ...... 8 1.1.4.2 Specialized Transduction ...... 10 1.1.5 Classification of bacteriophages ...... 10 1.1.6 Bacteriophage genomics ...... 13 1.1.6.1 Diversity of phage populations ...... 15 1.1.6.2 Bacteriophage genome mosaicism ...... 15 1.1.6.3 Phage genome architecture ...... 18 1.1.6.4 Prophages and prophage-like elements ...... 18 1.1.6.5 Phage metagenomics ...... 20 1.1.7 Bacteriophages and their applications ...... 20 1.1.7.1 Phages in medicine ...... 20 1.1.7.2 Phages in agriculture ...... 21 1.1.7.3 Phages in the food industry ...... 22 1.2 Rhizobia-legume symbioses ...... 23 1.2.1 Competition for nodulation of legumes ...... 24 1.2.2 Rhizobiophages ...... 26 1.2.2.1 Rhizobiophages as transducing agents ...... 26 1.2.2.2 Rhizobiophages in agriculture ...... 27 1.2.2.3 Rhizobiophage genomics ...... 28 1.3 Research objectives ...... 30

CHAPTER TWO: Materials and methods ...... 32 2.1 Bacterial strains, plasmids and growth conditions ...... 32 2.2 Isolation, storage and propagation of rhizobiophages ...... 32 2.3 Transmission electron microscopy (TEM) ...... 39 2.4 Host range determination ...... 40 2.5 One-step growth curves ...... 40 2.6 Phage lysis curves ...... 41 2.7 Transduction experiments ...... 41

vii

2.8 Isolation of phage-resistant strains of Rlv 248SM ...... 42 2.9 Plant assays ...... 42 2.9.1 Nodulation conditions ...... 42 2.9.2 Inoculation with rhizobia and phage ...... 44 2.9.3 Nodule staining ...... 44 2.10 Analysis of phage proteins ...... 45 2.10.1 Preparation of phage samples ...... 45 2.10.2 Polyacrylamide gel electrophoresis (SDS-PAGE) and tandem mass spectrometry analysis (LC-MS/MS) ...... 45 2.11 Bacterial DNA isolation and manipulation ...... 46 2.12 DNA primers and polymerase chain reactions ...... 46 2.13 Phage DNA isolation ...... 47 2.14 Pulsed field gel electrophoresis (PFGE) ...... 47 2.15 Phage genome sequencing and assembly ...... 48 2.15.1 Phage vB_RleM_P10VF (P10VF) ...... 48 2.15.2 Phage vB_RglS_P106B (P106B) ...... 48 2.15.3 Phages vB_MloP_Lo5R7ANS (Lo5R7ANS) and vB_MloP_Cp1R7ANS-C2 (Cp1R7ANS-C2) ...... 49 2.15.4 Phage vB_RleM_PPF1 (PPF1) ...... 49 2.16 Genome annotation and in silico analysis of the phage genome ...... 49 2.17 Isolation of lysogenized strains of R. leguminosarum F1 and prophage induction from the lysogen ...... 50 2.18 Southern hybridization ...... 51 2.18.1 Preparation of phage DNA probe ...... 51 2.18.2 Hybridization ...... 51 2.19 Identification of the attachment sites in PPF1 and R. legumonisarum F1 genomes ...... 52 2.19.1 Sequencing and analysis of the lysogenized R. legumonisarum F1 genome ...... 52 2.19.2 Verification of attachment sites in vivo and site-specific recombination .....52 2.20 Complementation of Rlv VF39SM exopolysaccharide synthesis mutants ...... 53 2.20.1 Complementation ...... 53 2.20.2 Bacterial matings ...... 56 2.20.3 Eckhardt gel analysis ...... 56 2.20.4 Efficiency of plaquing (EOP) ...... 56

CHAPTER THREE: Characterization of Rhizobium leguminosarum phages ...... 58 3.1 Results ...... 59 3.1.1 Phage isolation and trapping rhizobial hosts ...... 59 3.1.2 Host range of R. leguminosarum phages ...... 60 3.1.3 Lysis curves ...... 64 3.1.4 Morphologies of selected R. leguminosarum phages ...... 64 3.1.5 One-step growth curves ...... 69 3.1.6 Genome characterization of R. leguminosarum phages ...... 72 3.1.6.1 Restriction enzyme digestion profiles of phage genomic DNA ...... 72

viii

3.1.6.2 PFGE ...... 72 3.1.6.3 Genome of vB_RleM_P10VF (P10VF) ...... 75 3.1.7 Analysis of phage proteins ...... 78 3.1.8 The effect of exopolysaccharide and lipoplysaccharide synthesis of R. leguminosarum bv. viciae VF39SM on phage infection ...... 82 3.2 Discussion ...... 92 3.2.1 General characteristics of R. leguminosarum phages ...... 92 3.2.2 Role of exopolysaccharide (EPS) and lipoplysaccharide (LPS) synthesis genes of R. leguminosarum bv. viciae VF39SM on phage infection ...... 95 3.2.3 Genome analysis ...... 98 3.2.3.1 Genome of vB_RleM_P10VF (P10VF) ...... 99 3.3 Author’s contributions ...... 106

CHAPTER FOUR: Characterization of Rhizobium gallicum phages ...... 107 4.1 Results ...... 107 4.1.1 Phage isolation and trapping rhizobial host ...... 107 4.1.2 Host range of R. gallicum phages ...... 108 4.1.3 Lysis curves ...... 113 4.1.4 Plaque formation and morphology ...... 113 4.1.5 One-step growth curve of P106B ...... 117 4.1.6 Genome characterization of P106B ...... 117 4.1.7 Analysis of phage proteins ...... 120 4.2 Discussion ...... 125 4.2.1 General characteristics of R. gallicum phages ...... 125 4.2.2 Genome analysis ...... 127 4.3 Author’s contributions ...... 135

CHAPTER FIVE : Characterization of Mesorhizobium loti phages ...... 136 5.1 Results ...... 136 5.1.1. Phage isolation and trapping rhizobial hosts ...... 136 5.1.2. Host range of Mesorhizobium phages ...... 137 5.1.3. Plaque formation and morphology ...... 138 5.1.4. Characterization of genomes ...... 138 5.2 Discussion ...... 150 5.2.1. General characteristics of M. loti phages ...... 150 5.2.2. Analysis of genomes ...... 152 5.3 Author’s contributions ...... 164

CHAPTER SIX: Characterization of temperate rhizobiophage vB_RleM_PPF1 and its site-specific integration into the rhizobial host genome ...... 165 6.1 Results ...... 165 6.1.1 General characteristics and host range of PPF1 ...... 165 6.1.2 Genome characterization of PPF1 ...... 166 6.1.3 Isolation of lysogenized strains of R. leguminosarum F1 and prophage induction from the lysogen ...... 173

ix

6.1.4 Identification of the attachment sites in PPF1 and R. legumonisarum F1 genomes ...... 175 6.2 Discussion ...... 180 6.2.1 Genome analysis ...... 181 Prophage genome arrangement and integration into the host chromosome ...... 187 6.3 Author’s contributions ...... 191

CHAPTER SEVEN: Effect of rhizobiophages on nodulation competitiveness of Rhizobium leguminosarum...... 192 7.1 Results ...... 192 7.1.1 Selection of phages and standard plant competition assays ...... 192 7.1.2 Determining the effective multiplicity of infection (MOI) for phage inoculation ...... 193 7.1.3 Methods of phage inoculation ...... 194 7.1.4 Phage effect on nodulation by indigenous rhizobia ...... 197 7.1.5 Effect of the phage on the nodulation competitiveness of rhizobia ...... 197 7.2 Discussion ...... 209

CHAPTER EIGHT: General discussion and future directions ...... 213 8.1 General discussion ...... 213 8.2 Future directions ...... 222

REFERENCES ...... 226

APPENDIX I: Growth media ...... 245

APPENDIX II: Solutions used for electrophoresis, Southern blots, and Eckhardt gel analysis ...... 247

APPENDIX III: Solutions used in polyacralamide gel electrophoresis ...... 248

APPENDIX IV: Solutions used for nodulation competition assays and nodule staining 249

APPENDIX V: Genome annotations of vB_RleM_P10VF: Open reading frames in the genome and their predicted functions ...... 250

APPENDIX VI: Genome annotation of vB_RgaS_P106B: Open reading frames in the genome and their predicted functions ...... 274

APPENDIX VII: Genome annotations of vB_MloP_Lo5R7ANS: Open reading frames in the genome and their predicted functions ...... 283

APPENDIX VIII: Genome annotations of vB_MloP_Cp1R7ANS-C2: Open reading frames in the genome and their predicted functions ...... 292

APPENDIX IX: Genome annotations of vB_RleM_PPF1: Open reading frames in the genome and their predicted functions ...... 301 x

APPENDIX X: Permission for reprints ...... 311

xi

List of Tables

Table 1-1. Availability of complete genome sequences of rhizobiophages ...... 29

Table 2-1. Bacterial strains used in this study ...... 33

Table 2-2. Phages used to isolate a phage resistant Rlv 248SM strain ...... 43

Table 2-3. Primers used to verify the probable att sites of phage PPF1 ...... 54

Table 2-4. Primers used to complement Rlv VF39pssD::Tn5 strain ...... 55

Table 3-1. R. leguminosarum phages used in this study and their trapping information . 61

Table 3-2. Host range of selected phages with known rhizobial strains ...... 62

Table 3-3. Host range of selected phages of R. leguminosarum with 24 indigenous rhizobial strains isolated from different soil samples associated with legume growth ...... 63

Table 3-4. Morphological characteristics of some of the R. leguminosarum phage isolates ...... 70

Table 3-5. Host range of 23 R. leguminosarum phages against Rlv VF39SM lipopolysaccharide mutants ...... 84

Table 3-6. Host range of 23 R. leguminosarum phages against Rlv VF39SM exopolysaccharide mutants ...... 85

Table 3-7. Efficiency of plaquing (EOP) data for 24 R. leguminosarum phages against EPS mutant VF39pssD::Tn5 strain ...... 87

Table 3-8. Efficiency of plaquing (EOP) data for 17 R. leguminosarum VF39SM phages against EPS mutant VF39pssD::Tn5 strain and its complemented strains ... 91

Table 4-1. R. gallicum phages and their morphological characteristics ...... 109

Table 4-2. Host range of Rhizobium gallicum phages with known rhizobial strains. .... 110

Table 4-3. Host range of Rhizobium gallicum phages with indigenous rhizobial strains 111

Table 4-4. Codon usage of rhizobiophages P106B, L338C, 16-3, RHEph10 and bacterial host R. leguminosarum bv. viciae 3841 for amino acid leucine ...... 132

Table 5-1. Host range of Mesorhizobium phages with 11 different Mesorhizobium strains ...... 139

xii

Table 5-2. Host range of Mesorhizobium phages with Rhizobium strains ...... 140

Table 5-3. Presence of direct terminal repeats (DTRs) in the genomes of T7-like Podoviridae phages ...... 154

Table 6-1. Host range of phage PPF1 with 13 previously characterized strains of rhizobia and 17 indigenous rhizobial strains ...... 168

Table 8-1. Summary of the morphotypes of phages isolated with different trapping hosts in this study ...... 215

Table 8-2. Summary of the five rhizobiophage-genomes completed in this study ...... 216

Table 8-3. Phages that infect rhizobial hosts for which the complete genome has been reported (including the ones completed in this study) ...... 217

xiii

List of Figures and Illustrations

Figure 1-1. Life cycle of bacteriophage λ ...... 7

Figure 1-2. Generalized transduction of bacteriophages ...... 9

Figure 1-3. Morphotypes of prokaryotic viruses ...... 12

Figure 1-4. Mosaicism of gene cassettes among a group of five tailed bacteriophages ... 17

Figure 1-5. Comparison of the genomes of known Siphoviridae phages ...... 19

Figure 3-1. Lysis curves of phages isolated using R. leguminosarum bv viciae VF39SM as trapping host ...... 65

Figure 3-2. Lysis curves of phages isolated using R. leguminosarum F3 as trapping host ...... 66

Figure 3-3. Transmission electron micrographs of R. leguminosarum phages belonging to the family Siphoviridae ...... 67

Figure 3-4. Transmission electron micrographs of R. leguminosarum phages belonging to the family Myoviridae ...... 68

Figure 3-5. One step growth curve of phages P9VFCI and P11VFC ...... 71

Figure 3-6. Restriction enzyme profiles of DNA isolated from different R. leguminosarum phages ...... 73

Figure 3-7. Pulsed field gel electrophoresis (PFGE) ...... 74

Figure 3-8. Genome arrangement of vB_RleM_P10VF ...... 76

Figure 3-9. Distribution of predicted ORFs in the P10VF genome ...... 77

Figure 3-10. Protein profiles of selected phages of R. leguminosarum 3841 ...... 79

Figure 3-11. Protein profiles of selected phages of R. leguminosarum VF39SM ...... 80

Figure 3-12. Protein profiles of selected phages of R. leguminosarum F3 ...... 81

Figure 3-13. The organization of EPS biosynthesis gene cluster of R. leg VF39SM ...... 86

Figure 3-14. PCR amplified pssD and pssD+pssE genes of R. leguminosarum bv. viciae VF39SM ...... 89

xiv

Figure 3-15. Eckhardt gel analysis of the Rlv VF39pssD::Tn5 and its complemented mutants ...... 90

Figure 4-1. Lysis curves of phage P106B and P106CI ...... 114

Figure 4-2. Morphological characterization of phage P106B ...... 115

Figure 4-3. Transmission electron micrographs of R. gallicum phages ...... 116

Figure 5-1. Plaque morphology of phage Lo1R7ANS –A ...... 141

Figure 5-2. Transmission electron micrographs of Mesorhizobium loti phages ...... 142

Figure 5-3. Pulsed field gel electrophoresis (PFGE) ...... 143

Figure 5-4. Restriction enzyme profiles of DNA isolated from different M. loti phages 144

Figure 5-5. Genome arrangement of vB_MloP_Lo5R7ANS ...... 146

Figure 5-6. Genome arrangement of vB_MloP_Cp1R7ANS-C2 ...... 148

Figure 5-7. Alignment of vB_MloP_Cp1R7ANS-C2 and vB_MloP_Lo5R7ANS genomes ...... 149

Figure 5-8. Neighbor-joining phylogenetic tree of RNA polymerase of selected Podoviridae phages ...... 157

Figure 5-9. Sequence alignment of Cp1R7ANS-C2 and Lo5R7ANS T7-like RNA polymerases...... 159

Figure 6-1. Transmission electron micrographs of temperate phage PPF1 ...... 167

Figure 6-2. Pulsed field gel electrophoresis (PFGE) of DNA extracted from phage PPF1 ...... 170

Figure 6-3. Restriction enzyme profiles of digested PPF1 phage DNA ...... 171

Figure 6-4. Genome arrangement of rhizobiophage vB_RleM_PPF1 (PPF1) ...... 172

Figure 6-5. Southern blot hybridization ...... 174

Figure 6-6. Agarose gel (1%) showing the PCR products of the 5' and 3' regions of the phage integration into the R. leguminosarum F1 genome ...... 176

Figure 6-7. The sequence alignments of probable attachment (att) sites of phage PPF1 and its bacterial host Rhizobium leguminosarum F1 ...... 177

xv

Figure 6-8. The intergenic region between the putative ligase and integrase genes in the phage PPF1 genome showing the presence of direct repeats ...... 178

Figure 6-9. Schematic representation of the secondary structure of the putative tRNA- Pro (CGG) ...... 179

Figure 6-10. A Neighbor-joining phylogenetic tree showing the phylogenetic relationship between the amino acid sequences of 39 different phage integrases. . 188

Figure 7-1. Effect of three different phages on nodulation of their respective host Rhizobium strain when co-inoculated with different multiplicities of infection (MOIs) ...... 196

Figure 7-2. Effect of different phage inoculation methods on nodule occupancy by R. leguminosarum bv. viciae VF39SM and 3841 ...... 198

Figure 7-3. Effect of phage cocktail on nodulation by indigenous rhizobial strains present in four different soil types with histories of legume cultivation ...... 199

Figure 7-4. The assembly of Magenta jars for plant assays (A) and nodules formed by R. leguminosarum bv. viciae (Rlv) VF39SM and Rlv 248SM-ϕ2(pFPdegP1) during a nodulation competition assay (B)...... 200

Figure 7-5. Effect of four different rhizobiophages on nodulation competitiveness of two strains of R. leguminosarum bv. viciae (248SM and VF39degP1) ...... 202

Figure 7-6. Effect of two different phage cocktails on nodulation competitiveness of two strains of R. leguminosarum bv. viciae (VF39SM and 248SM- ϕ2(pFPdegP1)) ...... 206

Figure 7-7. Effect of two different phage cocktails on nodulation competitiveness of indigenous rhizobia present in three different legume soil samples against R. leguminosarum bv. viciae 248SM-ϕ2(pFPdegP1) ...... 208

Figure 8-1. Neighbor-joining phylogenetic tree of terminase large sub units of 94 phages ...... 220

Figure 8-2. Neighbor-joining phylogenetic tree of major capsid proteins of 25 phages 221

xvi

List of Symbols, Abbreviations and Nomenclature

Symbol Definition Amp Ampicillin CSPD Disodium 3-(4-methoxyspiro 1,2-dioxetane-3, 2'-(5'-chloro) tricycle [3.3.1.13,7] decan-4-yl) phenyl phosphate CFU Colony forming umits EOP Efficiency of plaquing EPS Exopolysaccharides Gm Gentamycin gusA β-glucuronidase ICTV International Committee of of Viruses Km Kanamycin LPS Lipopolysaccharides LB Lysogeny broth medium M.O.I. Mulitiplicity of infection Nm Neomycin ORF Open reading frame PFGE Pulsed-field gel electrophoresis PFU Plaque forming units SD Standard deviation SDS-PAGE Sodium dodecyl sulphate-polyacrylamide gel electrophoresis SEM Standard error of the mean Sm Streptomycin SM Suspension medium TAE Tris-Acetate-EDTA TBE Tris-Boric-EDTA Tc Tetracycline TEM Transmission electron microscopy TY Tryptone yeast extract medium

xvii 1

Chapter One: Introduction

1.1 Bacteriophages

Bacterial viruses or bacteriophages are ubiquitously distributed biological entities, which are capable of infecting their bacterial hosts. Since their first discovery by Twort (1915) and d’Herelle (1917) (Duckworth, 1976), phages have been a subject of great interest among many biological researchers. The significant impact of bacteriophages on bacterial population dynamics, their importance in controlling infectious bacterial diseases, and their abundance and diversity in the environment have garnered them an important position in the field of molecular biology. Phage virulence, which results in lysis of the host bacterial cell, is a contributing factor in altering the relative numbers of in a given community (Bouvier & del Giorgio, 2007; Chibani-Chennoufi et al.,

2004; Suttle, 2007; Swanson et al., 2009; Weinbauer & Rassoulzadegan, 2004). Apart from that, bacteriophages also have a notable influence upon bacterial diversity as they play a crucial role in horizontal gene transfer via transduction (Chibani-Chennoufi et al.,

2004; Weinbauer & Rassoulzadegan, 2004). Furthermore over the years, bacteriophage research has immensely contributed in establishing the foundation of molecular microbiology (Campbell, 2003). Also the ability to use these viruses as a therapeutic agent to combat common bacterial diseases, which is known as ‘phage therapy’, has been considered widely since their first discovery.

2

1.1.1 Bacteriophage abundance and distribution

Bacteriophages are widely distributed in both terrestrial and aquatic environments. The global phage population is known to be overwhelmingly large and it has been suggested that the estimated phage population may typically outnumber the prokaryotic cells by 10- fold in the biosphere making them the most abundant biological entity (Brüssow &

Hendrix, 2002). With an approximated 1031 phages that infect Bacteria and Archaea as the estimated population in the biosphere, they probably represent the most diverse and unexplored genetic reservoir on earth (Hatfull, 2008).

The soil, which is inhabited by a heterogeneous microbial community, may harbour approximately 1.5 X 108 of bacteriophages per gram (Ashelford et al., 2003). Soil provides an incredible range of niches with diverse groups of bacteria within them, making it a favourable habitat for bacteriophages. The abundance of phages in soil ecosystems may vary depending on the chemical and physical parameters of the soil such as moisture, pH and available organic matter (Srinivasiah et al., 2008). The phage population and diversity in the rhizosphere environment are known to be higher than that of other soil areas, due to the elevated abundance of bacteria near the root surfaces

(Swanson et al., 2009). The ample supply of energy sources in the form of root exudates serves as the cause for the increased bacterial population within the rhizosphere.

However, a higher VBR (Virus-to-Bacterium Ratio) is observed in bulk soil compared to the rhizosphere (Swanson et al., 2009). Although viral production, diversity and activities are higher within the rhizosphere due to elevated host populations, it can be also expected that the rhizosphere contains higher concentrations of extracellular enzymes and more acidic pH, which can cause viral decay (Williamson, 2011).

3

1.1.2 The general infection process of a bacteriophage: the lytic life cycle

When a lytic or a virulent phage, such as bacteriophage T4, infects a bacterium, it enters into its lytic pathway of reproduction, which results in a large number of progeny viral particles and the lysis of the host bacterial cell. However, for a temperate phage, two developmental choices are available and it can either enter into a lytic cycle or into a lysogenic pathway by a process that result in switching off of its lytic capacity, resulting in the survival of the host as a lysogenic bacterial cell (Echols, 1972) (Figure 1.1). The phage-host interactions during an infection process have been studied using single-step or one-step growth curves since the early years of phage biology. Three important parameters of infection can be identified: eclipse period, latent period and burst size of viral infection.

The general lytic cycle of a bacteriophage progresses through several tightly programmed steps, the efficiency and timing of which predominantly rely upon the physiological and metabolic state of the host bacterial cell (Kutter & Sulakvelidze, 2005). Host cell recognition and adsorption of the phage onto the bacterial cell surface is the inaugural step of any phage infection. Tailed phages bind to specific surface molecules on the host bacterium through their tail fibers or tail spikes (Kutter & Sulakvelidze, 2005). Bacterial pili, flagella, outer membrane proteins or extracellular polysaccharides can serve as surface molecules or phage receptors for adsorption (Crook et al., 2013; Rakhuba et al.,

2010). These receptors are the key factors in determining the specificity of a phage infection, and the number, localization and density of receptors govern the efficiency of the infection (Rakhuba et al., 2010). The rate of phage adsorption is influenced by multiplicity of phage infection (or the ratio of phage: bacteria involved in the infection),

4 host physiological states, and a series of non-specific physicochemical factors such as pH, temperature, and presence/ absence of ions (Kutter & Sulakvelidze, 2005). As they are obligate intracellular parasites of bacteria, the adsorption of phages should be essentially followed by the penetration of phage DNA in to host cell, to continue a successful infection. This process of ‘injecting’ phage genome into the host cytoplasm is rather specific for different phage groups and known to involve several DNA transfer mechanisms. In general, a tail tip associated enzymatic activity aids in penetrating the peptidoglycan layer and the inner memebrane, and then the DNA is released into the host cell cytoplasm. Once inside the host cell, many phages either rapidly circularize their

DNA using sticky ends or terminal redundancies, or have the ends of their DNA protected with proteins, to secure their DNA from being susceptible to host exonucleases

(Kutter & Sulakvelidze, 2005). During succeeding steps, phages subvert the host metabolism into phage-directed metabolism, by initially sequestering the host RNA polymerase by binding to strong phage promoters. Subsequently the host transcription machinery is used to express a set of early phage genes to restructure the host system according to phage requirements (Kutter & Sulakvelidze, 2005). This restructuring of host transcription usually results in complete termination of host macromolecular synthesis. Some phages encode new sigma factors to reprogram host RNA polymerase

(Ex: T4), while some encode their own RNA polymerase to transcribe their genomes (Ex:

T7). The transcription, translation and replication steps are followed by the phage assembly step. DNA packaging is initiated into a pre-assembled procapsid and head maturation occurs through enzymatic cleavages of scaffolding and major head proteins

(Black, 1989). Once all the tail and accessory structures are assembled, a phage lytic

5 cycle is concluded, in most cases, by lysis of the host cell and release of mature phage particles. This is mainly accomplished either by inhibiting peptidoglycan synthesis using a single protein or by enzymatically cleaving the peptidoglycan present in the host cell wall (Borysowski et al., 2006). Host cell lysis in dsDNA viruses is achieved using a lysis system composed of two enzymes; holin and endolysin (Wang et al., 2003). Endolysins are phage-encoded hydrolases that carry one or more of four different muralytic activities against the glycosidic, amide, or peptide bonds of the peptidoglycan and are expressed at the late phase of phage gene expression (Borysowski et al., 2006). The release of endolysins by a phage-infected cell causes the degradation of the host peptidoglycan layer, resulting in loss of cell viability. However to gain the access to its desired substrate, endolysin must be aided by the activity of holins. Holins are small membrane proteins that mediate the formation of permeabilizing lesions. These lesions allow the endolysins to pass through them and initiate the peptidoglycan degradation, resulting in host cell death (Wang et al., 2003).

1.1.3 Lysogeny and its consequences

Apart from the lytic life cycle, temperate bacteriophages are capable of entering a second developmental pathway known as a lysogenic life cycle (Figure 1.1), where the phage genome is integrated (in most known cases) into the host chromosome giving rise to a quiescent form of phage genome known as a ‘prophage’. The replication of prophages is coordinated with the replication of the host genome and the genes coding for enzymes and proteins required for the lytic life cycle of the phage are down-regulated during lysogeny. Once established, the lysogenic state is extremely stable (Echols, 1972).

6

However, the lytic life cycle of a prophage can be induced by different environmental factors such as UV irradiation, increased temperature, mutagenic agents and nutrient starvation. Switching from lysogeny to the lytic life cycle occurs due to the depletion of a repressor molecule. In the well-studied lysogeny module of λ phage, phage repressor is subjected to RecA-stimulated self-proteolysis resulting in the conversion (Lemire et al.,

2011). Large family of lambdoid prophages found in Salmonella genomes regulate their induction using a repressor-antirepressor module (Lemire et al., 2011). During normal lysogenic growth, antirepressor is repressed by LexA, which also undergoes RecA- stimulated cleavage in the presence of damaged DNA. Once LexA is depleted, antirepressor is synthesized resulting in the formation of repressor-antirepressor complex.

This binding inactivates the lysogenic repressor, thus the lytic life cycle will be induced.

Prophages are considered as important players in the environment, as they may contain genes that can be expressed during the lysogenic state incorporating new phenotypic traits to the host (Abedon, 2008). This phenomenon, which is known as ‘lysogenic conversion’, can provide many unique phenotypic traits to the bacterial lysogen contributing to its survival fitness. Besides providing superinfection immunity to the lysogen against related phages, lysogenic conversion can play a substantial role in the emergence of certain pathogens. Phage-encoded toxins of Corynebacterium diptheriae,

Clostridium botulinum, Streptococcus pyogenes, Staphylococcus aureus and E. coli

(Shiga toxin) are the key components associated with the pathogenesis of these organisms

(Brüssow et al., 2004). Other than contributing to the host life style with prophage- encoded fitness factors, lysogenization can also result loss of gene functions.

7

Figure 1-1. Life cycle of bacteriophage λ

Phage particle attaches to the host surface using its tail. Once attached, phage DNA is injected into the host cell. Within the host cell cytoplasm, linear phage DNA molecule is circularized using its cohesive site (cos). In some infected cells, transcription, translation and replication of phage DNA occurs, while other infected cells phage DNA is integrated into host chromosome and phage development process is suppressed. The cells with phage DNA that undergo lytic development, replicated DNA is added into prohead, tails are assembled and progeny particles are liberated into the surrounding by host cell lysis.

Reprinted by permission from Macmillan Publishers Ltd., Nature Reviews Genetics

(Campbell, 2003), copyright (2014).

8

1.1.4 Phages as agents of horizontal gene transfer agents

Bacteriophages contribute to horizontal gene transfer via the processs known as transduction. The phage-mediated transfer of DNA from one host bacterium (donor) to another host cell (recipient), was first observed by Norton Zinder and Joshua Lederberg with Salmonella phage P22 (Lederberg et al., 1951).

1.1.4.1 Generalized Transduction

Mistaken packaging of the host bacterial DNA into the capsid during the lytic life cycle of a phage can lead to generalized transduction (Figure 1-2) (Griffiths et al., 2012).

Generalized transduction was first observed with the temperate phage P22 (Lederberg et al., 1951). The occasional erroroneous packaging of a headfull of host bacterial DNA can be delivered to the recipient cell by the phage. The size of the host DNA fragment that is encapsidated within the phage head approximates the size of the phage genome (Miller,

2001). The defective phage particles that contain bacterial DNA in place of viral DNA can still adsorb and inject their nucleic acid component into the recipient bacteria despite their inability to initiate their own phage-related functions (Fineran et al., 2009).

Generalized transduction can have two possible outcomes; complete transduction and abortive transduction (Benson & Roth, 1997). If the transduced host DNA shares a certain degree of sequence similarity with the recipient host chromosomal DNA, homologous recombination can occur leading to stable and inherited acquisition of transduced DNA by the recipient host (Benson & Roth, 1997).

9

Figure 1-2. Generalized transduction of bacteriophages

During the lytic life cycle of the phage, it can occasionally mispackage DNA from host bacterium resulting in a transducing phage. These particles can adsorb to and inject the

DNA into the cytoplasm of another host where the transduced DNA can undergo homologous recombination with recipient chromosomal DNA. This figure has been reproduced from Griffiths et al. (2012)

10

In contrast, abortively transduced DNA can stay extrachromosomally within the recipient, evading nuclease mediate degradation. However, abortively transduced DNA is non-replicative and will be inherited by only one progeny cell during bacterial cell division.

1.1.4.2 Specialized Transduction

Specialized transduction occurs as a result of the lysogenic life cycle of temperate phages. Incorrect excision of a prophage during its entry into the lytic life cycle can result in acquisition of host chromosomal DNA that flanks the phage genome resulting in specialized transduction. In contrast to generalized transduction, specialized transducing phages can only obtain genes flanking the prophage insertion site; hence the name

‘specialized transduction’. Also, unlike in the generalized transduction process, which results in particles defective in their phage functions, specialized transducing phage particles contain most or all of their phage DNA along with host DNA. Phage λ is a specialized transducing phage that can lyzogenize its E.coli host by inserting its genome into the bacterial attachment site, attB located between the galactose (gal) and the biotin

(bio) operon. With λ transducing particles genes from either the gal or bio operon can be transduced into a new host.

1.1.5 Classification of bacteriophages

The classical system of phage classification, like any other method used for grouping, allows the easy management of data by condensing the large amounts of available information about phages into groups, within which the individual phages are placed based on their nucleic acid type, and physiological and morphological parameters.

Identification of new phages and comparative analysis are highly aided with such a

11 system. Ideally, a classification system should also be able to outline evolutionary relationships between phages (Lavigne et al., 2008). The invention and advances of electron microscopy allowed phage biologists to examine the morphology of virions and gather structural parameters of a phage such as capsid symmetry and tail fiber lengths, which resulted in a morphotype based classification scheme (Nelson, 2004). In 1962, a viral classification system was proposed based on their nucleic acid type, capsid shape, presence or absence of an envelope, and number of capsomers (Lwoff et al., 1962), which was later adopted by ICTV (International Committee of Taxonomy of Viruses)

(Ackermann, 2009). The widely used current phage classification system is derived from the method proposed by Bradley (Bradley, 1967), where the phages were divided into six morphotypes, represented by phages T4, λ, T7, φX174, MS2, and fd (Ackermann, 2009).

Since then, the phage classification system has advanced and by year 2012, approximately 6300 prokaryotic viruses had been classified into 19 families (Figure 1-3)

(Ackermann & Prangishvili, 2012). The nature of the nucleic acid (dsDNA, ssDNA, ssRNA and dsRNA) and the virion morphology are used as the defining factors to place these viruses in families (Lavigne et al., 2008). The classified viral population is overwhelmingly dominated by tailed phages (96.3%) and over 3600 members (overall

57.3%) are grouped in the family Siphoviridae making it the largest group of phages with available information (Ackermann & Prangishvili, 2012). All tailed phages come under the order Caudovirales, with dsDNA and icosahedral or elongated heads. The order contains three main families: Myoviridae, Siphoviridae and Podoviridae. Myoviridae family members have rigid, contractile tails composed of a neck, contractile sheath and a central tube.

12

Figure 1-3. Morphotypes of prokaryotic viruses

Prokaryotic viruses that have been described and observed through electron microscopy so far, can be tailed (blue), polyhedral (red), filamentous (orange) or pleomorphic

(green). HHPV-1, SHI, group, and STV1 group are awaiting classification (Ackermann

& Prangishvili, 2012). This figure has been modified from Ackermann and Prangishvili

(2012). Reprinted by permission from Springer Science + Business Media, Copyright

(2014).

13

The most numerous of classified phages, Siphophages, possess rather flexible, noncontractile and long tails. The final family of the order, Podoviridae is characterized by a fairly short and noncontractile tail (Ackerman, 2009). The other families that compose ~4% of the prokaryotic viruses studied, belong to more variable groups.

Considering the increasing amount of phage genomic and proteomic data added to public databases and in attempt to unify the classical morphotype-based classification and genomic relationships of phages, a new classification scheme has been proposed for families Podoviridae (Lavigne et al., 2008) and Myoviridae (Lavigne et al., 2009). The phage genera are defined by measuring the relationship of genomes by the numbers of homologous/orthologous proteins shared within the genomes using BLASTP-based tools.

This was applied to 55 fully sequenced Podoviridae genomes available in the public databases (Lavigne et al., 2008) and later has been extended to reconstruct the classification of 102 Myoviridae phage genomes. Both these attempts were mostly in agreement with the current phage classification of the ICTV (Lavigne et al., 2009).

1.1.6 Bacteriophage genomics

Due to relatively smaller sizes of phage genomes they were the first complete genomes to be sequenced. Since the completion of first ssDNA phage genome in 1977 (genome of

φX174) (Hatfull, 2008), phage genomics has traversed a long way. By 2011, the NCBI genome data base contained around 750 completed genomes of phages, where the majority was composed of members of order Caudovirales (Hatfull & Hendrix, 2011).

The recent advances in sequencing technologies have resulted a huge influx of sequence data into public databases. Despite harboring comparatively smaller genomes and having

14 remarkable diversity with greater abundance in the environment, phage genome sequencing lags far behind compared to bacterial genome sequencing (Klumpp et al.,

2012). This slow pace of phage genome sequencing can be attributed to various challenges that come along with phage genomes. Most of the available sequencing technologies require pure and high quality DNA and this can be a rather challenging task with many phages. However, the most notable obstacle in sequencing phage genomes is the presence of highly modified DNA. To evade host defense mechanisms phage genomes are known to have modified bases, such as methylated bases, in their genome making them resistant to many DNA manipulation techniques (Klumpp et al., 2012;

Miller et al., 2003; Warren, 1980) including conventional shotgun cloning-based Sanger sequencing. Though the cloning step is avoided in PCR-based next generation sequencing technologies, use of methylated DNA as PCR templates can be problematic and may affect the subsequent sequencing process (Klumpp et al., 2012).

Additionally, the available pool of phage genomes is highly dominated by phages infecting several bacterial species. In 2011, phage database of NCBI had around 750 completed genomes and 70% of these available genomes represented only 12 different bacterial hosts (Hatfull & Hendrix, 2011). As a result of this host bias, our understanding of phage genomics is only limited to certain groups of phages, representing a very small proportion of the comprehensive diversity. Regardless of these issues, phage comparative genomic studies have contributed immensely in improving our knowledge of phage genome architecture, genome mosaicism and, up-to a certain level, evolution of the phages (Casjens, 2005; Hatfull, 2008; Hendrix, 2003).

15

1.1.6.1 Diversity of phage populations

The sequenced genome pool of phages consists of genomes of varying lengths. The sizes of sequenced genomes were observed to be varying from ~3300 nucleotides of ssRNA virus of E. coli to the 500 kilobase pairs genome of Bacillus megaterium phage G

(Hatfull & Hendrix, 2011).

Remarkable genetic diversity can be observed among phage populations. Nucleotide sequence similarities among genomes of phages with nonoverlapping host ranges are rarely observed (Hatfull, 2008). It is suggested that host preference may act as a barrier in genetic exchange between phages, where phages with common or closely related bacterial hosts can share nucleotide homologies due to the genetic contact among them

(Hatfull, 2008; Hatfull & Hendrix, 2011).

1.1.6.2 Bacteriophage genome mosaicism

Bacteriophage genome mosaicism is defined as the presence of patchy sequence similarities within comparable genomes (Casjens & Thuman-Commike, 2011). The compositions of dsDNA phage genomes exhibit mosaicism and the regions of genomes that show variable sequence similarities have been interpreted as parts with different evolutionary histories that have been horizontally exchanged among phages. Simply, this can be explained as genomes with different combinations of modules, which are exchangeable within the population. The size of these modules, and their exchange rates, as well as the phage genomes which harbor these modules are observed to be highly variable (Hatfull & Hendrix, 2011). Though mosaicism is a striking feature identified among phage genomes through comparative genomic tools, it is not unique in phages as it can be also identified in bacteria. However the appearance of mosaicism in phage

16 genomes is much higher than in bacterial genomes (Hatfull & Hendrix, 2011). Genetic mosaicism has been described using comparative genomic analysis among the genomes of five temperate coliphage that belong to families Myoviridae and Siphoviridae

(Lawrence et al., 2002) (Figure 1-4). The set of head genes in phage λ and N15 are closely related, while HK97 and SfV share a head gene set that is different than phage λ and N15, but homologous with each other. Head genes in phage Mu are different than these two sets. Tail genes also show relatedness, but the shared similarities are among different phage groups than that of head genes. The tail gene clusters of given

Siphoviridae phages are homologous (λ, HK97 and N15), whereas phage Mu and SfV have closely related tail gene clusters. The early expressed gene regions of this group of phages have single genes or small groups of genes as modules of mosaicism compared to the larger late-expressed head and tail mosaic modules (Lawrence et al., 2002).

Genome mosaicism can be either viewed at the nucleotide levels by heteroduplex mapping and sequence comparison, or it can be identified at protein levels by comparing gene products (Hatfull, 2008). The latter can be conveniently applied, especially with phage genomes that do not share any sequence similarities at nucleotide levels.

Figure 1-4. Mosaicism of gene cassettes among a group of five tailed bacteriophages

Homologous genes within the cassettes are co-ordinately colored; striped genes indicate that protein products perform analogous functions. This figure has been reproduced from Lawrence et al. (2002). 17

18

1.1.6.3 Phage genome architecture

Common genome arrangement themes can be identified among bacteriophages using comparative genomic tools. This gene synteny can be observed in many phage groups; figure 1-5 indicates the common genome architecture shared by nine siphophages

(Brüssow & Desiere, 2001). Such conserved syntenic relationships of phage structural and assembly genes were observed not only among genomes with sequence similarities, but also with phage genomes that have non-detectable relatedness even at the amino acid levels. (Hatfull, 2008).

1.1.6.4 Prophages and prophage-like elements

Many sequenced bacterial genomes carry prophages or prophage-elements, with an average of three prophages per bacterial genome (Based on sequence data available by

2005) (Casjens, 2005). However, this can be an underestimation of the actual number considering the difficulties and incomplete methods available for prophage identification

(Canchaya et al., 2004). Undeniably, prophages act as important sources of new genes for their lysogenic hosts, not only for the well-known toxin encoding genes of human pathogens, but also in other instances where the hosts gain increased survival fitness due to phage-encoded factors. It is suggested that these prophage sequences can be helpful in understanding evolutionary mechanisms and population dynamics of both phage and its host.

Figure 1-5. Comparison of the genomes of known Siphoviridae phages

Same color codes are used for corresponding genes and similar overall architecture can be identified among the genes coding for virion structure and assembly functions. This figure has been reproduced from Brüssow and Desiere (2001). Reprinted by permission from John Wiley & Sons, Inc., Copyright (2014). 19

20

1.1.6.5 Phage metagenomics

Phage metagenomics is a culture-independent sequence-based analysis of phage genomes present in environmental samples (Casas & Rohwer, 2007). The presence of a high proportion of unknown sequences has been consistent with many phage metagenomic studies, complementing the high genetic diversity observed with whole genome sequence data. Though these studies are highly valuable in understanding geographic distribution of phages and phage types, the information provided in metagenomics can be rarely used in elucidating phage genome structure and evolution as the possibility of assembling whole phage genomes from such studies is relatively low at the moment (Hatfull, 2008).

1.1.7 Bacteriophages and their applications

1.1.7.1 Phages in medicine

Phage therapy is a biocontrol approach where bacteriophages are used to control undesirable bacterial populations such as those associated with different infectious diseases (Loc-Carrillo & Abedon, 2011). Since their first discovery, the potential use of bacteriophages as antibacterial agents to combat bacterial diseases has been hugely appreciated (Kutter et al., 2010). As there were not many methods available to treat bacterial infections at these times, the study of bacteriophage as a potential anti-bacterial candidate attracted the attention of many researchers (Gill & Hyman, 2010; Kutter et al.,

2010). However this initial excitement was short-lived due to the ambiguous results of early phage therapy trials and, secondly and most importantly, due to the discovery of antibiotics as a clinical treatment for bacterial infections (Fischetti et al., 2006). However with the increased emergence of antibiotic resistant pathogenic bacteria to most of the

21 currently available antibiotics, the necessity of reinventing phage therapy has garnered attention. Despite many challenges that are associated with using phages as an accepted method in clinical treatments (Carlton, 1999; Goodridge, 2010; Loc-Carrillo & Abedon,

2011; Sulakvelidze et al., 2001), current studies and applications of phage therapy are promising. Many multiple-phage formulations, such as pyophage and intestiphage, have been evaluated at Eliava Institute in Tbilisi in Georgia, for their efficacy in treating different bacterial infections (Kutateladze & Adamia, 2008). Pyophage is a phage cocktail composed of five different phages, including phages against S. aureus,

Streptococcus, Proteus spp., P. aeruginosa, and E. coli, which can be used for the treatments of digestive tract infections, wounds and other infectious dermatological complications (Kutateladze & Adamia, 2008; Markoishvili et al., 2002; Skiena, 2001).

The ‘intestiphage’ is used as a treatment of intestinal problems caused by various pathogens including Shigella, Salmonella, Proteus, Staphylococcus, Pseudomonas, and various serovars of enteropathogenic E. coli (Kutateladze & Adamia, 2008).

1.1.7.2 Phages in agriculture

The impact of phages on agriculture is mainly mediated by the bacterial species associated with plants. The phage effect on bacteria can be due to bacterial lysis, positive selection of resistant bacterial strains, or modification of bacteria through phage conversion (Gill & Abedon, 2003). However the subsequent effect of the phage on a specific plant can be either positive or negative depending on the type of interaction shared by its host and the particular plant.

22

The use of bacteriophages in controlling bacterial pathogens that infect economically important agricultural plants has been reported. Phages have been successfully tested as biopesticides against many phytopathogens (Svircev et al., 2011). The relatively low cost of production and ease in preparing phage-based treatments act in favour of phage biopesticides (Jones et al., 2007). Bacteriophages LIMEstone1 and LIMEstone2 have been successfully used to control potato soft rot disease caused by Dickeya solani, where potato tubers were treated with phages prior to planting (Adriaenssens et al., 2012). Many other studies have successfully implemented bacteriophage-based pathogen control methods such as those for bacterial wilt in tomato caused by Ralstonia solanacearum

(Fujiwara et al., 2011), for fire blight in pome fruits caused by Erwinia amylovora (Boule et al., 2011; Gill et al., 2003; Schnabel & Jones, 2001), for potato scab caused by

Streptomyces scabies (McKenna et al., 2001) and for leaf blight in onions caused by

Xanthomonas (Nath et al., 2011).

1.1.7.3 Phages in the food industry

Bacteriophages and their derivatives are used as antibacterial agents to control food- borne pathogens and spoilage contaminants in food as well as in food processing environments (Sillankorva et al., 2012). The use of phage-based products to control undesirable bacterial populations in the food industry has been widely recognized during recent years and many such products are currently used successfully. Intralytix, a USA- based biotechnological company, has introduced several commercially available, phage- based products, which can be used against food pathogens such as Listeria monocytogenes (ListShield™), Escherichia coli O157:H7 (EcoShield™) and Salmonella

23 spp. (SalmoFresh™) (Endersen et al., 2014). The USA Food and Drug Administration

(FDA) has approved all these products for direct applications into food. ListShield™ is the first phage-based product to be recognized as safe by the FDA. Since then many food- borne pathogens have been controlled efficiently using phage-based biotherapeutics

(Endersen et al., 2014).

1.2 Rhizobia-legume symbioses

The bacterial group commonly referred to as rhizobia is capable of nodulating leguminous plants, resulting in nitrogen fixing symbiotic associations. The members of the group include the genera Rhizobium, Bradyrhizobium, Sinorhizobium,

Mesorhizobium, Azorhizobium, and Allorhizobium (Willems, 2006). Rhizobia induce the formation of nodules on the legume plant roots, within which atmospheric nitrogen is converted to ammonia that can be utilized by the plant, thus reducing the requirement for nitrogen containing fertilizers in agricultural fields. The rhizobia- legume symbioses play a significant role in the global nitrogen cycle, accounting for nearly a fourth of the nitrogen fixed annually all over the world (Masson-Boivin et al., 2009). Unlike the commercially available nitrogen fertilizers that are produced at the cost of non-renewable fossil fuels through Haber-Bosch process, the nitrogen input by nitrogen fixing symbioses, which is also known as biological nitrogen fixation (BNF), is both cost effective and environmentally friendly (Bockman, 1997; Bohlool et al., 1992; Cheng,

2008). Due to these reasons, it is a recommended practice to inoculate legume seeds with commercially available Rhizobium inoculants in order to meet their nitrogen requirements during growth.

24

1.2.1 Competition for nodulation of legumes

In many countries, superior nitrogen fixing rhizobial strains are used as inoculants in agriculture to increase legume crop productivity (Vlassak & Vanderleyden, 1997). These inoculant strains have been optimized for their nitrogen fixing capabilities with a particular legume crop under controlled conditions, in order to obtain maximum legume yields. However many of these strains fail to accomplish their purpose under natural conditions. When these beneficial rhizobial strains are released to the soil, which is an extremely complex ecological niche with highly diversified microbial community, they often fail to exhibit high nodule occupancy due to the presence of indigenous rhizobia

(Triplett & Sadowsky, 1992). These indigenous rhizobial strains, which are in many cases relatively ineffective in fixing nitrogen, are highly adapted to their ecological niche and hence capable of nodulating the legume crops thereby excluding the inoculant strains from the nodules (Hynes & O'Connell, 1990). The inability of rhizobial inoculants to adapt to the prevailing soil conditions over the indigenous rhizobial strains, which has been known as “the Rhizobium competition problem” (Dowling & Broughton, 1986;

Triplett & Sadowsky, 1992), is a major obstacle in the attempt to improve the legume yield using superior strains. Many approaches have been suggested to overcome this problem by enhancing the nodulation competitiveness of the inoculant strains.

Inoculating larger volumes of beneficial rhizobial cultures with high cell densities into the legume soils, with the hope that relatively higher numbers may enhance the nodule occupancy by the inoculants over the indigenous strains is considered as one such approach (Triplett & Sadowsky, 1992). Also, selecting strains that are highly competitive in nature has been suggested as a potential solution. However, selecting

25 naturally competitive strains with enhanced nitrogen fixing capabilities is considered problematic, since many biological and environmental characteristics of the soil may affect survivability of these bacterial strains in the rhizosphere (Dowling & Broughton,

1986). Enhancing the genetic characteristics of nodulating rhizobia to develop them as successful competitors in the rhizosphere environment is another attempt made to address the Rhizobium-competition problem. This has been implemented by engineering rhizobial strains that are capable of producing a narrow spectrum bacteriocin known as trifolitoxin, which is bacteriostatic against certain bacterial species including legume symbionts

(Robleto et al., 1997; Robleto et al., 1998; Triplett & Barta, 1987; Triplett et al., 1994).

Although the development of trifolitoxin producing strains for the enhanced competitiveness over indigenous rhizobial populations can be considered as an effective strategy against the competition problem, this may have some pitfalls in the form of potential concerns of regulatory bodies which may arise due to the involvement of releasing genetically engineered organisms into the environment (Amarger, 2002).

Incorporation of something to the inoculum that can enhance the survivability and competitiveness of the inoculant rhizobial strains without directly altering the strain itself can be used as a better strategy to address the rhizobial competition problem as well to avoid the requirement of using the genetically modified organisms as a solution. The natural predation of bacteriophages can be of a great importance in implementing such a method in agriculture. Co-inoculation of phages with phage resistant superior strains can provide the rhizobial strains with a competitive advantage by inhibiting the growth of competing indigenous rhizobial strains that are sensitive to the phages.

26

1.2.2 Rhizobiophages

Rhizobiophages, viruses that infect bacterial species of the rhizobial group, have been widely isolated from legume soils associated with their susceptible host. These phages were mainly utilized in phage typing of the rhizobial populations (Appunu & Dhar, 2006;

Staniewski, 1980) and some of them have been characterized in a preliminary manner using their morphotypes, host range and other physiological characters such as burst size, adsorption rates etc. (Dhar et al., 1978; Dhar et al., 1993; Turska-Szewczuk & Russa,

2000; Turska-Szewczuk et al., 2010; Werquin et al., 1988)

1.2.2.1 Rhizobiophages as transducing agents

Rhizobiophages have been studied for their likely use as vectors in genetic engineering due to their transducing ability (Finan et al., 1984; Lawson et al., 1987; Mink et al.,

1982; Shah et al., 1981). Some of the well-known rhizobiophages, such as RL38JI

(Buchanan-Wollaston, 1979) and ϕM12 (Finan et al., 1984) are transducing phages.

Phage RL38JI is a lytic phage of R. leguminosarum and capable of transducing from strain R. leguminosarum 1682 to 1055, at a frequency greater than 10-8 (Buchanan-

Wollaston, 1979). RL38JI can also transduce between R. leguminosarum strains and between R. trifolii (now considered as a symbiovar of R. leguminosarum) strains. Phage

ϕM12 is a lytic myophage of Sinorhizobium meliloti 1021 and has been used as standard tool for genetic manipulation of the bacterium (Finan et al., 1984; Glazebrook & Walker,

1991). Many other phages of S. meliloti such as, N3 (Martin & Long, 1984), DF2

(Casadesus & Olivares, 1979) and phage 11 (Sik et al., 1980) have been identified and used as transducing agents for genetic studies of this rhizobial group.

27

1.2.2.2 Rhizobiophages in agriculture

The potential significance of rhizobiophages in altering the ecology of various rhizobial populations in the rhizosphere environment has been widely appreciated (Hashem &

Angle, 1988; Mendum et al., 2001; Werquin et al., 1988). Apart from having a pivotal role in rhizobial evolution by implementing a positive selective pressure upon resistant strains, the presence of rhizobiophages can also affect the legume- Rhizobium symbioses substantially by causing significant changes in the relative numbers of resistant and susceptible rhizobial strains in soil (Ahmad & Morgan, 1994; Hashem & Angle, 1988).

The influence of phages on nodule occupancy and nitrogen fixation has been demonstrated using Bradyrhizobium japonicum strains (Hashem & Angle, 1990). The nodulation efficiency of B. japonicum strain USDA 117 was decreased in the presence of a virulent phage, while the population density of the bradyrhizobia also declined within the rhizosphere (Hashem & Angle, 1988). Hashem and Angle (1990) have shown that the co-inoculation of two strains of B. japonicum with rhizobiophages into the soil can cause significant reduction of nodule occupancy by the phage sensitive strain while allowing the resistant bradyrhizobia to occupy the greater number of nodules. In the presence of phage, the nodule occupancy by the susceptible strain B. japonicum USDA 117 was reduced to 23%, whereas a 71% increase in the nodule occupancy was observed for the phage resistant B. japonicum USDA 110. Similarly, the relative success of competing R. trifolii strains was altered in the presence of a bacteriophage (Evans et al., 1979). The nodulation efficiency of phage-sensitive R. trifolii strains SU297/31 was decreased when the phage NT1 was present in the inoculum. This decrease was accompanied by a corresponding increase in nodule occupancy by the phage-resistant SU204. These results

28 reinforce the concept of enhancing the nodulation occupancy of superior rhizobial inoculants by using a mixture of phages to which the indigenous rhizobia may be susceptible, but the inoculants are resistant. However, very few studies have been done to determine the phage effect on the nodulation of R. leguminosarum, which forms nodules with peas, lentils and other important crops like faba beans.

1.2.2.3 Rhizobiophage genomics

Despite their importance in the rhizosphere environment and related agricultural applications, the genomes of rhizobiophages had been studied very little until recently. In

2012, only two complete rhizobiophage sequences were present in the NCBI phage database. Phage 16-3 of S. meliloti Rm41 (NC_011103) has a genome of 60,195 bp in length, while the 57,416 bp length genome of S. meliloti 1021 phage PBC5 was deposited under the accession number NC_003324, although there were no associated publications.

During the past two years, the rhizobiophage genome pool has increased considerably, reaching a total of 15 complete genome sequences of phages infecting rhizobia (Table

1.1). However, this database is dominated by phages infecting R. etli and S. meliloti, whereas only one phage genome (L338C) of rhizobial host R. leguminosarum is available.

29

Table 1-1. Availability of complete genome sequences of rhizobiophages

GenBank accession Phage Name Host bacterium Reference No:

(Deák et al., 2010; Sinorhizobium meliloti Ganyu et al., 2005; Phage 16-3 NC_011103 Rm41 Semsey et al., 1999; Semsey et al., 2002)

PBC5 S. meliloti 1021 NC_003324 RHEph01 JX483873 RHEph02 JX483874 RHEph03 JX483875

RHEph04 JX483876

RHEph05 Rhizobium etli JX483877 Santamaria et al. RHEph06 JX483878 (2014) RHEph08 JX483879 RHEph09 JX483880 RHEph10 JX483881 ϕM12 S. meliloti 1021 KF381361 Brewer et al. (2014) L338C R. leguminosarum 3841 KF614509 Restrepo (2012) RR1-A NC_021560 Engelhardt et al. R. radiobacter P007 RR1-B NC_021557 (2013)

30

1.3 Research objectives

Previous studies with B. japonicum and R.trifolii have shown that a phage inoculum has the potential to alter the nodulation dynamics of phage resistant and phage susceptible strains within the rhizosphere. Therefore, it can be suggested that a 'phage-cocktail’ can be used as a solution to address the Rhizobium competition problem. Co-inoculation of

‘phage-cocktail’ with the phage-resistant inoculant rhizobia can provide the competitive advantage towards the inoculant strains by inhibiting the growth of competing phage- sensitive native strains. The Hynes lab has suggested developing such a ‘phage-cocktail’ containing several different strains of rhizobiophages targeting a wide variety of native rhizobia. This can be used to improve the efficacy of Rhizobium inoculants and mitigate the Rhizobium competition problem. However, a prerequisite for the application of such technology is a thorough understanding of rhizobiophage biology. Detailed characterization of the phage genome accompanied with proteome data is important to screen out potentially problematic phages such as ones that lysogenize their hosts and could result in the development of phage resistant indigenous strains. Despite their diversity and abundance in the rhizosphere, rhizobiophages have not been studied extensively and their diversity and precise role in an ecological context is poorly understood. Around the time this study started, there was a remarkable scarcity of available genomic data on rhizobiophages. Though this situation has been remedied somewhat over the past couple of years, still the genomic data available for rhizobiophages are biased toward phages that infect a small number of rhizobial hosts.

Therefore this study was initiated to improve our understanding on rhizobiophages of several different rhizobial strains at the physiological, proteomic and genomic levels. It is

31 expected that these data will ultimately serve as a resource in developing not only a phage cocktail that can be used to mitigate Rhizobium competition problem, but also any other rhizobiophage-related future applications.

The specific objectives of this study were:

1. To isolate a variety of novel rhizobiophages from various legume soil

sources obtained from Alberta, Saskatchewan, Ontario and British

Columbia using different rhizobial trapping hosts.

2. To perform a preliminary characterization of some of the phage isolates

through the host range analysis, growth parameters, and genome and

proteome characteristics.

3. To sequence, assemble and annotate genomes of selected rhizobiophages

representing diverse rhizobial host strains and different phage morphotypes.

4. To identify phages capable of lysogenization of their rhizobial hosts and to

study their integration into host genome.

5. To study the impact of selected phages on the nodule occupancy of the

resistant and sensitive rhizobial strains under controlled conditions.

32

Chapter Two: Materials and Methods

2.1 Bacterial strains, plasmids and growth conditions

The bacterial strains and plasmids used in this study are listed in Table 2-1. All rhizobial strains used for phage isolation and host range determination were grown in tryptone- yeast extract (TY) medium (Beringer, 1974) at 30°C. Strains of Escherichia coli were grown in modified lysogeny broth medium (LB) at 37°C (Sambrook, 1989). The detailed composition of each medium used in this study is included in Appendix I. The concentrations of antibiotics used to grow resistant strains of R. leguminosarum were streptomycin (Sm) 500 µg/ml, gentamicin (Gm) 30 µg/ml, neomycin (Nm) 100 µg/ml, and tetracycline (Tc) 5 µg/ml. When required, resistant strains of M. loti were grown with following antibiotic concentrations: gentamicin (Gm) 50 µg/ml, neomycin (Nm) 200

µg/ml. The concentrations of antibiotics used to grow resistant E. coli strains were ampicillin (Amp) 100 µg/ml, kanamycin (Km) 50 µg/ml and tetracycline (Tc) 10 µg/ml.

2.2 Isolation, storage and propagation of rhizobiophages

Isolation of phages from soil samples was performed using previously described techniques (Mendum et al., 2001), with a few modifications. Soil samples used for the isolation of phages were obtained from different locations with a history of legume cultivation in Alberta, Saskatchewan, Ontario and British Columbia, Canada. An overnight culture (50ml) of the trapping rhizobial strain grown in TY broth was inoculated with 5 g of the soil sample and incubated at 30°C with shaking at 150-200 rpm for 24 hours.

33

Table 2-1. Bacterial strains used in this study

a Strain or plasmid Relevant Characteristics Source/Reference(s)

Rhizobium leguminosarum strains

R. leguminosarum biovar viciae (Rlv), 3841 Poole et al. (1994) JB300 derivative, Smr, 6 plasmids (a-f)

VF39SM Biovar viciae, Smr, 6 plasmids (a-f) Priefer (1989)

pssA::Tn5derivative of VF39SM; Smr, Ksenzenko et al. VF39pssA::Tn5 Nmr (2007) Sadykov et al. VF39pssC-Km pssC- Kmr insertion of VF39SM; Smr, Kmr (1998) pssD::Tn5 derivative of VF39SM; Smr, Sadykov et al. VF39pssD::Tn5 Nmr (1998) Sadykov et al. VF39pssF-Km pssF- Kmr insertion of VF39SM; Smr, Kmr (1998) pssG- Kmr insertion of VF39SM; Smr, Sadykov et al. VF39pssG-Km Kmr (1998) pssH- Kmr insertion of VF39SM; Smr, Sadykov et al. VF39pssH-Km Kmr (1998) Sadykov et al. VF39pssI-Km pssI- Kmr insertion of VF39SM; Smr, Kmr (1998) pssM- Kmr insertion of VF39SM; Smr, Sadykov et al. VF39pssM-Km Kmr (1998) Sadykov et al. VF39pssT-Km pssT- Kmr insertion of VF39SM; Smr, Kmr (1998) VF39-86B lpc::Tn5 derivative of VF39SM; Smr, Nmr Priefer (1989) VF39-51 lpc::Tn5 derivative of VF39SM; Smr, Nmr Priefer (1989) VF39-48B lpc::Tn5 derivative of VF39SM; Smr, Nmr Priefer (1989) VF39-23 lpc::Tn5 derivative of VF39SM; Smr, Nmr Priefer (1989) VF39-32 lpc::Tn5 derivative of VF39SM; Smr, Nmr Priefer (1989)

VF39pssD::Tn5/ VF39pssD::Tn5 complemented with This study pSRKGm- pssD pSRKGm carrying pssD gene from Rlv VF39SM; Smr, Nmr, Gmr

34

VF39pssD::Tn5/ VF39pssD::Tn5 complemented with This study pSRKGm- pssDE pSRKGm carrying pssD and pssE genes from Rlv VF39SM; Smr, Nmr, Gmr

Field isolate from Saskatchewan, isolated Frost and Yost F3 from pea root nodules. (Unpublished) Field isolate from Saskatchewan, isolated Frost and Yost F1 from pea root nodules. (Unpublished) 248 Isolated from Vicia faba Josey et al. (1979)

248SM Smr derivative of 248 Hynes lab 248SM-ϕ2(pFPdegP1) Phage resistant derivative of 248SM with This study pFPdegP1; Smr, Tcr

VF39(pFPdegP1) Derivative of VF39SM with pFPdegP1; Yost lab Smr, Tcr (Unpublished) 3855 Smr derivative of 128C53 Brewin et al. (1982) 336 Biovar viciae, Isolated from pea root Hirsch (1979) nodules, Rothamsted strain 1007 306 Biovar viciae, Isolated from pea root Josey et al. (1979) nodule

309 Biovar viciae, Isolated from pea root Josey et al. (1979) nodule

W14-2 Biovar trifolii, 4 plasmid (a to d); Smr Baldani et al. (1992) 8401 Biovar phaseoli 8002; Smr, cured of Lamb et al. (1982) plasmid pSym (pRL1JI) Rhizobium etli strains CE3 Smr derivative of CFN42, 6 plasmids (a to Noel et al. (1984) f)

Rhizobium tropicii strains CIAT 899T Isolated from a Phaseolus vulgaris nodules Martínez-Romero et in Colombia al. (1991)

Rhizobium leucaenae strains CFN 299T Isolated from a P. vulgaris nodules (Martínez-Romero et al., 1991)

35

Rhizobium gallicum strains

Biovar gallicum, Isolated from P. Geniaux et al. (1993) R60spT vulgaris Amarger et al. (1997)

Biovar gallicum, Isolated from P. PhP222 Geniaux et al. (1993) vulgaris Biovar gallicum, Isolated from P. Geniaux et al. (1993) PhF29 coccineus Biovar phaseoli, Isolated from P. Geniaux et al. (1993) PhD12 vulgaris Geniaux et al. (1993) Biovar phaseoli, Isolated from P. PhI21 vulgaris S014B-4 (6) Isolated from soil associated with Vicia cracca. Identified by 16sRNA gene Hynes Lab sequencing. (Unpublished)

S004A-2 (14) Isolated from soil associated with V. Hynes Lab cracca. Identified by 16sRNA gene (Unpublished) sequencing.

S013A-1 (15b) Isolated from soil associated with Vicia Hynes Lab americana. Identified by 16sRNA gene (Unpublished) sequencing.

S019B-5 (27) Isolated from soil associated with V. Hynes Lab americana. Identified by 16sRNA gene (Unpublished) sequencing. Rhizobium spp. S023A-4 (1)* Isolated from soil associated with V. Hynes Lab americana. (Unpublished) S022A-2 (2) * Isolated from soil associated with V. Hynes Lab americana. (Unpublished) S008A-5 (8) * Isolated from soil associated with V. Hynes Lab americana. (Unpublished) S015A-1 (12) * Isolated from soil associated with V. Hynes Lab americana. (Unpublished) S030B-4 (13b) * Hynes Lab Isolated from soil associated with V. (Unpublished) americana.

36

S020A-5 (16a) * Isolated from soil associated with V. Hynes Lab americana. (Unpublished) S002A-4 (17a) * Isolated from soil associated with V. Hynes Lab americana. (Unpublished) S011B-1 (18) * Isolated from soil associated with V. Hynes Lab americana. (Unpublished) S028A-4 (21) * Isolated from soil associated with V. Hynes Lab americana. (Unpublished) S010B-1 (24) * Isolated from soil associated with V. Hynes Lab americana. (Unpublished) S018A-3 (3) * Isolated from soil associated with V. Hynes Lab cracca (Unpublished) S018A-1 (5) * Isolated from soil associated with V. Hynes Lab cracca (Unpublished) S001B-3 (7) * Isolated from soil associated with V. Hynes Lab cracca (Unpublished) S016B-1 (10) * Isolated from soil associated with V. Hynes Lab cracca (Unpublished) S016A-3 (11) * Isolated from soil associated with V. Hynes Lab cracca (Unpublished) S027B-3 (4a) * Isolated from soil associated with Lathyrus Hynes Lab venosus (Unpublished) S012A-1 (9) * Isolated from soil associated with Lathyrus Hynes Lab venosus (Unpublished) S012A-2 (19) * Isolated from soil associated with Lathyrus Hynes Lab venosus (Unpublished) S012B-3 (20) * Isolated from soil associated with Lathyrus Hynes Lab venosus (Unpublished) S010A-2 (23a) * Isolated from soil associated with Lathyrus Hynes Lab venosus (Unpublished)

Sinorhizobium meliloti strains AK631 Wild type Kondorosi et. al Kondorosi et al. (1989) 102F34 Wild type Becker Becker et al. (1993) ML2 Wild type Trevor Charles lab collection, University of Waterloo (Unpublished)

37

Mesorhizobium loti strains R7A Field re-isolate of ICMP3153, isolated Sullivan et al. (1995) from nodules of Lotus corniculatus, Wild type symbiotic strain

R7ANS Non-symbiotic derivative of R7A; lacks Ramsay et al. (2006) ICEMlSymR7A

R7ArpoN1pFUS2 R7A derivative, rpoN1::lacZ, pFUS2, Sullivan et al. (2013) Insertion duplication mutant in which the coding sequence disrupted, Nmr

R7AnadCTn5 nadC::Tn5 derivative of R7A, Nicotinate Sullivan et al. (2001) auxotroph, Gmr

Mesorhizobium spp.

NZP2037 NZP2213 Strains of diverse geographical origin Jarvis et al. (1982) NZP2234 (from New Zealand) NZP2298 Strain with symbiosis island. Isolated from R97A Sullivan et al. (1995) nodules of Lotus corniculatus. Dr. John Sullivan’s Strain with symbiosis island. Isolated from R90A lab collection, NZ nodules of L. corniculatus (Unpublished) R16C Strain with symbiosis island. Isolated from Sullivan et al. (1995) R88B nodules of L. corniculatus

8KC3 Soil isolate from New Zealand Sullivan et al. (1995)

Escherichia coli strains S17.1 Mobilizer strain, RP4 tra region, Spr Simon et al. (1983) DH5α endA1, hsdR17, supE44, thi-1, recA1, Invitrogen gyrA96, relA1, (argF-lacZYA), U169, φ80dlac Δ ZM15

38

Plasmids pBluescriptSK+ Cloning vector, Ampr Stratagene pFPdegP1 pFus1 with par locus of RK2 and degP1 Yost lab fusion, Tcr (Unpublished) pSRKGm pBBR1MCS-5-derived broad host range q vector containing lac promoter and lacI Khan et al. (2008) and lacZα+; Gmr pGEM®-T Easy PCR cloning vector; Ampr Promega Vectors

a Abbreviations: Smr- Streptomycin resistant, Spr – Spectinomycin resistant, Nmr- Neomycin resistant, Gmr- Gentamicin resistant, Ampr- Ampicillin resistant, Tcr- Tetracycline resistant

* The indigenous strains were classified according to their plasmid profiles and the number within brackets indicates their profile group for easy identification.

39

The supernatant fluid was decanted into a centrifuge tube and was treated with 20% (v/v) chloroform. The rhizobial cellular debris was removed by centrifugation at 17,000 x g, at

4°C for 10 minutes using a Sorval RC-5B refrigerated centrifuge. The top translucent supernatant was filtered using 0.2 µm nitrocellulose filters and used in plaque assays using the agar double-layer technique (Adams, 1959). Briefly, 500 µl of exponentially growing rhizobial host culture and 100 µl of filtered supernatant was added to a tube containing 5 ml of melted soft TY agar (0.5% agar). The contents of the tubes were mixed by vortexing briefly and poured evenly on a TY agar plate. Plates were allowed to solidify and incubated at 30°C for 24 hours. After the incubation, individual plaques were toothpicked, suspended in 1 ml of TY with 20% of chloroform, vortexed and centrifuged.

The supernatant (100 µl) was re-plated using the agar overlay technique with the respective rhizobial host. Each of the phage isolates with unique plaque morphologies was purified with three successive single plaque isolations and was subsequently stored at

4°C for further characterization.

2.3 Transmission electron microscopy (TEM)

Purified phage lysate was concentrated by ultracentrifugation at 70,000 g for 40 minutes at 4°C using an OptimaTM MAX-E ultracentrifuge (Beckman Coulter Inc.). The pellet was suspended in 20 µl of suspension medium (SM) (Sambrook, 1989)(Appendix I) without gelatin. A formvar carbon-coated copper grid was placed on a drop of concentrated phage suspension for 3 minutes followed by a 30 seconds staining with 1% uranyl acetate. The grids were observed using a Hitachi-7650 transmission electron microscope (Microscopy and Imaging Facility, University of Calgary) and images were

40 taken with the AMT image capture engine (Advanced Microscopy Techniques, Corp.,

Woburn.). The AMT software was used for the determination of phage dimensions.

2.4 Host range determination

The ability of phage to lyse different rhizobial strains was screened using plaque assays

(Adams, 1959). Briefly, lawns of the rhizobial strains grown on TY plates were spotted with 10 µl of the rhizobiophage suspension. The plates were observed after an overnight incubation at 30°C. The type of lysis observed was reported as either “lysis” for clear and turbid plaques or “no lysis” for no plaque formation.

2.5 One-step growth curves

The one-step growth curves were performed as described by Petty et al. Petty et al.

(2006). An overnight culture of the trapping rhizobial strain (10 ml) with OD600 0.2-0.5 was centrifuged and the pellet was resuspended in 9 ml of TY broth. One ml of appropriately diluted phage lysate was added to the cells to give an approximate multiplicity of infection (MOI) of 0.001-0.01, which was followed by 2-5 minutes incubation at room temperature for the phage adsorption. Samples were centrifuged and the supernatant was discarded to remove any unadsorbed phages. The pellet was resuspended in 10 ml of fresh TY broth and the suspension was added to a flask containing 15 ml of fresh TY broth. Cultures were grown with continuous aeration at

30°C and samples of 200 µl were withdrawn at every 20 minutes. A 100 µl aliquot of the sample was appropriately diluted and plated using the double-layer method. The other

100 µl were treated with 20% chloroform, serially diluted, and plated. A control culture containing only the phage in TY broth was also treated as described above.

41

2.6 Phage lysis curves

A 25 ml volume of host rhizobial culture was grown to OD600 ≈ 0.2 with constant aeration at 30°C. The culture was challenged with 1 ml of undiluted phage lysate and grown with shaking at 30°C. Samples were withdrawn at hourly intervals for 10-12 hours and OD600 of the samples was recorded to observe the onset of lysis.

2.7 Transduction experiments

Phage lysates were prepared as described above using broth cultures of defined mutant strains of rhizobia as their bacterial host. An auxotrophic mutant of R. gallicum SO14B-4

(6) purH::Tn5 (Thraya L., Hynes Lab, Unpublished) was used for the phages of R. gallicum. Two mutant strains of M. loti R7ArpoN1pFUS2 and R7AnadCTn5 were used for phages of M. loti (Sullivan et al., 2001; Sullivan et al., 2013). Previously described methods were used for transduction experiments (Buchanan-Wollaston, 1979). Briefly, the recipient rhizobial cells were infected with phage suspensions prepared with mutant rhizobial hosts at different MOIs (approximately 0.001, 0.01, 0.1, 1 and 10) and incubated for 45 minutes at 30°C. After the incubation, cells were collected by centrifugation (12,000 x g for 2 minutes) and were plated on TY agar plates with appropriate antibiotics after resuspending the cells in sterile distilled water. For the experiments with M. loti resuspended cells were plated on G/RDM (Rhizobium defined medium with glucose) plates with appropriate antibiotics (Appendix I).

42

2.8 Isolation of phage-resistant strains of Rlv 248SM

An overnight grown broth culture of Rlv 248SM (OD600 = 0.2-0.5) was challenged with a

‘cocktail’ of five phages. Combinations of phage used are listed in Table 2-2. A tube containing 5 ml of Rlv 248SM culture was inoculated with 200 µl of each phage in the cocktail and incubated at 30oC for 48 hours. After the incubation, rhizobial cells were pelleted down and washed with fresh TY broth for 3 times. The final pellet was resuspended in 200 µl of TY broth and used to inoculate a tube containing 5 ml TY broth.

The tube was also inoculated with 200 µl of each phage in the cocktail and incubated at

30oC for 48 hours. Pelleting down the cells, washing with TY broth, and using the pellet as an inoculum with phage cocktail was repeated two more times. Finally, cells were washed with TY broth and streaked on a TY agar plates to isolate a phage resistant strain of Rlv 248SM. The pFPdegP1 (Yost lab, Unpublished) was mobilized into Rlv 248SM phage resistant strain (Sadykov et al., 1998) using E. coli S17.1(Simon et al., 1983) and transconjugants were selected with appropriate antibiotics.

2.9 Plant assays

2.9.1 Nodulation conditions

Pisum sativum cv. Trapper (trapper pea) seeds were surface sterilized using 2.5% hypochlorite for 5 min, followed by washing with 70% ethanol for 5 min. The seeds were then rinsed with sterile distilled water three times, before leaving them for 3-5 days on water agar plates for germination. Two germinated seedlings were transferred to each

Magenta jar containing sterile vermiculite. The assembly of Magenta jars resembles the

Leonard jars, whereas the vermiculite was used as the solid support for plant roots.

43

Table 2-2. Phages used to isolate a phage resistant Rlv 248SM strain

Phage cocktail Rhizobiophage Reference L338C (Restrepo, 2012) P11VFA This Study Cocktail 1 P11VFC This Study P10VF (Restrepo, 2012) B1VFA This Study C2F3A1 This Study C2F3A2 This Study Cocktail 2 AF3 This Study V1VFA This Study V1VFB This Study

44

The initial set up contained nitrogen-free plant medium (Appendix IV) and afterwards plants were watered using sterile distilled water. Inoculated plants were grown for four weeks in a growth chamber with 16 hours of light/ day at 20 °C. After 4 weeks, the nodules were harvested and counted.

2.9.2 Inoculation with rhizobia and phage

For single bacterial inoculations, 1 ml of log phase culture of rhizobia (OD600≈ 0.5-0.7) was used for each pot. For rhizobial competition assays, the OD600 of the bacterial broth cultures were adjusted before mixing the cultures according to the required volume ratios and 1 ml of this mixed rhizobial culture was used to inoculate the seedlings in each pot.

When soil was used as the inoculum, 10 g of the soil sample was suspended in 50 ml of sterile distilled water and 1 ml of the supernatant was removed to treat each pot. For the co-inoculation of seedlings with phage, 1 ml of different dilutions of phage suspensions was used according to the required MOI. When a phage cocktail was used as the co- inoculum, undiluted phage suspensions were mixed in equal volume ratios and 1 ml of the mixture was used for the inoculation.

2.9.3 Nodule staining

When the bacterial inoculum contained a strain with a reporter gene (gusA), harvested nodules were briefly surface sterilized with 2.5% hypochlorite for 1 minute, then washed with 70% ethanol for 1 minute, followed by three rinses in sterile distilled water. Roots with nodules were placed in a conical centrifuge tube containing a staining solution

(Appendix IV) (≈ 30 ml) and incubated at 30 °C overnight with continuous shaking

(Wilson et al., 1995b).

45

2.10 Analysis of phage proteins

2.10.1 Preparation of phage samples

Samples for the analysis of structural proteins were prepared by concentrating the purified phage lysates by ultracentrifugation at 70,000 x g for 40 minutes at 4°C using an

OptimaTM MAX-E ultracentrifuge (Beckman Coulter Inc.) and resuspending the pellets in

20 µl of suspension medium (Sambrook, 1989) without gelatin. The phage samples were further purified and concentrated using an Amicon Ultra-0.5 centrifugal filter device with

100000 NMWL (Nominal molecular weight limit). Approximately 500 µl of concentrated phage sample was filtered using the device at 14,000 x g for 20 minutes and the concentrate was recovered by resuspending in 50 µl of SM without gelatin.

2.10.2 Polyacrylamide gel electrophoresis (SDS-PAGE) and tandem mass spectrometry analysis (LC-MS/MS)

The phage preparations were denatured at 100oC for 10 minutes in Laemmli solubilisation buffer with β-mercaptoethanol (Laemmli, 1970) (Appendix III) and separated on 12% SDS-PAGE gel at 180 V for 40-50 minutes using a mini-PROTEAN cell (BioRad). PageRuler™Plus- pre-stained protein ladder (Fermentas Life sciences) was used as the molecular size marker. The gels were stained in 0.2% of Coomassie

Brilliant Blue R250 (Sigma-Aldrich) in acetic acid (10%) and methanol (40%).

Destaining was performed with a solution containing 40% methanol and 10% acetic acid

(v/v). The protein bands of interest were excised and identified by liquid chromatography tandem mass spectrometry (LC-MS/MS) at the Southern Alberta Mass Spectrometry

46

(SAMS) Centre, University of Calgary. The LC-MS/MS data were analysed using

Mascot v2.1 (Matrix Science).

2.11 Bacterial DNA isolation and manipulation

DNA manipulations were performed according to standard techniques (Sambrook, 1989), unless otherwise mentioned. Total genomic DNA of bacteria was isolated using the mi-

Bacterial Genomic DNA kit (Metabion) or EZ-10 Spin column bacterial DNA mini-preps kit (BioBasic Inc). Plasmids were isolated using EZ-10 Spin column plasmid DNA mini- preps kit (BioBasic Inc). Restriction endonucleases used in this study were purchased from Invitrogen, Fermentas, or New England Biolabs and were used according to the manufacturer’s instructions. Preparation of chemically competent E. coli cells and bacterial transformation were performed according to standard protocols (Sambrook,

1989). Gel electrophoresis was performed using 0.8-1.2% (w/v) agarose (Invitrogen) gels, and gels were electrophoresed using 1 X TAE buffer (Appendix II).

2.12 DNA primers and polymerase chain reactions

Primers were synthesized by IDT (Integrated DNA Technologies) and the amplifications were carried out using a Techne TC-312 thermal cycle. Either Taq PCR master mix kit

(Qiagen) or Phusion® High-Fidelity PCR master mix (New England Biolabs) was used to set up the PCR reactions based on the requirements. When necessary, PCR products were purified with EZ-10 Spin Column PCR Products Purification Kit (Bio Basic Inc.). A typical PCR reaction included initial denaturation at 94°C for 4 minutes and 30 cycles of the following conditions: (1) denaturation at 94°C for 1 minute; (2) annealing for 1 minute at a temperature 2-5°C below than the melting temperatures (Tm) of the primer

47 pair used; (3) extension at 72°C for 1-3 minutes. The thermal cycles were concluded with a final elongation at 72°C for 10 minutes. The extension times in the thermal cycles were varied according to the different polymerases used for each reaction.

2.13 Phage DNA isolation

Phage DNA extractions were performed according to previously described methods

(Lech et al., 2001). Approximately 50 ml of the phage lysate was treated with DNase

(0.05 mg/ml) and RNase (0.25 mg/ml) and incubated at 37°C for 1 hour. The phage particles were concentrated by centrifugation using an Optima L-90K ultracentrifuge

(Beckman Coulter Inc.) at 85,000 x g for 2 hours at 4°C. The phage pellet was suspended in 200 µl of 0.05 M Tris-Cl, treated with 200 µl of buffered phenol, to denature their capsid, and vortexed for 20 minutes to release the phage DNA. The DNA was extracted with phenol: chloroform extractions at a 1:1 volume ratio followed by a precipitation using 3 M sodium acetate and 2 volumes of 95% ethanol. The final pellet was washed with 1 ml of 70% ethanol and resuspended in 50 µl sterile distilled H2O. Restriction enzymes with different methylation sensitivities were used to characterize the isolated

DNA, and included: EcoRI, XbaI, DraI, NsiI, HindIII MboI, NdeI, BclI, SphI, KpnI, PstI,

SmaI, BglI, TruII and BamHI (Fermentas). Restriction enzyme digestions were performed according to the manufacturer’s instructions.

2.14 Pulsed field gel electrophoresis (PFGE)

The approximate phage genome sizes were estimated with pulsed field gel electrophoresis using previously described methods (Steward, 2001). Agar plugs were prepared by mixing phage DNA with 1.5% (w/v) molten agarose (Pulsed field-certified

48 agarose, Bio-Rad Laboratories) made in SM without gelatin, in 1:1 volume ratio. The samples were run on a 1% agarose gel for 18 hours in 0.5 X TBE buffer (Appendix II) at a 1 to 10 s switch time, 6V/cm, a linear ramping factor (a=0) and an included angle of

120° using the Biorad-CHEF PFG electrophoresis system. The genome size was determined based on the standard curve generated for the low range PFG marker (New

England BioLabs).

2.15 Phage genome sequencing and assembly

2.15.1 Phage vB_RleM_P10VF (P10VF)

The genome of P10VF was sequenced using Ion-Torrent (Life technologies) technology and Illumina sequencing technology at the Biology Department, University of Regina,

Canada. Assembling of the sequence data was performed using MIRA sequence assembler v3.9.16 (Chevreux et al., 1999) with standard parameters.

2.15.2 Phage vB_RglS_P106B (P106B)

A library of phage DNA fragments was generated by cloning HindIII and EcoRI digested phage DNA into pBluescript SK+ (Stratagene, La Jolla, California), and these clones were sequenced at Quintara Biosciences (Berkeley, California) using Sanger sequencing.

The sequences were assembled using Newbler software (454 Life Sciences Corporation,

Roche Diagnostics) with an overlap of 20 nucleotides and 85% identity. The genome was also sequenced using Ion-Torrent (Life technologies) technology at the National Research

Council of Canada, Biotechnology Institute (Montreal, Quebec). Ion-Torrent reads along with the previously obtained Sanger sequences were used in assembling the complete genome. The Newbler software (Roche Diagnostics) was used for assembly, with the

49 parameters set to 10 nucleotides overlap and 100% identity. The large contigs obtained with this assembly as well as the Ion Torrent raw reads were subsequently re-aligned using the DNAStar software (DNASTAR Inc., Madison, Wisconsin), with the default parameters (15 nucleotide overlap and 85% identity).

2.15.3 Phages vB_MloP_Lo5R7ANS (Lo5R7ANS) and vB_MloP_Cp1R7ANS-C2

(Cp1R7ANS-C2)

The genomes of mesophages Lo5R7ANS and Cp1R7ANS-C2 were sequenced using Ion-

Torrent technology at the Biology Department, University of Regina, Canada. MIRA sequence assembler v3.9.16 (Chevreux et al., 1999) with standard parameters was used to assemble the genomes.

2.15.4 Phage vB_RleM_PPF1 (PPF1)

The temperate phage PPF1 genome was sequenced using 454-pyrosequencing technology at the Ontario Agency for Health Protection and Promotion (Ontario, Canada). The 454- reads were assembled with Newbler software (454 Life Sciences Corporation, Roche

Diagnostics).

2.16 Genome annotation and in silico analysis of the phage genome

Open reading frames (ORFs) of the phage genomes P106B and PPF1 were determined using the RAST (Aziz et al., 2008) and Prodigal (Hyatt et al., 2010) annotation servers, while ORF predictions for P10VF, Lo5R7ANS and Cp1R7ANS-C2 were performed with

RAST (Aziz et al., 2008) and GLIMMER 3 (Delcher et al., 1999). Putative products of the predicted ORFs were compared against the database of National Center for

Biotechnology Information (NCBI) using Standard protein BLAST with the PSI -BLAST

50

(Position-Specific Iterated BLAST) algorithm (Altschul et al., 1997). The e-value stringency was determined as ≤ 10-4, when assigning a putative function to a predicted

ORF based on its BLASTP hit.

The phage genomes were scanned for the presence of tRNA genes using ARAGON v1.2

(Laslett & Canback, 2004) and tRNAscan-SE v.1.23 (Lowe & Eddy, 1997). The codon usage of the phage genomes were determined with the Countcodon program version 4

(Kazusa DNA Research Institute) and DNA 2.0 web server (DNA2.0 Inc., USA).

ClustalX 2.1(Larkin et al., 2007) was used for protein alignment and neighbor-joining phylogenetic method with default parameters were used in constructing bootstrap NJ tree.

The phylogenetic trees were viewed and modified with MEGA 5.1 (Tamura et al., 2011).

2.17 Isolation of lysogenized strains of R. leguminosarum F1 and prophage induction from the lysogen

Lysogenized R. leguminosarum F1 strains were isolated using two methods. Firstly, an overnight 5 ml TY broth culture of F1 (OD600 = 0.2-0.5) was inoculated with 200 µl of

PPF1 and was incubated at 30oC for 24-48 hours. After the incubation, rhizobial cells were pelleted down and washed with fresh TY broth for 3 times. Washed cells were resuspended in 200 µl of TY broth and streaked on a TY plate to isolate lysogenized F1 colonies. Secondly, phage was plated with F1 as the host strain using the TY agar over lay method (Adams, 1959). After >48 hours of incubation, cells were obtained from the opaque plaques of phage infection and streaked on to a fresh TY plate. Colonies were screened for lysogenization after three consecutive single colony isolations.

Induction of prophages from the lysogenized R. leguminosarum F1 (R. leg F1-L) strains was carried out using UV irradiation. An overnight grown TY broth culture of F1 (10 ml)

51

at OD600 ≈ 0.2 was added into a sterile petri plate and exposed to the UV irradiation for

20 seconds. After the exposure, the plate was incubated in the dark for 15-18 hours and supernatant of the culture was extracted by pelleting the cells at 10,000 x g for 15 minutes. The supernatant was filtered using 0.2 µm filters and plated with its F1 host using the agar double-layer technique. Phage isolates were purified with three successive single plaque isolations and were subsequently stored at 4°C for further characterization.

2.18 Southern hybridization

2.18.1 Preparation of phage DNA probe

A total phage genomic probe of PPF1 was prepared by digoxigenin (DIG)-labelling according to the manufacturer’s protocol (Roche Applied Science). Briefly, phage DNA was digested with EcoRI (Fermentas) and ethanol precipitated. The DNA was then labeled overnight with DIG at 37°C. The phage probe was stored at -20°C.

2.18.2 Hybridization

Total genomic DNA of rhizobial strains and phage were digested with EcoRI (Fermentas) according to the manufacturer’s specification. The digested DNA was electrophoresed on

1% w/v agarose gel using 1X TAE and was photographed after staining with ethidium bromide. The DNA was then depurinated by soaking in 0.25 M HCl, washed twice in strong basic transfer buffer (Appendix II) and transferred overnight onto a nylon membrane (Roche Diagnostics) following previously described methods (Sambrook,

1989). The hybridization of the blot to the DIG-labelled probe was carried out at 65-67°C for 16-18 hours, washed twice with 2X SSC (Appendix II) and 0.1% SDS for 5 minutes at room temperature, followed by two stringency washes at 65-67°C for 30 min with

52

0.1X SSC and 0.1% SDS. The development of hybridization signals was detected by chemiluminescence using an anti-DIG antibody linked to alkaline phosphate (anti-DIG-

AP) and CSPD substrate, following the protocol supplied by Roche Applied Science.

2.19 Identification of the attachment sites in PPF1 and R. legumonisarum F1 genomes

2.19.1 Sequencing and analysis of the lysogenized R. legumonisarum F1 genome

The total genomic DNA of strain R. leguminosarum F1-L was isolated using an EZ-10

Spin Column Bacterial DNA isolation kit (Bio Basic Inc.) and sequenced at Funomics

Global Inc. using Illumina technology. The draft genome sequence of the R. leg leguminosarum F1-L genome was annotated using the RAST annotation server (Aziz et al., 2008). The contigs with regions containing phage-related proteins were aligned with the temperate phage PPF1 genome sequence using Mauve multiple genome alignment software (Darling et al., 2004) to identify the R. leguminosarum F1-L contig with the desired sequence.

2.19.2 Verification of attachment sites in vivo and site-specific recombination

The probable att sites identified by in silico methods in the phage and its rhizobial host were confirmed in vivo using PCR (Polymerase chain reaction) amplifications and subsequent sequencing. PCR primers were designed to amplify the regions of attachments at both 5’ and 3’ ends of phage integration (Table. 2-3). The amplified products were cloned into pGEM-T Easy Vector (Promega, Madison) as per the manufacturer’s instructions. Plasmid DNA extractions were done using an EZ-10 Spin

53

Column plasmid DNA kit (Bio Basic Inc.) and the clones were sequenced at Quintara

Biosciences (Berkeley, California) using Sanger sequencing.

2.20 Complementation of Rlv VF39SM exopolysaccharide synthesis mutants

2.20.1 Complementation

VF39pssD::Tn5 was complemented to confirm the involvement of pssD gene product in determining the phage infection. PCR primers with restriction sites required for subsequent cloning were designed to amplify the target genes in the acidic exopolysaccharide biosynthesis gene cluster of Rlv VF39SM (accession no: AF028810.2)

(Table 2-4). The individual pssD gene and a fragment with both pssD and pssE genes were amplified using Phusion® High-Fidelity PCR master mix (New England Biolabs) and PCR products were purified with an EZ-10 Spin Column PCR Products Purification

Kit (Bio Basic Inc.). The purified PCR products were digested with NdeI and BamHI

(Fermentas) and cloned into the vector pSRKGm (Khan et al., 2008). The clones were transformed into chemically competent E. coli DH5α and plated on LB agar plates containing X-gal (5-bromo-4-chloro-3-indolyl-beta-D-galacto-pyranoside) and IPTG

(Isopropyl β-D-1-thiogalactopyranoside) with appropriate antibiotics. Successful amplifications and cloning of the genes were confirmed by plasmid isolation and subsequent sequencing at Quintara Biosciences (Berkeley, California). Selected constructs were mobilized into Rlv VF39pssD::Tn5 strain (Sadykov et al., 1998) by conjugation with an E. coli S17.1 strain containing the construct (Simon et al., 1983) and transconjugants were selected with appropriate antibiotics.

54

Table 2-3. Primers used to verify the probable att sites of phage PPF1

a Primer Name Primer Sequence

5’ end region of the PPF1_Att5’_F ATCGTGTCGTCCCTGATCTC phage integration PPF1_Att5’_R GGCAGTCTATTCGGGAATGT

3’ end region of the PPF1_LigF2 CATGGCCTTGAGGGTATCAT phage integration RlF1_attB_R1 ACATAGCAGCGACACAAACG

a Sequence is from 5’ to 3’ direction.

55

Table 2-4. Primers used to complement Rlv VF39pssD::Tn5 strain

a Primer Name Primer Sequence VF39_pssD_F tcggcatatgACTGAGAAAAAATTGAAAG

VF39_pssD_R atgtggatccTCAAAGGACAGCTCCTGC

VF39_pssE_R atcaggatccTCAGACTGCCGTAATATAGT

a Sequence is from 5’ to 3’ direction. The upper case letters represent sequences of Rlv

VF39SM (accession no: AF028810.2), whereas the lower case letters represent the adaptor sequence. The restriction sites (NdeI/ BamHI) in the adaptor sequences are underlined.

56

2.20.2 Bacterial matings

Bacterial cultures were grown in broth culture medium. Equal volumes of the donor and recipient cultures were mixed and centrifuged at 18,000 x g for 2 minutes. The resulting pellet was washed with 50 µl of fresh TY. Cells were resuspended in another 50 µl of fresh TY and spotted on the surface of a TY agar plate and incubated at 30°C for 24-48 hours. The cells were then scraped off from the plate and resuspended in 500 µl of sterile distilled water. Different volumes were plated on appropriate media with antibiotics to select for the transconjugants. Plates were incubated for 3-5 days at 30°C.

2.20.3 Eckhardt gel analysis

Plasmids of rhizobial strains were visualized using a modified Eckhardt gel (Eckhardt,

1978; Hynes et al., 1985; Hynes & McGregor, 1990). Rhizobial cells were grown in PH medium (OD600 ≈ 0.5) (Appendix I) and 0.1 ml of the culture was harvested on ice and washed with o.5 ml of cold 0.3% sarkosyl solution. The cells were resuspended in 18 µl of E1 lysis solution and loaded into a 0.8% agarose and 1% SDS gel with 1 X TBE

(Appendix II). 1 X TBE was used as the running buffer. During loading, gels were kept at

0.25 volts/cm for 60 minutes for cell lysis. Then the DNA was electrophoresed at 4 volts/cm for 6-8 hours and stained with ethidium bromide to visualize under UV light.

2.20.4Efficiency of plaquing (EOP)

Serially diluted phage lysates (100 µl) were plated using both test strain and control strain separately, according to the previously described methods. When required, IPTG was added to the 5 ml of molten TY soft agar (final concentration of IPTG in TY soft agar ≈

57

0.2 mM). Plates were incubated at 30°C for 24 hours. The EOP value was calculated by dividing the pfu/ml on test strain by the pfu/ml on the control strain.

58

Chapter Three: Characterization of Rhizobium leguminosarum phages

Rhizobium leguminosarum, belonging to the alpha sub-group of the , is capable of nodulating a wide variety of legumes, including Pisum, Lathyrus, Vicia, Lens,

Phaseolus, and Trifolium (Willems, 2006). Depending on the host plant specificity of individual strains, Rhizobium leguminosarum is further divided in to biovars (also called symbiovars recently) known as viciae, phaseoli and trifolii (van Rhijn & Vanderleyden,

1995). Although several studies on the bacteriophages of R. leguminosarum have been reported earlier, these were mainly confined to characterizing the morphological and growth parameters of phages (Barnet, 1979; Staniewski, 1980). The R. leguminosarum phage RL38JI is widely known for its transducing ability (Buchanan-Wollaston, 1979).

The Hynes lab has a collection of rhizobiophages that have been isolated from different legume soils using various R. leguminosarum trapping hosts. Some of these phages have been characterized previously based on their host range, morphology, transduction ability and growth parameters (Restrepo, 2012). Furthermore, one of the R. leguminosarum phage of our collection, vB_RleS_L338C has been fully characterized and its genome was the first available completed R. leguminosarum phage genome (accession no:

KF614509) (Restrepo, 2012). This study was a continuation of the previously reported work (Restrepo, 2012) to isolate and characterize novel rhizobiophages with broad host range that can be potentially used in agricultural applications.

59

3.1 Results

3.1.1 Phage isolation and trapping rhizobial hosts

Isolation of rhizobiophages was performed using soil samples collected from Alberta,

Saskatchewan, Ontario and British Columbia. Several different strains of Rhizobium leguminosarum were used as susceptible hosts for trapping phages with the broadest host range possible. An overnight culture of the trapping host was inoculated with the soil sample and incubated overnight to enrich any phage population present in soil, which was capable of infecting the rhizobial host used. After the incubation, supernatant was treated with chloroform and bacterial cell debris was removed using centrifugation. The top translucent supernatant was filtered using 0.2 µm filters, and was then assayed for the presence of phages with different plaque morphologies, using the double-layer technique with the respective trapping rhizobial host (Adams, 1959) and phages with unique plaque morphologies were isolated with at least three successive single-plaque isolations (Table

3-1).

The newly isolated phages were named according to a simple naming system, in which the first letter with the following one or two digits indicate the soil sample used for phage isolation (Ex: AF3, P9VFCI and L338H). The succeeding two letters reveal the rhizobial host used for trapping (Ex: AF3, P9VFCI and L338H), whereas the last one or two characters, if any, represent the individual phage, isolated from each soil sample with the particular trapping host (Ex: P9VFCI and L338H). The proposed virus nomenclature system (Kropinski et al., 2009) in conjunction with our naming system was further used to name the phages, once they had been classified based on their morphology.

60

3.1.2 Host range of R. leguminosarum phages

All the phage isolates of R. leguminosarum, present in the Hynes lab collection were screened for their ability to lyse a large variety of rhizobial strains using plaque assays.

Phage lysates were spotted on a lawn of rhizobial host and results were recorded as ‘lytic’ or ‘ non lytic’ based on the clearance of the bacterial lawn after overnight incubation at

30°C. An array of previously characterized strains of rhizobia as well as 24 different indigenous rhizobial strains isolated from root nodules of P. sativum were used to select the phages with the broadest spectrum of infectivity. Host range results for selected rhizobiophages with higher infection ability are presented in tables 3-2 and 3-3. These phages infected different strains of R. leguminosarum, where all the tested strains of R. gallicum, R. tropici and Mesorhizobium loti were resistant to their infection. However, phages P11VFC, B1VFA, P9VFCI, C2VFA, C2VFB, L338H, AF3, CF3 and C2F3AI were infective against R. etli strain CE3 (derivative of CFN42) (Noel et al., 1984) and R. leucaenae strain CFN 299T (Ribeiro et al., 2012). Most of the selected rhizobiophages were capable of infecting the majority of indigenous strains tested (Table 3-3), which are presumptive R. leguminosarum isolates. Indigenous rhizobial strains identified as R. gallicum were completely resistant to the infection by phages isolated using R. leguminosarum strains as trapping host, indicating that R. gallicum belongs to a different host specificity group for phages (see also chapter 4).

61

Table 3-1. R. leguminosarum phages used in this study and their trapping information

Source/ Soil sample/ source Trapping rhizobial host Trapped phages reference

P9VFAI, P9VFAII, R. leguminosarum bv. viciae Pea soil 9 (P9) P9VFBI, P9VFBII, This study (Rlv) VF39SM P9VFCI, P9VFCII

R. leguminosarum F3 P10F3A1, P10F3B This study Pea soil 10 (P10)/ Yost garden, Calgary NW Rlv VF39SM P10VF Restrepo (2012)

Pea soil 11 (P11)/ Yost Rlv VF39SM PllVFA, P11VFC This study garden, Calgary NW

Lentil 3 (L3)/ L338A, L338B, L338C, Novozymes BioAg (52° Rlv 3841 L338D, L338E, L338F, Restrepo (2012) 36.931 N/ 106° 34.816’ L338H W)

Vetch soil 1(V1)/ Rlv VF39SM V1VFA, V1VFB This study Ontario

Alfalfa soil (A)/ Agriculture and Agrifood Canada, R. leguminosarum F3 AF3 This study Lethbridge Research Centre (49° 38’0’’ N/ 112° 48’0’’ W)

Clover soil 1(C)/ Jerry R. leguminosarum F3 CF3 Restrepo (2012) Pots Calgary NW

R. leguminosarum F3 C2F3AI, C2F3AII, C2F3B This study Clover soil 2(C2) Rlv VF39SM C2VFA, C2VFB This study

Bean soil 1(B1)/ Rlv VF39SM B1VFA This study British Columbia Bean soil 2 (B2)/ Rlv VF39SM B2VF-D1, B2VFBII This study British Columbia Bean soil 3 (B3)/ Rlv VF39SM B3VFA, B3VFB, B3VFC This study British Columbia

62 Table 3-2. Host range of selected phages with known rhizobial strains

Grey squares with a ‘+’ sign indicate lysis, where no lysis is indicated with clear squares

with a ‘−’.

Phage

Rhizobial Strain AF3 CF3 L338H C2F3B B1VFA C2VFB C2VFA P10F3B P10F3A C2F3AI C2F3A2 P9VFCI P11VFA P11VFC

R. leg bv. viciae 3841 + + + + + + + + + + + + + + R. leg bv. viciae + + + + + + + + + + + + + + VF39SM R. leg bv. viciae 248 + + − + − − + − + − − + + + R. leg bv. viciae 306 − − − − + + + + − − − + + + R. leg bv. viciae 336 + + − + + + + − + + + + + + R. leg bv. viciae 309 − − + − + + − + − − − + + + R. leg bv. phaseoli 8401 + + + + + + + + + − − ND ND ND R. leg bv. phaseoli 4292 + + − + + + + + + − − + + + R. leg bv. trifolii W14-2 + + + + + + + + + − − + + + R. leg 3855 + + + + + + + + + + + + + + R. leg F3 + + + + + + + + + + + + + + R. leg 162Y10 + + − + + + + − + + + + + + R. etli CE3 − + + + + + + + + − − + ND ND R. leucaenae CFN299T − + + + + + + + + − − + ND ND R. tropici CIAT899 T − − − − − − − − − − − − − − R. gal R602spT − − − − − − − − − − − − − − R. gal PhP222 − − − − − − − − − − − − − − R. gal PhF29 − − − − − − − − − − − − − − R. gal PhD12 − − − − − − − − − − − − − − R. gal PhI21 − − − − − − − − − − − − − − R. gal PhP222 − − − − − − − − − − − − − − M. loti R7A − − − − − − − − − − − − − − M. loti R7ANS − − − − − − − − − − − − − −

ND- not determined

63 Table 3-3. Host range of selected phages of R. leguminosarum with 24 indigenous

rhizobial strains isolated from different soil samples associated with legume growth

Grey squares with a ‘+’ sign indicate lysis, where no lysis is indicated with clear squares

with a ‘−’.

Phage

Rhizobial Strain* AF3 CF3 L338H C2F3B B1VFA C2VFB V1VFB C2VFA V1VFA P10F3B P10F3A C2F3AI C2F3A2 P9VFCI P11VFA P11VFC

R. gal S014B-4 (6) − − − − − − − − − − − − − − − − R. gal S004A-2 (14) − − − − − − − − − − − − − − − − R. gal S013A-1 (15b) − − − − − − − − − − − − − − − − R. gal S019B-5 (27) − − − − − − − − − − − − − − − − R. gal S021A-2 (25) − − − − − − − − − − − − − − − − Rhizobium S023A-4 (1) + + + + + + + + + + − − − + + + Rhizobium S018A-3 (3) + + + + + + + + + + + − − + + + Rhizobium S027B-3 (4a) + + + + + + + + + + + + + + + + Rhizobium S018A-1 (5) + + + + + + + + + + − + + + + + Rhizobium S001B-3 (7) + + + + + + + + + + + + + − + + Rhizobium S008A-5 (8) + + + + + + + + + + − + + + + + Rhizobium S012A-1 (9) + + + + + + − − + + + + + + + + Rhizobium S016B-1 (10) + + + + + + + + + + + + + + + + Rhizobium S016A-3 (11) + + + + + + + + + + + + + + + + Rhizobium S015A-1 (12) + + + + + + + + + + + + + + + + Rhizobium S020A-5 (16a) + + + + + + + + + + + + + + + + Rhizobium S002A-4 (17a) + + + + + + + + + + + + + + + + Rhizobium S011B-1 (18) + + + + + + + + + + + + + + + + Rhizobium S012A-2 (19) + + + + + + + + + + + + + + + + Rhizobium S012B-3 (20) + + + + + + + + + + + − − + + + Rhizobium S028A-4 (21) + + + + + + + + + + + − − + + + Rhizobium S010A-2 (23a) + + + + + + + + + + − + + + + + Rhizobium S010B-1 (24) + + + + + + + + + + + + + + + +

* The indigenous strains were classified according to their plasmid profiles and the number

within brackets indicates their profile group for easy identification.

64 3.1.3 Lysis curves

Phage lysis curves or killing curves are useful in determining the ability of phages to infect their exponentially growing bacterial host and to compare their infection abilities against a particular host. Among the tested phages, P10VF and P9VFCI exhibited the earliest onset of lysis, where the OD600 of their host rhizobial culture Rlv VF39SM started to decline rapidly, 2 hours after phage inoculation (Figure 3-1). The lysis of phage AF3 against its rhizobial host (R. leg F3) was also observed to be initiated after 1-2 hours of phage inoculation (Figure 3-2). However the infection by AF3 demonstrated a slow decrease in the OD600 of its host culture for another 2 hours after the onset of lysis and then started to decrease rapidly during the succeeding 2 hours before stabilizing. The lysis of their respective rhizobial hosts by phages P11VFC, B1VFA, CF3 and C2F3A2 were initiated at 5 hours after the phage inoculation, whereas the phage P11VFA showed the latest onset of lysis (6 hours) among the tested phages.

3.1.4 Morphologies of selected R. leguminosarum phages

Determination of phage morphology using electron microscopy plays a vital role in phage classification (Ackerman, 2009). The information on phage morphotype is also essential for the recently introduced nomenclature system of phages (Kropinski et al., 2009). The transmission electron microscopy (TEM) of selected phages was performed using negative staining with uranyl acetate (1%) and the phage dimensions were determined using AMT software. The TEM of phages P11VFA, L338H, P11VFC, and B1VFA revealed that they belong to the family Siphoviridae, while phages P9VFCI, V1VFA,

AF3 and V1VFB were classified in the family Myoviridae (Figures 3-3 and 3-4).

65

Figure 3-1. Lysis curves of phages isolated using R. leguminosarum bv viciae

VF39SM as trapping host

Phages were propagated on R. leguminosarum bv. viciae VF39SM at 30°C. Values represent means of three independent replicates and error bars indicate standard error of the mean (SEM). Missing error bars are due to smaller SEM values. Values are the OD600 at different time points. Time= 0 hours represents the phage inoculation and a negative control of rhizobial host (Rlv VF39SM) without any phage was also performed.

66

Figure 3-2. Lysis curves of phages isolated using R. leguminosarum F3 as trapping host

Phages were propagated on R. leguminosarum F3 at 30°C. Values represent means of three independent replicates and error bars indicate standard error of the mean (SEM).

Missing error bars are due to smaller SEM values. Values are the OD600 at different time points. Time= 0 hours represents the phage inoculation and a negative control of rhizobial host (F3) without any phage was also performed.

67

Figure 3-3. Transmission electron micrographs of R. leguminosarum phages belonging to the family Siphoviridae

Transmission electron micrographs of phage P11VFA (A), L338H (B), P11VFC (C) and

B1VFA (D) stained with 1% uranyl acetate. Scale bars represent 100 nm.

68

Figure 3-4. Transmission electron micrographs of R. leguminosarum phages belonging to the family Myoviridae

Transmission electron micrographs of phage P9VFCI (A), V1VFA (B), AF3 (C) and

V1VFB (D) stained with 1% uranyl acetate. Scale bars represent 100 nm.

69 The approximate tail lengths, tail widths and head diameters of phages observed using

TEM are listed in table 3-3. The head diameters of Siphoviridae phages ranged from 73-

94 nm, while it was between 83-117 nm for phages classified under the family

Myoviridae. The contractile tails of myoviruses were 111-133 nm in length, whereas the siphophage P11VFA had a much longer and possibly non-contractile tail, which was

217±15 nm in length. The tail lengths of L338H, P11VFC and B1VFA have not been determined.

3.1.5 One-step growth curves

One-step growth curves were performed to determine the time scale of phage growth.

The parameters determined using one-step growth experiments are vital in understanding the phage-host interactions (Adams, 1959). The eclipse and latent periods for phage

P9VFCI were 60 and 120 minutes respectively. The burst size of the phage was calculated as 29 pfu/ infected cell (Figure 3-5A). The 80 minutes latent period of phage

P11VFC was followed by an exponential increase for 60 minutes. The eclipse period of

P11VFC infection was 60 minute, while burst size of the phage was determined as 11 pfu/infected cell (Figure 3-5B)

70 Table 3-4. Morphological characteristics of some of the R. leguminosarum phage isolates

The dimensions of the phage particles were determined using the AMT software. The values represent at least three measurements (n ≥ 3) with standard deviation (± SD).

Virion morphology

Phage name Trapping host Family Tail Tail (nm) (nm) (nm) Head Head length diameter diameter Tail width width Tail

P11VFA R. leguminosarum bv Siphoviridae 81±6 217±15 12±2 (vB_RleS_P11VFA) viciae (Rlv) VF39SM P11VFC Rlv VF39SM Siphoviridae 88±6 ND1 12±2 (vB_RleS_P11VFC) L338H Rlv 3841 Siphoviridae 83±6 ND1 12±2 (vB_RleS_L338H)

B1VFA ★ Rlv VF39SM Siphoviridae 76±3 ND1 10 (vB_RleS_B1VFA) P9VFCI Rlv VF39SM Myoviridae 95±5 125±7 24±4 (vB_RleM_ P9VFCI) AF3 114± R. leguminosarum F3 Myoviridae 126±6 18±5 (vB_RleM_ AF3) 3 V1VFA Rlv VF39SM Myoviridae 87±4 129±4 22±9 (vB_RleM_ V1VFA) V1VFB Rlv VF39SM Myoviridae 92±7 124±13 20±8 (vB_RleM_ V1VFB)

1ND- not determined

★ Only a single phage particle was measured (n=1)

71

Figure 3-5. One step growth curve of phages P9VFCI and P11VFC

Phage P9VFCI (A) and P11VFC (B) were propagated on R. leguminosarum bv. viciae

VF39SM at 30°C. Values represent means of three independent replicates and error bars indicate standard error of the mean (SEM). The values are the log pfu/ml at different time points.

72 3.1.6 Genome characterization of R. leguminosarum phages

3.1.6.1 Restriction enzyme digestion profiles of phage genomic DNA

Restriction profiling of phage genomes can be used as a valuable tool in preliminary characterization of phage isolates. The DNA of selected phages was digested with restriction enzymes with different methylation sensitivities (Figure 3-6). A remarkable resistance to restriction enzyme digestion was recognized with the DNA of phages

P9VFCI and AF3, where only Tru1I was capable of cleaving the DNA (Figures 3-6C and

3-6D).

3.1.6.2 PFGE

The genome sizes of selected phages based on their host range, were estimated using pulsed field gel electrophoresis (PFGE) (Figure 3-7). Total DNA extracted from rhizobiophages were used to prepare agar plugs with 1.5% (w/v) molten agar and electrophoresed at 6V/cm for 16-18 hours. A standard curve, generated using the run lengths of the low range PFG marker, was used to approximate the genome sizes. The estimated size of the P11VFA genome was 116 kb, while both the AF3 and L338D genomes were ~194 kb in size.

73

Figure 3-6. Restriction enzyme profiles of DNA isolated from different R. leguminosarum phages

DNA from phages L338H (A), P11VFC (B), P9VFCI (C), AF3 (D), A38 (E) and B1VFA (F) were digested with Tru1I (2), HindIII (3), SalI (4), SpeI (5), BclI (6), PstI (6a) XbaI (7) and EcoRI (8). Lane 1 contained 1kb DNA ladder.

74

Figure 3-7. Pulsed field gel electrophoresis (PFGE)

DNA from phage P11VFA (1), AF3 (2) and L338D (3) were electrophoresed on a 1% agarose gel for 18 hours at a 1-10 s switch time, 6V/cm with a linear ramping factor and an included angle of 120°. Lane 1 contained low range PFG marker.

75 3.1.6.3 Genome of vB_RleM_P10VF (P10VF)

The genome of P10VF was sequenced using Ion Torrent and Illumina Sequencing technologies. Sequence data obtained using both methods were used to assemble the genome using MIRA sequence assembler v3.9.1.6 (Chevreux et al., 1999). The assembled genome of P10VF is 156,446 bp in length and had 66.81-fold average coverage. The genome has an average G+C content of 49.9%. The ORF predictions of the genome with RAST and GLIMMER 3 (Delcher et al., 1999) resulted in 257 putative protein-coding genes, which represent approximately 94.1% of the total genome (Figure

3-8 and Appendix V). ATG was preferred as the start codon by the P10VF genome with

227 ORFs initiating with it, while GTG and TTG were employed in low frequencies (15

ORFs each). Putative products of the predicted ORFs were compared with the previously characterized gene products at the amino acid level using BLASTP with the PSI-BLAST algorithm. The e-value cut-off for a significant BLASTP hit was assigned as 10-4. Based on the sequence comparisons putative functions were assigned to 47 predicted ORFs

(18.29%), whereas products of 151 (58.75%) predicted protein-coding genes had no significant sequence similarity to anything available in the NCBI database. Forty-seven of the predicted gene products exhibited homology to previously characterized hypothetical proteins (22.96%), while 26 of these hypothetical proteins showed sequence similarities to phage-related products (Figure 3-9). Both ARAGORN and tRNAscan-SE searches failed to identify any tRNA genes within the P10VF genome.

Figure 3-8. Genome arrangement of vB_RleM_P10VF

Graphical representation of the organization of rhizobiophage vB_RleM_P10VF (P10VF) genome generated with Genious 6.1.2 (Drummond et al., 2011). Each predicted ORF is represented by a single arrow and the scale is given in base pairs. 76

77

Figure 3-9. Distribution of predicted ORFs in the P10VF genome

78 3.1.7 Analysis of phage proteins

In addition to host range studies, and restriction profiling of genomes, profiling of phage proteomes can be used as a tool for the preliminary differentiation and characterization of phage isolates. To look at protein profiles, phage particles were concentrated at 70,000 x g for 40 minutes at 4°C, followed by a further purification step with a centrifugal filter device. A molecular cut-off filter was used to further concentrate and purify the samples from any unnecessary host cellular excretions present in the medium. Concentrated phage samples were run on a 12% SDS-PAGE gel to separate the phage proteins and phage protein profiles were used to differentiate between phages (Figures 3-10, 3-11 and 3-12).

The difficulty in extracting enough phage particles to obtain a protein profile that can be visualized on a 12% SDS-PAGE gel through a simple staining with Coomassie Brilliant

Blue was encountered with many phages. This was a major drawback in using the protein profiles of phages in characterizing phages of R. leguminosarum.

A protein band of the phage L338C was extracted and further analyzed using trypsin proteolysis followed by liquid chromatography tandem mass spectrometry (LC-MS/MS) at Southern Alberta Mass Spectrometry (SAMS) Centre, University of Calgary. The band which had the highest intensity among the 10-12 different bands observed on the protein profile of phage L338C, ran at ~ 36 kDa on the gel. The excised protein band was identified as the gene product of ORF 22 of L338C genome (Restrepo, 2012) (Accession no: KF614509), which is the major capsid protein with an approximate molecular mass of 35 kDa .

79

Figure 3-10. Protein profiles of selected phages of R. leguminosarum 3841

Phages L338H (2), A38 (3) and L338C (4) were denatured and separated on 12% SDS-

PAGE. The gel was stained with 0.2% of Coomassie Brilliant Blue R250. Lane 1 contained the molecular size marker. The protein band indicated with a white arrow was cut out of the gel and analysed by LC-MS/MS. The LC-MS/MS data were compared against the predicted phage proteins of L338C.

80

Figure 3-11. Protein profiles of selected phages of R. leguminosarum VF39SM

Phages P11VFC (2), P11VFA (3), B2VFBII (4) and P10VF (5) were denatured and separated on 12% SDS-PAGE. The gel was stained with 0.2% of Coomassie Brilliant

Blue R250. Lane 1 contained the molecular size marker.

81

Figure 3-12. Protein profiles of selected phages of R. leguminosarum F3

Phages C2F3AI (2), C2F3AII (3) and C2F3B (4) were denatured and separated on 12%

SDS-PAGE. The gel was stained with 0.2% of Coomassie Brilliant Blue R250. Lane 1 contained the molecular size marker.

82 3.1.8 The effect of exopolysaccharide and lipoplysaccharide synthesis of R.

leguminosarum bv. viciae VF39SM on phage infection

The efficiency of phage adsorption to bacterial hosts is one of the most important factors that define the host range of a phage. Phage adsorption is mediated through a receptor located on the host cell surface, where it interacts with phage receptor binding protein to initiate the phage infection. Therefore, the efficacy of the infection relies upon the presence, accessibility and spatial distribution of these phage receptors on the host cell surface (Rakhuba et al., 2010). The lipopolysaccharide present in the outer membrane of

Gram-negative bacteria as well as many external protective layers in bacteria such as slime layers and capsules are known to harbor components that act as phage receptors

(Samson et al., 2013). The host bacteria can resist a phage infection by altering their specific receptor and subsequently denying phage access into the cell.

When using a phage cocktail to provide a competitive advantage for an inoculant rhizobial strain, it is important to use phages with broader host range. Moreover, the use of phages that target different receptors on rhizobial cell surface should elevate the efficiency of the cocktail and reduce the ability of the host to develop phage resistance by modifying the receptors. Therefore, to understand the interaction of our phages and their possible receptor molecules on host cellular surface, Rlv VF39SM strains with different mutations in EPS and LPS biosynthesis genes were tested using host range studies

(Tables 3-5 and 3-6). The LPS mutant strains VF39-51, VF39-23 and VF39-32, exhibited the same susceptibility pattern against the tested phages as their wild-type counterpart.

Mutant VF39-86B exhibited resistance towards infection by four phages (L338H, CF3,

P10F3A and P10F3B), while VF39-48B was only immune to P10F3B infection.

83 Also, 9 different exopolysaccharide mutants (pss genes) of Rlv VF39SM were screened for host range with 23 rhizobiophages (Table 3-6). The mutant VF39pssD::Tn5 exhibited a remarkable ability to withstand infection by 17 phages out of the 23 phages tested. The other EPS mutants did not show any significant difference in sensitivity to phage infection compared to the wild type. In an attempt to further confirm the increased resistance of mutant VF39pssD::Tn5 for phage infection, experiments were designed to determine the efficiency of plaquing (EOP) of the strain against its wild type counterpart.

EOP is a relative count calculated by dividing the number of plaques (pfu/ml) formed on a lawn of test strain by the total number plaques formed (pfu/ml) with the control strain, where the mutant strain (Rlv VF39pssD::Tn5) and wild type host act as test and control strains respectively. To compare the number of plaques formed with two different strains, it is important to ensure the number of bacterial cells and phage particles involved in an infection is the same. Therefore, the OD600 of host cultures were standardized before the experiment and serially diluted phages were titered using the agar-overlay technique to determine the EOP. As shown in table 3-7, the average EOPs on Rlv

VF39pssD::Tn5 of tested phages were always observed to be lower than 1, confirming their decreased susceptibility against phage infections compared to the wild type.

However, further confirmations were required to conclude that the gene product of pssD is crucial in determining the adsorption efficiency of these phages to their Rlv VF39SM host. This goal was achieved using gene complementation experiments.

The acidic exopolysaccharide biosynthesis gene cluster of Rlv VF39SM (Accession no:

AF028810) (Sadykov et al., 1998) is shown in the figure 3-13.

Table 3-5. Host range of 23 R. leguminosarum phages against Rlv VF39SM lipopolysaccharide mutants

Grey squares with a ‘+’ sign indicate lysis, where no lysis is indicated with clear squares with a ‘−’.

Phage

Rhizobial Strain AF3 CF3 L338F L338B L338E Rl38JI L338A L338C L338D P10VF L338H C2F3B B1VFA V1VFB V1VFA P10F3B P10F3A C2F3AI P9VFBI P9VFAI P9VFCI P11VFA P11VFC

R. leg bv. viciae 3841 + + + + + + + + + + + + + + + + + + + + + + + R. leg bv. viciae VF39 + + + + + + + + + + + + + + + + + + + + + + + VF39-86B + + + + + + + - + + + + + + + + + + - - - + + VF39-51 + + + + + + + + + + + + + + + + + + + + + + + VF39-48B + + + + + + + + + + + + + + + + + + + + - + + VF39-23 + + + + + + + + + + + + + + + + + + + + + + + VF39-32 + + + + + + + + + + + + + + + + + + + + + + +

84

Table 3-6. Host range of 23 R. leguminosarum phages against Rlv VF39SM exopolysaccharide mutants

Grey squares with a ‘+’ sign indicate confluent lysis, where no lysis is indicated with clear squares with a ‘−’.

Phage

Rhizobial Strain AF3 CF3 L338F L338B L338E Rl38JI L338A L338C L338D P10VF L338H C2F3B B1VFA V1VFB V1VFA P10F3B P10F3A C2F3AI P9VFBI P9VFAI P9VFCI P11VFA P11VFC 3841 + + + + + + + + + + + + + + + + + + + + + + + VF39 + + + + + + + + + + + + + + + + + + + + + + + VF39pssA::Tn5 + + + + + + + + + + + + + + + + + + + + + + + VF39pssC-Km + + + + + + + + + + + + + + + + + + + + + + + VF39pssD::Tn5 + ------+ + + + - - - - - + - - - - - VF39pssF-Km + + + + + + + + + + + + + + + + + + + + + + + VF39pssG-Km + + + + + + + + + + + + + + + + + + + + + + + VF39pssH-Km + + + + + + + + + + + + + + + + + + + + + + + VF39pssI-Km + + + + + + + + + + + + + + + + + + + + + + + VF39pssM-Km + + + + + + + + + + + + + + + + + + + + + + + VF39pssT-Km + + + + + + + + + + + + + + + + + + + + + + + 85

86

Figure 3-13. The organization of EPS biosynthesis gene cluster of R. leg VF39SM

The arrangement of genes in the Rhizobium leguminosarum bv. viciae VF39SM acidic exopolysaccharide biosynthesis gene cluster (Accession no: AF028810) (Sadykov et al.,

1998) is represented by green arrows. The small red arrows indicate the primers used to amplify the pssD gene and both pssD and pssE genes together for gene complementation experiments.

87 Table 3-7. Efficiency of plaquing (EOP) data for 24 R. leguminosarum phages against EPS mutant VF39pssD::Tn5 strain

The number of plaques formed by a phage on the lawns of host rhizobia (Rlv VF39SM

WT) was compared against the number of plaques formed by the phage with the test strain (Rlv VF39pssD::Tn5) to calculate the EOP. Values represent the average of two independent trials.

Average efficiency of plaquing (Compared to VF39) Phage Name VF39SM WT VF39pssD::Tn5

L338A 1.0000 0.0000 L338B 1.0000 0.0000 L338C 1.0000 0.0000 L338D 1.0000 0.0000 L338E 1.0000 0.0000 L338F 1.0000 0.0000 L338H 1.0000 0.0000 P9VFAI 1.0000 0.3500 P9VFAII 1.0000 0.0000 P9VFBI 1.0000 0.0002 P9VFBII 1.0000 0.0000 P9VFCI 1.0000 0.0004 P9VFCII 1.0000 0.0002 P10VF 1.0000 0.0086 P11VFA 1.0000 0.0000 P11VFC 1.0000 0.0000 B1VFA 1.0000 0.0000 B2VF-DI 1.0000 0.0000 B3VFB 1.0000 0.0000 B3VFC 1.0000 0.4400 C2VFA 1.0000 0.0013 C2VFB 1.0000 0.0005 V1VFA 1.0000 0.0000 V1VFB 1.0000 0.0213

88 The gene of concern, pssD, is 459 bp long and located at the far end of the gene cluster, upstream of the pssE gene. PCR primers with appropriate adapter sites were designed to amplify the pssD gene alone, as well as both pssD and pssE genes together (932 bp) for complementation. The mutation in pssD gene might have a polar effect on the expression of genes located downstream to it, which may also affect the phage infection. To confirm the absence of such an effect, the VF39pssD::Tn5 mutant was complemented with pssD gene and with both pssD and pssE genes. The amplified PCR products (Figure 3-14) were purified using a PCR purification kit and digested with appropriate restriction enzymes before cloning into the inducible expression vector pSRKGm (Khan et al.,

2008). The clones were then transformed into chemically competent E. coli DH5α and transformants were selected on LB plates with the antibiotic Gm and IPTG. Clones were screened and successful amplifications were then confirmed by Sanger sequencing.

Confirmed constructs were then mobilized into Rlv VF39pssD::Tn5 using E. coli

S17.1(Simon et al., 1983) and selection of transconjugants was done on TY plates with

Sm and Gm. Selected transconjugants, after confirming with an Eckhardt analysis

(Figure 3-15), were used for EOP experiments.

The expression vector pSRKGm used in gene complementation, has regulated promoters and requires an inducer to express the cloned genes (Khan et al., 2008). When performing the experiments to compare the EOPs of complemented mutants, IPTG was added to molten soft agar (final concentration of IPTG in TY soft agar ≈ 0.2 mM) before adding the bacterial host. The average EOP values for complemented Rlv VF39pssD::Tn5 mutant are shown in the table 3-8.

89

Figure 3-14. PCR amplified pssD and pssD+pssE genes of R. leguminosarum bv. viciae VF39SM

The PCR amplified individual pssD gene (459 bp) (Lane 3) and both pssD and pssE genes (932 bp) (Lane 5) of Rlv VF39SM WT were run on 1% agarose gel. Lanes 2 and 4 contained non-template controls for each PCR reaction. Lane 1 contained 1kb DNA ladder.

90

Figure 3-15. Eckhardt gel analysis of the Rlv VF39pssD::Tn5 and its complemented mutants

Lane 1 contains wild type Rlv VF39SM indicating its six plasmids. Lanes 2, 3 and 4 contain Rlv VF39pssD::Tn5, Rlv VF39pssD::pSRKGm-pssD, Rlv VF39pssD::pSRKGm- pssDE respectively.

91

Table 3-8. Efficiency of plaquing (EOP) data for 17 R. leguminosarum VF39SM phages against EPS mutant VF39pssD::Tn5 strain and its complemented strains

The number of plaques formed by a phage on the lawns of host rhizobia (Rlv VF39SM WT) was compared against the number of plaques formed by the phage with the test strains (Rlv VF39pssD::Tn5 and its complemented strains) to calculate the EOP. Values represent the average of two independent trials. The values where EOP is greater than 1, are highlighted indicating the restoration of sensitivity against phage infection due to gene complementation.

Average efficiency of plaquing (Compared to VF39SM WT)

Phage Name WT ::pSRK ::pSRK pssD pssD - pssDE SM - pssD pssD Gm VF39 Gm VF39 VF39 VF39

P9VF-AI 1.00 0.05 7.33 6.33 P9VF-AII 1.00 0.00 0.04 6.02 P9VF-BI 1.00 0.00 0.09 0.03 P9VF-BII 1.00 0.00 0.00 0.01 P9VF-CI 1.00 0.07 2.88 3.25 P9VF-CII 1.00 0.00 0.01 0.01 P10VF 1.00 0.01 0.04 0.03 P11VFA 1.00 0.00 0.00 0.00 P11VFC 1.00 0.00 0.00 0.00 B1VFA 1.00 0.00 0.00 0.00 B3VFB 1.00 0.00 10.00 150.00 C2VFA 1.00 0.00 0.41 1.02 C2VFB 1.00 0.02 2.02 2.91 V1VFA 1.00 0.00 0.08 0.01 V1VFB 1.00 0.02 0.20 0.24 Cp1VFA 1.00 0.00 0.17 0.12 B2VF-DI 1.00 0.00 0.00 0.00

92 The susceptibility to phages P9VF-AI, P9VF-CII, B3VFB and C2VFB was restored in both types of complemented mutants with a higher magnitude compared to the wild type host. Increased susceptibility levels may be a result of the copy number of the expression vector used. Infection by phages P9VF-AII and C2VFA was permitted only when the mutant was complemented with both pssD and pssE genes. None of the complemented mutants exhibited any difference in sensitivity to infection by phages P11VFA, P11VFC,

B1VFA and B2VF-DI, compared to the mutant Rlv VF39pssD::Tn5. The infection of these phages was completely blocked by both Rlv VF39pssD::Tn5 and its complemented strains. The other phages tested showed slightly increased infection ability towards the complemented strains, though the susceptibility of the wild type was not fully restored.

3.2 Discussion

3.2.1 General characteristics of R. leguminosarum phages

Phage host range, or the different bacterial genera, species and strains that can be infected by a bacteriophage, is considered as a biological characteristic of great importance. It is also used as a tool during the preliminary characterization of a phage. The presence/absence of receptors in the host cell, efficiency in phage adsorption, successful injection of phage DNA into the bacterial cell and the phage’s ability to evade the host- employed resistance mechanisms are a few of the factors that influence the infectivity spectrum of the phage. The phages in the Hynes lab collection were screened for their host range, as a mean of preliminary characterization, as well as to select the phages with the broadest host range. In order to use a cocktail of phages to control the growth of undesired rhizobial strains during nodulation, it is crucial to pick the phages with the broadest possible spectrum of infectivity. Therefore, phages in our collection were tested

93 with a variety of previously characterized rhizobia and also against indigenous strains of rhizobia isolated from soils that were associated with various legume cultivations.

Phages that exhibited a broader host range against the rhizobial strains tested were selected for further characterization (Tables 3-2 and 3-3).

Electron microscopy plays an integral role in phage biology. In fact, all current methods of phage classification heavily rely on electron microscopic observations of phage particles. The transmission electron microscopy of negatively stained virion particles with either uranyl acetate or phosphotungstate is the most commonly used technique for bacteriophages. All the examined phages of R. leguminosarum belong to the order

Caudovirales, which includes tailed phages. The majority of phages described so far belong to this morphotype (Ackerman, 2009) encompassing over 96% of all characterized phages. Phages P11VFA, P11VFC, L338H and B1VFA were classified in the family Siphoviridae due to the presence of long, flexible and presumably non- contractile tail, while the rigid, contractile tails of P9VFCI, V1VFA, AF3 and V1VFB made them group in the family Myoviridae. The diameter of the capsid of tailed phages ranges from 30 nm to 160 nm, whereas the tail length falls within 10-800 nm (Ackerman,

2009). R. etli phages RHEph 04, RHEph 05 and RHEph 06, which have been classified in the family Myoviridae, have polyhedral heads of 60 nm in diameter and contractile tails of 92 nm (Santamaria et al., 2014). The Myoviridae rhizobiophages examined in this study contained larger capsids and longer tails. The dimensions of the siphophages reported in this study are similar to those of our previously characterized phage L338C

(Restrepo, 2012), although the tail length was determined only with phage P11VFA. The virus L338C is a Siphoviridae phage and has a long tail of 298 nm and a head diameter of

94 76 ± 11 nm, while the siphophage of R. leguminosarum H3V had an elongated head

(58 x 76 nm) with a long tail of 120 nm in length (Dhar et al., 1993).

The one-step growth experiments are important in determining the duration of the phage infectious cycle and the yield of phage particles per infected host cell under a given physiochemical conditions. The success of the experiment usually relies on the multiplicity of infection (MOI), where a typical one-step growth curve assumes the phage infection in a single cell is initiated by a single phage. The latent period of a lytic phage infection commences with the attachment of phage to its host bacterial cell and terminates at the lysis of host cell, which can be determined by the first increase in the number of phage particles. The eclipse period is concluded by the presence of the first phage progeny particle in the cytoplasm of an infected bacterial cell, a condition that can be observed only with artificial lysis of cells. The eclipse usually overlaps with the latent period and can be determined by treating infected host cells with chloroform. The

Siphoviridae phage RL1 had a latent period of 90 minutes followed by an exponential increase of 80 minutes (Dhar et al., 1978). A much higher latent period (120 minutes) with a shorter exponential increase time (20 minutes) was observed with phage P9VFCI, whereas values similar to previously characterized rhizobiophages were detected with

P11VFC growth. The siphophage Mlo1 also had an extended latent period of 180-200 minutes (Turska-Szewczuk et al., 2010). The burst sizes of other rhizobiophages described previously vary substantially, producing very few phages to 210 phage particles per infected cell (Atkins, 1973; Defives et al., 1993; Malek et al., 2009). The burst size of R. leguminosarum phage L338C was calculated as 17 pfu/ infected cell, while P10VF had a burst size of 140 pfu/ infected cell (Restrepo, 2012). The phages of

95 M. loti Mlo1, Mlo30, Mam12 and Mam 20 possessed burst sizes ranging from 10-120 pfu/ infected cells (Turska-Szewczuk et al., 2010), while rhizobiophages H3V and R2V of R. leguminosarum had much higher burst sizes of 240 and 200 pfu/ infected cells respectively (Dhar et al., 1993).

3.2.2 Role of exopolysaccharide (EPS) and lipoplysaccharide (LPS) synthesis genes of

R. leguminosarum bv. viciae VF39SM on phage infection

In order for phages to commence the infection and initiate intracellular replication, they should first attach to their bacterial host (phage adsorption) and inject their DNA into the host cytoplasm (Samson et al., 2013). The specificity of host attachment sites or receptors used by the phage is known as a key determinant of phage host range. Apart from the structural properties associated with host receptors, the localization, density and number of such receptors on the cell surface are also recognized as vital factors in phage adsorption (Rakhuba et al., 2010). Four broad categories of phage receptors are recognized in Gram-negative bacteria: flagella, pili, outer membrane proteins and extracellular polysaccharides (Crook et al., 2013). The outer membrane proteins OmpF and OmpC serve as receptors for coliphages T2 and T4 respectively (Rakhuba et al.,

2010), whereas the essential porin RopA1 of Sinorhizobium meliloti is the target for many phages including ϕM12 and N3 (Crook et al., 2013). The T3 and T7 phages use components of outer membrane lipopolysaccharides (LPS) of Shigella and Escherichia as their receptors (Rakhuba et al., 2010). Many phages also target pili, flagella, capsular and slime polysaccharide of their host bacterial cell as receptors (Lotz & Pfister, 1975; Lotz et al., 1977). The external capsular or exopolysccharide layers of bacteria can either prevent phage access to their receptors located on the cell wall or can themselves serve as

96 phage receptors (Rakhuba et al., 2010). In an attempt to determine the potential phage receptors in R. leguminasarum for the phages in Hynes lab collection, the host range of phages with different types of VF39 and 3841 mutants have been tested. Rlv strain

VF39SM contains 4-7 peritrichous flagella (Tambalo et al., 2010), whereas 1-2 subpolar flagella are present in Rlv 3841 (Miller et al., 2007). The host range of phages against mutants of Rlv VF39SM and 3841 lacking flagellar filaments, did not exhibit any evidence that suggested the necessity of flagella in phage infection (Restrepo, 2012). The large plasmids of rhizobia contain genes that are not only important for their nitrogen fixing symbiotic relationship, but also helpful in increasing their survival fitness within the rhizosphere (Ding & Hynes, 2009). The lack of symbiotic plasmid pRP2JI in R. leguminasarum bv. phaseoli makes the strain resistant to phage infection (Jun et al.,

1993). However, no difference in sensitivity to phage infection compared to the wild type was observed with plasmid-cured strains (Restrepo, 2012).

As a further attempt to determine the putative phage receptors on Rlv VF39SM, 5 different mutants with altered LPS synthesis (Priefer, 1989) abilities were used in host range studies. Phage P10F3B infection was prevented by both VF39-86B and VF39-48B mutants. Apart from the altered sensitivity of VF39-86B against four tested phages, other

LPS mutants did not exhibit any significant changes in their phage susceptibility compared to the wild type strain (Table 3-5). Out of the 9 EPS mutants screened,

VF39pssD::Tn5 was selected for further studies due to its elevated levels of resistance against R. leguminosarum phages tested (Table 3-6). The infection by phages CF3,

P10F3A, P10F3B and L338H were blocked in both LPS mutant VF39-86B and EPS mutant VF39pssD::Tn5.

97 Among the different types of polysaccharides synthesized by rhizobia, extracellular polysaccharides are important due to their functions associated with free-living rhizobia and the establishment of rhizobial symbiotic association with their host plant (Ivashina &

Ksenzenko, 2012). EPS are referred to as species or strain specific heteropolysaccharides of repeating units, which have slight or no cell association (Ivashina & Ksenzenko, 2012;

Skorupska et al., 2006). The gene pssD codes for a putative glucuronosyl transferase, and is classified as an essential gene for the EPS biosynthesis in R. leguminosarum where a mutation in pssD gene results in a strain entirely deficient in EPS production (Skorupska et al., 2006). EPS biosynthesis is initiated in R. leguminosarum by transferring an UDP- glucose to the lipid carrier attached to the cytoplasmic membrane, a reaction catalyzed by the gene product of pssA. Subsequently, pssD and pssE-encoded unit synthesis glucuronosyl transferase catalyses the addition of a glucuronic acid residue (Skorupska et al., 2006). Thus, the gene products of pssA and pssD are involved in early stage of the

EPS biosynthesis and considered as essential for the process (Ivashina & Ksenzenko,

2012; Skorupska et al., 2006). The importance of EPS in phage infection has been previously reported with Erwinia amylovora, where Podoviridae and Myoviridae phage infections were favored by the presence of two different types of host EPS (Roach et al.,

2013).

Based on the EOP experiments with wild type and VF39pssD::Tn5 mutant strain, it is evident that the mutant strain lacks an essential component for phage infection (Table 3-

7), or has a modified cell envelope that interferes with phage infection. However, complementation of the individual pssD gene, as well as both pssD and pssE, genes failed to restore the plaquing efficiency of the strain with majority of the phage tested.

98 Especially, P11VFA, P11VFC, B1VFA and B2VF-D1 phage infections were completely blocked even after the gene complementation. These results may suggests a possibility of one or more random mutations in the genome of VF39pssD::Tn5 mutant.

Also the pRleVF39a plasmid of the Rlv VF39pssD::Tn5 mutant runs slightly faster than the pRleVF39a plasmid of Rlv VF39 wild type (Figure 3-15). This may also play a role in altered phage sensitivity of Rlv VF39pssD::Tn5 mutant. Nevertheless, the involvement of

EPS in the infections of P9VF-AI, P9VF-CI, B3VFB and C2VFB can be concluded, as the susceptibility of the host was restored due to gene complementation.

3.2.3 Genome analysis

Bacteriophages are known to modify the bases in their genome as a mechanism to evade the host restriction modification systems. This has been previously reported with several different rhizobiophage, such as P10VF (Restrepo, 2012), RL38JI (Swinton et al., 1985),

RL2RES, RL1RES (Mendum et al., 2001), N3 (Martin & Long, 1984) and φM12 (Finan et al., 1984) that contain modified genomes, which exhibit resistance to cleavage by most of the available restriction enzymes. Based on the restriction profiles of rhizobiophages

P9VFCI and AF3, it can be suggested that these phages may harbor modified genomes.

Interestingly, both these phages with genomes resistant to enzymatic digestions were classified under the family Myoviridae, while three of the other phages where no such resistance was observed (L338H, P11VFC and B1VFA), belong to the family

Siphoviridae. Rhizobiophage P10VF, which was predicted to have a modified genome, was also characterized as a Myoviridae phage (Restrepo, 2012).

99 3.2.3.1 Genome of vB_RleM_P10VF (P10VF)

P10VF, a phage isolated using the trapping host Rlv VF39SM and classified in the family

Myoviridae, has been fully characterized previously with respect to morphology, growth, and host range (Restrepo, 2012). The genome size was predicted as 194 kb based on

PFGE and the genomic DNA of the phage exhibited remarkable resistance against the digestion by most of the restriction enzymes used (Restrepo, 2012). Phage P10VF exhibited a moderate host range, where it was capable of infecting 16 out of 24 indigenous strains tested (Restrepo, 2012). However, it had a much narrower host range against characterized rhizobial strains with only 7 out 25 strains being susceptible to its infection (Restrepo, 2012). Due to the ease of DNA extraction and its moderate host range, P10VF was selected for further characterization (Restrepo, 2012). The genome of

P10VF was sequenced using several strategies including: shotgun cloning with subsequent Sanger sequencing, 454-pyrosequencing and Ion Torrent technology.

However, the data obtained through all these approaches could not be successfully used in assembling the complete genome of the phage, due to the short length of the contigs resulted by the assembly (N50= 1,566 bp). The genomic DNA of phage P10VF was suggested to be highly modified, based on its ability to resist the digestion by an array of restriction enzymes. The failed attempts at sequencing using different strategies could also possibly be attributed to the presence of a modified genome (Restrepo, 2012).

The sequencing of the P10VF genome was repeated using Ion Torrent and Illumina technologies at University of Regina, Saskatchewan. The data obtained through both sequencing methods was successfully used in assembling the genome into a contig

156,446 bp in length. The complete annotated genome of P10VF (vB_RleM_P10VF) can

100 be found under the accession number KM199770. Out of the 257 predicted ORFs, putative functions could only be assigned to 59 protein-coding genes, leaving the functions of 198 putative genes (77.0%) to be unknown. Additionally, BLASTP similarities revealed that 34 of the predicted ORFs could be linked to bacteriophage T4- related conserved domains.

Based on the ICTV (International Committee for the Taxonomy of Viruses) classification, the genus ‘T4-like viruses’ is one of the three genera classified under the subfamily Tevenvirinae of family Myoviridae. This genus includes a diverse set of lytic myophages (Ex: T-even phages, Aeromonas phage 25 and Pseudomonas phage 42), which exhibit genetic and morphological similarities to the well-studied bacteriophage T4 of Escherichia coli. The members of the genus possess relatively larger genomes which can range from ~160-250 kb and have the ability to encode most of the components required for their own replisome and recombination systems (Miller et al., 2003; Nolan et al., 2006). The genomes of T4 and its relatives share mosaic similarities with each other; this has been attributed to their ability to undergo DNA rearrangements, replacements, translocations and inversions (Miller et al., 2003; Petrov et al., 2010).

Gene expression, DNA replication, repair and processing

A major fraction of the functional assignment of predicted ORFs in the P10VF genome was assigned to functions associated with DNA metabolism, replication, recombination and repair (Figure 3-8). Genes responsible for DNA replication can be identified scattered throughout the genome of P10VF. The genome codes for a DNA polymerase III epsilon subunit (ORF 13), an ATP-dependent helicase (ORF 76), a DNA helicase (ORF

101 100), a ssDNA binding protein (ORF 122), a DNA primase (ORF 148), a primase/ helicase (ORF 171), a putative sliding clamp (ORF 199), sliding clamp loader subunits

(ORF 200 and 201) and a DNA polymerase (ORF 214), all of which may be directly related in forming the basic replisome of the phage during replication. The gene coding for DNA polymerase in P10VF exhibited a sequence similarity to the conserved domain of gp43 (DNA polymerase) of T4 phage, while the products of ORF 122 (ssDNA binding protein) and ORF 148 (DNA primase) were associated with the conserved domains of gp32 and gp61 of T4 genome respectively. Furthermore, sequence similarities could be identified between respective conserved domains of T4 genes and products of ORF 199

(putative sliding clamp), ORF 200, ORF 201 (sliding clamp loader subunits) and ORF

171 (DNA helicase). ORF 215 codes for an ATP-dependent DNA ligase, while ribonuclease H activity was assigned for ORF 69.

The phage P10VF genome contains 6 ORFs that were predicted to code for products related to different endonucleases. Functional predictions of ORF 195 and 196 assigned them to be genes encoding subunits of recombination endonucleases, whereas the gene product of ORF 72 was also an endonuclease. ORF 77, 121 and 130 of phage P10VF genome code for putative homing endonucleases and exhibit sequence homologies to the

N-terminal catalytic GIY-YIG domain of bacteriophage T4 segABCDEFG gene encoding proteins (Appendix V). Homing endonucleases are characterized as highly specific, rare- cutting DNA cleaving enzymes that often reside within self-splicing elements such as group I introns, group II introns and inteins, conferring their own mobility within and between genomes (Belfort & Roberts, 1997; Chevalier & Stoddard, 2001; Edgell et al.,

2010; Stoddard, 2014). However, the T4 genome encodes 15 homing endonucleases, 12

102 of which are not intron-encoded. These genes are located within intergenic regions and known as free-standing endonucleases. The free-standing endonucleases belong mainly to two families, namely GIY-YIG and HNH homing endonucleases (Edgell et al., 2010).

The genes coding for GIY-YIG family homing endonucleases are denominated as seg

(Similar to Endonucleases encoded within Group I introns) genes. All the 3 homing endonucleases found in the P10VF genome can be categorized as free-standing homing endonucleases based on their association with the GIY-YIG domain of segABCDEFG genes.

Another striking feature of T4-like genomes, especially of T-even phages, is their ability to modify their genomes using their own enzyme systems confers resistance against host defense mechanisms upon them. The presence of hydroxymethylated-cytosine (hmCyt) within the genome instead of cytosine makes them immune to host restriction- modification systems, which are devised to recognize specific sequence with cytosine residues. In the genome of phage T4, phage-encoded dCTPase-dUTPase, dCMP- hydroxymethylase and dNMP kinase together form a machinery that results in a pool of hmCyt residues, which are used for the synthesis of phage DNA (Petrov et al., 2010;

Warren, 1980). The genome of P10VF carries several genes that can be identified as related to such a system. ORF 175 and 177 of P10VF genome encode a putative dNMP kinase and a pyrimidine hydroxymethylase (dCMP-hydroxymethylase), while dCMP deaminase activity was assigned to ORF 15. The phage-encoded dCMP deaminase converts part of the dCMP into dUMP, whereas the rest of the dCMP residues are used as the substrate for dCMP-hydroxymethylase (Petrov et al., 2010; Warren, 1980). Phage T4 also modifies its bases by glycosylation of hmCyt residues. Hydroxymethyl groups are

103 substituted with glucose molecules after DNA polymerization using phage-encoded α- and β-glucosyltransferases (Miller et al., 2003). Based on sequence similarities, ORF 157 of P10VF contained a putative conserved glycosyltransferase domain hinting at the possible existence of glycosylation system in the genome. Moreover, ORF 213 was assigned as a gene putatively coding for an adenine-specific DNA methylase. P10VF

DNA exhibited noteworthy resistance to cleavage by restriction enzymes, reducing the available choices for shotgun cloning and subsequent sequencing (Restrepo, 2012).

Furthermore, several failed trials of sequencing of the phage genome were also attributed to the presence of modified bases in the genome. The existence of an array of genes that can be identified with an involvement in base modification in the genome strengthens the above assumptions. Additionally, the size of the P10VF genome was approximated as

194 kb based on the results of PFGE. The discrepancy between the PFGE-estimated size and the size of the sequenced genome may be due to the presence of glu-hmCyt residues, which would be increasing the overall molecular weight of the phage genomic DNA considerably. If each hmCyt residue in the genome is glycosylated by a single glucose molecule, the molecular weight of P10VF genomic DNA would increase by approximately 18% of its actual MW due to glycosylation (No: of C in the genome =

33522).

DNA packaging and phage morphogenesis

The DNA packaging motor of ds DNA phages is an energy-driven process, where two main non-structural components contribute: the large terminase subunit with packaging

ATPase activity and small terminase subunit that mediates the specific recognition of phage DNA (Casjens, 2011; Rao & Feiss, 2008). ORF 111 was assigned as a gene coding

104 for a putative large terminase subunit, exhibiting homology to conserved domain of gp17 of T4 phage. Although not conclusively annotated as a gene coding for a small terminase subunit, the putative product of ORF 98 showed a weak sequence similarity to a terminase small subunit of an uncultured phage and also was associated with the conserved domain gp16 of T4. In dsDNA phages, DNA is typically translocated into the empty prohead by the terminase complex through an ATP-driven process. The assembly of phage prohead needs at least three main protein components: a portal protein, scaffolding protein and a major capsid protein (Casjens, 2011). The maturation of T4 head occurs with the activity of prohead protease, which cleaves the N-termini of major capsid protein and portal vertex protein, resulting in the empty and mature prohead for packaging of DNA. ORF 83 of P10VF genome is annotated as gene coding for a major capsid protein, ORF 80, 85 and 87 encode a trypsin-like serine protease (prohead protease), a prohead assembly protein and a portal vortex protein respectively.

Bacteriophage tails are designed in order to achieve several vital goals including recognition and attachment to a host cell, penetration of the host cellular envelope and ejection of phage DNA into host cytoplasm (Leiman et al., 2004). A number of genes in the genome of P10VF were identified to have functions associated with baseplate and tail morphogenesis. Based on the sequence similarities, three baseplate wedge subunits (ORF

53, 96 and 97), a tail sheath stabilization protein (ORF 55), a tail completion and sheath stabilizer protein (ORF 66), a tail tube protein (ORF 88), a tail sheath protein (ORF 89), two phage baseplate hub subunit and tail lysozymes (ORF 60 and 110) and a base plate wedge protein (ORF 168) were recognized. The tail of a Myoviridae phage is composed of a baseplate and a tail tube, which is covered with the outer tail sheath. The contractions

105 of the tail sheath push the phage DNA through the inner passage created by the tail tube proteins into the host cell (Aksyuk et al., 2011; Leiman et al., 2004). In T4 phage, a complex formed by two proteins, tail lysozyme (gp 5) and baseplate hub (gp 27) is located at the tip of the tail tube, where it helps to penetrate through the outer membrane and peptidoglycan during the infection (Miller et al., 2003).

Host cell lysis

The infection cycle of a lytic phage is concluded by the release of progeny phage particles through the lysis of the host bacterial cell. The lysis process of phages, especially dsDNA phages, is a complex and carefully timed event with the involvement of two major proteins. A phage-encoded endolysin degrades the peptidoglycan layer of the host cell, while another protein, known as holin, helps the endolysins to move from the cytoplasm to the periplasmic space where it can approach its substrate. The gene product of ORF 41 in P10VF genome was assigned the function of N-acetylmuramoyl-L- alanine amidase. It was also associated with the conserved domain of peptidoglycan recognition proteins. N-acetylmuramoyl-L-alanine amidases are one of the five main classes of endolysins, and are capable of cleaving amide linkages between the N- acetylmuramoyl residues and stem peptides (Borysowski et al., 2006; Loessner, 2005;

Pastagia et al., 2013). However, no ORF that could be identified as a holin-encoding gene was observed in the P10VF genome.

106 3.3 Author’s contributions

The work reported in this chapter was complementary to the study carried out by Marcela

Restrepo for her M.Sc. thesis (2012). Suriakarthiga Ganesan contributed in phage isolation and host range experiments. Benjamin Perry (Dr. Christopher Yost’s lab,

University of Regina) carried out the sequencing and the assembly of the P10VF genome, while I performed all the other experiments mentioned in this chapter.

107 Chapter Four: Characterization of Rhizobium gallicum phages

Rhizobium gallicum strains are capable of nodulating and fixing nitrogen with common bean, Phaseolus vulgaris. They have been further divided into two subgroups: biovar gallicum and biovar phaseoli, based on their host ranges (Amarger et al., 1997).

Although a number of rhizobiophages have been previously reported and characterized, to our knowledge, none have been isolated with specificity to strains of R. gallicum. We used a R. gallicum strain isolated from soil obtained from Saskatchewan, Canada, as a trapping strain to isolate R. gallicum specific bacteriophages from a variety of soil samples from western Canada. This R. gallicum strain S014B-4 was not infected by most of the phages in the Hynes lab collection.

4.1 Results

4.1.1 Phage isolation and trapping rhizobial host

Phage isolation was performed with soil samples with history of legume cultivation, obtained from Alberta and Saskatchewan, Canada, using R. gallicum strain S014B-4 as its trapping host. The strain S014B-4 is an indigenous rhizobial host that was first isolated from soils associated with Vicia cracca in Outlook, Saskatchewan. 16s rDNA analysis of the Rhizobium strain S014B-4 revealed that it belongs to the species R. gallicum, placing it relatively close to the other known R. gallicum strains in a phylogenetic tree based on the 16s ribosomal sequences. Phages with different plaque morphologies were isolated with single plaque isolations and stored at 2-8 °C in TY

(Table 4-1). All the phage isolates were named according to the naming system described in the previous chapter. The phages, which were classified according to their morphology

108 were further named based on the guidelines of the proposed virus nomenclature system

(Kropinski et al., 2009).

4.1.2 Host range of R. gallicum phages

The host ranges of the isolated phages were explored with spot testing using the agar overlay technique. Results were recorded as ‘lytic’ or ‘ non lytic’ based on the clearance on the bacterial lawn after overnight incubation at 30°C. To determine the infection spectrum of each phage isolate, an array of previously characterized strains of rhizobia were used (Table 4-2).

This collection included R. gallicum bv. gallicum strains R60spT, PhP222 and PhF29

(Amarger et al., 1997; Geniaux et al., 1993). R. gallicum bv. phaseoli strains PhD12 and

PhI21(Amarger et al., 1997; Geniaux et al., 1993). R. leguminosarum strains F3 (Frost and Yost, unpublished), R. leguminosarum bv. viciae (Rlv) strains 3841 (Poole et al.,

1994), VF39SM (Priefer, 1989), 306 (Hirsch, 1979), 3855 (Brewin et al., 1982), 336

(Josey et al., 1979) and 248 (Josey et al., 1979) were also used for the host range assays.

All these strains were resistant to phages tested, except for R. gallicum bv. phaseoli strain

PhI21, which was infected by P106CI. R. etli strain CE3 (CFN42 derivative) (Noel et al.,

1984), R. tropici CIAT 899T (Martínez-Romero et al., 1991) and Sinohizobium meliloti strains AK631 (Kondorosi et al., 1989), CC2013 (Charles et al., unpublished) and

102F34 (Becker et al., 1993) were also resistant against phage infections. P106B was infective against R. leucaenae strain CFN 299T (Ribeiro et al., 2012).

109 Table 4-1. R. gallicum phages and their morphological characteristics

Name of the Phage isolated soil sample/ Source Name Plaque morphology Virion morphology*

Family: Siphoviridae Clear plaques with a P106A surrounding thin halo Head diameter: 78±7 nm (vB_RglS_P106A) 1 zone (d ≈2 mm) Tail length: 130±10 nm Tail width: ND

Family: Siphoviridae Clear plaques with no P106B halo zone (d1 ≈0.5-1 Head diameter: 61±6 nm P10 (Pea soil (vB_RglS_P106B) mm) sample 10)/ Tail length: 109±8 nm Yost garden, Tail width: 12±3 nm Calgary AB

Family: Siphoviridae

P106CI Opaque plaques with no 1 Head diameter: 73 7 nm (vB_RglS_P106CI) halo zone (d ≈2 mm) ± Tail length: 140 nm Tail width: ND

Opaque plaques with no P106DII halo zone (d1 ≈0.5-1 ND mm) 1 plaque diameter, ND- not determined

* The dimensions of the phage particles were determined using the AMT software. The values represent at least three measurements (n ≥ 3) with standard deviation (± SD).

110 Table 4-2. Host range of Rhizobium gallicum phages with known rhizobial strains.

Grey squares with a ‘+’ sign indicate lysis, where no lysis is indicated with clear squares with a ‘−’.

Rhizobial Strain P106B P106A P106CI P106DII

R. gallicum S014B-4 (6) + + + +

R. gal bv. gallicum R602spT − − − −

R. gal bv. gallicum PhP222 − − − −

R. gal bv. gallicum PhF29 − − − −

R. gal bv. phaseoli PhD12 − − − −

R. gal bv. phaseoli PhI21 − − + −

R. leg bv. viciae 3841 − − − −

R. leg bv. viciae VF39SM − − − −

R. leg bv. viciae 248 − − − −

R. leg bv. viciae 306 − − − −

R. leg bv. viciae 336 − − − −

R. leg bv. phaseoli 8401 − − − −

R. leg 3855 − − − −

R. leg F3 − − − −

S. meliloti CC2013 − − − −

S. meliloti AK631 − − − −

S. meliloti 102F34 − − − −

R. etli CE3 − − − −

R. leucaenae CFN299T − + − −

CIAT899 − − − −

111 Table 4-3. Host range of Rhizobium gallicum phages with indigenous rhizobial strains

Twenty-four indigenous rhizobial strains isolated from different soil samples associated with legume growth were screened against the phage isolates. Grey squares with a ‘+’ sign indicate lysis, where no lysis is indicated with clear squares with a ‘−’.

Rhizobial Strain* P106B P106A P106CI P106DII

R. gallicum S014B-4 (6) + + + +

R. gallicum S004A-2 (14) + + − + R. gallicum S013A-1 (15b) − + − − R. gallicum S019B-5 (27) − − − − Rhizobium spp. S023A-4 (1) − + − + Rhizobium spp. S022A-2 (2) − − − − Rhizobium spp. S018A-3 (3) − − − + Rhizobium spp. S027B-3 (4a) − − − + Rhizobium spp. S018A-1 (5) − − − + Rhizobium spp. S001B-3 (7) − − − − Rhizobium spp. S008A-5 (8) − − − − Rhizobium spp. S012A-1 (9) − − − − Rhizobium spp. S016B-1 (10) − − − − Rhizobium spp. S016A-3 (11) − − + − Rhizobium spp. S015A-1 (12) − − − + Rhizobium spp. S030B-4 (13b) − − − − Rhizobium spp. S020A-5 (16a) − − − − Rhizobium spp. S002A-4 (17a) − − − − Rhizobium spp. S011B-1 (18) − − − − Rhizobium spp. S012A-2 (19) − − − − Rhizobium spp. S012B-3 (20) − + + + Rhizobium spp. S028A-4 (21) − − − − Rhizobium spp. S010A-2 (23a) − + + + Rhizobium spp. S010B-1 (24) − − − − Rhizobium spp. S021A-2 (25) + + − +

112 * The indigenous strains were classified according to their plasmid profiles and the number within brackets indicates their profile group for easy identification.

113 Apart from these characterized rhizobial strains; phages were also screened against different indigenous rhizobial strains isolated from root nodules of P. sativum. These strains have been isolated from pea plant nodules by inoculating them with soil samples associated with different legumes (Hynes lab, unpublished). Phage P106DII showed broader host range against these indigenous strains, infecting 9 strains out of the 24 strains tested, whereas P106B was capable of infecting 6 indigenous rhizobial isolates

(Table 4-3).

4.1.3 Lysis curves

The ability of phages to lyse their rhizobial host was determined by infecting an exponentially growing rhizobial culture. The onset of lysis by both P106B and P106CI was observed 3 hours after phage inoculation. The lysis by phage P106CI was observed to be rapid within the 2 hours succeeding the onset of lysis. The OD600 of the host stabilized after that, for the following 4-5 hours. The P106B infection showed steady and slow decrease in the OD600 of the host culture through the course of experiment.

4.1.4 Plaque formation and morphology

Phage P106B produced relatively small plaques with an average diameter of 0.5-1 mm on a 0.5% TY overlay after 24 hours of incubation at 30°C (Figure 4-2A). The electron microscopy of P106B revealed that the phage belongs to the family Siphoviridae (Figure

4-2B), with a characteristic long, non-contractile tail and an icosahedral head. The head diameter of the particle was estimated to be 61±6 nm where the tail length and width were determined as 109±8 nm and 12±3 nm respectively (Table 4-1).

114

Figure 4-1. Lysis curves of phage P106B and P106CI

Phage P106B and P106CI were propagated on R. gallicum SO14B-4 (6) at 30°C. Values represent means of three independent replicates and error bars indicate standard error of the mean (SEM). The values are the OD600 at different time points. Time= 0 hours represents the phage inoculation and a negative control of R. gallicum SO14B-4 (6) without any phage was also performed.

115

Figure 4-2. Morphological characterization of phage P106B

(A) Plaque morphology of P106B on a lawn of R. gallicum SO14B-4 (6), grown on 0.5%

TY overlay after 24 hours of incubation at 30°C. (B) Transmission electron micrograph of phage P106B stained with 1% uranyl acetate. Scale bar represents 100 nm.

116

Figure 4-3. Transmission electron micrographs of R. gallicum phages

Transmission electron micrographs of phage P106B with R. gallicum strain S014B-4 (6)

(A), P106CI (B) and P106A (C) stained with 1% uranyl acetate. Scale bars represent 500 nm in A and 100 nm in B and C.

117

TEM was also performed on two other phages, P106A and P106CI, both of which had characteristic morphology of phages of family Siphoviridae (Figure 4-3 and Table 4-1).

Phages P106B and P106DII were selected for further characterization as they had comparatively higher infection ability against the indigenous strains than the other R. gallicum phages tested. However, phage DNA extraction with P106DII was challenging and many attempts resulted in yields of DNA concentrations that were not sufficient for further analysis. Therefore, further characterizations were carried out only with phage

P106B.

4.1.5 One-step growth curve of P106B

A one-step growth curve for P106B was performed to assess the time scale and the burst size of the phage infection. The eclipse and the latent periods of the phage were determined as 60 minutes and 100 minutes accordingly (Figure 4-2), which subsequently led to an exponential increase of phage particles within a period of 60 minutes. The burst size of the phage was calculated as 21 pfu/ infected cell.

4.1.6 Genome characterization of P106B

Based on PFGE, the genome size of rhizobiophage P106B was estimated to be 56 kb

(Figure 4-3A). The phage DNA was digested with an array of restriction enzymes with different methylation sensitivities. DNA of P106B was cleaved by EcoRI, XbaI, DraI,

BclI, NsiI and SphI (Figure 4-3B). HindIII, MboI and NdeI were also able to digest

P106B DNA (Data not shown). Restriction enzymes BamHI, SmaI, KpnI, PstI and BglII were unable to cleave P106B DNA (data not shown).

118

Figure 4-4. One step growth curve of phage P106B

Phage P106B was propagated on R. gallicum SO14B-4 (6) at 30°C. Values represent means of three independent replicates and error bars indicate standard error of the mean

(SEM). The values are the log pfu/ml at different time points.

119 Based on the data obtained through Sanger sequencing and Ion Torrent Sequencing, the genome of the P106B was assembled into one final contig of 56 kb with an average coverage of 83.39X. The virtual digest of the assembled contig with enzymes XbaI and

SphI was compared with the restriction profile of the phage DNA digested with same enzymes, to confirm the proper assembly. The genome has an average G+C content of

47.9%.

Based on the predictions from RAST and Prodigal, 95 putative ORFs were identified in the genome of P106B, which represents 93.6% of the total genome (Figure 4-6 and

Appendix VI). ATG was identified as the most frequent start codon with 86.3% of the predicted ORFs are initiating with it, while 8.4% and 5.3% of the ORFs are initiated with codons GTG and TTG respectively. Putative gene products of these predicted ORFs were compared with the previously characterized gene products available in the public database using BLASTP with the PSI -BLAST algorithm. A significant BLASTP hit was classified with the cut-off e-value of 10-4. Based on the sequence similarities, putative functions were only assigned to 22 (23.1%) of these predicted ORFs, while 47 (49.5%) of the predicted gene products had no homology to previously characterized proteins present in the NCBI database. 27.4% (26 ORFs) of predicted protein coding genes exhibited sequence similarities to other hypothetical proteins, where 34.6% of these hypothetical products were homologous to other phage-related proteins (Figure 4-7).

According to both tRNAscan-SE and ARAGORN predictions a single tRNA gene was found in the P106B genome, which was located at 36,501-36,576 bp between two ORFs encoding hypothetical phage proteins (ORF51 & ORF52). The putative tRNA is 76 bp in

120 length with an average G+C content of 50% and predicted to be functioned as tRNA-

Leu with the anticodon TAA (Leu, TTA).

4.1.7 Analysis of phage proteins

Samples for SDS-PAGE were prepared by concentrating phage particles at 70,000 X g for 40 minutes at 4°C, followed by a further purification step with a centrifugal filter device. Concentrated phage samples were run on a 12% SDS-PAGE gel to separate the phage proteins (Figure 4-8). Approximately, 15 different protein bands could be observed for phage P106B on a SDS-PAGE gel, when stained in 0.2% of Coomassie Brilliant

Blue. The most prominent protein band, which ran at ~42 kDa was excised and analyzed using trypsin proteolysis followed by liquid chromatography tandem mass spectrometry

(LC-MS/MS) at Southern Alberta Mass Spectrometry (SAMS) Centre, University of

Calgary. The protein band was identified as the major coat protein of P106B with sequence coverage of 89%. The ORF 9 of the P106B genome is assigned as a gene coding for a major coat protein and the approximate calculated molecular mass of the predicted protein is ~44 kDa, which falls within the range of the value determined by the

SDS-PAGE.

121

Figure 4-5. (A) Pulsed field gel electrophoresis (PFGE) and (B) restriction enzyme profiles of P106B DNA

(A) P106B DNA (Lane b.) was run on a 1% agarose gel for 18 hrs at a 1-10s switch time,

6V/cm with a linear ramping factor and an included angle of 120°. Low range PFG marker was run in lane a. (B) Phage DNA was digested with XbaI (B2.), SphI (B3.), BclI

(4), DraI (5), NsiI (6) and EcoRI (7). Lane 1 contained 1kb DNA ladder.

Figure 4-6. Genome arrangement of vB_RglS_P106B Graphical representation of the organization of rhizobiophage vB_RglS_P106B (P106B) genome generated with Genious 6.1.2 (Drummond et al., 2011). Each predicted ORF is represented by a single arrow and the scale is given in base pairs. The gene encoding a putative phage coat protein was experimentally identified by SDS-PAGE followed by LC-MS/MS and is indicated by an asterisk (★). 122

123

Figure 4-7. Distribution of predicted ORFs in P106B genome

124

Figure 4-8. SDS-PAGE of P106B phage proteins

Phage particles were denatured and separated on 12% SDS-PAGE (Lane B.). The gel was stained with 0.2% of Coomassie Brilliant Blue R250. Lane A contained the molecular size marker. The protein band indicated with a white arrow was cut out of the gel and analysed by LC-MS/MS. The LC-MS/MS data were compared against the predicted phage proteins of P106B and the adjacent table shows the summary of the analysis.

125

4.2 Discussion

Rhizobium gallicum strain S014B-4 was used in this study as the trapping host to isolate phages from legume soil samples. The strain S014B-4 is an indigenous rhizobial host that was first isolated from root nodules of pea plants by inoculating them with soils associated with Vicia cracca in Outlook, Saskatchewan. The sequence analysis of 16s rRNA gene of the Rhizobium strain S014B-4 revealed that it belongs to the species R. gallicum, placing it relatively close to other known R. gallicum strains (Sequence identity

96%, e-value 0). The strain S014B-4 was used as the strain of choice, due to the high resistance levels it exhibited against the majority of phages present in the Hynes lab collection. Although a number of rhizobiophages have been previously reported and characterized, to our knowledge, none have been isolated with specificity to strains of R. gallicum.

4.2.1 General characteristics of R. gallicum phages

The breadth of different bacterial genera, species and strains that a phage can lyse is considered as a vital biological characteristic when characterizing a phage. To determine the host range, R. gallicum phages were screened against a broad range of rhizobia, which included previously characterized strains, and some rhizobial field isolates. All the phages exhibited a narrow host range, infecting a limited number of strains (Table 4-2 and 4-3). They did not infect most of the characterized rhizobial strains, including five R. gallicum strains from Europe. However, P106B was infective against R. leucaenae strain

CFN 299T (Ribeiro et al., 2012), whereas R. gallicum bv. phaseoli strain PhI21 was susceptible to P106CI infection. Apart from these characterized rhizobial strains; phages

126 were also screened against few different indigenous rhizobial strains. Strains S023A-

4, S004A-2, SO13A-1, S010A-2, S012B-3 and S0019B-5 were sensitive towards the

P106B infection whereas most of the other strains were resistant. Interestingly, 16s rDNA sequencing of two of these susceptible indigenous rhizobial hosts (S004A-2 and

S0019B-5) revealed that they also type as R. gallicum. Phage P106DII showed a broader spectrum of infectivity against indigenous strains of rhizobia, infecting 9 strains (S004A-

2, S023A-4, S018A-3, S027B-3, S018A-1, S015A-1, S012B-3, S010A-2 and S021A-2).

The role of transmission electron microscopy in phage classification is considered as essential. All R. gallicum phages examined in this study fall under the order

Caudovirales, which includes all tailed phages. Furthermore, all of them were classified in the family Siphoviridae, which contains phages with characteristic long, flexible tails presumably with non-contractile properties. Among the phages observed under electron microscopy, Siphoviruses are identified as the most numerous group of tailed phages

(Ackerman, 2009). The head diameters of phage P106A, P10CI and P106B were estimated to be 78±7 nm, 73±7 and 61±6 nm respectively, whereas the tail lengths were determined as 130±10 nm, 140 nm and 109±8 nm accordingly. Similar dimensions have been reported with rhizobiophages belonging to the family Siphoviridae. The head diameters of R. meliloti phages φM11S, NM1 and NM8 were estimated as 65, 70 and 64 nm accordingly, while the corresponding tail lengths were approximated as 127, 110 and

111 nm (Werquin et al., 1988). The R. leguminosarum phage L338C was reported to have a longer non-contractile tail with 298 nm in length (Restrepo, 2012).

The transduction ability of phage is an important element in horizontal gene transfer and hence considered as a useful tool in genetic manipulations. Rhizobiophages have been

127 widely studied for their likely use as vectors in genetic engineering due to their transducing ability (Finan et al., 1984; Lawson et al., 1987; Mink et al., 1982; Shah et al.,

1981). The transduction ability of phages P106B and P106CI were tested with an auxotrophic mutant strain of R. gallicum S014B-4. However no putative transductants were obtained for both phages. This was observed consistently at all different MOIs tested, indicating that it is unlikely ability that phages P106B and P106CI would have any application as transducing agents.

4.2.2 Genome analysis

The presence of highly modified genomes, which prevents the cleavage by different restriction enzymes has been previously reported with rhizobiophages. Phages P10VF

(Restrepo, 2012), RL38JI (Swinton et al., 1985), RL2RES, RL1RES (Mendum et al.,

2001), N3 (Martin & Long, 1984) and φM12 (Finan et al., 1984) contain modified genomes, which are resistant to restriction digestions. However, DNA of P106B did not exhibit such resistance and it was cleaved by a majority of the restriction enzymes used

(EcoRI, XbaI, DraI, BclI, NsiI and SphI) (Figure 4-5B). HindIII, MboI and NdeI were also able to digest P106B DNA (data not shown). Nevertheless, restriction enzymes

BamHI, SmaI, KpnI, PstI and BglII were unable to cleave P106B DNA (Data not shown), as the genome lacked the DNA sequences specific to these enzymes.

General features of the genome

The genome of P106B is 56,024 bp in length with an average G+C content of 47.9%. In general, most of the phage genomes studied so far tend to be more A+T rich than their

128 respective bacterial hosts (Rocha & Danchin, 2002). Although the exact genome composition of the trapping host R. gallicum SO14B-4 is unknown, the closely related members of the bacterial host within the genus Rhizobium share genomes with an average

G+C content of 60-63%. However, the genome of P106B is richer in A+T compared to the other characterized rhizobiophage genomes. The G+C compositions of both R. leguminosarum bv. viciae 3841 phage L338C and Sinorhizobium meliloti phage 16-3

(Semsey et al., 1999) are higher than P106B with a value of 59%, whereas the average

G+C content of R. etli phage RHEph10 is 60.3%. R. etli phages RHEph02, RHEph03,

RHEph08 and RHEph09 all have genomes with an average G+C content of 49%

(Santamaria et al., 2014).

Based on the predictions from RAST and Prodigal, the genome of P106B contained 95 putative ORFs, which represents 93.6% of the total genome. Based on the sequence similarities, putative functions could only be assigned to 22 (23.1%) of these predicted

ORFs, while 47 (49.5%) of the predicted gene products had no homology to previously characterized proteins present in the NCBI database. None of the predicted proteins of

P106B genome exhibited any sequence similarity to known lysogeny-related proteins, suggesting an exclusively lytic lifecycle.

The complete annotated genome sequence of the Rhizobium phage vB_RglS_P106B

(P106B) can be found under the accession number KF977490. The current NCBI database contains fifteen fully sequenced and characterized rhizobiophage genomes, which include: Sinorhizobium phage 16-3 (Deák et al., 2010; Ganyu et al., 2005; Semsey et al., 2002), Sinorhizobium phage PBC5 (accession no: NC_003324.1), R. etli phages

RHEph01- RHEph06 and RHEph08- RHEph10 (Santamaria et al., 2014), Rhizobium

129

(Agrobacterium) radiobacter phages RR1-A and RR1-B (NC_021560.1 and

NC_021557.1), S. meliloti phage ϕM12 (Brewer et al., 2014) and R. leguminosarum phage L338C (KF614509). None of these phages were recorded to be isolated using a R. gallicum strain as a trapping host. Therefore, genome of phage P106B, isolated using strain R. gallicum S014B-4 (6) can be considered as the first available complete genome sequence for a phage of R. gallicum.

Presence of tRNA in the P106B genome

Putative tRNAs have been identified as the most frequently present translation-associated genes present in phage genomes (Bailly-Bechet et al., 2007). A phage genome may contain multiple tRNAs. For example there is a record of 26 tRNA genes in Erwinia amylovora phage ϕEa21-4 (Lehman et al., 2009). Both tRNAscan-SE and ARAGORN predicted the presence of a single tRNA in the P106B genome that was located at 36,501-

36,576bp. The putative tRNA gene was predicted to function as tRNA-Leu with the anticodon TAA (Leu, TTA). The distribution of tRNAs in phage genomes is often correlated with codon usage bias in the phage (Bailly-Bechet et al., 2007) . Mimicking the codon usage of its host is considered as the best mechanism for a phage to complete their protein synthesis, but this may not be achievable in all instances especially due to the discrepancies in the G+C content of the phage and its host. A phage genome with relatively higher A+T composition is unable to achieve high compatibility of codon usage with its bacterial host. Therefore it is suggested that phages may compensate for this difference with the presence of tRNA genes in their genomes that correspond to codons that are frequently used by the phage but rarely used by the host bacteria (Bailly-Bechet

130 et al., 2007). Due to the unavailability of complete genome sequence of its host bacterium R. gallicum S014B-4 or any other strain of R. gallicum, codon usage frequency of P106B for amino acid leucine was compared to that of the closely related member of the genus, R. leguminosarum bv. viciae (Rlv) 3841 and three other characterized rhizobiophages (Table 4-4). Although TTA was not the predominant leucine encoding codon in P106B genome, it had a notable frequency of 1.16%, while it is the most rare codon used for the corresponding amino acid in Rlv 3841 genome (0.08%).

Rhizobiophage L338C also exhibits a resemblance to its bacterial host Rlv 3841 in codon usage, whereas a similar pattern was observed with phages 16-3 and RHEph10 with TTA being the least used codon for leucine. Although this may suggest that the phage P106B may harbour the UUA decoding-tRNA-Leu in its genome as a strategy to compensate for the probable rare use of the particular codon in its bacterial host genome, this cannot be confirmed without comparing the codon usage frequencies of its actual host strain.

DNA packaging and phage morphogenesis

Based on the functional predictions of genes within the P106B genome, only one putative phage terminase large subunit gene could be confidently identified as a component of the phage DNA packaging machinery. Typically, the DNA packaging motor of dsDNA bearing tailed phages follows a similar pattern whereby two adjacently located genes code for two non-structural components, large and small terminase subunits (Casjens,

2011). Contrary to the typical two protein-packaging machine, several examples of phages are known to have a DNA packaging mechanism composed of one protein component along with a hexamer of RNA called pRNA (Shu & Guo, 2003). This atypical

131 motor is well studied in phage φ29 (Guo et al., 1987) and further identified in phages such as Cp-1 (Martin et al., 1996) and φEa21-4 (Lehman et al., 2009). A genome with a single predicted terminase gene might suggest that the P106B may follow this alternative mechanism as its DNA packaging motor. Although, in most tailed phages, a gene encoding a portal protein can be identified in the vicinity of the terminase genes, sequence-based comparisons of P106B genome did not provide any evidence for the presence of such a gene. However, two other structural protein-coding genes (ORF 2 and

9) were identified in the close proximity of the terminase gene. Commonly, dsDNA phages package their DNA into a prohead or procapsid, the assembly of which requires the functioning of three separate procapsid proteins: a coat protein, a scaffolding protein and a portal protein (Casjens, 2011). ORF 9, which has been assigned as a putative phage coat protein may have an involvement in the procapsid formation. This structural protein with an approximate molecular mass of 44 kDa was identified on a SDS-PAGE based protein profile of the phage structural proteins and confirmed by LC-MS/MS data (Figure

4-8).

132 Table 4-4. Codon usage of rhizobiophages P106B, L338C, 16-3, RHEph10 and bacterial host R. leguminosarum bv. viciae 3841 for amino acid leucine

The highlighted codon represents the corresponding anticodon used by the tRNA present in the phage P106B genome.

Frequency of codon usage (%) Amino Codon Phage Phage Phage Phage acid Rlv 3841 P106B L338C 16-3 RHEph10 Leu TTA 1.16 0.07 0.11 0.04 0.08 TTG 2.18 0.73 0.98 0.39 0.87 CTT 1.59 1.16 1.45 0.84 1.45 CTC 0.46 2.69 1.90 3.08 3.23 CTA 0.77 0.26 0.6 0.11 0.19 CTG 1.08 2.39 2.46 3.69 4.14

133 The cluster of phage morphogenesis genes includes ORF 19, encoding a putative head morphogenesis protein followed by a set of genes assigned for the functions related to tail assembly. The cluster of tail genes begins with genes encoding two tail length tape measure proteins encoding genes (ORF 31 and 32), which function to define the length of the phage tail (Kutter et al., 2005). Functional predictions of the ORF 33 and ORF 34 assigned them to be genes encoding for minor tail proteins, whereas the two subsequent

ORFs are predicted to be involved in tail assembly. The tail morphogenesis gene cluster ends with three genes, ORF 37, 38 and 39, which code for a tail component, a tail collar domain containing-protein and a putative tail fiber assembly-like protein, respectively.

Additionally, distant to this well identified tail-genes region, a remotely located phage tail assembly like protein-coding gene (ORF 77) was also identified.

DNA replication, repair and processing

The genes involved in phage DNA replication, repair, and processing can be identified, clustered together in a region that loosely stretches from ORF 48 to ORF 71 within the genome. This begins with the ORF 48, predicted as a putative phage DNA polymerase, containing a DNA polymerase type II-B catalytic domain followed by the ORF 50 with the functional prediction of a putative exodeoxy ribonuclease enzyme. A helicase function was assigned to ORF 58, whereas DNA primase activity was attributed to ORF

71. All of the genes coding for proteins related to DNA replication and processing functions exhibit greater sequence similarities to their phage counterparts than corresponding bacterial-related genes. Additionally, the gene product of ORF 70 was identified as a probable DNA binding protein with a putative helix-turn-helix (HTH)

134 conserved domain. The other notable prediction in this cluster is attributed to ORF 67, which encodes a probable nucleoside triphosphate pyrophosphohydrolase enzyme.

Pyrophosphohydrolase activity is necessary in catalyzing the removal of pyrophosphates resulting from nucleoside triphosphate (NTP) hydrolysis (Moroz et al., 2005).

Furthermore, the NTP-Ppase activity of NTP Pyrophosphohydrolases is reported to have a role in cellular stress responses and “house-cleaning” functions ((Lu et al., 2010; Moroz et al., 2005).

Host cell lysis

The release of progeny phage particles with the lysis of host bacterial cell denotes the completion of the lytic life cycle of a virulent bacteriophage. Host cell lysis occurs by at least two different strategies involving either a single gene product or two different proteins. In ssDNA and ssRNA phages, the lysis system requires a single protein, which inhibits the synthesis of peptidoglycan. However, the dsDNA phages use a holin- endolysin system in achieving host cell lysis, whereby a muralytic enzyme or endolysin degrades the host cell peptidoglycan. The accurately timed activity of holin allows the percolation of endolysin through the cytoplasmic membrane, where it gains the access to its substrate, resulting cell lysis. Several enzymatic activities can be associated with phage endolysins, one being the activity of “true lysozymes” or N-acetyl muramidases

(Lehman et al., 2009). ORF 40 was identified as the only recognized lysis gene in the

P106B genome. This gene encodes a phage related lysozyme and BLASTP searches of the ORF exhibited a sequence similarity to the lysozyme of phage P22.

135 4.3 Author’s contributions

I have performed all the experiments mentioned in this chapter with the exception of transduction experiments. Leyna Thraya, an undergraduate project student, carried out the transduction experiments for phages, under my direction.

136

Chapter Five : Characterization of Mesorhizobium loti phages

Mesorhizobium spp. can nodulate a wide variety of legume hosts, including some of the agriculturally important crops such as chickpea (Cicer arietinum L.). Chickpea is the second most widely grown pulse crop worldwide (Laranjo et al., 2014) and it is becoming an important legume crop in western Canada, with almost all the Canadian chickpea production coming from Saskatchewan (88%) and Alberta (12%) (Crop profile for chickpea in Canada, March 2008). Very few records on phages of Mesorhizobium are available and most of these studies are limited to morphological and growth characterization. To our knowledge there are no available complete genome sequences for any Mesorhizobium phages. Considering the growing interest in chickpea as a pulse crop in western Canada, as well as the insufficient information available on

Mesorhizobium phages, including the lack of any transducing phages for genetic work, we initiated this study on phages of mesorhizobia.

5.1 Results

5.1.1. Phage isolation and trapping rhizobial hosts

Phage isolation was carried out with three soil samples (Lo1, Lo5 and Cp1) using

Mesorhizobium loti strains R7A and R7ANS as trapping rhizobial hosts. M. loti R7A is a wild type symbiotic strain originally isolated from nodules of Lotus corniculatus

(Sullivan et al., 1995), whereas the strain R7ANS is the non-symbiotic derivative lacking the symbiosis island ICEMlSymR7A (Ramsay et al., 2006). Phages with different plaque morphologies were further purified with at least three successive single plaque isolations and stored at 2-8 °C in TY.

137 5.1.2. Host range of Mesorhizobium phages

The isolated phages of M. loti were screened against a variety of rhizobial hosts using plaque assays to determine their spectrum of infectivity. The results were recorded as

‘lytic’ or ‘non lytic’ after observing the bacterial host lawns for a possible clearance by phage. Phage Cp1R7A-A1 was capable of infecting the majority of Mesorhizobium strains tested (Table 5-1). Mesorhizobium strain NZP2213 (Sullivan et al., 1995) was resistant to all phages tested, while R88B was only infected by Cp1R7A-A1. Apart from the trapping hosts, R97A (Sullivan et al., 1995) was the most susceptible host against our phage isolates with five out of eight phages forming plaques with it. The host ranges of

Lo1R7ANS-A1, Lo1R7ANS-A1 and Lo5R7ANS were extremely narrow when tested on the selected mesorhizobia, with their infections completely limited to the two M. loti strains (R7A and R7ANS) that were used as trapping hosts.

The phage isolates were also screened against several other rhizobial strains other than

Mesorhizobium spp. (Table 5-2). These strains included R. leguminosarum bv. viciae

(Rlv) strains 3841(Poole et al., 1994) , VF39SM (Priefer, 1989), 306 (Hirsch, 1979), 3855

(Brewin et al., 1982), 336 (Josey et al., 1979) and 248 (Josey et al., 1979), R. etli strain

CE3 (CFN42 derivative) (Noel et al., 1984), R. tropici CIAT 899T (Martínez-Romero et al., 1991), R. leucaenae strain CFN 299T (Ribeiro et al., 2012) and R. gallicum bv. gallicum strains R60spT. However, all these strains were resistant against the infection by mesophages except for the R. etli strain CE3 (CFN42 derivative). Phages Cp1R7ANS-C1 and Cp1R7ANS-D2 were capable of infecting strain CFN42. Several indigenous isolates of rhizobia were also tested against the phage collection. However, most strains exhibited

138 resistance towards the phages, while only phages Cp1R7ANS-C2 and Cp1R7ANS-D2 were infective against Rhizobium spp. S021A-2 (25).

5.1.3. Plaque formation and morphology

Phages Lo1R7ANS-A, Lo1R7ANS-B1, Cp1R7ANS-C2 and Lo5R7ANS formed clear plaques on their host bacterial lawns. However, after ~36 hours opaque-halo zones could be observed surrounding the clear plaques. Upon further incubation at 30°C, these halo zones could be identified as concentric rings surrounding the plaques with varying degrees of opaqueness (Figure 5-1).

The electron micrographs of Lo5R7ANS, Cp1R7ANS-C2 and Cp1R7ANS-D2 (Figure 5-

2) allowed us to place them in the family Podoviridae, with an icosahedral head and a very short cylindrical tail. The head diameters of Lo5R7ANS, Cp1R7ANS-C2 and

Cp1R7ANS-D2 were estimated to be 60±2 nm, 59±3 nm, and 60±4 nm respectively. The tail length and width of Lo5R7ANS was determined to be 12 nm and 13 nm accordingly.

5.1.4.Characterization of genomes

Based on PFGE results, the genome sizes of phages Lo1R7ANS-B1 and Cp1R7ANS-A1 were approximated to be 32 kb and 142 kb respectively (Figure 5-3). The DNA isolated from phages was digested with several different restriction enzymes to generate restriction profiles. The DNA of Lo1R7ANS-B1, Lo1R7ANS-B2 and Lo5R7ANS was cleaved by TruII, HindIII, SalI, PstI and XbaI (Figure 5-4), whereas these DNA samples were resistant to digestion by SpeI and EcoRI. The genome analysis of Lo5R7ANS revealed that it lacked the sites specific for these enzymes. Furthermore, the restriction profiles of phage Lo1R7ANS-B1, Lo1R7ANS-B2 and Lo5R7ANS DNA exhibited a close similarity, though they were not exactly the same.

139 Table 5-1. Host range of Mesorhizobium phages with 11 different Mesorhizobium strains

Grey squares with a ‘+’ sign indicate lysis, where no lysis is indicated with clear squares with a ‘−’.

A3 C1 C2 D2

A B1 ------A1 Rhizobial Strain - Lo1R7ANS Lo1R7ANS Lo5R7ANS Cp1R7A Cp1R7ANS Cp1R7ANS Cp1R7ANS Cp1R7ANS M. loti R7A + + + + + + + + M. loti R7ANS + + + + + + + + Mesorhizobium spp. R97A − − − + + + + + Mesorhizobium spp. R90A − − − + + + − + Mesorhizobium spp. R16C − − − + − + + + Mesorhizobium spp. R88B − − − + − − − − Mesorhizobium spp. 8KC3 − − − + + + − − Mesorhizobium spp. NZP2037 − − − + + + − − Mesorhizobium spp. NZP2213 − − − − − − − − Mesorhizobium spp. NZP2234 − − − + + + − − Mesorhizobium spp. NZP2298 − − − + + + + −

140 Table 5-2. Host range of Mesorhizobium phages with Rhizobium strains

Grey squares with a ‘+’ sign indicate lysis, where no lysis is indicated with clear squares with a ‘−’.

A3 C1 C2 D2

A B1 ------A1 Rhizobial Strain - Lo1R7ANS Lo1R7ANS Lo5R7ANS Cp1R7A Cp1R7ANS Cp1R7ANS Cp1R7ANS Cp1R7ANS R. leg bv. viciae 3841 − − − − − − − − R. leg bv. viciae VF39SM − − − − − − − − R. leg bv. viciae 248 − − − − − − − − R. leg bv. viciae 306 − − − − − − − − R. leg bv. viciae 336 − − − − − − − − R. leg 3855 − − − − − − − − R. etli CE3 CFN42 − − − − − + − + R. leucaenae CFN299T − − − − − − − − R. tropici CIAT899 T − − − − − − − − R. gal bv. gallicum R602spT − − − − − − − − R. gallicum S014B-4 (6) − − − − − − − − R. gallicum S019B-5 (27) − − − − − − − − R. gallicum S013A-1 (15b) − − − − − − − − Rhizobium spp. S021A-2 (25) − − − − − − + +

141

Figure 5-1. Plaque morphology of phage Lo1R7ANS –A

Plaque morphology of Lo1R7ANS -A on a lawn of M. loti R7ANS, grown on 0.5% TY overlay after 72 hours of incubation at 30°C.

142

Figure 5-2. Transmission electron micrographs of Mesorhizobium loti phages

Transmission electron micrographs of phage Lo5R7ANS (A), Cp1R7ANS-C2 (B) and

Cp1R7ANS-D2 (C) stained with 1% uranyl acetate. Scale bars represent 100 nm.

143

Figure 5-3. Pulsed field gel electrophoresis (PFGE)

DNA from phage Lo1R7ANS-B1 (2), and Cp1R7ANS-A1 (3) were electrophoresed on a

1% agarose gel for 18 hrs at a 1-10s switch time, 6V/cm with a linear ramping factor and an included angle of 120°. Lane 1 contained low range PFG marker.

144

Figure 5-4. Restriction enzyme profiles of DNA isolated from different M. loti phages

DNA from phages Lo1R7ANS-B1 (A), Lo5R7ANS (B), Lo1R7ANS-B2 (C) and

Cp1R7ANS-C2 (D) were digested with Tru1I (2), HindIII (3), SalI (4), SpeI (5), PstI (6)

XbaI (7), EcoRI (8) and BclI (9). Lane 1 contained 1kb DNA ladder.

145

The genome of phage vB_MloP_Lo5R7ANS

Using the sequence data obtained, the genome of the Lo5R7ANS was assembled into a contig of 45,718 bp with an average coverage of 60.51 X. The average G+C content of the genome was 61.1%. The ORF predictions of the genome with RAST and GLIMMER

3 resulted in 63 putative protein-coding genes, which represent approximately 91% of the total genome (Figure 5-5 and Appendix VII). The most frequently used start codon was

ATG with 79.4% (50 ORFs) of predicted ORFs initiating with it, while the frequencies of use of GTG and TTG were 12.7% (12 ORFs) and 7.9% (8 ORFs) respectively. Sequence comparisons of the putative gene products using BLASTP with the PSI-BLAST algorithm allowed us to assign functions to 19 predicted ORFs (30.1%). Thirty-three

ORFs (52.4%) exhibited homology to hypothetical proteins, whereas putative products of

11 predicted genes (17.5%) had no significant similarity to anything available in the

NCBI database. ARAGORN and tRNAscan-SE searches revealed the presence of a single tRNA gene in the Lo5R7ANS genome at 8,569-8,651 bp. The putative tRNA gene is 83 bp in length with an average G+C content of 55.4%. It was predicted to be a tRNA-

Leu with the anticodon TAG (Leu, CTA). The complete annotated genome of phage

Lo5R7ANS (vB_MloP_Lo5R7ANS) is deposited under the accession number

KM199771.

Figure 5-5. Genome arrangement of vB_MloP_Lo5R7ANS

Graphical representation of the organization of rhizobiophage vB_MloP_Lo5R7ANS (Lo5R7ANS) genome generated with Genious 6.1.2 (Drummond et al., 2011). Each predicted ORF is represented by a single arrow and the scale is given in base pairs. 146

147

The genome of phage vB_MloP_Cp1R7ANS-C2

Based on Ion Torrent sequencing data, the genome of Cp1R7ANS-C2 was assembled into a single contig of 43,865 bp with a 60.5-fold average coverage. It has an average

G+C% content of 60.5%. The ORF predictions for the genome revealed the presence of

67 putative protein-coding genes, which occupy almost 89.2% of the total genome

(Figure 5-6 and Appendix VIII). ATG was used as the preferred start codon by the genome with 54 predicted ORFs initiating with ATG, while GTG (8 ORFs) and TTG (5

ORFs) were employed at low frequencies. Based on the sequence similarity at the protein level, putative functions could only be assigned to 16 of the predicted ORFs, while 41 of the predicted gene products exhibited homology to hypothetical proteins present in the

NCBI database. The gene products of remaining 10 ORFs had no homology to previously characterized proteins present in the database. No putative tRNA genes were identified in the genome.

Both Lo5R7ANS and Cp1R7ANS-C2 exhibited sequence similarity to different gene products of various strains of Mesorhizobium (Appendix VII and VIII). This similarity was compelling with Cp1R7ANS-C2 genome where 49 out of 67 predicted gene products

(73%) were related to proteins encoded by different mesorhizobia. Although not that significant, 14 predicted ORFs of Lo5R7ANS-encoded proteins showed similarity to gene products of Mesorhizobium spp (22%). Also, both genomes showed a similar gene arrangement with closely related gene products. To identify similarity between the two genomes, Cp1R7ANS-C2 and Lo5R7ANS genomes were compared using progressive mauve alignment (Figure 5-7), where the average nucleotide sequence identity in the alignment was proportional to the height of the colored region.

Figure 5-6. Genome arrangement of vB_MloP_Cp1R7ANS-C2

Graphical representation of the organization of rhizobiophage vB_MloP_Cp1R7ANS-C2 (Cp1R7ANS-C2) genome generated with Genious 6.1.2 (Drummond et al., 2011). Each predicted ORF is represented by a single arrow and the scale is given in base pairs. 148

Figure 5-7. Alignment of vB_MloP_Cp1R7ANS-C2 and vB_MloP_Lo5R7ANS genomes

Genomes were aligned using Mauve v 2.3.1 with progressive Mauve alignment. The height of the similarity plot indicates the degree of sequence similarity of the aligned regions of the phage genomes.

149

150

5.2 Discussion

5.2.1. General characteristics of M. loti phages

Isolation of phages from soil samples with a history of growing legumes was carried out using strains of M. loti. M. loti is capable of nodulating different Lotus spp. (Kaneko et al., 2000; Sullivan et al., 1995), while different mesorhizobial species including M. ciceri and M. mediterraneum (Nour et al., 1994; Nour et al., 1995) are known to nodulate chickpea (Cicer arietinum) with different nitrogen fixing efficiencies . We have used three different soil samples, which had cultivation history of Lotus spp. and chickpea, to isolate phages infecting mesorhizobia.

The infection capacity of a phage is recognized as an important biological characteristic when characterizing a phage. The host ranges of M. loti phages were determined by screening them against different rhizobial strains using plaque assays. The phage

Cp1R7A-A1 had the widest host range against Mesorhizobium strains tested (Table 5-1).

Mesorhizobium strain NZP2213 (Sullivan et al., 1995) was resistant to all phages tested, while R97A (Sullivan et al., 1995) was identified as the most susceptible host against our phage isolates apart from the trapping host. In general, the phages isolated from soil samples Lo1 and Lo5 (with histories of Lotus spp. growth) were observed to have an extremely narrow host range, with their infection completely limited to the trapping hosts

M. loti R7A and R7ANS (Table 5-1). Phage isolates were also screened against different other rhizobial species such as, R. leguminosarum, R. gallicum, R.tropici and R. etli

(Table 5-2). However, M. loti phages exhibited a very narrow host range against other rhizobial strains as infections were observed only on few instances.

151 Several mesophage isolates were capable of forming clear plaques surrounded with concentric rings of halo zones on TY overlays (Figure 5-1). Moreover, these halo rings were observed to be ‘growing’ upon incubation. Similarly, Pseudomonas putida phage

AF (Cornelissen et al., 2012) and ϕ15 (Cornelissen et al., 2011) produce plaques that are surrounded with a halo zone, where the diameter of the halo zone increases with the increasing incubation time. The presence of growing halo zones with T7-like phage ϕ15 infection has been attributed as an indication of exopolysaccharide (EPS) degradation by the phage associated EPS depolymerase (Cornelissen et al., 2011). However, no previous record of plaques with increasing number of concentric rings of halo zones could be found. The ORF 62 of phage Cp1R7ANS-C2, which has been predicted to encode a protein functioning as a putative lytic enzyme, exhibited a sequence similarity to an R- type pyocin of Mesorhizobium spp. (Sequence identity 65% and similarity 73%)

(Appendix VIII). R-type pyocins are described as bactericidal molecules that resemble non-flexible, contractile tail structure of bacteriophages. Although nothing similar to this was identified in the genome of Lo5R7ANS, it can be suggested that the lytic activity related to this protein may play a role in producing the halo zones.

Based on the TEM observations, all phages of M. loti examined in this study were classified in the order Caudovirales, with icosahedral heads and very short cylindrical tails, characteristic to the family Podoviridae. The head diameters of the phages were estimated to be within the range of 56-64 nm. Previously characterized Mesorhizobium phages have been classified under two families of tailed phages; families Podoviridae and

Siphoviridae. Phages Mlo30, Mam12, Mam20, and A1 were classified as members of the family Podoviridae (Turska-Szewczuk & Russa, 2000; Turska-Szewczuk et al., 2010),

152 while Mlo1, ϕRP1, ϕRP2, ϕRP3, K1, K2 and C1 were placed in the family

Siphoviridae (Małek et al., 2009; Turska-Szewczuk et al., 2010; Wdowiak et al., 2000).

The head diameters of podophages Mlo30, Mam12, Mam20, and A1 were approximated as 56 nm, 58 nm, 60 nm and 52.4 nm respectively. These reported values coordinate with the dimensions determined for the phages examined in this study.

Transduction experiments for phages were carried out for Lo5R7ANS, Cp1R7ANS-C2 and Lo1R7ANS-B1 phages using M. loti R7ArpoN1pFUS2 and M. loti R7AnadCTn5 mutant strains. Transduction efficiencies of phages could not be determined due to the presence of high numbers of spontaneous antibiotic resistant colonies of M. loti R7A wild type strain on control plates. However, the number of colonies formed on test and control plates exhibited no difference, indicating that it was very unlikely that any of our isolated phages had the ability to transduce.

5.2.2. Analysis of genomes

The genome size of phage Lo5R7ANS was determined as 45.7 kb, while the genome of

Cp1R7ANS-C2 was 43.8 kb in length. Based on the PFGE, the genomes of two other phages, Lo1R7ANS-B1 and Cp1R7ANS-A1, were estimated to be 32 kb and 142 kb respectively. Previously reported Podoviridae phages of mesorhizobia shared genomes within the 39- 45 kb range (Turska-Szewczuk et al., 2010) and the calculated genome sizes of siphophages of mesorhizobia fall within the range of 80-103 kb (Małek et al.,

2009). The genome sizes of the two Podoviridae phages Lo5R7ANS and Cp1R7ANS-C2 are within a similar range to that of the earlier described phages, whereas the morphotype of the Cp1R7ANS-A1 with a much larger genome has not been determined in this study.

153 The genomes of phages Lo5R7ANS and Cp1R7ANS-C2 were sequenced using Ion

Torrent technology and assembly of these sequence data resulted in complete genome sequences. The genome of Lo5R7ANS is 45,718 bp in length with an overall G+C content of 61.1%, while Cp1R7ANS-C2 has a G+C rich (60.5%) double-stranded DNA genome, which is 43,865 bp long. Long terminal direct repeats were identified in both genomes, where the respective repeats are 246 bp and 218 bp in length for phages

Lo5R7ANS and Cp1R7ANS-C2. The presence of long terminal direct repeats in the genomes is a commonly observed trait within the Podoviridae phages of T7-like family

(Table 5-3). The sizes of the terminal repeats in mesophages are within a similar range as phages T3 (231 bp), ϕYeO3-12 (232 bp) and K1-5 (234 bp), whereas much longer repeat sequences have been identified in ϕKMV (414 bp) and ϕRsB1 (325 bp).

The genomic organization of Lo5R7ANS and Cp1R7ANS-C2 exhibit a resemblance to

T7-type genome arrangement. Therefore a rough division of the genomes into three regions with early, middle and late genes can be suggested. The first cluster of early genes in T7 genome ends with the gene coding for T7 RNA polymerase (RNAP) and middle genes consist of several genes coding for different functions related to DNA metabolism. The third cluster or the late genes are responsible for coding phage structural proteins and products related to host cell lysis.

154

Table 5-3. Presence of direct terminal repeats (DTRs) in the genomes of T7-like

Podoviridae phages

Length Phage name Host bacterium Accession No: (Reference) of DTR

T7 Escherichia coli 160 bp NC_001604 T3 Escherichia coli 231 bp NC_003298 ϕYeO3-12 Yersinia enterocolitica (s 0:3) 232 bp NC_001271 VpV262 Vibrio parahaemolyticus 138 bp NC_003907 ϕKMV Pseudomonas aeruginosa 414 bp AJ505558 (Lavigne et al., 2003) Sp6 Salmonella typhimurium LT2 174 bp AY370673 (Scholl et al., 2004) K1-5 Escherichia coli 234 bp AY370674 (Scholl et al., 2004) ϕRsB1 Ralstonia solanacearum 325 bp (Kawasaki et al., 2009) Lo5R7ANS Mesorhizobium loti R7ANS 246 bp This study Cp1R7ANS-C2 Mesorhizobium loti R7ANS 218 bp This study

155 RNA polymerase (RNAP)

The gene clusters present in the conserved genome organization of a typical T7-like phage are mainly classified based on the transcription array of the genome. While the class I/ early genes are transcribed by the host RNA polymerases, genes present in class

II and III (middle and late genes) are transcribed by a phage-encoded RNAP. The host

RNAP generally transcribes a part of the genome (nearly 20% of the total genome) and the rest of the genes will be transcribed by the phage using its own transcription system

(Chen & Schneider, 2005). The phage-specific RNAP recognizes its cognate promoters in the phage genome to carry out the transcription and also plays a role in translocating phage DNA into the host cell (Chen & Schneider, 2005; Li et al., 2013). The ORF 20 of the phage Lo5R7ANS and ORF 25 of the phage Cp1R7ANS-C2 were assigned as genes coding for putative DNA-dependent RNA polymerases. Furthermore, the putative RNAP of phage Cp1R7ANS-C2 exhibited a strong sequence similarity towards the conserved domain of T3/T7-like RNAP.

The phage-specific RNAP, which is considered as a characteristic feature of the subgroup of T7-like phages, in conjunction with their cognate promoters, are suggested to be used as a tool in classifying the members within the subgroup (Chen & Schneider, 2005). A phylogenetic tree was constructed using the amino acid sequences of phage RNAPs from selected thirteen members of T7-like phages including Cp1R7ANS-C2 and Lo5R7ANS

(Figure 5-8). The RNAPs of Cp1R7ANS-C2 and Lo5R7ANS were closely related to each other, while the cluster of R. etli phages RHEph02, RHEph03, RHEph08 and RHEph09 can be identified as the other most closely related members. RNAPs from SP6, K1-5 and

ϕKMV are more distantly related to Cp1R7ANS-C2 and Lo5R7ANS than the others.

156 Based on the recently suggested classification system within the family Podoviridae

(Lavigne et al., 2008), all the phages used to analyze the phylogenetic relationship based on their RNAPs, fall under the subfamily Autographivirinae. The phages SP6 and K1-5 belong to the genus ‘SP6-like phages’, while ϕKMV is a member of the genus ‘ϕKMV- like phages’. Coliphages T3, T7, vibriophage VP4 and Yersinia phage ϕYeO3-12 were classified under the genus ‘T7-like phages’.

To compare the RNAPs of Cp1R7ANS-C2 and Lo5R7ANS phages, the putative protein sequences were aligned using ClustalX v.2.1 Larkin et al., 2007b) (Figure 5-9). The two sequences exhibited 33% identity (280/ 840 residues), whereas the sequence similarity was 52% (437/840 residues) with each other.

DNA replication, repair and processing

In the genome of phage Cp1R7ANS-C2, five major genes assigned with functions related to DNA metabolism can be identified in a loosely arranged cluster downstream of the gene coding for RNAP. ORF 29 was predicted to encode a putative ssDNA binding protein, whereas functional predictions for ORF 30 assigned it to be a gene coding for phage endonuclease. The putative endonuclease (115 residues) exhibited homology to phage endonuclease class I, which is capable of resolving four-way DNA junctions. A helicase/primase function was assigned to ORF 38, while DNA polymerase (Family A) activity was attributed to ORF 42. The ORF 44 was predicted as a gene coding for a putative phage exconuclease with 5’-3’ exonuclease activity.

157

Figure 5-8. Neighbor-joining phylogenetic tree of RNA polymerase of selected

Podoviridae phages

Phylogenetic tree was constructed based on the sequence similarity of RNA polymerases at the amino acid level. Multiple sequence alignment and generation of neighbor-joining tree was performed using ClustalX v.2.1 Larkin et al., 2007b), while the tree was modified with MEGA 5.1 (Tamura et al., 2011). The bootstrap values are indicated for

1000 trials and RNAPs of M.loti phages are shown with a red diamond.

158

159

Figure 5-9. Sequence alignment of Cp1R7ANS-C2 and Lo5R7ANS T7-like RNA polymerases.

The amino acid sequences of putative T7-like RNA polymerases of Cp1R7ANS-C2 and

Lo5R7ANS were aligned using ClustalX v2.1 (Larkin et al., 2007). The residues are coloured based on the default ClustalX coloring scheme where it depends on both residue type and the pattern of conservation within a column.

160

Similarly, genes putatively encoding a ssDNA binding protein (ORF 23), a DNA primase/helicase (ORF 31), a DNA polymerase (Family A) (ORF 34) and a 5’-3’ exonuclease (ORF 36) were identified organized in a loose cluster within the Lo5R7ANS genome. The other notable functional prediction related to the cluster of genes involved in DNA replication, repair and processing could be attributed to ORF 12 of Lo5R7ANS and ORF 19 of Cp1R7ANS-C2, which encode putative tyrosine recombinase family integrases. In both genomes, the gene coding for the putative integrase was localized upstream of the phage RNAP within the probable region of the early genes. The presence of these integrase genes means that it is likely that both phages are temperate phages, though we do not yet have any evidence of lysogeny.

The two ORFs located immediately downstream of the integrase-coding gene of the phage Lo5R7ANS, encode a restriction endonuclease (REase) BglII (ORF 13) and an adenine-specific DNA methyltransferase (MTase) (ORF 14). These two overlapping, adjacent genes are transcribed in the same direction, but in different reading frames, by the phage. The REase BglII belongs to the type II restriction-modification system (R-M system), where in most cases a separate REase and an MTase are required for the functioning of the system (Tock & Dryden, 2005). Additionally, the expression of REase and MTase of type II system should be tightly regulated to avoid any lethality against its own genome. In many type II R-M systems, a product of a closely located or tightly linked gene to the R-M gene cluster accomplishes this purpose. The neighboring gene located upstream of the putative integrase of Lo5R7ANS genome (ORF 11), which exhibited 51% sequence identity towards a transcriptional regulator of Sinorhizobium meliloti RU11/001, might play a role in regulating the expression of REase and MTase.

161

The Staphylococcus aureus phage ϕ42 encodes the R-M system Sau421, which has been classified as type IIb REase (Dempsey et al., 2005). The R-M system of ϕ42 is encoded by two adjacently located, overlapping genes, which are transcribed in different reading frames. The in vitro analysis has revealed that the presence of Sau421 in the phage genome allows it to express a different phenotypic trait in hosts lysogenized by ϕ42. The lysogenic conversion of the host by ϕ42 was shown to make it resistant to infection by 23 other exogenous phages. Therefore it can be suggested that the presence of type II R-M system, close to the integrase gene of the Lo5R7ANS genome, may have a similar role.

The ORF 16 of Lo5R7ANS encodes a putative RtcB, which is characterized as a tRNA- splicing ligase (Tanaka et al., 2011). RNA ligases can be involved in repairing, splicing or editing RNA molecules through ligating broken RNAs or by altering their primary structure. The well-known phage T4-RNA ligase in conjunction with another T4-encoded enzyme polynucleotide kinase, act to repair damaged tRNA molecules in a mechanism analogous to tRNA splicing (Ho et al., 2004; Schmidt, 1985). The T4 infection triggers an anticodon nuclease produced by the host, resulting in the cleavage of tRNA-Lys. To avoid the depletion of tRNA-Lys in the cell and the subsequent loss of phage infection, phage-encoded RNA ligase and polynucleotide kinase operate together in repairing the damaged tRNA molecules, hence successfully evading the host defense mechanism (Ho et al., 2004). The putative tRNA-splicing ligase may have a similar function in

Lo5R7ANS infection and it might be interesting to investigate this experimentally.

162

Phage structure and assembly

In both Lo5R7ANS and Cp1R7ANS-C2 genomes, the first identifiable gene related to virion structure and assembly is the putative phage head-to-tail joining protein. The head- to-tail connector protein, which serves as a portal for the DNA translocation into the prohead is generally 510-555 amino acids in length. The ORF 48 of Cp1R7ANS-C2 (550 residues) and ORF 40 of Lo5R7ANS (530 residues) were assigned as encoding putative head-to-tail joining proteins. The gene products of ORF 50 in Cp1R7ANS-C2 and ORF

43 in Lo5R7ANS were identified as putative T7-like capsid assembly proteins based on the sequence similarities. Downstream to the capsid assembly protein-coding gene, a gene encodes for a major capsid protein can be located. In Cp1R7ANS-C2, ORF 53 encodes for a capsid protein, while ORF 50 in Lo5R7ANS was assigned the same function. The capsid protein-coding genes in both genomes are immediately followed by two genes, which were predicted to encode proteins that function as T7-like tail tubular protein A and B. The gp11 and gp12 of coliphage T7, which are functional as tail tubular proteins A and B respectively, are known to be characteristic structural components of the short non-contractile tails of the family Podoviridae. The tail tubular protein A and B of Cp1R7ANS-C2 (ORF 54 and 55) and Lo5R7ANS (ORF 51 and 52) exhibited sequence similarities to the tubular proteins of R. etli phage RHEph01 albeit with low significance. Another important component of the complex of tail proteins is the tail fiber protein. The tail fiber protein in T7 (gp17) is known to be involved in the primary attachment of the phage via host cell lipopolysaccharides. The functional predictions for ORF 60 of Cp1R7ANS-C2 and ORF 57 of Lo5R7ANS assigned them to be genes encoding T7-like tail fiber protein. Both ORFs shared sequence similarity to the

163

conserved domain of T7 tail fiber protein family. Additionally, a gene coding for a putative internal virion protein (ORF 56) was identified in the Lo5R7ANS genome. The short tail of members of family Podoviridae is too short to stretch through the cell envelope during the process of ejecting the phage DNA into the host cell. Therefore the phage T7 and its relatives are known to form an internal cylindrical structure composed of three protein components (gp14, gp15 and gp16) within the phage head, which can be extended into the cell envelope of the host during the initiation of infection. Moreover, these internal core proteins play roles in virion morphogenesis at later stages (Chang et al., 2010). The internal virion protein gp16 of T7 is also known to have a catalytic activity in cleaving glycosidic bonds in the peptidoglycan layer. The predicted product of

ORF 56 of Lo5R7ANS, which was assigned to function as a putative internal virion protein, also exhibited a sequence similarity to a transglycosylase domain protein of

Sinorhizobium meliloti RU11/011 (Appendix VII).

A gene coding for a putative DNA maturase B protein was also identified in both

Lo5R7ANS (ORF 63) and Cp1R7ANS-C2 (ORF 1) genomes, although the localization of this gene in the genomes was different in each case.

Host cell lysis

The dsDNA phages typically achieve host cell lysis by the degradation of host cellular peptidoglycan with a phage-encoded endolysin, which gains access through the cytoplasmic membrane with the help of another enzyme coded by the phage, holin. Two potential genes involved in functions related to host cell lysis were identified in the

Cp1R7ANS-C2 genome. ORF 58 exhibited a sequence similarity to a putative tail-fiber/ lysozyme of Acinetobacter phage IME-AB2 and a lambda-lysozyme of Thioflavicoccus

164

mobilis 8321, while ORF 62 contained a lysozyme-like domain. The gene product of

ORF 44 of phage Lo5R7ANS was predicted as a putative N-acetylmuramoyl-L-alanine amidase with a conserved domain related to peptidoglycan recognition proteins (PGRPs).

5.3 Author’s contributions

I have performed all the experiments mentioned in this chapter with the exceptions of the following: Suryakarthiga Ganesan and Rémy Gavard, two undergraduate students, carried out the phage isolation under my supervision. Rémy Gavard helped in performing host range studies, phage DNA isolation and transduction experiments. Benjamin Perry

(Dr. Christopher Yost’s lab, University of Regina) carried out the sequencing and assembly of Lo5R7ANS and Cp1R7ANS-C2 genomes.

165

Chapter Six: Characterization of temperate rhizobiophage vB_RleM_PPF1 and its

site-specific integration into the rhizobial host genome

The presence of prophages in a genome can contribute to increase bacterial fitness and ecological success in an environment that contains closely related phages (Canchaya et al., 2003). The Sinorhizobium meliloti phage 16-3 is the first genetically well-studied temperate phage of rhizobia (Dorgai et al., 1983). It is able to integrate into its host chromosome with high efficiency by site-specific recombination (Semsey et al., 1999).

We have isolated phage PPF1 from a lysogenized strain of Rhizobium leguminosarum F1.

The wild type Rhizobium leguminosarum F1 strain does not have the phage PPF1 in its genome, however it was isolated from an accidentally lysogenized strain of Rhizobium leguminosarum F1. Although we assume that, the Rhizobium leguminosarum F1 strain may have been lysogenized by this phage during handling of various soil samples for phage isolation, the actual origin of the PPF1 is unknown. It is also possible that the original isolate of strain F1 contained a mixed population with some cells containing

PPF1 and others not containing the prophage. Nonetheless, through our experiments, we have identified that PPF1 is a temperate phage, which is capable of efficiently lysogenizing the R. leguminosarum strain F1, and have also shown that it can be induced from its lysogenized host using UV irradiation.

6.1 Results

6.1.1 General characteristics and host range of PPF1

The rhizobiophage PPF1 was isolated using the R. leguminosarum strain F1. The plaques produced by PPF1 on 0.5% TY overlay of R. leguminosarum F1, after 24-48 hrs of

166

incubation, were relatively small with an opaque lysis. Based on the electron microscopy, PPF1 was characterized as a member of the family Myoviridae, with a contractile tail and an icosahedral head. The head diameter was estimated to be 83±5 nm, whereas the phage tail length was determined as 130±5 nm (Figure 6-1).

The spectrum of infectivity of PPF1 was screened using plaque assays. Agar overlays of different rhizobial strains, including previously characterized strains and some field isolates, were used to determine the host range of the phage PPF1. Some of the selected data of the PPF1 host range studies are shown in Table 6-1. However, PPF1 was not infective against any of the characterized rhizobial strains. Except for its host R. leguminosarum F1, five more rhizobial field isolates were observed to be sensitive to the

PPF1 lytic infection: S023A-4, S008A-5, S016B-1, S012B-3 and S010A-2. Moreover, all of these infections formed very turbid/opaque plaques on the host rhizobial lawn.

6.1.2 Genome characterization of PPF1

The PPF1 genome, which was estimated to be 53.5 kb based on the results of PFGE

(Figure 6-2), was digested with restriction enzymes with different methylation sensitivities including: SpeI, NdeI, BclI, PstI, SmaI, NsiI, SalI, NdeI, KpnI, TruII and

BamHI (Figure 6-3). PPF1 genomic DNA didn’t exhibit any remarkable resistance to restriction digestions. A faint and extra band running at ~110 kb could be observed in the

PFGE of the PPF1 (Figure 6-2). This could be a phage DNA dimer of PPF1 or contaminating DNA that was present in our phage DNA preparation.

167

Figure 6-1. Transmission electron micrographs of temperate phage PPF1

Particles were negatively stained with 1% uranyl acetate. Bars in the micrographs represent 100 nm.

168

Table 6-1. Host range of phage PPF1 with 13 previously characterized strains of rhizobia and 17 indigenous rhizobial strains

Grey squares with a ‘+’ sign indicate lysis, where no lysis is indicated with clear squares with a ‘−’.

Rhizobial Strain* Rhizobial Strain* PPF1 PPF1

R. leg F1 + R. gallicum S013A-1 (15b) − R. leg bv. viciae 3841 − R. gallicum S019B-5 (27) − R. leg bv. viciae VF39SM − Rhizobium spp. S023A-4 (1) + R. leg bv. viciae 248 − Rhizobium spp. S022A-2 (2) − R. leg 3855 − Rhizobium spp. S018A-3 (3) − R. leg bv. viciae 336 − Rhizobium spp. S027B-3 (4a) − R. leg bv. viciae 306 − Rhizobium spp. S018A-1 (5) − R. leg F3 − Rhizobium spp. S001B-3 (7) − R. leg bv. phaseoli 8401 − Rhizobium spp. S008A-5 (8) + R. leg bv. trifolii W14 − Rhizobium spp. S012A-1 (9) − R. etli CE3 − Rhizobium spp. S016B-1 (10) + S. meliloti ML2 − Rhizobium spp. S012B-3 (20) + S. meliloti AK631 − Rhizobium spp. S010A-2 (23a) + R. gallicum S014B-4 (6) − Rhizobium spp. S010B-1 (24) − R. gallicum S013A-1 (15b) − Rhizobium spp. S021A-2 (25) −

* The indigenous strains were classified according to their plasmid profiles and the number within brackets indicates their profile group for easy identification.

169

Based on 454-pyrosequencing data, the PPF1 genome was assembled into one contig of

54,506 bp with a 34-fold average coverage. The genome has an average G+C content of

61.9%. The ORF predictions of the PPF1 genome with RAST and Prodigal revealed the presence of 94 putative protein-encoding genes, which approximately adds up to 95.4% of the total genome. ATG was identified as the most preferable start codon of the genome with 84 predicted ORFs are initiating with it, while GTG (9 ORFs) and TTG (1 ORF) were also detected as start codons albeit in low frequencies. The putative gene products of the predicted ORFs were compared with available protein sequences in the NCBI database using BLASTP with the PSI-BLAST algorithm and e-values greater than 10-4 was used to classify a significant BLASTP hit. Based on the sequence similarity, 70

ORFs (74.5%) shared a significant similarity to the previously characterized gene products, whereas 24 ORFs (25.5%) exhibited no homology to anything present in the database (Figure 6-4 and Appendix IX). Putative functions were only assigned to 24 predicted ORFs (~25.5%), while 40 (~48.9%) predicted protein-encoding genes showed sequence similarities to hypothetical proteins. The absence of any complete tRNA genes in the PPF1 genome was confirmed by both tRNAscan-SE and ARAGORN searches.

170

Figure 6-2. Pulsed field gel electrophoresis (PFGE) of DNA extracted from phage

PPF1

DNA from phage PPPF1 (2) was electrophoresed on a 1% agarose gel for 18 hrs at a 1-10 s switch time, 6V/cm with a linear ramping factor and an included angle of 120°. Lane 1 contained low range PFG marker.

171

Figure 6-3. Restriction enzyme profiles of digested PPF1 phage DNA

DNA from phage PPF1 was digested with different restriction enzymes. Lanes 1. 1kb

DNA ladder (Fermentas), 2. XbaI, 3. SpeI, 4. BclI, 5. HindIII, 6. PstI, 7. SalI, 8. NsiI, 9.

NdeI, 10. KpnI, 11. BamHI, 12. SalI, 13. SmaI, 14. Tru1I and 15. EcoRI

Figure 6-4. Genome arrangement of rhizobiophage vB_RleM_PPF1 (PPF1)

Graphical representation of the organization of the L338C rhizobiophage genome generated with Geneious v6.1.2 (Drummond et al.,

2011). A single arrow represents each predicted ORF and the scale is given in base pairs. 172

173

6.1.3 Isolation of lysogenized strains of R. leguminosarum F1 and prophage

induction from the lysogen

Two methods were used to isolate lysogenized strains of R. leguminosarum F1 (R. leg

F1) with phage PPF1. With these methods, either a broth culture or a rhizobial lawn was infected with the phage and incubated for a prolonged period of time (>24 hours) at 30°C.

Possible lysogens were isolated after at least three consecutive single colony isolation steps and screened for lysogenization using plaque assays. The strains that exhibited low plaquing efficiency with phage PPF1 compared to the wild type R. leguminosarum F1 strain were selected for further screening with Southern hybridization.

Total genomic DNA samples of possible lysogens were electrophoresed on an agarose gel, after digesting them with EcoRI (Figure 6-5 A). The DNA was then transferred on to a nylon membrane for hybridization. The blot was hybridized with a digoxigenin (DIG)- labeled total genomic DNA probe of phage PPF1 and hybridization signals were detected by chemiluminescence (Figure 6-5 B). No signal was detected for the wild type rhizobial strain; whereas clear signals were detected on the lanes containing the DNA from potential lysogenic strains of R. leg F1 (R. leg F1-L) (Figure 6-5 B. lanes 4-11).

Furthermore, the patterns of the signals were similar to that of the lane that contained the

EcoRI digested PPF1 phage DNA, which suggested that the R. leg F1-L strains were carrying the PPF1 genome.

174

Figure 6-5. Southern blot hybridization

DNA from phage PPF1 (2), R. leguminosarum F1 (3), eight independently lysogenized strains of R. leguminosarum F1 (4-11), R. leguminosarum VF39SM (12) and a lysogenized strain of R. leguminosarum VF39SM (13) were digested with EcoRI and electrophoresed on 1% agarose gel (A). The DNA was then transferred onto a nylon membrane and hybridized with whole genomic DNA probe of PPF1 (B). Lane 1 contains

1 kb DNA ladder.

175 Similarly, the ability of PPF1 to lysogenize R. leguminosarum bv. viciae VF39SM was confirmed by detecting hybridization signals with genomic DNA of the lysogen, where the signals were clearly absent with wild type genomic DNA. Interestingly, PPF1 was unable to form any visible plaques on a lawn of R. leguminosarum bv. viciae

VF39SM. However it is evident that although it does not form visible plaques with R. leguminosarum bv. viciae VF39, PPF1 is still capable of lysogenizing this host.

6.1.4 Identification of the attachment sites in PPF1 and R. legumonisarum F1

genomes

For the identification of the attachment sites in PPF1 and R. legumonisarum F1 genomes,

R.leg F1-L genome was sequenced at Funomics Global Inc. using Illumina technology.

The draft genome sequence of the R.leg F1-L was then annotated using the RAST annotation server (Aziz et al., 2008) and the contigs with regions containing phage- related proteins were aligned with the temperate phage PPF1 genome sequence using

Mauve multiple genome alignment software (Darling et al., 2004) to identify the R. leg

F1-L contig with the desired sequence. The confirmation of the probable att sites identified by in silico methods was then verified in vivo by amplifying the regions of attachment at both 5’ and 3’ ends of phage integration (Figure 6-6). Furthermore, to confirm the site-specific integration by PPF1 integrase, 5’ and 3’ attachment regions were amplified in several independently isolated R. leg F1-L strains and verified using subsequent sequencing.

176

Figure 6-6. Agarose gel (1%) showing the PCR products of the 5' and 3' regions of the phage integration into the R. leguminosarum F1 genome

Lanes 1 and 8 contain 1kb DNA ladder. Amplified PCR products with genomic DNA templates of R. leg F1 (lanes 2. and 9.) and 5 different independent isolates of R. leg F1 lysogens (lanes 3-7 and 10-14) were run on the gel. Expected sizes of the PCR products were 387 bp and 645 bp for 5’ end and 3’end attachment regions respectively.

177

Figure 6-7. The sequence alignments of probable attachment (att) sites of phage

PPF1 and its bacterial host Rhizobium leguminosarum F1

The att site of PPF1 has considerable sequence similarity (100%) to the att site of 16-3 in

S. meliloti 41. Putative tRNA-Pro (CGG) genes are indicated in grey arrows, whereas predicted att core (50 bp) regions of PPF1 att site are boxed with red dash lines. The att core of 16-3 attachment (51 bp) is underlined.

178

Figure 6-8. The intergenic region between the putative ligase and integrase genes in the phage PPF1 genome showing the presence of direct repeats

Each 23 bp direct repeat (DR1) contains two identical direct repeats (DR2-11 bp) within it. The 50 bp homology region is indicated with a purple box.

179

Figure 6-9. Schematic representation of the secondary structure of the putative tRNA-Pro (CGG)

The 50 bp homology region (predicted att core region) is underlined. The arrow indicates the site where phage insertion occurs.

180

The integration of the phage was observed to be occurr at 154 bp upstream of the putative integrase gene in the PPF1 genome. Also, the phage integration site overlaps with a putative proline-tRNA (CGG) gene in the R. leg F1-L genome, with the 3’ end of the tRNA gene constituting a 44 bp sequence region obtained from the phage genome

(Figure 6-9). Furthermore, a homologous sequence of 50 bp, which consists of this 44 bp sequence followed by another 6 bp, was identified in the bacterial genome, flanking the

3’ end of the prophage genome sequence. This 50 bp homologous sequence, which occurs within the intergenic region between putative ligase and integrase genes of the

PPF1 phage genome (28,855- 28,904 bp), is most likely involved in the strand exchange during the phage integration.

The 50 bp sequence that occurs commonly within the predicted att sites of both phage

PPF1 and its rhizobial host F1 genome was also compared against the NCBI data base.

These searches revealed that the predicted core sequence in the att region of the PPF1 attachment share sequence similarity with the 51 bp att core region of the targeted attachment site of phage 16-3 of Sinorhizobium meliloti 41 (Figure 6-7).

6.2 Discussion

PPF1 is a temperate rhizobiophage that infects R. leguminosarum strain F1 and it is capable of efficiently lysogenizing its rhizobial host F1. Based on the electron microscopy, PPF1 belongs to the order Caudovirales and the family Myoviridae, with a contractile tail and an icosahedral head. The head diameter and tail length of PPF1 were estimated to be 83±5 nm and 130±5 nm respectively (Figure 6-1). The previously characterized rhizobiophage P10VF was included under the family Myoviridae, with

181 similar dimensions. The head diameter and tail length of P10VF were determined to be 85±6 nm and 122±7 nm (Restrepo, 2012). R. etli phages RHEph04, RHEph05 and

RHEph06 also belong to the same family of phages with an average head diameter of 60 nm and average tail length of 92 nm (Santamaria et al., 2014)

To determine the host range of PPF1, it was screened against a range of rhizobial strains; including previously characterized strains and some field isolates (Table 6-1). However,

PPF1 was only capable of forming visible plaques on a limited number of strains, exhibiting a very narrow host range. It was not visibly lytic against any of the characterized rhizobial strains. Nevertheless, a few rhizobial field isolates were observed to be sensitive to the PPF1 lytic infection with the formation of very turbid/opaque plaques on the host rhizobial lawn.

6.2.1 Genome analysis

The genome size of the phage PPF1 was estimated as 56 kb, based on the results obtained form the PFGE. The DNA of PPF1 did not exhibit any remarkable resistance to restriction digestion. Using the 454-pyrosequencing data, the genome of PPF1 was assembled into a single contig of 54,506 bp with a 34-fold average coverage. The average

G+C content of the phage was estimated as 61.9%. Generally, bacteriophages harbor genomes that are A+T richer than their respective bacterial hosts. This increase in AT content in phage genomes may vary slightly in different types of phages with an average value of 4% (Rocha & Danchin, 2002). However it has been observed that the temperate phage genomes tend to resemble the composition of their host genome with a much smaller A+T deviation (about 1.4%). The previously characterized rhizobiophages; R.

182 leguminosarum bv. viciae 3841 phage L338C (Restrepo, 2012) and Sinorhizobium meliloti phage 16-3 (Semsey et al., 1999), share genomes with a G+C content of 59% whereas the G+C composition of S. meliloti phage PBC5 is 61.5%. The closely related members of the genus Rhizobium are known to possess genomes that have an average

G+C content of 60-62%. The average G+C compositions of the genomes of R. leguminosarum bv. viciae strain 3841 and R. leguminosarum bv. trifolii strain WSM

2304 fall within this range with the values of 60.9% and 61.2% respectively. Although the exact genome composition of the R. leguminosarum strain F1 is unknown, it can be suggested that the similarity in genome composition between PPF1 and host strains may facilitate establishment of the lysogenic state.

The PPF1 genome contains 94 putative protein-encoding genes and functional predictions for the annotated gene products were predominantly associated with phage morphogenesis.

DNA packaging and phage morphogenesis

The typical DNA packaging machinery of ds DNA phages is an ATP driven process, which mainly involves two non-structural components; large terminase subunit with packaging ATPase activity and small terminase subunit that mediates the specific recognition of phage DNA (Casjens, 2011; Rao & Feiss, 2008). Contrarily, phages of the family φ29 use a single protein component and an RNA hexamer molecule called pRNA to accomplish their DNA packaging (Guo et al., 1987; Shu & Guo, 2003). The genome- wide functional predictions of the PPF1 phage genome identified only a single gene

183 coding for a putative terminase subunit (ORF 1), suggesting the possible presence of this atypical DNA packaging motor in it.

More often than not, the tailed phages translocate their DNA into a prohead or procapsid.

The assembly of this phage prohead calls for the functioning of three main components: a portal protein, scaffolding protein and a major capsid protein. The portal protein creates a ring at the portal vertex of the prohead that serves as the docking site for terminases during DNA translocation (Rao & Feiss, 2008). Adjacent to the gene coding for the terminase large subunit, a protein-coding gene containing a putative conserved domain of the lambda family phage portal protein was identified in the PPF1 genome (ORF 3).

Apart from that, two other genes coding for structural proteins that are associated with head morphogenesis were located in the close proximity of the gene encoding the putative portal protein. ORF 7 and ORF 8 were assigned as encoding a putative head decorative protein and the major capsid protein respectively. Aside from these structural proteins, capsid maturation in dsDNA viruses requires a proteolytic cleavage that normally is achieved by a prohead protease. The functional prediction of ORF 5 assigned it to be a gene coding for a putatative periplasmic serine protease. Phage prohead proteases are classified under four main families including two serine protease families

S14 and S49 (Liu & Mushegian, 2004). The serine protease-encoding gene of PPF1 shows a strong sequence similarity to the conserved domain signal peptide peptidase A

(SppA)- S49 peptidase family with an e-value of 3.03e-74, whereas it was also associated with the conserved domain of S14 family-ClpP protease class with an e-value of 1.04e-

39.

184

The unique contractile tail structure of the phages of the family Myoviridae is noteworthy, with several essential elements. The contraction of the tail sheath that covers the rigid tail tube in the members of this family causes the tail tube to push itself into the host cell membrane resulting the subsequent phage infection (Aksyuk et al., 2011;

Leiman et al., 2004). The genes responsible for tail morphogenesis in PPF1 can be identified loosely arranged together within the region that stretches from ORF 13 to ORF

24 in the genome. Based on the functional predictions, the PPF1 genome encodes 9 putative protein products that play roles in tail morphogenesis including a tail tape measure protein (ORF16), a tail/DNA circulation protein (ORF 17), a baseplate assembly protein (ORF 20) and a baseplate J-like protein (ORF 23). The tail sheath and tail tube proteins, which can be considered as the exclusive components of the tail machinery of the family Myoviridae, were predicted to be encoded by ORF 13 and ORF 14 of the

PPF1 genome respectively. Additionally to this tail gene cluster, a remotely located gene coding for a putative minor tail protein (ORF 94) can be also identified. Furthermore, all the genes that are assigned functions associated with tail morphogenesis exhibit sequence similarities to a phage sequence within the Hoeflea phototrophica bacterial genome

(Appendix IX).

DNA replication, repair and processing

Genes required for DNA replication, repair and processing were identified, scattered across the PPF1 genome. ORF 42 was predicted to code for a putative ATP-dependent

DNA ligase with strong sequence homology to several rhizobial putative DNA ligases at the amino acid level. A probable site-specific integrase/ recombinase activity was

185 assigned to the ORF 43. The protein product of ORF 43 shared the conserved catalytic domain of the site-specific tyrosine recombinase family. The phylogenetic comparison of the PPF1 integrase with 37 previously characterized phage integrases clustered it within known site-specific tyrosine recombinase family integrases (Figure 6-

10). Based on this neighbor-joining tree, the integrase of well-known temperate rhizobiophage 16-3 was identified as closely related to the integrase of PPF1.

The ORF 87 was predicted as a gene coding for a putative DNA methylase with a cytosine-C5 specific DNA methylases conserved domain, whereas an adenine-specific methyltransferase activity was assigned to the following gene (ORF 88). Cytosine- specific DNA methyltransferases (Dcm) specifically recognize the cytosine residues in a

DNA molecule and covalently transfer a methyl group to the C-5 of the cytosine ring, while methylation of the N-6 in adenine is catalyzed by adenine-specific methyltransferases (Dam) (Low et al., 2001). Such modifications to the genome may affect the recognition sequences of certain restriction enzymes, thus effectively altering the enzymatic cleavage. However, phage DNA can be digested with BclI (Figure 6-3), the restriction sites of which can be blocked due to Dam methylation.

Host cell lysis

Non-filamentous bacteriophages typically release the progeny particles from an infected bacterial cell by inducing host cell lysis. This process, which concludes the lytic life cycle of a phage, can be achieved using two different mechanisms. Bacteriophages with ssRNA or ssDNA genomes accomplish this purpose by inhibiting a given step of peptidoglycan synthesis hence disrupting bacterial cell wall synthesis; whereas dsDNA phage-mediated

186 cell lysis involves a lytic system that enzymatically cleaves peptidoglycan. A classical lytic system contains: an endolysin, a phage-encoded muralytic enzyme targeting the four major bonds in peptidoglycan macromolecules and a holin which allows the percolation of endolysins into the periplasmic space allowing their access to the substrate. With reference to their enzymatic specificity, endolysins are divided into five main categories

(Borysowski et al., 2006): (i) N –acetylmuramidases, (ii) endo-β -N - acetylglucosaminidases, (iii) lytic transglycosylases, (iv) endopeptidases and (v) N - acetylmuramoyl-L -alanine amidases. Based on the sequence comparisons, lytic murein transglycosylase activity was assigned to ORF 6, whereas ORF 33 was predicted to encode a putative N-acetylmuramyl-L-alanine amidase. Lytic transglycosylases specifically attack the glycosidic bond between the amino sugars N-acetyl muramic acid and N-acetyl glucosamine in the peptidoglycan heteropolymer (Loessner, 2005). ORF 6 showed sequence similarity to the lytic murein transglycosylase of Agrobacterium vitis

S4 with an e-value of 2e-16. The PSI-BLAST and Pfam searches failed to detect any affiliated conserved domains to this protein-coding gene. N-acetylmuramyl-L-alanine amidases are characterized as hydrolases that cleave the amide linkage between the N- acetylmuramyl residues and stem peptides (Borysowski et al., 2006; Loessner, 2005;

Pastagia et al., 2013). Apart from the hydrolytic activity, phage lysins also possess domains that function in substrate recognition (Pastagia et al., 2013). The predicted product of ORF 33 of the PPF1 genome was associated with two conserved domains: amidase_2 family that includes amidases with N-acetylmuramyl-L-alanine amidase enzymatic activity and PGRP (Peptidoglycan recognition proteins) superfamily, which contains receptors for the recognition and binding to the peptidoglycans in the bacterial

187 cell wall.

In addition to these putative endolysin-encoding genes that can be generically included into the lysis module of the phage, ORF 32 of the PPF1 genome, which was predicted as a gene coding for a conserved hypothetical protein, may have a function in host cell lysis.

This gene, whose predicted product exhibited sequence similarity to a hypothetical protein in Paenibacillus elgii, was also associated with the conserved domain SGNH- hydrolase with a marginal e-value (7.83e-05). This family of hydrolases includes a variety of lipases and esterases, hence may play a role in the host cell lysis during the phage infection. Interestingly, no gene in the PPF1 genome was conclusively identified to code for a holin protein.

Prophage genome arrangement and integration into the host chromosome

To confirm that the potential lysogenic strain of R.leg F1 (R.leg F1-L) carries the PPF1 genome, Southern hybridization with an EcoRI-digested whole genomic DNA probe of

PPF1 was used. As shown in Figure 6-5 B, no signal was detected with the genomic

DNA of R.leg F1, whereas a clear signal was observed for the lane with R.leg F1-L genomic DNA. Moreover, the similar restriction patterns of the phage and the lysogen suggested that the R.leg F1-L carried PPF1 as a prophage. Additionally, the phage PPF1 was capable of lysogenizing the R. leguminosarum bv. viciae VF39SM (Figure 6-5 B).

188

Figure 6-10. A Neighbor-joining phylogenetic tree showing the phylogenetic relationship between the amino acid sequences of 39 different phage integrases.

The boot-strap values are indicated for 1000 trials and rhizobiophage integrases are shown with a red diamond. The tree was conducted with ClustalX 2.1 (Larkin et al.,

2007) and modified with MEGA 5.1 (Tamura et al., 2011).

189

The region containing the attachment sites on phage PPF1 and R.leg F1 was identified by analyzing the draft genome sequence of R.leg F1-L. The site-specificity of PPF1 integrase was confirmed by amplifying 5’ and 3’ regions of phage attachment in several independently lysogenized F1 strains (Figure 6-6). The integration of the phage genome into the host chromosome occurs at the att site that is located 154 bp upstream of the putative site-specific integrase gene. Further analysis of the integration site indicated that it overlaps with a putative proline-tRNA (CGG) gene, with the 3’ end of the tRNA gene constituting a 44 bp sequence region obtained from the phage genome (Figure 6-9). A 50 bp sequence identical to this region of the phage genome, and consisting of this 44 bp sequence followed by another 6 bp, was identified in the bacterial genome, flanking the

3’ end of the prophage genome sequence. By having this homologous region of 50 bp, the phage can reconstitute the tRNA gene upon integration, hence maintaining the integrity of the gene function. A considerable number of temperate phages use site- specific integration to lysogenize their host bacterium, where the target site often overlaps with a putative tRNA gene of the host chromosome (Williams, 2002).

Rhizobium leguminosarum phage ϕU (Uchiumi et al., 1998), Sinorhizobium phage 16-3

(Semsey et al., 2002), Mycobacterium phages L5 (Lee et al., 1991) and Ms6 (Freitas-

Vieira et al., 1998), Haemophilus phage HP1 (Hauser & Scocca, 1992), E. coli phage P4

(Pierson & Kahn, 1987) and Salmonella phage P22 (Smith-Mungo et al., 1994) are temperate phages that use an attB site located within a tRNA gene as their target for the integration into host chromosome. The reason for using tRNA genes as a preferred integration site by temperate phages and other horizontally transferred genetic elements, such as genomic islands and integrative conjugative elements, is readily explained by the

190 stability and conservation between strains of the attB sequence within a tRNA gene, which helps in increasing the number of different host bacterial chromosomes that the phage can integrate into (Williams, 2002). In order to restore the activity of the gene, which can be disrupted during the phage integration, the attP sequence should include the homologous host DNA sequence. The small size of tRNA genes is also believed to be a favorable determinant in selecting them as preferred integration sites. This reduces the size of the host DNA fragment that has to be carried within the attP site for the restoration of its integrity (Williams, 2002).

Additionally, a BLAST search of the 50 bp sequence that occurs commonly within the predicted att sites of both phage PPF1 and its rhizobial host F1 genome, exhibited a sequence similarity to the 51 bp att core region of the targeted attachment site of phage

16-3 (Figure 6-7). Phage 16-3 is a temperate siphovirus, capable of lysogenizing the

Sinorhizobium meliloti 41, by site-specific recombination into its targeted integration site within a tRNA-Pro (CGG) (Papp et al., 1993; Semsey et al., 2002). The attP and attB sites of phage 16-3 and its rhizobial host, respectively, share the identical 51 bp core region that prevents any alteration in the tRNA gene upon phage integration. The anticodon loop of the tRNA gene is used as the targeted sublocation for integration by both 16-3 and PPF1 phages. Also, the attB site of temperate phage ϕU on the chromosome of Rhizobium leguminosarum biovar trifolii strain 4S, carries a 53 bp core sequence, out of which a 47 bp region overlaps with a putative tRNA-Thr (GGU) gene

(Uchiumi et al., 1998). The integration of phage ϕU into the tRNA gene also follows the same pattern as 16-3 and PPF1 rhizobiophages, by using the anticodon-loop as the preferred sublocation of integration.

191

The intergenic region between the putative ligase and integrase genes in the PPF1 phage genome where the integration occurs, is high in A+T content with a value of

47.3%, compared to the average A+T composition of the total genome (≈ 38%). The presence of relatively AT rich attachment regions is a characteristic observation in several temperate phages including phage λ and lactococcal phages ϕLC3, TP901-1 and

ϕadh. Furthermore, several direct repeats, which may play a role during the phage integration (Figure 6-8), were identified surrounding the core sequence of the probable attP site in the PPF1 genome. Two direct repeats (23 bp) were detected at 28,755-28,777 bp and 28,973-28,995 bp regions in the PPF1 genome and these were positioned 78 bp upstream and 68 bp down stream to the attP core site, respectively. Further analysis of these two repeats revealed that each 23 bp repeat region consists of two identical sequences with a length of 11 bp. The abundant distribution of repeat sequences in the attP region has been acknowledged as a characteristic feature in many identified site- specific recombination systems.

6.3 Author’s contributions

Marcela Restrepo performed the TEM of PPF1. The assembly and annotation of the draft genome sequence of R. leguminosarum F1-L strain, as well as the in silico analysis of phage attachment sites were carried out by Benjamin Perry (Dr. Christopher Yost’s lab,

University of Regina.). I performed all the other experiments reported in this chapter.

192

Chapter Seven: Effect of rhizobiophages on nodulation competitiveness of

Rhizobium leguminosarum

The ability of rhizobiophages to alter the population dynamics of rhizobia present in the rhizosphere environment has been widely recognized (Hashem & Angle, 1988; Mendum et al., 2001). They can also affect legume- Rhizobium symbioses substantially by causing significant changes in the relative numbers of resistant and susceptible rhizobial strains in the soil (Ahmad & Morgan, 1994; Hashem & Angle, 1988). Their ability to improve the nodule occupancy by phage resistant strains of rhizobia can be used to address the

“Rhizobium-competition problem”, which is considered as a major obstacle in increasing legume crop yields through the use of optimized Rhizobium inoculants (Dowling &

Broughton, 1986; Triplett & Sadowsky, 1992). Based on previous studies conducted on phages of Bradyrhizobium (Hashem & Angle, 1988; Hashem & Angle, 1990), the Hynes lab has suggested developing a ‘phage cocktail’ that can be used to enhance the nodulation competitiveness of rhizobial inoculants over the indigenous rhizobia, which will be important in agricultural applications.

7.1 Results

7.1.1 Selection of phages and standard plant competition assays

In order to demonstrate the effects of phage on nodulation of Pisum sativum roots by strains of R. leguminosarum, several phages from the Hynes lab collection were selected.

All the phage isolates were screened against a variety of previously characterized rhizobial strains along with 29 indigenous rhizobial strains isolated from different agricultural soils associated with legume growth (data not shown). The phage isolates

193 showing the broadest spectrum of activity against the rhizobial strains tested were picked to further evaluate their ability to provide a competitive advantage to a host bacterium resistant to the phage during the nodulation process.

The effect of the selected virulent rhizobiophages on the nodule occupancy of rhizobia was analysed using standard competition assays (Hynes et al., 1988; Hynes & O'Connell,

1990; Oresnik et al., 1999). Each treatment was carried out in triplicates, in pots containing two seeds. Plants were watered periodically with sterile distilled water and harvested after 4 weeks to determine the nodule occupancy. Appropriate controls without any inoculant were also utilized for the comparison of results. The data obtained were analyzed statistically using Student’s t-test.

7.1.2 Determining the effective multiplicity of infection (MOI) for phage inoculation

To determine the optimal multiplicity of infection (MOI) for phage inoculation during plant assays, three different rhizobiophages were used at varying dilutions to co-inoculate the Trapper pea seeds with their respective host rhizobial strains. The diluted phage lysates and bacterial cultures were added onto the planted seedlings in the pots.

The pea seeds were co-inoculated with phage P10VF and its rhizobial host, Rlv VF39SM, at four different MOIs (10, 1, 10-2 and 10-4) (Figure 7-1 A) and the number of nodules formed by Rlv VF39SM was significantly decreased when the plants were treated with

P10VF at MOIs 10 and 1 (p < 0.01). There was no difference observed in nodule formation when plants were treated with phages at lower MOIs, compared to the plant inoculated only with Rlv VF39SM. Similarly, when phage P11VFA was used as a co- inoculant in nodulation assays with Rlv VF39SM, a significant reduction in the number of

194 nodules formed was only observed at the highest phage: rhizobial inoculum ratio

(MOI= 10-2, p < 0.01) (Figure 7-1 B). Nodule formation by the strain Rlv 3841 decreased in the presence of phage L338C (at MOI= 10-3 and 10-4), but the observed reductions could not be considered as significantly different. The use of different MOIs for different phages was due to the maximum possible phage titer that could be achieved with each phage. The rhizobial culture was not diluted throughout the experiment to maintain a constant number of bacterial cells involved in the nodulation and phage lysates were diluted accordingly to get different MOIs. The highest MOI corresponds to the undiluted lysate of each phage.

7.1.3 Methods of phage inoculation

Two different methods were used to inoculate the pea seeds with rhizobiophages to determine the most effective method of inoculation for testing phage effect on nodule occupancy. Surface-sterilized pea seeds were soaked with undiluted phage lysates for 12-

14 hours at room temperature with slow shaking. The phage-soaked seeds were then dried on a sterile filter paper and germinated on fresh TY agar plates. Another set of surface sterilized seeds was germinated without any contact with phages. Upon germination, both sets were used in nodulation assays. Pea seeds that were germinated without any contact with phages were inoculated with 1 ml of fresh phage lysate during the planting as per requirements. Two phages were used separately with their respective rhizobial hosts. Both methods of phage inoculation resulted in a reduction in nodule formation by the rhizobial host, when P10VF was used with Rlv VF39SM (p < 0.01)

(Figure 7-2).

195

196

Figure 7-1. Effect of three different phages on nodulation of their respective host

Rhizobium strain when co-inoculated with different multiplicities of infection

(MOIs)

Rlv VF39SM was infected with different MOIs of P10VF (A) and P11VFA (B). Phage

L338C (C) was used to co-inoculate pea seeds with Rlv 3841. The Y-axis represents the number of nodules per pot for each treatment, and each pot contained two pea plants.

Values represent means of three independent replicates and error bars indicate standard error of the mean (SEM). A significant difference compared to the control is indicated with an asterisk (Student t-test, p <0.01).

197 A considerable reduction in nodule formation was observed when the seeds were soaked with the phage before germination. However, the difference in the number of nodules formed by Rlv 3841 when co-inoculated with L338C was not statistically significant (p < 0.05) to the number of nodules formed with phage-soaked seeds.

7.1.4 Phage effect on nodulation by indigenous rhizobia

The ability of phage isolates to effectively alter the nodule occupancy of indigenous rhizobial strains was determined using four different soil samples obtained from agricultural lands with a history of legume growth. Instead of using a rhizobial strain, 1 ml of the soil suspension was used as the inoculum to nodulate the pea seedlings.

Considering the possible diversity of rhizobia in the soil samples, a phage cocktail of five different rhizobiophages (P10VF, P11VFA, L338C, L338G and P106B) was used. A significant decrease in nodule formation by indigenous rhizobial strains compared to the controls with no phages was observed with all the four different soil inocula (Figure 7-3)

(p <0.05).

7.1.5 Effect of the phage on the nodulation competitiveness of rhizobia

Rhizobiophages are known for their potential ability to alter the nodulation dynamics of competing rhizobial strains by implying a selective pressure over their susceptible hosts.

In order to determine the effect of phages on nodulation competitiveness of R. leguminosarum, two different strains with varying susceptibility towards the selected rhizobiophages were used. Rlv 248SM exhibited an elevated resistance towards the phages used, compared to Rlv VF39. Rlv 248SM was resistant to phage P11VFA, whereas the infections by P10VF, L338C and L338G resulted in low plaquing efficiencies and opaque plaques (data not shown).

198

Figure 7-2. Effect of different phage inoculation methods on nodule occupancy by R. leguminosarum bv. viciae VF39SM and 3841

Phage P10VF was used with Rlv VF39SM (*) (significant, Student's t-test, p <0.05), whereas Rlv 3841 (**) was treated with phage L338C (not significant, Student's t-test, p

<0.05). Y-axis represents the number of nodules per pot for each treatment and each pot contained two pea plants. Values represent means of three independent replicates and error bars indicate standard error of the mean (SEM). The approximate MOI for phage

P10VF was 1; while it is was 0.01 for the phage L338C.

199

Figure 7-3. Effect of phage cocktail on nodulation by indigenous rhizobial strains present in four different soil types with histories of legume cultivation

A phage cocktail of 5 different phages (P10VF, P11VFA, L338G, L338C and P106B) was used to co-inoculate pea seeds with 4 different soil suspensions (P11, B2, P12 and

S1). Y-axis represents the number of nodules per pot for each treatment and each pot contained two pea plants. Values represent means of three independent replicates and error bars indicate standard error of the mean (SEM). A significant difference compared to the control is indicated with an asterisk (Student's t-test, p <0.05).

200

Figure 7-4. The assembly of Magenta jars for plant assays (A) and nodules formed by R. leguminosarum bv. viciae (Rlv) VF39SM and Rlv 248SM-ϕ2(pFPdegP1) during a nodulation competition assay (B).

Rlv VF39SM and Rlv 248SM-ϕ2(pFPdegP1) were used to co-inoculate pea seeds at 2: 1

(v/v) ratio without the presence of any phage. The nodules formed by the rhizobial strain that carries the plasmid containing the reporter gene (Rlv 248SM-ϕ2(pFPdegP1) become blue upon staining with a solution containing X-gluc (B).

201 In contrast, Rlv VF39SM was susceptible to all these four phages and infections resulted in clear lysis of bacterial lawns. Furthermore, a strain of Rlv VF39SM carrying the plasmid pFPdegP1 with gusA reporter gene (Yost lab, Unpublished) was available for use in the nodulation competition assays. The figure 7-4B illustrates the nodules formed by Rlv 248SM-ϕ2(pFP degP1) with wild type Rlv VF39SM during a nodulation assay, where the nodules formed by pFPdegP1 carrying strain Rlv 248SM-ϕ2(pFPdegP1) have turned into blue after staining with X-Gluc containing solution. For the nodulation competition assays, pea seedlings were co-inoculated with Rlv VF39(pFPdegP1) and Rlv

248SM in a 2:1(v/v) ratio. The OD600 of the bacterial broth cultures were adjusted and mixed according to the required volume ratio before use for inoculations.

Similarly, pea seedlings were co-inoculated with each phage separately and as a cocktail.

Appropriate controls with no phage inoculum were also carried out. After four weeks of incubation, nodules were harvested, cleaned and stained with a solution containing X-

Gluc (5-bromo-4-chloro-3-indolyl-β-D-glucuronic acid) at 30 °C for overnight with continuous shaking. As illustrated in the figure 7-5, a considerable decrease in the number of blue nodules was observed in all phage-treated plants compared to the control

(a significant difference at p < 0.02) indicating the reduced nodule occupancy of the susceptible strain in the presence of phages. The decrease in the blue nodules was also accompanied by a corresponding increase in the red nodules occupied by the comparatively phage-resistant strain (Rlv 248SM). However, this trend was not observed with the plants treated with phage L338C, where there was no increase detected in the number of red nodules formed.

202

Figure 7-5. Effect of four different rhizobiophages on nodulation competitiveness of two strains of R. leguminosarum bv. viciae (248SM and VF39degP1)

Pea seeds were co-inoculated with Rlv VF39degP1 and Rlv 248SM strains in 2:1 ratio.

Four different phages (P10VF, P11VFA, L338C and L338G) were also used individually as well as a cocktail of phages for the co-inoculation. The Y-axis represents the number of nodules per pot for each treatment and each pot contained two pea plants. Values represent means of two independent replicates and error bars indicate standard error of the mean (SEM). A significant decrease is indicated with an asterisk (Student t-test, p <

0.02).

203 Isolation of phage resistant strain of Rlv 248SM

Phages that exhibited, comparatively, the broadest spectrum of infectivity against the tested rhizobial strains were selected to design two different phage cocktails.

Combinations of phages used for these cocktails are listed in table 2-4. The phage cocktail 1 contained phages L338C, P11VFA, P11VFC, P10VF and B1VFA, while cocktail 2 was composed of C2F3A1, C2F3A2, AF3, V1VFA and V1VFB. Although all phages used in the cocktail 1 and C2F3A1 and C2F3A2 from cocktail 2 are all infective against wild type Rlv 248SM, they form very opaque plaques on a wild type Rlv 248SM lawn compared to the clear lysis observed with other rhizobial hosts such as Rlv

VF39SM.

To isolate a phage resistant derivative of Rlv 248SM, the wild type Rlv 248SM broth culture was infected with each phage cocktail separately and incubated for a period of 48 hours. Resulting cells were washed and infected repeatedly with the same phage cocktail two more times, before isolating them on TY agar plates. These rhizobial isolates were then screened for their co-resistance to selected phage infections.

When challenged with phage cocktail 1, no resistant variant of Rlv 248SM could be isolated successfully. However, a strain of Rlv 248SM that was relatively resistant to the infection by the phages of cocktail 2 was isolated (Rlv 248SM-ϕ2) (with lower EOPs compared to the wild type Rlv 248SM- data not shown). The pFPdegP1 plasmid was mobilized into this phage-resistant 248SM strain (Rlv 248SM-ϕ2(pFPdegP1)) enabling the easy identification of its nodule occupancy.

Nodulation competition assays were performed by co-inoculating pea seeds with Rlv

VF39SM and Rlv 248SM-ϕ2 (pFPdegP1) in 2:1 and 3: 1 (v/v) ratios. The two different

204 cocktails (Table 2-4) of phages mentioned earlier that contained five phages in each were used as the phage inoculum. When there was no phage, both Rlv VF39SM and

248SM-ϕ2 (pFPdegP1) occupied more or less similar number of nodules at a 2:1 inoculation ratio, whereas nodule occupancy by Rlv VF39SM increased with increased inoculation ratio (3:1) (Figure 7-6). When phage cocktail 1 was used as co-inoculum, the nodule occupancy by the susceptible strain dropped significantly (p < 0.05) by ~57.9% at the 2:1 inoculation ratio, whereas the percentage decrease in nodulation compared to the control was higher when a 3:1 ratio was used (~72.9%). This reduction of red nodule formation corresponded to an increase in the number of nodules formed by the phage- resistant rhizobial strain. The increase in the number of blue nodules formed with the presence of cocktail 1 was observed to be ~43.6% and ~73.5% for 2:1 and 3:1 inoculation ratios respectively. Similarly, phage cocktail 2 suppressed the nodulation by Rlv

VF39SM by ~57.9% and ~63.6% at 2:1 and 3:1 inoculation ratios accordingly. The increase in the nodule occupancy by phage-resistant 248SM-ϕ2(pFPdegP1) strain was observed to be considerably higher when cocktail 2 was used with a 2:1 inoculation ratio.

205

206

Figure 7-6. Effect of two different phage cocktails on nodulation competitiveness of two strains of R. leguminosarum bv. viciae (VF39SM and 248SM-ϕ2(pFPdegP1))

Pea seeds were co-inoculated with Rlv VF39SM and Rlv 248SM-ϕ2(pFPdegP1) strains in

2:1 and 3:1 ratios. Two different phage cocktails were also used for the co-inoculation.

Y-axis represents the number of nodules per plant for each treatment. Phage cocktail

1 contained L338C, P11VFA, P11VFC, P10VF and B1VFA, whereas cocktail 2 contained C2F3A1, C2F3A2, AF3, V1VFA and V1VFB. Values represent means of three independent replicates and error bars indicate standard error of the mean (SEM). A significant decrease/increase compared to its appropriate control is indicated with an asterisk (Student's t-test, p < 0.05).

207

208

Figure 7-7. Effect of two different phage cocktails on nodulation competitiveness of indigenous rhizobia present in three different legume soil samples against R. leguminosarum bv. viciae 248SM-ϕ2(pFPdegP1)

Pea seeds were co-inoculated with soil suspensions and Rlv 248SM-ϕ2(pFPdegP1) strain in 1:1 ratio (v/v). Three different soil samples Cp1 (A), V1 (B) and P12 (C) were used.

Two different phage cocktails were also used for the co-inoculation. Y-axis represents the number of nodules per pot for each treatment and each pot contained two pea plants.

Phage cocktail 1 contained L338C, P11VFA, P11VFC, P10VF and B1VFA, whereas cocktail 2 contained C2F3A1, C2F3A2, AF3, V1VFA and V1VFB. Values represent means of three independent replicates and error bars indicate standard error of the mean

(SEM).

209 In addition, indigenous rhizobia present in three different soil samples were used as co-inoculum with phage-resistant 248SM-ϕ2 (pFPdegP1) strain to determine the competition for nodule occupancy. However, the impact of rhizobiophages on nodule occupancy by indigenous rhizobia could not be evaluated as the 248SM-ϕ2 (pFPdegP1) strain managed to outcompete the indigenous rhizobia present in soil suspensions consistently throughout the experiment (Figure 7-7).

7.2 Discussion

The potential significance of rhizobiophages in altering the ecology of various rhizobial populations in the rhizosphere environment has been widely appreciated (Hashem &

Angle, 1988; Mendum et al., 2001; Werquin et al., 1988). Apart from having a pivotal role in rhizobial evolution by implementing a positive selective pressure upon resistant strains, the presence of rhizobiophages can also affect the legume- Rhizobium symbioses substantially by causing significant changes in the relative numbers of resistant and susceptible rhizobial strains in soil (Ahmad & Morgan, 1994; Hashem & Angle, 1988).

The influence of phages on nodule occupancy and nitrogen fixation has been demonstrated using Bradyrhizobium japonicum strains (Hashem & Angle, 1990).

Greenhouse experiments have demonstrated that the presence of the phage has the ability to decrease the survival of susceptible B. japonicum strain USDA 117 in the rhizospere environment and the nodule number, nitrogenase activity and the plant dry weight was significantly reduced when the plants were only inoculated with sensitive rhizobial strain and phage (Hashem & Angle, 1988). Hashem and Angle (1990) have shown that the co- inoculation of two strains of B. japonicum with rhizobiophages into the soil can cause

210 significant reduction of nodule occupancy by the phage sensitive strain while allowing the resistant Bradyrhizobia to occupy the greater number of nodules. The ability of rhizobiophages to selectively reduce the undesirable indigenous rhizobial populations within the rhizosphere environment has been regarded as a promising solution in alleviating the Rhizobium competition problem (Dowling & Broughton, 1986; Triplett &

Sadowsky, 1992) in inoculant technology. The results obtained with strains of bradyrhizobia reinforce the concept of enhancing the nodulation occupancy of superior rhizobial inoculants by using a mixture of phages to which the indigenous rhizobia may be susceptible, but the inoculants are resistant. However, very few studies have been done to determine the phage effect on the nodulation of R. leguminosarum that infect peas or lentils.

The phage isolates that exhibited the broadest host specificity against the tested rhizobial strains were chosen from the Hynes lab collection to further evaluate their effect on nodule occupancy of R. leguminosarum using standard nodule competition assays.

Multiplicity of infection (MOI) or the number of phage particles present per a host cell is an important factor in determining a successful infection process by a phage. To achieve the maximum efficiency of the phage infection during plant assays, the optimal MOI for phage inoculation was determined. Throughout this experiment, the approximate numbers of rhizobial cells introduced into the rhizosphere was maintained as constant, whereas the phage lysates were diluted appropriately. A significant decrease in the number of nodules formed by the host rhizobia was observed only at the highest MOIs used in each phage infection. Nodulation increased with the decreasing MOI indicating that the number of phages involved in the process plays an essential role in determining

211 the effect of rhizobiophages on nodule occupancy by susceptible rhizobia. Moreover, maximum phage effect on nodulation with both P10VF and P11VFA was observed when undiluted phage lysates were used.

For a phage inoculant to be successful in providing the competitive advantage to the rhizobial inoculants used in agriculture, its ability to selectively and negatively influence the nodulation of indigenous rhizobia is essential. To accomplish this, we have screened our phage isolates against a variety of indigenous rhizobia and the ones with highest spectra of infectivity were selected. Notwithstanding the selected rhizobiophages’ ability to infect the majority of rhizobial strains tested, it is crucial to determine the competence of these phages in affecting the population size and nodule occupancy of the indigenous rhizobial strains within the rhizosphere environment. Therefore nodulation assays were conducted substituting the rhizobial strains with soil suspensions obtained from soil samples with a history of legume growth as bacterial inoculum. Pea seedlings were treated with these soil suspensions and a phage cocktail of five different rhizobiophages

(P10VF, P11VFA, L338C, L338G and P106B) was used for co-inoculation. This phage cocktail was able to suppress the nodulation by indigenous rhizobia significantly with all the four soil samples (Figure 7-3).

The effect of phage on the competition for nodule occupancy by two or more rhizobial strains was determined using a reporter gene technology. The use of reporter genes in studying the nodule occupancy of competing rhizobial strains is common and enables the rapid screening of nodule occupancy (Wilson et al., 1995a; Wilson et al., 1999). In this study, rhizobial strains with the gusA gene were used to differentiate the nodulation

212 between competing strains. It encodes the β-glucuronidase (GUS), which can react with its substrate X-Gluc and form a blue precipitate resulting in blue nodules. For the nodulation competition assays with Rlv VF39(pFPdegP1) and Rlv 248SM, the competing rhizobial cultures were co-inoculated in 2:1 volume ratio. Since it was observed during the preliminary studies (data not shown), that the strain 248SM is naturally competent and outcompetes VF39 for nodule occupancy, VF39 was provided with a competitive advantage by increasing its population size in plant assays when determining the phage effect. Nevertheless, the phage inoculum was successful in significantly reducing the nodule occupancy by the susceptible rhizobial strain VF39, allowing a rise in the number of nodules occupied by phage-resistant strain. This was further demonstrated with Rlv

VF39SM and Rlv 248SM-ϕ2(pFPdegP1). The rhizobiophages were capable of increasing the nodule occupancy by phage-resistant rhizobia, despite the competitive dominance given to the phage-sensitive strain with higher population sizes. These results coincide with the previous observations made with B. japonicum strains (Hashem & Angle, 1990) and its phages to influence the nodulation of competing rhizobial strains in the soil environment.

To further evaluate the phage effect in nodule occupancy of competing rhizobia, we used indigenous rhizobia present in three different soil samples as a co-inoculum with phage- resistant 248SM-ϕ2(pFPdegP1) strain. No conclusive observation could be made during this experiment, as the 248SM-ϕ2(pFPdegP1) strain exhibited a strong competitiveness against the indigenous rhizobia present in soil samples, outcompeting them for nodule occupancy regardless of the presence of phages.

213 Chapter Eight: General discussion and future directions

8.1 General discussion

The genetic diversity hidden among the global phage population is considered to be remarkable (Hatfull, 2008), with the majority of the gene products of sequenced genomes exhibiting no sequence similarities to the characterized gene products present in the available public databases. Despite their diversity and abundance in the rhizosphere, and their contribution to population dynamics and evolution of their hosts, rhizobiophages had been little studied until recently. This study was undertaken in an attempt to advance our understanding on the rhizobiophage diversity in the environment, which could be beneficial in elucidating their precise role in an ecological context.

The Hynes lab collection of rhizobiophages contains more than 50 phages isolated from different soil samples obtained from Alberta, Saskatchewan, Ontario and British

Columbia, Canada. We have used a variety of rhizobial strains, such as Rhizobium leguminosarum bv. viciae (Rlv) 3841, Rlv VF39, Rlv 248, R. leguminosarum F1, R. gallicum S014B-4 and Mesorhizobium loti strains R7A and R7ANS in trapping rhizobiophages from soil samples that have histories of legume cultivation.

Rhizobiophages L338C and P10VF, have been characterized previously, including the complete genome sequence of L338C (Restrepo, 2012).

All the phages that were examined by electron microscopy in this study belong to the order Caudovirales, which consists of the tailed phages. This is not surprising as the characterized viral community is highly dominated by tailed phages. Furthermore a

214 morphological diversity could be identified among the characterized rhizobiophage isolates with them belonging to all the three families described in this Order (Table 1-1)

Factors such as host range of the phage, ability to lysogenize their rhizobial host, morphological diversity and ease of isolating phage DNA were considered in selecting phages for further characterization. Especially when selecting phages of R. leguminosarum, their broader spectrum of infectivity was taken into account. The proposed phage-based technology that can be used to improve the competitive success of the rhizobial inoculum in nodulation requires phages with the broadest possible host range.

Five rhizobiophage genomes were sequenced and completed using different sequencing technologies including classical Sanger sequencing with sub-clone generation, 454- pyrosequencing, Ion Torrent, and Illumina. Table 8-2 summarizes the sequencing methods and general features of the rhizobiophage genomes completed in this study.

Apart from the 5 genomes completed in this study, there are 15 rhizobiophage genomes available in the NCBI public database (Table 8-3). R. etli phages RHEph01, RHEph02,

RHEph03, RHEph08, and RHEph09 are similar to T7-like phages (Santamaria et al.,

2014) and the phylogenetic relationship established using the sequence similarity of terminase large sub units of 94 phages at amino acid levels, clustered them with T7-like phages (Figure 8-1). Mesorhizobium loti phages Lo5R7ANS and Cp1R7ANS-C2 were also identified as T7-like phages based on their genome functions and architecture.

However, both these phages didn’t have any ORF that could be conclusively identified as coding for a putative terminase large subunit. Therefore they could not be included in the phylogenetic representation.

215 Table 8-1. Summary of the morphotypes of phages isolated with different trapping hosts in this study

Rhizobial host Phage Morphotype

P11VFA Siphoviridae P11VFC Siphoviridae B1VFA Siphoviridae Rhizobium leguminosarum P9VFCI Myoviridae bv. viciae (Rlv) VF39 V1VFA Myoviridae V1VFB Myoviridae

Rlv 3841 L338H Siphoviridae

R. leguminosarum F3 AF3 Myoviridae

R. leguminosarum F1 PPF1 Myoviridae P106A Siphoviridae P106B Siphoviridae R. gallicum S014B-4 P106CI Siphoviridae

Cp1R7ANS-C2 Podoviridae Mesorhzobium loti R7ANS Cp1R7ANS-D2 Podoviridae Lo5R7ANS Podoviridae

216

Table 8-2. Summary of the five rhizobiophage-genomes completed in this study

Phage name P10VF P106B Lo5R7ANS Cp1R7ANS-C2 PPF1 454- Sequencing Ion Torrent & Sanger & Ion Torrent Ion Torrent pyrosequenc technology used Illumina Ion Torrent ing

Average coverage 66.81X 83.39X 60.51X 60.5X 34X

Genome size (bp) 156,446 56,024 45,718 43,865 54,506

No: of predicted 257 95 63 67 94 ORFS Distribution of ORFs in the genome (%) Functional assignments 18.29 23.1 30.1 23.9 25.5 Hypothetical proteins 22.96 27.4 52.4 61.2 48.9 No significant similarity 58.75 49.5 17.5 14.9 25.5

% of genome coding 94.1 93.6 91 89.2 95.4

Table 8-3. Phages that infect rhizobial hosts for which the complete genome has been reported (including the ones completed in this study)

Genome characteristics Genome accession Phage Name Rhizobial host No: of predicted No: of predicted Size (bp) Average G+C% no: ORFs tRNA genes Sinorhizobium Phage 16-3 60,195 59.1% 110 - NC_011103 meliloti Rm41 PBC5 S. meliloti 1021 57,416 61.5% 83 - NC_003324 RHEph01 Rhizobium etli Bra5 t2 43,444 59.2% 56 1 JX483873 RHEph02 R. etli Bra5 t4 46,486 49.4% 60 - JX483874 RHEph03 R. etli Bra5 t6 45,912 49.3% 59 - JX483875 RHEph04 R. etli GR56 t1 53,018 56.4% 81 - JX483876 RHEph05 R. etli GR56 t2 50,426 56.4% 75 - JX483877 RHEph06 R. etli GR56 t4 53,721 56.4% 82 - JX483878 RHEph08 R. etli Kim5 t5 43,619 49.4% 59 - JX483879 RHEph09 R. etli Kim5 t6 45,962 49.3% 61 - JX483880 RHEph10 R. etli CE3 t4 115,042 60.3% 171 1 JX483881 ϕM12 S. meliloti 1021 194,701 49.0% 376 10 KF381361 R. leguminosarum bv. - L338C 109,558 59.0% 185 KF614509 viciae 3841 R. leguminosarum bv. - P10VF 156,446 49.9% 257 KM199770 viciae VF39 R. gallicum S014B-4 P106B 56,024 47.9% 95 1 KF977490 (6) PPF1 R. leguminosarum F1 54,506 61.9% 94 - KJ746502 Mesorhizobium loti Lo5R7ANS 45,718 61.1% 63 1 KM199771 R7ANS Cp1R7ANS-C2 M. loti R7ANS 43,865 60.5% 67 - - RR1-A R. radiobacter P007 53,102 57.2% 68 1 NC_021560 RR1-B R. radiobacter P007 37,378 58.3% 52 - NC_021557

217

218

The neighbor-joining tree generated with amino acid sequences of 25 phage putative major capsid proteins clustered these mesophages with previously identified T7-like rhizobiophages (Figure 8-2).

Phylogenetic clustering using the terminase large subunit as a molecular marker placed our phage P10VF among a group of T4-like phages (Figure 8-1). According to the phylogenetic tree, the closest related member of P10VF is S. meliloti T4-like superfamily phage ϕM12 (Brewer et al., 2014). The clustering of P10VF with T4-like phages was consistently observed even with the capsid protein-based phylogeny. The two siphophages L338C and P106B were observed as more closely related to each other than any studied rhizobiophage, as they were grouped together in the terminase-based phylogenetic relationships.

Aside from the morphological, proteomic, and genomic characterization of the selected rhizobiophages, this study also examined the use of rhizobiophages as a tool in mitigating the highly encountered Rhizobium-competition problem in rhizobial inoculant technology. Two cocktails of 5 selected rhizobiophages of the R. leguminosarum with the broadest possible host range were able to provide a competitive advantage to a phage- resistant rhizobial strain allowing it to occupy the majority of the nodules formed under controlled environmental conditions. However, the success of such techniques in the natural environment relies heavily on the physicochemical and biological parameters of the ecosystem into which the phages are introduced. Therefore, the phage effect on the nodulation of phage-resistant and phage-susceptible rhizobial strains should be evaluated with field trials.

219

220

Figure 8-1. Neighbor-joining phylogenetic tree of terminase large sub units of 94 phages

A phylogenetic tree was constructed based on the sequence similarity of terminase large subunits at the amino acid level. Multiple sequence alignment and generation of neighbor-joining tree was performed using ClustalX v.2.1 (Larkin et al., 2007), while the tree was modified with MEGA 5.1 (Tamura et al., 2011). The bootstrap values are indicated for 1000 trials and terminases of rhizobiophages are shown with a red diamond.

221

Figure 8-2. Neighbor-joining phylogenetic tree of major capsid proteins of 25 phages

A phylogenetic tree was constructed based on the sequence similarity of major capsid proteins at the amino acid level. Multiple sequence alignment and generation of neighbor-joining tree was performed using ClustalX v.2.1 (Larkin et al., 2007), while the tree was modified with MEGA 5.1 (Tamura et al., 2011). The bootstrap values are

222 indicated for 1000 trials and capsid proteins of rhizobiophages are shown with a red diamond.

8.2 Future directions

The methods optimized previously (Restrepo, 2012) and in this study can be used to isolate and characterize novel rhizobiophages from variety of soil samples using different rhizobial trapping hosts. However, our attempts to isolate a transducing phage that can be used as a tool in molecular biological techniques were mostly unsuccessful, especially with Rhizobium gallicum and Mesorhizobium loti hosts. The two sequenced genomes of mesophages, Lo5R7ANS and Cp1R7ANS-C2, contain integrase-coding genes, which suggests their ability to enter into the lysogenic life cycle. Therefore it is possible that these phages might have the ablity to mediate specialized transduction. The transduction frequencies of these phages could be studied using different mutant strains, and the ability to lysogenize host cells should be examined.

The genomes of phages P9VFCI and V1VFA have been sequenced using Ion Torrent technology, while both Ion Torrent and Illumina sequencing technologies have been used to sequence the genomes of AF3 and P11VFA. The assemblies of data obtained from both methods resulted in 15 contigs larger than 500 bp in size (n50=70,970 bp) for

P11VFA. Similarly, assemblies for AF3 resulted in 12 contigs (> 500bp) with an n50 of

29,185 bp. These genomes could be completed using a few primer-walking steps. It would be especially interesting to see the completed genome architecture of phages

P9VFCI and AF3, as both these Myoviridae phage genomes were resistant to digestion by a majority of restriction enzymes tested. Based on the preliminary annotations of the

223 contigs of AF3, it can be suggested that AF3 is most probably a T4-like phage. This is not surprising, considering its morphotype, genome size and remarkable ability of its

DNA to resist digestion by different restriction enzymes.

It has been shown in Chapter six that the predicted att site of temperate phage PPF1 share a sequence similarity with the targeted att site of Sinorhizobium meliloti phage 16-3 with both att sites containing a ~50 bp region from the 3’ end of a putative tRNA-Pro gene.

The integrative elements of phage 16-3 recombination system have been characterized thoroughly. The recombination of 16-3 requires products of two genes: int and xis

(Semsey et al., 1999). An expression vector containing int gene with attP region was capable of readily integrating into the S. meliloti host chromosome, while this also occurred when Int protein was provided in trans to the attP. Similarly, the necessity of

Xis protein for the recombination was determined by the reappearance of attB site in the presence of xis gene provided using an expression vector. However, it was also shown that the integrative system of 16-3 failed to function in E.coli, indicating the requirement of a host factor. Furthermore, the integration successfully took place in several other closely related bacterial species, including Rhizobium leguminosarum, Bradyrhizobium japonicum and Agrobacterium tumefaciens (Semsey et al., 2002). Although a putative integrase-encoding gene was identified in the PPF1 genome based on sequence similarities, the involvement of the gene product in recombination should be confirmed by pursuing more experiments as described above. The requirement of any other gene of

PPF1 for recombination could be also identified using random mutagenesis. Since phage

PPF1 integration occurs within a putative tRNA gene it could be assumed that the phage can target a range of bacterial species due to the conserved nature of its targeted tRNA

224 gene. During this study, it has been observed that although PPF1 failed to form visible plaques on a lawn of R. leguminosarum VF39, it is capable of lysogenizing the same strain. Despite its extremely low ability to form visible plaques with rhizobial hosts,

PPF1 may be capable of lysogenizing a large variety of other hosts. Therefore further characterization of the phage PPF1 integrative recombination system to identify the essential components for recombination as well as determining its host range for lysogenization would be interesting.

In Chapter seven, it was shown that the phage-resistant Rlv 248SM-ϕ2(pFPdegP1) strain became more competitive against the co-inoculated phage-sensitive Rlv VF39SM in the presence of a phage-cocktail. However, we failed to conclude the same with indigenous rhizobia due to the superior competitiveness of Rlv 248SM-ϕ2(pFPdegP1) in nodule occupancy regardless of the presence of phage-cocktails. This experiment should be repeated with higher v/v ratios of soil suspension to Rlv 248SM-ϕ2(pFPdegP1) to provide the initial competitive dominance to the indigenous strains to observe the effect of subsequent change in nodule occupancy by Rlv 248SM-ϕ2(pFPdegP1) in the presence of phage-cocktails. Furthermore, field trials should be carried out to determine the phage effect on nodulation of the phage-resistant strain under the natural conditions.

Additionally, determining the persistence of phages in the field for several consecutive growing seasons could be also helpful.

It would be also intriguing to analyze the phage-resistant variant of Rlv 248SM to understand the mechanism of phage resistance in that strain. Studies have revealed that the mutations conferring the resistance to ϕM12 and N3 phage infections in S. meliloti are

225 due to different mutations within the gene coding for essential porin RopA1 (Crook et al., 2013). Rlv 248SM has two copies of ropA genes within its genome and PCR amplification with subsequent sequencing of the ropA genes of the Rlv 248SM-

ϕ2(pFPdegP1) would be interesting to see if there is any notable mutations present in the ropA gene that is responsible for its phage-resistant phenotype. Alternatively, the strain

Rlv 248SM-ϕ2(pFPdegP1) could be screened for the presence of prophages in the genome providing superinfection immunity to their lysogenized host. Southern hybridizations of genomic DNA of Rlv 248SM-ϕ2(pFPdegP1) with individual phage genomic probes separately would be able to reveal its lysogenized state, if any.

226

References

Abedon, S. (2008). Bacteriophage ecology: Population growth, evolution, and impact of bacterial viruses. Cambridge University Press.

Ackerman, H. W. (2009). Phage classification and characterization. In Bacteriophages: Methods and protocols, pp. 127-139. Edited by M. R. J. Clokie & A. M. Kropinski: Humana Press.

Ackermann, H. W. (2009). Phage classification and characterization. Methods Mol Biol 501, 127-140.

Ackermann, H. W. & Prangishvili, D. (2012). Prokaryote viruses studied by electron microscopy. Archives of Virology 157, 1843-1849.

Adams, M. H. (1959). Bacteriophages. New York: Interscience Publishers, Inc.

Adriaenssens, E. M., Van Vaerenbergh, J., Vandenheuvel, D. & other authors (2012). T4-related bacteriophage LIMEstone isolates for the control of soft rot on potato caused by 'Dickeya solani'. PLoS One 7, e33227.

Ahmad, M. H. & Morgan, V. (1994). Characterization of a cowpea (Vigna unguiculata) rhizobiophage and its effect on cowpea nodulation and growth. Biol Fert Soils 18, 297- 301.

Aksyuk, A. A., Kurochkina, L. P., Fokine, A., Forouhar, F., Mesyanzhinov, V. V., Tong, L. & Rossmann, M. G. (2011). Structural conservation of the Myoviridae phage tail sheath protein fold. Structure 19, 1885-1894.

Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J. H., Zhang, Z., Miller, W. & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-3402.

Amarger, N., Macheret, V. & Laguerre, G. (1997). Rhizobium gallicum sp. nov. and Rhizobium giardinii sp. nov., from Phaseolus vulgaris nodules. Int J Syst Bacteriol 47, 996-1006.

Amarger, N. (2002). Genetically modified bacteria in agriculture. Biochimie 84, 1061- 1072.

Appunu, C. & Dhar, B. (2006). Phage typing of indigenous soybean-rhizobia and relationship of a phage group strains for their asymbiotic and symbiotic nitrogen fixation. Indian J Exp Biol 44, 1006-1011.

227

Ashelford, K. E., Day, M. J. & Fry, J. C. (2003). Elevated abundance of bacteriophage infecting bacteria in soil. Appl Environ Microb 69, 285-289.

Atkins, G. J. (1973). Some bacteriophages active against Rhizobium trifolii strain W19. J Virol 12, 149-156.

Aziz, R. K., Bartels, D., Best, A. A. & other authors (2008). The RAST server: Rapid annotations using subsystems technology. BMC Genomics 9.

Bailly-Bechet, M., Vergassola, M. & Rocha, E. (2007). Causes for the intriguing presence of tRNAs in phages. Genome Research 17, 1486-1495.

Baldani, J. I., Weaver, R. W., Hynes, M. F. & Eardly, B. D. (1992). Utilization of carbon substrates, electrophoretic enzyme patterns, and symbiotic performance of plasmid-cured clover rhizobia. Appl Environ Microb 58, 2308-2314.

Barnet, Y. M. (1979). Properties of Rhizobium trifolii isolates surviving exposure to specific bacteriophage. Can J Microbiol 25, 979-986.

Becker, A., Kleickmann, A., Küster, H., Keller, M., Arnold, W. & Pühler, A. (1993). Analysis of the Rhizobium meliloti genes exoU, exoV, exoW, exoT, and exoI involved in exopolysaccharide biosynthesis and nodule invasion: exoU and exoW probably encode glucosyltransferases. Molec Plant-Microbe Interact 6, 735-744.

Belfort, M. & Roberts, R. J. (1997). Homing endonucleases: keeping the house in order. Nucleic Acids Res 25, 3379-3388.

Benson, N. R. & Roth, J. (1997). A Salmonella phage-P22 mutant defective in abortive transduction. Genetics 145, 17-27.

Beringer, J. E. (1974). R Factor Transfer in Rhizobium leguminosarum. J Gen Microbiol 84, 188-198.

Black, L. W. (1989). DNA packaging in dsDNA bacteriophages. Ann Rev Microbiol 43, 267-292.

Bockman, O. C. (1997). Fertilizers and biological nitrogen fixation as sources of plant nutrients: Perspectives for future agriculture. Plant Soil 194, 11-14.

Bohlool, B. B., Ladha, J. K., Garrity, D. P. & George, T. (1992). Biological nitrogen fixation for sustainable agriculture - A Perspective. Plant Soil 141, 1-11.

Borysowski, J., Weber-Dabrowska, B. & Gorski, A. (2006). Bacteriophage endolysins as a novel class of antibacterial agents. Exp Biol Med 231, 366-377.

228

Boule, J., Sholberg, P. L., Lehman, S. M., O'Gorman, D. T. & Svircev, A. M. (2011). Isolation and characterization of eight bacteriophages infecting Erwinia amylovora and their potential as biological control agents in British Columbia, Canada. Can J Plant Pathol 33, 308-317.

Bouvier, T. & del Giorgio, P. A. (2007). Key role of selective viral-induced mortality in determining marine bacterial community composition. Environ Microbiol 9, 287-297.

Bradley, D. E. (1967). Ultrastructure of bacteriophage and bacteriocins. Bacteriol Rev 31, 230-314.

Brewer, T. E., Elizabeth Stroupe, M. & Jones, K. M. (2014). The genome, proteome and phylogenetic analysis of Sinorhizobium meliloti phage PhiM12, the founder of a new group of T4-superfamily phages. Virology 450-451, 84-97.

Brewin, N. J., Wood, E. A., Johnston, A. W. B., Dibb, N. J. & Hombrecher, G. (1982). Recombinant nodulation plasmids in Rhizobium leguminosarum. J Gen Microbiol 128, 1817-1827.

Brüssow, H. & Desiere, F. (2001). Comparative phage genomics and the evolution of Siphoviridae: Insights from dairy phages. Mol Microbiol 39, 213-222.

Brüssow, H. & Hendrix, R. W. (2002). Phage genomics: Small is beautiful. Cell 108, 13-16.

Brüssow, H., Canchaya, C. & Hardt, W. D. (2004). Phages and the evolution of bacterial pathogens: From genomic rearrangements to lysogenic conversion. Microbiol Mol Biol R 68, 560-+.

Buchanan-Wollaston, V. (1979). Generalized transduction in Rhizobium leguminosarum. J Gen Microbiol 112, 135-142.

Campbell, A. (2003). The future of bacteriophage biology. Nat Rev Genet 4, 471-477.

Canchaya, C., Proux, C., Fournous, G., Bruttin, A. & Brussow, H. (2003). Prophage genomics. Microbiol Mol Biol R 67, 238-275.

Canchaya, C., Fournous, G. & Brussow, H. (2004). The impact of prophages on bacterial chromosomes. Mol Microbiol 53, 9-18.

Carlton, R. M. (1999). Phage therapy: past history and future prospects. Arch Immunol Ther Exp (Warsz) 47, 267-274.

Casadesus, J. & Olivares, J. (1979). General transduction in Rhizobium meliloti by a thermosensitive mutant of bacteriophage-Df2. J Bacteriol 139, 316-317.

229

Casas, V. & Rohwer, F. (2007). Phage metagenomics. Advanced Bacterial Genetics: Use of Transposons and Phage for Genomic Engieering 421, 259-268.

Casjens, S. R. (2005). Comparative genomics and evolution of the tailed-bacteriophages. Curr Opin Microbiol 8, 451-458.

Casjens, S. R. (2011). The DNA-packaging nanomotor of tailed bacteriophages. Nat Rev Microbiol 9, 647-657.

Casjens, S. R. & Thuman-Commike, P. A. (2011). Evolution of mosaically related tailed bacteriophage genomes seen through the lens of phage P22 virion assembly. Virology 411, 393-415.

Chang, C. Y., Kemp, P. & Molineux, I. J. (2010). Gp15 and gp16 cooperate in translocating bacteriophage T7 DNA into the infected cell. Virology 398, 176-186.

Chen, Z. H. & Schneider, T. D. (2005). Information theory based T7-like promoter models: classification of bacteriophages and differential evolution of promoters and their polymerases. Nucleic Acids Res 33, 6172-6187.

Cheng, Q. (2008). Perspectives in biological nitrogen fixation research. J Integr Plant Biol 50, 786-798.

Chevalier, B. S. & Stoddard, B. L. (2001). Homing endonucleases: structural and functional insight into the catalysts of intron/intein mobility. Nucleic Acids Res 29, 3757- 3774.

Chevreux, B., Wetter, T. & Suhai, S. (1999).Genome sequence assembly using trace signals and additional sequence information. In Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB) pp. 45-56.

Chibani-Chennoufi, S., Bruttin, A., Dillmann, M. L. & Brussow, H. (2004). Phage- host interaction: an ecological perspective. J Bacteriol 186, 3677-3686.

Cornelissen, A., Ceyssens, P. J., T'Syen, J., Van Praet, H., Noben, J. P., Shaburova, O. V., Krylov, V. N., Volckaert, G. & Lavigne, R. (2011). The T7-related Pseudomonas putida phage phi 15 displays virion-associated biofilm degradation properties. PLoS One 6.

Cornelissen, A., Ceyssens, P. J., Krylov, V. N., Noben, J. P., Volckaert, G. & Lavigne, R. (2012). Identification of EPS-degrading activity within the tail spikes of the novel Pseudomonas putida phage AF. Virology 434, 251-256.

230

Crook, M. B., Draper, A. L., Guillory, R. J. & Griffitts, J. S. (2013). The Sinorhizobium meliloti essential porin RopA1 is a target for numerous bacteriophages. J Bacteriol 195, 3663-3671.

Crop profile for chickpea in Canada (March 2008). Prepared by Pesticide Risk Reduction Program,Pest Management Centre, Agriculture and Agrifood Canada. Available at http://www.agr.gc.ca/eng/?id=1299248507258 (Accessed: 23 August 2008).

Darling, A. C., Mau, B., Blattner, F. R. & Perna, N. T. (2004). Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14, 1394- 1403.

Deák, V., Lukács, R., Buzás, Z., Pálvolgyi, A., Papp, P. P., Orosz, L. & Putnoky, P. (2010). Identification of tail genes in the temperate phage 16-3 of Sinorhizobium meliloti 41. J Bacteriol 192, 1617-1623.

Defives, C., Werquin, M., Hornez, J. P. & Derieux, J. C. (1993). In-Vivo morphogenesis and growth-characteristics of phages Cm (Myoviridae) virulent for Rhizobium meliloti. Curr Microbiol 27, 307-310.

Delcher, A. L., Harmon, D., Kasif, S., White, O. & Salzberg, S. L. (1999). Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27, 4636-4641.

Dempsey, R. M., Carroll, D., Kong, H. M., Higgins, L., Keane, C. T. & Coleman, D. C. (2005). Sau421, a Bcgl-like restriction-modification system encoded by the Staphylococcus aureus quadruple-converting phage phi 42. Microbiology 151, 1301- 1311.

Dhar, B., Singh, B. D., Singh, R. B., Singh, R. M., Singh, V. P. & Srivastava, J. S. (1978). Isolation and characterization of a virus (RL1) infective on Rhizobium leguminosarum. Arch Microbiol 119, 263-267.

Dhar, B., Upadhyay, K. K. & Singh, R. M. (1993). Isolation and characterization of bacteriophages specific for Rhizobium leguminosarum biovar phaseoli. Can J Microbiol 39, 775-779.

Ding, H. & Hynes, M. F. (2009). Plasmid transfer systems in the rhizobia. Can J Microbiol 55, 917-927.

Dorgai, L., Polner, G., Jónás, E., Garamszegi, N., Ascher, Z., Páy, A., Dallmann, G. & Orosz, L. (1983). The detailed physical map of the temperate phage 16-3 of Rhizobium meliloti 41. Mol Gen Genet 191, 430-433.

Dowling, D. N. & Broughton, W. J. (1986). Competition for nodulation of legumes. Annu Rev Microbiol 40, 131-157.

231

Drummond, A. J., Ashton, B., Buxton, S. & other authors (2011). Geneious v5.4.

Duckworth, D. H. (1976). "Who discovered bacteriophage?". Bacteriol Rev 40, 793-802.

Echols, H. (1972). Developmental pathways for the temperate phage: lysis vs lysogeny. Annu Rev Genet 6, 157-190.

Eckhardt, T. (1978). A rapid method for the identification of plasmid desoxyribonucleic acid in bacteria. Plasmid 1, 584-588.

Edgell, D. R., Gibb, E. A. & Belfort, M. (2010). Mobile DNA elements in T4 and related phages. Virology J 7.

Endersen, L., O'Mahony, J., Hill, C., Ross, R. P., McAuliffe, O. & Coffey, A. (2014). Phage therapy in the food industry. Ann Rev Food Sci Technol 5, 327-349.

Engelhardt, T., Sahlberg, M., Cypionka, H. & Engelen, B. (2013). Biogeography of Rhizobium radiobacter and distribution of associated temperate phages in deep subseafloor sediments. ISME J 7, 199-209.

Evans, J., Barnet, Y. M. & Vincent, J. M. (1979). Effect of a bacteriophage on colonization and nodulation of clover roots by paired strains of Rhizobium trifolii. Can J Microbiol 25, 974-978.

Finan, T. M., Hartwieg, E., Lemieux, K., Bergman, K., Walker, G. C. & Signer, E. R. (1984). General Transduction in Rhizobium meliloti. J Bacteriol 159, 120-124.

Fineran, P. C., Petty, N. K. & Salmond, P. C. (2009). Transduction: Host DNA transfer by bacteriophage. In Genetics, Genomics, pp. 666-679: Elsevier Inc.

Fischetti, V. A., Nelson, D. & Schuch, R. (2006). Reinventing phage therapy: are the parts greater than the sum? Nat Biotechnol 24, 1508-1511.

Freitas-Vieira, A., Anes, E. & Moniz-Pereira, J. (1998). The site-specific recombination locus of mycobacteriophage Ms6 determines DNA integration at the tRNA(Ala) gene of Mycobacterium spp. Microbiol-UK 144, 3397-3406.

Fujiwara, A., Fujisawa, M., Hamasaki, R., Kawasaki, T., Fujie, M. & Yamada, T. (2011). Biocontrol of Ralstonia solanacearum by treatment with lytic bacteriophages. Appl Environ Microb 77, 4155-4162.

Ganyu, A., Csiszovszki, Z., Ponyi, T., Kern, A., Buzas, Z., Orosz, L. & Papp, P. P. (2005). Identification of cohesive ends and genes encoding the terminase of phage 16-3. J Bacteriol 187, 2526-2531.

232

Geniaux, E., Laguerre, G. & Amarger, N. (1993). Comparison of geographically distant populations of Rhizobium isolated from root nodules of Phaseolus vulgaris. Molec Ecol 2, 295-302.

Gill, J. & Abedon, S. (2003). Bacteriophage ecology and plants. APSnet Features. Available at https://www.apsnet.org/publications/apsnetfeatures/Pages/ BacteriophageEcology.aspx (Accessed: 23 August 2014)

Gill, J. J., Svircev, A. M., Smith, R. & Castle, A. J. (2003). Bacteriophages of Erwinia amylovora. Appl Environ Microbiol 69, 2133-2138.

Gill, J. J. & Hyman, P. (2010). Phage choice, isolation, and preparation for phage therapy. Curr Pharm Biotechnol 11, 2-14.

Glazebrook, J. & Walker, G. C. (1991). Genetic techniques in Rhizobium meliloti. Methods Enzymol 204, 398-418.

Goodridge, L. D. (2010). Designing phage therapeutics. Curr Pharm Biotechnol 11, 15- 27.

Griffiths, A. J. F., Miller, J. H., Suzuki, D. T., Lewontin, R. C. & Gelbart, W. M. (2012). An Introduction to genetic analysis, 10th edn: New York: W.H.Freeman.

Guo, P. X., Erickson, S. & Anderson, D. (1987). A small viral RNA is required for in vitro packaging of bacteriophage-Phi-29 DNA. Science 236, 690-694.

Hashem, F. M. & Angle, J. S. (1988). Rhizobiophage effects on Bradyrhizobium japonicum nodulation and soybean growth. Soil Biol Biochem 20, 69-73.

Hashem, F. M. & Angle, J. S. (1990). Rhizobiophage effects on nodulation, nitrogen- fixation, and yield of field-grown soybeans (Glycine max L Merr). Biol Fert Soils 9, 330- 334.

Hatfull, G. F. (2008). Bacteriophage genomics. Curr Opin Microbiol 11, 447-453.

Hatfull, G. F. & Hendrix, R. W. (2011). Bacteriophages and their genomes. Curr Opin Virol 1, 298-303.

Hauser, M. A. & Scocca, J. J. (1992). Site-specific integration of the Haemophilus influenzae bacteriophage-Hp1 - location of the boundaries of the phage attachment site. J Bacteriol 174, 6674-6677.

Hendrix, R. W. (2003). Bacteriophage genomics. Curr Opin Microbiol 6, 506-511.

233

Hirsch, P. R. (1979). Plasmid-determined bacteriocin production by Rhizobium leguminosarum. J Gen Microbiol 113, 219-228.

Ho, C. K., Wang, L. K., Lima, C. D. & Shuman, S. (2004). Structure and mechanism of RNA ligase. Structure 12, 327-339. Hyatt, D., Chen, G. L., LoCascio, P. F., Land, M. L., Larimer, F. W. & Hauser, L. J. (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11.

Hynes, M. F., Simon, R. & Pühler, A. (1985). The development of plasmid-free strains of Agrobacterium tumefaciens by using incompatibility with a Rhizobium meliloti plasmid to eliminate pAtC58. Plasmid 13, 99-105.

Hynes, M. F., Brucksch, K. & Priefer, U. (1988). Melanin production encoded by a cryptic plasmid in a Rhizobium leguminosarum strain. Arch Microbiol 150, 326-332.

Hynes, M. F. & McGregor, N. F. (1990). Two plasmids other than the nodulation plasmid are necessary for formation of nitrogen-fixing nodules by Rhizobium leguminosarum. Mol Microbiol 4, 567-574.

Hynes, M. F. & O'Connell, M. P. (1990). Host plant effect on competition among strains of Rhizobium leguminosarum. Can J Microbiol 36, 864-869.

Ivashina, T. V. & Ksenzenko, V. N. (2012). Exopolysaccharide biosynthesis in Rhizobium leguminosarum: From genes to functions. In The complex world of polysaccharides. Edited by N. Karunaratne: InTech.

Jarvis, B. D. W., Pankhurst, C. E. & Patel, J. J. (1982). Rhizobium loti, a new species of legume root nodule bacteria. Int J Syst Bacteriol 32, 378-380.

Jones, J. B., Jackson, L. E., Balogh, B., Obradovic, A., Iriarte, F. B. & Momol, M. T. (2007). Bacteriophages for plant disease control. Ann Rev Phytopathol 45, 245-262.

Josey, D. P., Beynon, J. L., Johnston, A. W. B. & Beringer, J. E. (1979). Strain identification in Rhizobium using intrinsic antibiotic resistance. J Appl Bacteriol 46, 343- 350.

Jun, G., Aird, E. L. H., Kannenberg, E., Downie, J. A. & Johnston, A. W. B. (1993). The Sym plasmid pRP2JI and at least two other loci of Rhizobium leguminosarum biovar phaseoli can confer resistance to infection by the virulent bacteriophage RL38. FEMS Microbiol Lett 111, 321-326.

Kaneko, T., Nakamura, Y., Sato, S. & other authors (2000). Complete genome structure of the nitrogen-fixing symbiotic bacterium Mesorhizobium loti. DNA Res 7, 331-338.

234

Kawasaki, T., Shimizu, M., Satsuma, H., Fujiwara, A., Fujie, M., Usami, S. & Yamada, T. (2009). Genomic characterization of Ralstonia solanacearum phage phiRSB1, a T7-like wide-host-range phage. J Bacteriol 191, 422-427.

Khan, S. R., Gaines, J., Roop, R. M. & Farrand, S. K. (2008). Broad-host-range expression vectors with tightly regulated promoters and their use to examine the influence of TraR and TraM expression on Ti plasmid quorum sensing. Appl Environ Microb 74, 5053-5062.

Klumpp, J., Fouts, D. E. & Sozhamannan, S. (2012). Next generation sequencing technologies and the changing landscape of phage genomics. Bacteriophage 2, 190-199.

Kondorosi, E., Gyuris, J., Schmidt, J., John, M., Duda, E., Hoffmann, B., Schell, J. & Kondorosi, A. (1989). Positive and negative control of nod gene expression in Rhizobium meliloti is required for optimal nodulation. EMBO J 8, 1331-1340.

Kropinski, A. M., Prangishvili, D. & Lavigne, R. (2009). Position paper: The creation of a rational scheme for the nomenclature of viruses of Bacteria and Archaea. Environ Microbiol 11, 2775-2777.

Ksenzenko, V. N., Ivashina, T. V., Dubeikovskaya, Z. A., Ivanov, S. G., Nanazashvili, M. B., Druzhinina, T. N., Kalinchuk, N. A. & Shibaev, V. N. (2007). The pssA gene encodes UDP-glucose: Polyprenyl phosphate-glucosyl phosphotransferase initiating biosynthesis of Rhizobium leguminosarum exopolysaccharide. Russ J Bioorg Chem+ 33, 150-155.

Kutateladze, M. & Adamia, R. (2008). Phage therapy experience at the Eliava Institute. Med Mal Infect 38, 426-430.

Kutter, B., Raya, R. & Carlson, K. (2005).Molecular mechanisms of phage infection. In Bacteriophages: Biology and applications. Edited by B. Kutter & A. Sulakvelidze: CRC press.

Kutter, B. & Sulakvelidze, A. (2005). In Bacteriophages: Biology and applications. Edited by B. Kutter & A. Sulakvelidze.

Kutter, E., De Vos, D., Gvasalia, G., Alavidze, Z., Gogokhia, L., Kuhl, S. & Abedon, S. T. (2010). Phage therapy in clinical practice: treatment of human infections. Curr Pharm Biotechnol 11, 69-86.

Laemmli, U. K. (1970). Cleavage of structural proteins during assembly of head of bacteriophage-T4. Nature 227, 680-&.

235

Lamb, J. W., Hombrecher, G. & Johnston, A. W. B. (1982). Plasmid determined nodulation and nitrogen fixation abilities in Rhizobium phaseoli. Mol Gen Genet 186, 449-452.

Laranjo, M., Alexandre, A. & Oliveira, S. (2014). Legume growth-promoting rhizobia: an overview on the Mesorhizobium genus. Microbiol Res 169, 2-17.

Larkin, M. A., Blackshields, G., Brown, N. P. & other authors (2007). Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947-2948.

Laslett, D. & Canback, B. (2004). ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res 32, 11-16.

Lavigne, R., Burkal'tseva, M. V., Robben, J. & other authors (2003). The genome of bacteriophage phi KMV, a T7-like virus infecting Pseudomonas aeruginosa. Virology 312, 49-59.

Lavigne, R., Seto, D., Mahadevan, P., Ackermann, H. W. & Kropinski, A. M. (2008). Unifying classical and molecular taxonomic classification: analysis of the Podoviridae using BLASTP-based tools. Res Microbiol 159, 406-414.

Lavigne, R., Darius, P., Summer, E. J., Seto, D., Mahadevan, P., Nilsson, A. S., Ackermann, H. W. & Kropinski, A. M. (2009). Classification of Myoviridae bacteriophages using protein sequence similarity. BMC Microbiology 9.

Lawrence, J. G., Hatfull, G. F. & Hendrix, R. W. (2002). Imbroglios of viral taxonomy: Genetic exchange and failings of phenetic approaches. J Bacteriol 184, 4891- 4905.

Lawson, K. A., Barnet, Y. M. & Mcgilchrist, C. A. (1987). Environmental factors influencing numbers of Rhizobium leguminosarum biovar trifolii and its bacteriophages in two field soils. Appl Environ Microb 53, 1125-1131.

Lech, K., Reddy, K. J. & Sherman, L. A. (2001). Preparing Lambda DNA from Phage Lysates. In Current Protocols in Molecular Biology: John Wiley & Sons. Inc.

Lederberg, J., Lederberg, E. M., Zinder, N. D. & Lively, E. R. (1951). Recombination analysis of bacterial heredity. Cold Spring Harb Sym 16, 413-443.

Lee, M. H., Pascopella, L., Jacobs, W. R. & Hatfull, G. F. (1991). Site-specific integration of Mycobacteriophage-L5: integration-proficient vectors for Mycobacterium smegmatis, Mycobacterium tuberculosis, and bacille Calmette-Guérin. Proc Natl Acad Sci USA 88, 3111-3115.

236

Lehman, S. M., Kropinski, A. M., Castle, A. J. & Svircev, A. M. (2009). Complete genome of the broad-host-range Erwinia amylovora phage Phi Ea21-4 and its relationship to Salmonella phage Felix O1. Appl Environ Microb 75, 2139-2147.

Leiman, P. G., Chipman, P. R., Kostyuchenko, V. A., Mesyanzhinov, V. V. & Rossmann, M. G. (2004). Three-dimensional rearrangement of proteins in the tail of bacteriophage T4 on infection of its host. Cell 118, 419-429.

Lemire, S., Figueroa-Bossi, N. & Bossi, L. (2011). Bacteriophage crosstalk: Coordination of prophage induction by trans-acting antirepressors. PLoS Genet 7.

Li, W., Zhang, J. Y., Chen, Z. H., Zhang, Q., Zhang, L., Du, P. C., Chen, C. & Kan, B. (2013). The genome of VP3, a T7-like phage used for the typing of Vibrio cholerae. Arch Virol 158, 1865-1876.

Liu, J. & Mushegian, A. (2004). Displacements of prohead protease genes in the late operons of double-stranded-DNA bacteriophages. J Bacteriol 186, 4369-4375.

Loc-Carrillo, C. & Abedon, S. T. (2011). Pros and cons of phage therapy. Bacteriophage 1, 111-114.

Loessner, M. J. (2005). Bacteriophage endolysins - current state of research and applications. Curr Opin Microbiol 8, 480-487.

Lotz, W. & Pfister, H. (1975). Attachment of a long tailed Rhizobium bacteriophage to pili of Its host. J Virol 16, 725-728.

Lotz, W., Acker, G. & Schmitt, R. (1977). Bacteriophage 7-7-1 adsorbs to the complex flagella of Rhizobium lupini H13-3. J Gen Virol 34, 9-17.

Low, D. A., Weyand, N. J. & Mahan, M. J. (2001). Roles of DNA adenine methylation in regulating bacterial gene expression and virulence. Infect Immun 69, 7197-7204.

Lowe, T. M. & Eddy, S. R. (1997). tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25, 955-964.

Lu, L. D., Sun, Q., Fan, X. Y., Zhong, Y., Yao, Y. F. & Zhao, G. P. (2010). Mycobacterial MazG is a novel NTP pyrophosphohydrolase involved in oxidative stress response. J Biol Chem 285, 28076-28085.

Lwoff, A., Tournier, P. & Horne, R. (1962). System of Viruses. Cold Spring Harb Sym 27, 51-&.

237

Malek, W., Wdowiak-Wrobel, S., Bartosik, M., Konopa, G. & Narajczyk, M. (2009). Characterization of phages virulent for Robinia pseudoacacia rhizobia. Curr Microbiol 59, 187-192.

Małek, W., Wdowiak-Wróbel, S., Bartosik, M., Konopa, G. & Narajczyk, M. (2009). Characterization of phages virulent for Robinia pseudoacacia rhizobia. Curr Microbiol 59, 187-192.

Markoishvili, K., Tsitlanadze, G., Katsarava, R., Morris, J. G., Jr. & Sulakvelidze, A. (2002). A novel sustained-release matrix based on biodegradable poly(ester amide)s and impregnated with bacteriophages and an antibiotic shows promise in management of infected venous stasis ulcers and other poorly healing wounds. Int J Dermatol 41, 453- 458.

Martín, A. C., López, R. & García, P. (1996). Analysis of the complete nucleotide sequence and functional organization of the genome of Streptococcus pneumoniae bacteriophage Cp-1. Journal of Virology 70, 3678-3687.

Martin, M. O. & Long, S. R. (1984). Generalized transduction in Rhizobium meliloti. J Bacteriol 159, 125-129.

Martínez-Romero, E., Segovia, L., Mercante, F. M., Franco, A. A., Graham, P. & Pardo, M. A. (1991). Rhizobium tropici, A novel species nodulating Phaseolus vulgaris L beans and Leucaena Sp trees. Int J Syst Bacteriol 41, 417-426.

Masson-Boivin, C., Giraud, E., Perret, X. & Batut, J. (2009). Establishing nitrogen- fixing symbiosis with legumes: how many rhizobium recipes? Trends Microbiol 17, 458- 466.

McKenna, F., El-Tarabily, K. A., Hardy, G. E. S. & Dell, B. (2001). Novel in vivo use of a polyvalent Streptomyces phage to disinfest Streptomyces scabies-infected seed potatoes. Plant Pathol 50, 666-675.

Mendum, T. A., Clark, I. M. & Hirsch, P. R. (2001). Characterization of two novel Rhizobium leguminosarum bacteriophages from a field release site of genetically- modified rhizobia. Anton Leeuw Int J : 79, 189–197.

Miller, E. S., Kutter, E., Mosig, G., Arisaka, F., Kunisawa, T. & Ruger, W. (2003). Bacteriophage T4 genome. Microbiol Mol Biol R 67, 86-156.

Miller, L. D., Yost, C. K., Hynes, M. F. & Alexandre, G. (2007). The major chemotaxis gene cluster of Rhizobium leguminosarum bv. viciae is essential for competitive nodulation. Mol Microbiol 63, 348-362.

238

Miller, R. V. (2001). Environmental bacteriophage-host interactions: factors contribution to natural transduction. Anton Leeuw Int J G 79, 141-147.

Mink, M., Orosz, L. & Sik, T. (1982). Specialized and generalized transducing rhizobiophage 16-3 and 11 are closely related. FEMS Microbiol Lett 13, 383-387.

Moroz, O. V., Murzin, A. G., Makarova, K. S., Koonin, E. V., Wilson, K. S. & Galperin, M. Y. (2005). Dimeric dUTPases, HisE, and MazG belong to a new superfamily of all-alpha NTP pyrophosphohydrolases with potential "house-cleaning" functions. J Molec Biol 347, 243-255.

Nath, D. J., Ozah, B., Baruah, R., Barooah, R. C. & Borah, D. K. (2011). Effect of integrated nutrient management on soil enzymes, microbial biomass carbon and bacterial populations under rice (Oryza sativa) -wheat (Triticum aestivum) sequence. Indian J Agr Sci 81, 1143-1148.

Nelson, D. (2004). Phage taxonomy: we agree to disagree. J Bacteriol 186, 7029-7031.

Noel, K. D., Sánchez, A., Fernández, L., Leemans, J. & Cevallos, M. A. (1984). Rhizobium phaseoli symbiotic mutants with transposon Tn5 insertions. J Bacteriol 158, 148-155.

Nolan, J. M., Petrov, V., Bertrand, C., Krisch, H. M. & Karam, J. D. (2006). Genetic diversity among five T4-like bacteriophages. Virology J 3.

Nour, S. M., Fernandez, M. P., Normand, P. & Cleyetmarel, J. C. (1994). Rhizobium ciceri sp. nov, consisting of strains that nodulate chickpeas (Cicer arietinum L). Int J Syst Bacteriol 44, 511-522.

Nour, S. M., Cleyetmarel, J. C., Normand, P. & Fernandez, M. P. (1995). Genomic heterogeneity of strains nodulating chickpeas (Cicer arietinum L) and description of Rhizobium mediterraneum sp. nov. Int J Syst Bacteriol 45, 640-648.

Oresnik, I. J., Twelker, S. & Hynes, M. F. (1999). Cloning and characterization of a Rhizobium leguminosarum gene encoding a bacteriocin with similarities to RTX toxins. Appl Environ Microb 65, 2833-2840.

Papp, I., Dorgai, L., Papp, P., Jónás, E., Olasz, F. & Orosz, L. (1993). The bacterial attachment site of the temperate Rhizobium phage 16-3 overlaps the 3' end of a putative proline transfer-RNA gene. Mol Gen Genet 240, 258-264.

Pastagia, M., Schuch, R., Fischetti, V. A. & Huang, D. B. (2013). Lysins: the arrival of pathogen-directed anti-infectives. J Med Microbiol 62, 1506-1516.

239

Petrov, V. M., Ratnayaka, S., Nolan, J. M., Miller, E. S. & Karam, J. D. (2010). Genomes of the T4-related bacteriophages as windows on microbial genome evolution. Virology J 7.

Petty, N. K., Foulds, I. J., Pradel, E., Ewbank, J. J. & Salmond, G. P. C. (2006). A generalized transducing phage (phi lF3) for the genomically sequenced Serratia marcescens strain Db11: a tool for functional genomics of an opportunistic human pathogen. Microbiology 152, 1701-1708.

Pierson, L. S. & Kahn, M. L. (1987). Integration of satellite bacteriophage-P4 in Escherichia coli - DNA-sequences of the phage and host regions involved in site-specific recombination. J Molec Biol 196, 487-496.

Poole, P. S., Blyth, A., Reid, C. J. & Walters, K. (1994). Myo-inositol catabolism and catabolite regulation in Rhizobium leguminosarum bv. viciae. Microbiol-UK 140, 2787- 2795.

Priefer, U. B. (1989). Genes involved in lipopolysaccharide production and symbiosis are clustered on the chromosome of Rhizobium leguminosarum biovar viciae VF39. J Bacteriol 171, 6161-6168.

Rakhuba, D. V., Kolomiets, E. I., Dey, E. S. & Novik, G. I. (2010). Bacteriophage receptors, mechanisms of phage adsorption and penetration into host cell. Polish J Microbiol 59, 145-155.

Ramsay, J. P., Sullivan, J. T., Stuart, G. S., Lamont, I. L. & Ronson, C. W. (2006). Excision and transfer of the Mesorhizobium loti R7A symbiosis island requires an integrase IntS, a novel recombination directionality factor RdfS, and a putative relaxase RlxS. Mol Microbiol 62, 723-734.

Rao, V. B. & Feiss, M. (2008). The bacteriophage DNA packaging motor. Ann Rev Genet 42, 647-681.

Restrepo, M. (2012).Isolation and characterization of Rhizobium leguminosarum phages from western Canadian soil. In Dept of Biological Sciences: University of Calgary.

Ribeiro, R. A., Rogel, M. A., López-López, A., Ormeño-Orrillo, E., Barcellos, F. G., Martínez, J., Thompson, F. L., Martínez-Romero, E. & Hungria, M. (2012). Reclassification of Rhizobium tropici type A strains as Rhizobium leucaenae sp nov. Int J Syst Evol Micr 62, 1179-1184.

Roach, D. R., Sjaarda, D. R., Castle, A. J. & Svircev, A. M. (2013). Host exopolysaccharide quantity and composition impact Erwinia amylovora bacteriophage pathogenesis. Appl Environ Microb 79, 3249-3256.

240

Robleto, E. A., Scupham, A. J. & Triplett, E. W. (1997). Trifolitoxin production in Rhizobium etli strain CE3 increases competitiveness for rhizosphere colonization and root nodulation of Phaseolus vulgaris in soil. Molec Plant-Microbe Interact 10, 228-233.

Robleto, E. A., Kmiecik, K., Oplinger, E. S., Nienhuis, J. & Triplett, E. W. (1998). Trifolitoxin production increases nodulation competitiveness of Rhizobium etli CE3 under agricultural conditions. Appl Environ Microb 64, 2630-2633.

Rocha, E. P. C. & Danchin, A. (2002). Base composition bias might result from competition for metabolic resources. Trends Genet 18, 291-294.

Sadykov, M. R., Ivashina, T. V., Kanapin, A. A., Shlyapnikov, M. G. & Ksenzenko, V. N. (1998). Structural and functional organization of the exopolysaccharide biosynthesis genes in Rhizobium leguminosarum bv viciae VF39. Mol Biol+ 32, 665-671.

Sambrook, J., Fritsch, E.F., Maniatis, T. (1989). Molecular cloning - A laboratory manual, 2nd edition edn. Cold Spring Harbor, New York.

Samson, J. E., Magadàn, A. H., Sabri, M. & Moineau, S. (2013). Revenge of the phages: defeating bacterial defences. Nat Rev Microbiol 11, 675-687.

Santamaria, R. I., Bustos, P., Sepúlveda-Robles, O. & other authors (2014). Narrow host range bacteriophages that infect Rhizobium etli associate with distinct genomic types. Appl Environ Microb 80, 446-454.

Schmidt, F. J. (1985). RNA splicing in prokaryotes - Bacteriophage T4 leads the way. Cell 41, 339-340.

Schnabel, E. L. & Jones, A. L. (2001). Isolation and characterization of five Erwinia amylovora bacteriophages and assessment of phage resistance in strains of Erwinia amylovora. Appl Environ Microb 67, 59-64.

Scholl, D., Kieleczawa, J., Kemp, P., Rush, J., Richardson, C. C., Merril, C., Adhya, S. & Molineux, I. J. (2004). Genomic analysis of bacteriophages SP6 and K1-5, an estranged subgroup of the T7 supergroup. J Molec Biol 335, 1151-1171.

Semsey, S., Papp, I., Buzas, Z., Patthy, A., Orosz, L. & Papp, P. P. (1999). Identification of site-specific recombination genes int and xis of the Rhizobium temperate phage 16-3. J Bacteriol 181, 4185-4192.

Semsey, S., Blaha, B., Köles, K., Orosz, L. & Papp, P. P. (2002). Site-specific integrative elements of rhizobiophage 16-3 can integrate into proline tRNA (CGG) genes in different bacterial genera. J Bacteriol 184, 177-182.

241

Shah, K., Desousa, S. & Modi, V. V. (1981). Studies on transducing phage M-1 for Rhizobium japonicum D211. Arch Microbiol 130, 262-266.

Shu, D. & Guo, P. X. (2003). Only one pRNA hexamer but multiple copies of the DNA- packaging protein gp16 are needed for the motor to package bacterial virus phi29 genomic DNA. Virology 309, 108-113.

Sik, T., Horvath, J. & Chatterjee, S. (1980). Generalized transduction in Rhizobium meliloti. Mol Gen Genet 178, 511-516.

Sillankorva, S. M., Oliveira, H. & Azeredo, J. (2012). Bacteriophages and their role in food safety. Int J Microbiol 2012, 863945.

Simon, R., Priefer, U. & Pühler, A. (1983). A broad host range mobilization system for in vivo genetic engineering: transposon mutagenesis in gram negative bacteria. Bio- Technol 1, 784-791.

Skiena, S. S. (2001). Designing better phages. Bioinformatics 17 Suppl 1, S253-261.

Skorupska, A., Janczarek, M., Marczak, M., Mazur, A. & Król, J. (2006). Rhizobial exopolysaccharides: genetic control and symbiotic functions. Microbial Cell Factories 5, 7.

Smith-Mungo, L., Chan, I. T. & Landy, A. (1994). Structure of the P22 att site. Conservation and divergence in the lambda motif of recombinogenic complexes. J Biol Chem 269, 20798-20805.

Srinivasiah, S., Bhavsar, J., Thapar, K., Liles, M., Schoenfeld, T. & Wommack, K. E. (2008). Phages across the biosphere: Contrasts of viruses in soil and aquatic environments. Res Microbiol 159, 349-357.

Staniewski, R. (1980). Typing of Rhizobium with two different phage dilutions. Acta Microbiol Pol 29, 331-341.

Steward, G. F. (2001).Fingerprinting viral assemblage by pulsed field gel electrophoresis. In Methods in Microbiology: Academic Press, Ltd.

Stoddard, B. L. (2014). Homing endonucleases from mobile group I introns: discovery to genome engineering. Mobile DNA-UK 5.

Sulakvelidze, A., Alavidze, Z. & Morris, J. G. (2001). Bacteriophage therapy. Antimicrob Agents Ch 45, 649-659.

242

Sullivan, J. T., Patrick, H. N., Lowther, W. L., Scott, D. B. & Ronson, C. W. (1995). Nodulating strains of Rhizobium loti arise through chromosomal symbiotic gene transfer in the environment. P Natl Acad Sci USA 92, 8985-8989.

Sullivan, J. T., Brown, S. D., Yocum, R. R. & Ronson, C. W. (2001). The bio operon on the acquired symbiosis island of Mesorhizobium sp. strain R7A includes a novel gene involved in pimeloyl-CoA synthesis. Microbiology 147, 1315-1322.

Sullivan, J. T., Brown, S. D. & Ronson, C. W. (2013). The NifA-RpoN regulon of Mesorhizobium loti strain R7A and its symbiotic activation by a novel LacI/GalR-family regulator. PLoS One 8.

Suttle, C. A. (2007). Marine viruses--major players in the global ecosystem. Nat Rev Microbiol 5, 801-812.

Svircev, A. M., Lehman, S. M., Sholberg, P., Roach, D. & Castle, A. J. (2011). Phage biopesticides and soil bacteria: Multilayered and complex interactions. In Biocommunication in soil microorganisms, Soil Biology 23, pp. 215-235. Edited by G. Witzany: Springer-Verlag Berlin Heidelberg

Swanson, M. M., Fraser, G., Daniell, T. J., Torrance, L., Gregory, P. J. & Taliansky, M. (2009). Viruses in soils: morphological diversity and abundance in the rhizosphere. Ann Appl Biol 155, 51-60.

Swinton, D., Hattman, S., Benzinger, R., Buchananwollaston, V. & Beringer, J. (1985). Replacement of the deoxycytidine residues in Rhizobium bacteriophage Rl38JI DNA. FEBS Lett 184, 294-298.

Tambalo, D. D., Bustard, D. E., Del Bel, K. L., Koval, S. F., Khan, M. F. & Hynes, M. F. (2010). Characterization and functional analysis of seven flagellin genes in Rhizobium leguminosarum bv. viciae. Characterization of R. leguminosarum flagellins. BMC Microbiology 10.

Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M. & Kumar, S. (2011). MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28, 2731-2739.

Tanaka, N., Meineke, B. & Shuman, S. (2011). RtcB, a Novel RNA ligase, can catalyze tRNA splicing and HAC1 mRNA splicing in vivo. J Biol Chem 286, 30253-30257.

Tock, M. R. & Dryden, D. T. F. (2005). The biology of restriction and anti-restriction. Curr Opin Microbiol 8, 466-472.

243

Triplett, E. W. & Barta, T. M. (1987). Trifolitoxin production and nodulation are necessary for the expression of superior nodulation competitiveness by Rhizobium leguminosarum bv. trifolii strain-T24 on clover. Plant Physiol 85, 335-342.

Triplett, E. W. & Sadowsky, M. J. (1992). Genetics of competition for nodulation of legumes. Ann Rev Microbiol 46, 399-428.

Triplett, E. W., Breil, B. T. & Splitter, G. A. (1994). Expression of tfr and sensitivity to the rhizobial peptide antibiotic trifolitoxin in a taxonomically distinct group of a- proteobacteria including the animal pathogen Brucella abortus. Appl Environ Microbiol 60, 4163-4166.

Turska-Szewczuk, A. & Russa, R. (2000). A new Mesorhizobium loti HAMBI 1129 phage isolated from Polish soil. Curr Microbiol 40, 341-343.

Turska-Szewczuk, A., Pietras, H., Pawelec, J., Mazur, A. & Russa, R. (2010). Morphology and general characteristics of bacteriophages infectious to Robinia pseudoacacia mesorhizobia. Curr Microbiol 61, 315-321.

Uchiumi, T., Abe, M. & Higashi, S. (1998). Integration of the temperate phage phi U into the putative tRNA gene on the chromosome of its host Rhizobium leguminosarum biovar trifolii. J Gen Appl Microbiol 44, 93-99. van Rhijn, P. & Vanderleyden, J. (1995). The Rhizobium-plant symbiosis. Microbiol Rev 59, 124-142.

Vlassak, K. M. & Vanderleyden, J. (1997). Factors influencing nodule occupancy by inoculant rhizobia. Crit Rev Plant Sci 16, 163-229.

Wang, I. N., Deaton, J. & Young, R. (2003). Sizing the holin lesion with an endolysin- beta-galactosidase fusion. J Bacteriol 185, 779-787.

Warren, R. A. J. (1980). Modified bases in bacteriophage DNAs. Ann Rev Microbiol 34, 137-158.

Wdowiak, S., Małek, W. & Grządka, M. (2000). Morphology and general characteristics of phages specific for Astragalus cicer rhizobia. Curr Microbiol 40, 110- 113.

Weinbauer, M. G. & Rassoulzadegan, F. (2004). Are viruses driving microbial diversification and diversity? Environ Microbiol 6, 1-11.

Werquin, M., Ackermann, H. W. & Lévesque, R. C. (1988). A Study of 33 bacteriophages of Rhizobium meliloti. Appl Environ Microb 54, 188-196.

244

Willems, A. (2006). The taxonomy of rhizobia: an overview. Plant Soil 287, 3-14.

Williams, K. P. (2002). Integration sites for genetic elements in prokaryotic tRNA and tmRNA genes: sublocation preference of integrase subfamilies. Nucleic Acids Res 30, 866-875.

Williamson, K. E. (2011). Soil phage ecology: Abundance, distribution and interactions with bacterial hosts. In Biocommunication in soil microorganisms, Soil Biology 23, pp. 113-136. Edited by G. Witzany: Springer-Verlag Berlin Heidelberg.

Wilson, K. J., Peoples, M. B. & Jefferson, R. A. (1995a). New techniques for studying competition by rhizobia and for assessing nitrogen-fixation in the field. Plant Soil 174, 241-253.

Wilson, K. J., Sessitsch, A., Corbo, J. C., Giller, K. E., Akkermans, A. D. L. & Jefferson, R. A. (1995b). Beta-Glucuronidase (GUS) transposons for ecological and genetic studies of rhizobia and other Gram-negative bacteria. Microbiol-UK 141, 1691- 1705.

Wilson, K. J., Parra, A. & Botero, L. (1999). Application of the GUS marker gene technique to high-throughput screening of rhizobial competition. Can J Microbiol 45, 678-685.

245

Appendix I: Growth media

TY Tryptone 5 g Yeast extract 3 g CaCl2·2H2O 0.4 g MgSO4·2H2O 0.2 g Deionized water 1 L

For solid media 13 g of agar/ L broth

PH Peptone 4 g Yeast extract 0.5 g Tryptone 0.5 g CaCl2·2H2O 0.5 g MgSO4·2H2O 0.2 g Deionized water 1 L

LB (Modified lysogeny broth) Tryptone 10 g Yeast extract 5 g NaCl 5 g Deionized water 1 L

For solid media 13 g of agar/ L broth

RDM (Rhizobium defined medium) Salt solution * 10 ml Bromothymol blue (2 mg/ml) 10 ml

NH4Cl (18g/l) 6 ml Trace elements** 1 ml L-Histidine monohydrochloride 100mg

Adjust volume to 1L. For solid media add 13 g of agar/ L broth.

After autoclaving:

246

Add 10 ml of phosphate solution (Sterile solution of 100 g of K2HPO4 and 100 g of

KH2PO4 in 1 L of deionized water). Add 1 ml of vitamin solution (Filter sterilised; 50 mg of nicotinamide, 50 mg of thiamine HCl and 1 ml of biotin (1 mg/ml) per 250 ml of deionized water). For RDM containing glucose as the sole source of carbon (G/RDM) add 20 ml of sterile 20% (w/v) glucose.

* Salt solution:

MgSO4 •7H2O 25 g CaCl2 •2H2O 2 g

FeCl3 0.66 g EDTA 1.5 g NaCl 20 g Make up to 1 L with deionized water and autoclave. ** Trace elements:

ZnSO4 •7H2O 3 mg Na2MoO4 •2H2O 40 mg H3BO3 50 mg MnSO4 •H2O 40 mg CuSO4 •5H2O 4 mg CoCl2 •6H2O (0.2 g/L) 1 ml Make up to 200 ml with deionized water and autoclave.

Suspension medium Tris-Cl pH 7.5 50 mM NaCl 0.1 M

MgSO4 8 mM Gelatin 0.01%

247

Appendix II: Solutions used for electrophoresis, Southern blots, and Eckhardt gel analysis

1X TAE

Tris base 2.42 g Glacial acetic acid 0.571 mL EDTA 1.0 mL of 0.5M EDTA (pH 8.0) Deionized water 1.0 L

1X TBE

Tris base 10.8 g Boric acid 5.5 g EDTA 4.0 mL of 0.5M EDTA (pH 8.0) Deionized water 1.0 L

Basic Transfer Buffer

NaOH 16.0 g NaCl 35.0 g Deionized water 1.0 L

20X SSC (saline sodium citrate)

NaCl 175.3 g tri-sodium citrate •2H2O 88.2 g Deionized water 1.0 L

0.3% sarkosyl solution

NN-lauryl sarkosine 3 g 1 X TBE 1.0 L

E1 lysis solution

Sucrose 100 g RNase 10 mg 1 X TBE 1.0 L Store at 4°C. Immediately before loading the cell suspension lysozyme (~ 100 µg/ml) was added.

248

Appendix III: Solutions used in polyacrylamide gel electrophoresis

1X Running Buffer

Tris base 3.0 g Glycine 14.4 g ddH2O 1.0 L

Acrylamide

Acrylamide 29 g Bis-acrylamide 1 g Milli Q water 100 ml

Store at 4°C in an amber colored bottle.

Laemmli Solubilization Buffer (5 ml)

1M DTT 500 µl 0.5 M tris, pH 6.8 500 µl 10% SDS (w/v) 1 ml 100% Glycerol 2 ml 1% Bromophenol 500 µl Milli Q water 500 µl

249

Appendix IV: Solutions used for nodulation competition assays and nodule staining Hoagland’s plant solution Solution A

CaCl2•7H2O 294g

H2O 1.0 L Solution B

KH2PO4 136g

H2O 1.0 L Solution C Fe-citrate 6.7g

H2O 1.0 L Solution D

MgSO4 123g K2SO4 87g MnSO4 0.338g H2BO4 0.247g ZnSO4 0.288g CuSO4 0.1g CoSO4 0.056g

Na2MoO4 0.048g H2O 1.0 L

1 mL of each solution was added to every 2 L of H2O.

Nodule Staining Solution i. Make 50mM phosphate buffer (pH=7) with 1mM EDTA and autoclave. For 1l:

Na2HPO4 4.76g (33.56mM)

Na2HPO4 Ÿ2H2O 2.56g (16.44mM) Adjust pH using NaOH (about 200µl) EDTA 372.2 g/mol X 1mM X 1l = 0.37g

ii. Add the following to each liter of above autoclaved solution (the final concentration). • 0.1% Sodium laurylsarcosine (add 10ml of 10% stock) • 0.1% Triton X-100 (add 1ml of 100% Triton X-100) • X-gluc 100 µg/ml (add 2ml of 50mg/ml X-gluc)

Appendix V: Genome annotations of vB_RleM_P10VF: Open reading frames in the genome and their predicted functions

Position

Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Start End Conserved Domains

ORF Function Identity/ % Similarity Strand (bp) (bp) Length (E value) Start Codon Start Orientation/ Orientation/

1 ORF 1 ATG 61 177 117 + None No significant similarity

Hypothetical protein Lw1_gp114 [Escherichia phage Lw1] Hypothetical (YP_008060637.1, 1e-31), 23/40 2 ATG 256 2490 2235 + None protein Hypothetical protein RB16p109 [Enterobacteria phage RB16] ( YP_003858409.1, 2e-31), 23/40

3 ORF 2 ATG 2590 2748 159 + None No significant similarity Hypothetical protein SmphiM12_197 [Sinorhizobium phage phiM12] Hypothetical (AGR47829.1, 1e-25), 35/55 4 ATG 2811 3413 603 + None protein MULTISPECIES: hypothetical protein [Xanthomonadaceae] (WP_010486539.1, 5e-23), 37/52 5 ORF 5 TTG 3394 3735 342 + None No significant similarity

6 ORF 6 ATG 3716 4045 330 + None No significant similarity

7 ORF 7 ATG 4042 4344 303 + None No significant similarity

Hypothetical Hypothetical protein PBI_HAWKEYE_80 [Mycobacterium phage 8 ATG 4341 4805 465 + None protein Hawkeye] (YP_009035975.1, 2e-14), 49/64 Hypothetical Hypothetical protein [Cupriavidus sp. UYPR2.512] (WP_018315084.1, 4e- 9 ATG 4798 5202 405 + None protein 11), 34/51

10 ORF 10 ATG 5210 5440 231 + None No significant similarity

11 ORF 11 GTG 5437 5649 213 + None No significant similarity

12 ORF 12 ATG 5694 5993 300 + None No significant similarity

250

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), Conserved Domains

ORF Function Start End %Identity/ %Similarity Strand Length (E value) (bp) (bp) Start Codon Start Orientation/ Orientation/ [DEDDh] cd06127 DEDDh 3'-5' exonuclease DNA polymerase III subunit epsilon [Agrobacterium sp. ATCC 31749] Putative DNA domain family (1.75e-16) (WP_006311924.1, 2e-24), 31/46 13 polymerase III ATG 6243 7094 852 + [PRK09182] PRK09182 DNA polymerase III epsilon subunit-like 3'-5' exonuclease epsilon subunit DNA polymerase III subunit [Novosphingobium resinovorum] (EZP71500.1, 3e-24), 32/47 epsilon (8.00e-36) 14 ORF 14 GTG 7166 7459 294 + None No significant similarity [deoxycytidylate_deaminase ] cd01286 Deoxycytidylate deaminase dCMP deaminase [Caldilinea aerophila] (WP_014431564.1, 2e-20), 41/60 Putative dCMP 15 ATG 7559 8056 498 + domain. (4.53e-33) Cytidine deaminase [Acholeplasma modicum] (WP_026391404.1, 2e-19), deaminase [ComEB] COG2131 44/60 Deoxycytidylate deaminase (4.92e-26) 16 ORF 16 ATG 8060 8368 309 + None No significant similarity

17 ORF 17 TTG 8365 8859 495 + None No significant similarity

18 ORF 18 ATG 8863 9069 207 + None No significant similarity Exonuclease [Prochlorococcus phage P-SSM4] (YP_214561.1 ,1e-22), Conserved [PDDEXK_] pfam12705 37/55 19 hypothetical GTG 9350 10060 711 + PD-(D/E)XK nuclease Exonuclease [Synechococcus phage S-ShM2] (YP_004322895.1, 3e-22), protein superfamily (1.48e-04) 38/55 20 ORF 20 ATG 10071 10580 510 + None No significant similarity

21 ORF 21 ATG 10567 10824 258 + None No significant similarity

22 ORF 22 ATG 10848 11117 270 + None No significant similarity

23 ORF 23 ATG 11280 11663 384 + None No significant similarity

24 ORF 24 ATG 11665 11931 267 + None No significant similarity

25 ORF 25 ATG 11993 12421 429 + None No significant similarity 251

Position Associated Putative Putative Best BLASTP match (es), (GeneBank Accession No:, E value), % Conserved Domains

ORF Function Identity/ % Similarity Strand Start End Length (E value) Orientation/ Orientation/

Start Codon Start (bp) (bp)

26 ORF 26 ATG 12418 12618 201 + None No significant similarity

27 ORF 27 ATG 12620 12835 216 + None No significant similarity

28 ORF 28 ATG 12832 13341 510 + None No significant similarity

29 ORF 29 GTG 13338 13577 240 + None No significant similarity 30 ORF 30 ATG 13564 14229 666 + None No significant similarity 31 ORF 31 ATG 14229 14477 249 + None No significant similarity Hypothetical protein [Pseudomonas amygdali] (WP_005742387.1, 6e-18), Hypothetical 40/60 32 ATG 14477 14896 420 + None protein Hypothetical protein DDB_G0286989 [Dictyostelium discoideum AX4] (XP_637451.1, 2e-17), 39/57 33 ORF 33 ATG 14940 15314 375 + None No significant similarities 34 ORF 34 ATG 15314 15625 312 + None No significant similarities Hypothetical Hypothetical protein RHEph10_gp128 [Rhizobium phage RHEph10] 35 ATG 15625 15834 210 + None protein (AGC36171.1, 5e-16), 51/62 36 ORF 36 ATG 15834 16058 225 + None No significant similarities Putative conserved protein [Rhizobium sp. LPU83] (CDM57661.1, 8e-22), Conserved 41/61 37 hypothetical ATG 16132 16515 384 + Hypothetical protein [Afipia clevelandensis] (WP_002712619.1, 3e-18), protein 45/59 [MPP_MS158] cd07404 Microscilla MS158 and related proteins, Phosphatase [Rhizobium leguminosarum] (WP_011651777.1, 4e-60), Putative metallophosphatase domain 41/56 38 ATG 16512 17249 738 + phosphatase (2.79e-28) Phosphatase [Rhizobium leguminosarum] (WP_003581383.1, 5e-60), [acc_ester] TIGR03729 42/59 putative phosphoesterase (5.67e-04) 252

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains

ORF Function Identity/ % Similarity Strand Start End Length (E value) Orientation/ Orientation/

Start Codon Start (bp) (bp)

39 ORF 39 ATG 17328 17858 531 + None No significant similarities

40 ORF 40 ATG 17872 18084 213 + None No significant similarities [PGRP] cd06583 Peptidoglycan recognition proteins (PGRPs) N-acetylmuramoyl-L-alanine amidase [Anaerophaga sp. HS1] N- (6.65e-15) (WP_010527797.1, 1e-31), 43/62 41 acetylmuramoyl- ATG 18071 19006 936 + [PRK11789] PRK11789 N-acetylmuramoyl-L-alanine amidase [Anaerophaga thermohalophila] L-alanine amidase N-acetyl-anhydromuranmyl- (WP_010420541.1, 2e-31), 43/62 L-alanine amidase; (4.09e- 29) [GLF] pfam03275 UDP-galactopyranose mutase (1.67e-79) [Glf] COG0562 UDP-galactopyranose UDP-galactopyranose mutase [Mesorhizobium australicum] UDP- mutase [Cell envelope (WP_015318020.1, 9e-111), 46/65 42 galactopyranose ATG 19121 20335 1215 + biogenesis, outer membrane] UDP-galactopyranose mutase [Rhizobium tropici] (WP_015340214.1, 4e- mutase (6.19e-138) 109), 45/63 [UDP-GALP_mutase] TIGR00031 UDP-galactopyranose mutase (1.12e-90) 43 ORF 43 ATG 20335 20544 210 + None No significant similarities 44 ORF 44 ATG 20545 20721 177 + None No significant similarities 45 ORF 45 ATG 20802 21410 609 + None No significant similarities 46 ORF 46 ATG 21469 21924 456 + None No significant similarities 47 ORF 47 ATG 21936 22136 201 + None No significant similarities Hypothetical protein W45_23055 [Agrobacterium tumefaciens] Hypothetical (KAJ36252.1, 1e-61), 43/55 48 ATG 23258 22239 1020 - None protein Hypothetical protein CHOED_010 [Vibrio phage CHOED] (YP_009021676.1, 4e-55), 38/50 253

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains

ORF Function Identity/ % Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start (bp) (bp)

Head decoration protein [Roseobacter sp. MED193] (WP_009811270.1, Putative head 3e-15), 46/56 49 ATG 24791 23262 1530 - None decoration protein Head decoration protein [Roseobacter sp. AzwK-3b] (WP_007812525.1, 3e-13), 55/69 Hypothetical protein [Photobacterium profundum] (WP_011221326.1, 8e- Hypothetical 42), 34/46 50 GTG 25936 24791 1146 - None protein Hypothetical protein [Candidatus Puniceispirillum marinum] (WP_013045242.1, 2e-39), 31/44 Hypothetical protein [Lysinibacillus fusiformis] (WP_025115867.1, 8e-17), Conserved [Peptidase_S74] pfam13884 33/50 51 hypothetical ATG 27726 25933 1794 - Chaperone of endosialidase Hypothetical protein [Novosphingobium lindaniclasticum] protein (3.39e-09) (WP_021234828.1, 2e-13), 46/62 Hypothetical protein [Salmonella phage SKML-39] (YP_007236098.1, 4e- 121), 26/42 Hypothetical Unknown structural protein [Dickeya phage Limestone] (YP_007237479.1, 52 ATG 33080 27930 5151 - None protein 5e-117), 27/43 Hypothetical protein [Shigella phage Ag3] (YP_003358661.1, 4e-116), 26/42 Baseplate wedge [Synechococcus phage S-CBM2] (AFK66262.1,8e-14), 30/47 Phage baseplate [Ube1_repeat1] cd01491 Baseplate wedge [Synechococcus phage S-ShM2] (YP_004322764.1, 2e- 53 wedge subunit ATG 34559 33120 1440 - Ubiquitin activating enzyme 13), 24/42 (T4-like gp8) (E1) ( 2.44e-05) gp8 baseplate wedge protein [Synechococcus phage S-MbCM6] (YP_007001809.1, 4e-13), 25/41

Hypothetical Hypothetical protein [Rhizobium mongolense] (WP_022717931.1,4e-07), 54 ATG 34847 34566 282 - None protein 37/56

Putative tail [15] PHA02556 gp15 proximal tail sheath stabilization protein [Synechococcus phage sheath tail sheath stabilizer and metaG-MbCM1] (YP_007001600.1, 6e-22), 35/56 55 ATG 35658 34858 801 - stabilization completion protein Proximal tail sheath stabilization [Synechococcus phage S-SSM5] protein (1.34e-24) (YP_004324707.1, 2e-21), 33/51

56 ORF 56 ATG 36074 35730 345 - None No significant similarities 254

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), Conserved Domains

ORF Function %Identity/ %Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start (bp) (bp)

[T4_neck-protein] gp14 head completion protein [Acinetobacter phage Acj61] Putative T4-like pfam11649 (YP_004009789.1, 2e-17), 33/52 57 ATG 36920 36084 837 - neck protein Virus neck protein (1.67e- gp14 neck protein [Enterobacteria phage CC31] (YP_004010024.1, 3e- 25) 15), 34/48 Neck protein [Sinorhizobium phage phiM12] (AGR47751.1, 2e-22), 27/47 Putative phage [13] PHA02554 58 ATG 37676 36924 753 - Neck protein [Synechococcus phage S-SSM4] (YP_007677257.1, 2e-19), neck protein neck protein (3.92e-160) 29/41 59 ORF 59 ATG 37975 37676 300 - None No significant similarities Phage baseplate Gp5 baseplate hub subunit and tail lysozyme [Klebsiella phage 0507-KN2- [5] PHA02596 hub subunit and 1] (|YP_008531971.1, 2e-46), 37/54 60 ATG 38865 37999 867 - baseplate hub subunit and tail lysozyme (T4- Baseplate hub subunit and tail lysozyme [Salmonella phage Marshall] tail lysozyme (5.44e-14) like gp5) (YP_008771749.1, 3e-46), 36/54 Hypothetical-Protein | belonging to T4-LIKE GC: 15 [Synechococcus Hypothetical phage S-PM2] (|YP_195244.1 , 8e-17), 24/41 61 ATG 40142 38862 1281 - None protein Hypothetical protein Syn1_014 [Prochlorococcus phage Syn1] (YP_004324386.1, 8e-15), 23/41 Conserved [ProQ] pfam04352 ProQ activator of osmoprotectant transporter ProP [Methylobacillus 62 hypothetical ATG 40952 40299 654 - ProQ/FINO family (6.28e- flagellatus] (WP_011480832.1, 6e-05), 30/45 protein 11) 63 ORF 63 ATG 41186 41001 186 - None No significant similarities 64 ORF 64 ATG 41347 41186 162 - None No significant similarities Putative [UvsY] pfam11056 recombination, Recombination, repair and UvsY [Synechococcus phage S-SKS1] (YP_007674498.1, 2e-13), 30/ 54 65 repair and ssDNA ATG 41802 41347 456 - ssDNA binding protein UvsY [Prochlorococcus phage P-SSM2] (YP_214370.1, 3e-12 ), 27/53 binding protein UvsY (4.15e-12) (UvsY-like protein) [3] PHA02576 Tail completion and sheath stabilizer protein gp3 tail completion and sheath stabilizer protein [Klebsiella phage KP15] Tail completion (1.87e-12) (YP_003580027.1, 5e-07), 26/51 66 and sheath GTG 42319 41795 525 - [Phage_T4_gp19] 3 gene product [Enterobacteria phage IME08] (YP_003734287.1,6e-07), stabilizer protein pfam06841 26/48 T4-like virus tail tube protein gp19 (3.80e-03) 255

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains

ORF Function Identity/ % Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start (bp) (bp)

67 ORF 67 ATG 42716 42297 420 - None No significant similarities 68 ORF 68 ATG 43474 42794 681 - None No significant similarities [RNaseH_C] pfam09293 Rnh RNaseH [Prochlorococcus phage Syn1] (YP_004324593.1, 3e-32), T4 RNase H, C terminal Putative 31/53 69 GTG 44440 43508 933 - (6.08e-08) ribonuclease H RNase H [Synechococcus phage S-CAM1] (YP_007673021.1, 1e-31), [rnh] PHA02567 31/51 RnaseH (1.52e-33) Hypothetical Hypothetical protein CcrColossus_gp027 [Caulobacter phage 70 ATG 44903 44496 408 - None protein CcrColossus] (YP_006988261.1, 1e-16), 29/52 71 ORF 71 GTG 45178 44900 279 - None No significant similarities Endonuclease [Pseudochrobactrum sp. AO18b] (WP_022711926.1, 8e- [Pyr_excise] pfam03013 Putative 26), 46/59 72 ATG 45564 45166 399 - Pyrimidine dimer DNA endonuclease Endonuclease V N-glycosylase UV repair enzyme [Cronobacter phage glycosylase (8.80e-19) vB_CsaM_GAP32] (YP_006987612.1, 5e-25), 44/64 73 ORF 73 ATG 45820 45629 192 - None No significant similarities

74 ORF 74 ATG 45987 45817 171 - None No significant similarities 75 ORF 75 ATG 46166 45984 183 - None No significant similarities [Helicase_C_2] pfam13307 Helicase C-terminal domain ATP-dependent helicase [Serratia phage phiMAM1] (YP_007349084.1, Probable ATP- (3.75e-15) 8e-62), 30/46 76 dependent ATG 47746 46163 1584 - [DinG] COG1199 DNA helicase [Salmonella phage vB_SalM_SJ3] (YP_009030518.1, 4e- helicase Rad3-related DNA helicase 50), 30/46 (3.97e-22) [GIY-YIG_SegABCDEFG] cd10444 SegC homing endonuclease [Enterobacteria phage Phi1] N-terminal catalytic GIY- Putative homing (YP_001469494.1, 1e-08), 36/50 77 ATG 48260 47799 462 - YIG domain of endonuclease Putative Seg-like homing endonuclease GIY-YIG family [Aeromonas bacteriophage T4 phage Aes012] (YP_007677835.1, 5e-07), 37/51 segABCDEFG gene encoding proteins (4.46e-13) 78 ORF 78 ATG 48537 48271 267 - None No significant similarities 256

Position Associated Putative

Putative Best BLASTP match (es), (GenBank Accession No:, E value), Conserved Domains Start Start

ORF Function %Identity/ %Similarity

Codon Strand Start End Length (E value)

(bp) (bp) Orientation/ [GIY- YIG_SegABCDEFG] cd10444 SegC homing endonuclease [Enterobacteria phage Phi1] (YP_001469494.1, N-terminal catalytic GIY- Putative homing 1e-08), 36/50 77 ATG 48260 47799 462 - YIG domain of endonuclease Putative Seg-like homing endonuclease GIY-YIG family [Aeromonas phage bacteriophage T4 Aes012] (YP_007677835.1, 5e-07), 37/51 segABCDEFG gene encoding proteins (4.46e- 13) 78 ORF 78 ATG 48537 48271 267 - None No significant similarities 79 ORF 79 ATG 48865 48635 231 - None No significant similarities [Trypsin_2] pfam13365 Trypsin-like peptidase Serine protease [Chelativorans sp. BNC1] (WP_011579594.1, 1e-20), Putative trypsin- domain (1.53e-12) 33/48 80 like serine ATG 49041 49730 690 + [degP_htrA_DO] Hypothetical protein [Ochrobactrum sp. EGD-AQ16] (WP_021584953.1, proteases TIGR02037 1e-17), 31/47 Periplasmic serine protease (1.15e-11) 81 ORF 81 ATG 50085 49822 264 - None No significant similarities 82 ORF 82 ATG 50372 50157 216 - None No significant similarities [23] PHA02541 Major capsid protein (7.33e- Precursor of major head subunit [Sinorhizobium phage phiM12] Major capsid 76) 83 ATG 51802 50459 1344 - (AGR47697.1, 1e-82), 40/52 Major capsid protein [Synechococcus phage protein [Gp23] pfam07068 S-CBM2] (AFK66286.1, 4e-77), 37/54 Major capsid protein Gp23 (4.79e-39) 84 ORF 84 ATG 53048 51873 1176 - None No significant similarities [2] PHA00911 Prohead core scaffolding Phage prohead Prohead core scaffold and protease [Synechococcus phage S-PM2] protein and protease assembly (YP_195140.1, 1e-37), 45/63 85 ATG 53715 53050 666 - (3.71e-47) (scaffolding) Prohead protease gp21 [Synechococcus phage S-PM2] (AAL09969.1, 1e- [Peptidase_U9] pfam03420 protein 37), 45/62 Prohead core protein protease (7.92e-27) 257

Position

Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), Conserved Domains

ORF Function %Identity/ %Similarity Strand Start End Length (E value) Start Codon Start (bp) (bp) Orientation/

86 ORF 86 ATG 53971 53726 246 - None No significant similarities [20] PHA02531 Portal vertex protein (5.39e- 124) Portal vertex protein of head [Synechococcus phage S-SSM5] Putative portal 87 ATG 55719 54109 1611 - [Peptidase_S80] pfam07230 (YP_004324725.1, 2e-117), 39/61 vertex protein Bacteriophage T4-like Portal protein [Cyanophage Syn30] (YP_007877943.1, 6e-117), 39/61 capsid assembly protein (Gp20) (5.70e-120) T4-like tail tube protein [Synechococcus phage S-CRM01] [19] PHA02551 (YP_004508470.1, 1e-13), 30/50 88 Tail tube protein ATG 56352 55729 624 - Tail tube protein (2.06e-14) Tail tube monomer [Synechococcus phage S-SKS1] (YP_007674508.1, 1e- 12), 28/51 Tail sheath monomer [Sinorhizobium phage phiM12] (AGR47739.1, 3e- [18] PHA02539 Putative tail 49), 30/46 89 GTG 58951 56432 2520 - Tail sheath protein (2.45e- sheath protein Tail sheath protein [Pelagibacter phage HTVC008M] (YP_007517947.1, 56) 3e-49), 33/49 Head completion protein [Enterobacteria phage RB49] (NP_891703.1, 1e- [4] PHA02552 Head completion 39), 45/61 90 ATG 59479 59018 462 - Head completion protein protein gp4 head completion protein [Enterobacteria phage Phi1] (3.35e-61) (YP_001469477.1, 2e-39), 45/61 91 ORF 91 ATG 59878 59504 375 - None No significant similarities

92 ORF 92 ATG 60033 59878 156 - None No significant similarities

[PRK12547] PRK12547 RNA polymerase sigma factor (8.27e-21) RNA polymerase sigma factor [Hyphomicrobium nitrativorans] [RpoE] COG1595 RNA polymerase (WP_023786975.1, 4e-19), 33/49 93 ATG 60609 60073 537 - DNA-directed RNA sigma-70 factor RNA polymerase sigma factor [Methylocystis rosea] (WP_018406644.1, polymerase specialized 9e-18), 32/53 sigma subunit, sigma24 homolog (2.91e-26)

258

Position Associated Putative

Putative Best BLASTP match (es), (GenBank Accession No:, E value), Conserved Domains Start Start

ORF Function Start End %Identity/ %Similarity

Codon Length (E value) n/ Strand n/

(bp) (bp) Orientatio Phosphate [PhoH] pfam02562 Hypothetical protein [Proteus hauseri] (WP_023582014.1,1e-44), 41/61 starvation- 94 ATG 60723 61481 759 + PhoH-like protein (2.92e- Phosphate starvation protein PhoH [Proteus mirabilis] (WP_017826955.1, inducible protein 68) 3e-44), 39/59 PhoH 95 ORF 95 ATG 64079 61452 2628 - None No significant similarities Baseplate wedge [Pelagibacter phage HTVC008M] (YP_007517920.1, 3e- Putative base [6] PHA02553 79), 29/47 96 plate wedge ATG 65859 64084 1776 - Baseplate wedge subunit Baseplate wedge [Synechococcus phage S-MbCM7] (YP_009008210.1, subunit (1.10e-85) 8e-73), 29/49 [25] PHA00415 Base plate wedge subunit [Synechococcus phage S-PM2] (YP_195113.1, Putative base Baseplate wedge subunit 9e-08), 36/56 97 plate wedge ATG 66248 65859 390 - (9.67e-11) Base plate wedge subunit [Prochlorococcus phage Syn1] subunit [GPW_gp25] pfam04965 (YP_004324463.1, 7e-06), 36/55 Conserved Gene[16] 25 PHA02585-like lysozyme Terminase small subunit [uncultured phage MedDCM-OCT-S09-C7] 98 hypothetical ATG 66743 66255 489 - Small terminase(4.68e-07) protein (ADD95599.1,1e-04), 24/50 protein (4.37e-05)

99 ORF 99 ATG 66967 66743 225 - None No significant similarities [uvsW] PHA02558 Helicase [Synechococcus phage S-CRM01] (YP_004508480.1, 1e-83), UvsW helicase (3.46e-115) 100 DNA helicase ATG 68507 66975 1533 34/55 - [SSL2] COG1061 UvsW [Cyanophage P-RSM6] (YP_007675121.1, 1e-80), 33/56 DNA or RNA helicases of superfamily II (4.18e-26) 101 ORF 101 ATG 68595 68843 249 + None No significant similarities 102 ORF 102 ATG 68840 69025 186 + None No significant similarities

103 ORF 103 ATG 69025 69216 192 + None No significant similarities Hypothetical protein [Verrucomicrobium spinosum] (WP_009962169.1, Hypothetical 2e-19), 45/61 104 ATG 69213 69608 396 + None protein Hypothetical protein [Pseudomonas phage PhiPA3] (AEH03652.1, 3e-19), 40/57

105 ORF 105 ATG 69896 70249 354 + None No significant similarities 259

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), Conserved Domains

ORF Function %Identity/ %Similarity Strand Start End Length (E value) Start Codon Start

(bp) (bp) Orientation/

Hypothetical Hypothetical protein [Agrobacterium tumefaciens] (WP_003496590.1, 3e- 106 ATG 70259 70528 270 + None protein 43), 79/ 107 ORF 107 ATG 70521 70640 120 + None No significant similarities

108 ORF 108 ATG 70713 70955 243 + None No significant similarities

109 ORF 109 ATG 71048 71257 210 + None No significant similarities Phage baseplate Baseplate hub subunit and tail lysozyme [Aeromonas phage 31] [5] PHA02596 hub subunit and (YP_238865.1, 1e-21), 31/45 110 ATG 73802 71247 2556 - Baseplate hub subunit and tail lysozyme (T4- Baseplate hub subunit and tail lysozyme [Aeromonas phage 44RR2.8t] tail lysozyme (8.23e-24) like gp5) (NP_932493.1, 1e-21), 31/45 Large terminase protein [uncultured phage MedDCM-OCT-S09-C7] [17] PHA02533 Terminase large (ADD95601.1, 5e-117), 40/59 111 ATG 75515 73806 1710 - Large terminase protein subunit Terminase large subunit [Cyanophage P-RSM6] (YP_007675141.1, 9e- (1.30e-166) 117), 39/58 Hypothetical protein [Bradyrhizobium sp. WSM1417] (WP_027516250.1, Conserved [Cadherin_repeat] cd11304 5e-11), 39/61 112 hypothetical ATG 75884 75525 360 - Cadherin tandem repeat Hypothetical protein [Bradyrhizobium sp. URHA0002] (WP_027541758.1, protein domain (2.72e-04) 1e-10), 40/61 Hypothetical protein IE4771_CH01952 [Rhizobium etli bv. mimosae str. Hypothetical IE4771] (AIC27069.1, 3e-15), 42/52 113 ATG 76275 75946 330 - None protein Hypothetical protein [Rhizobium leguminosarum] (WP_003565238.1, 1e- 13), 40/52 Hypothetical protein RHEph10_gp135 [Rhizobium phage RHEph10] Conserved (AGC36178.1, 0.0), 65/79 114 hypothetical ATG 79965 76348 3618 - None Hypothetical protein L338C_168 [Rhizobium phage vB_RleS_L338C] protein (YP_009003325.1, 3e-160), 48/60

Cellulosome anchoring protein [Rhodopirellula baltica] Conserved [Cadherin_repeat] cd11304 (WP_007331294.1, 7e-23), 27/42 115 hypothetical ATG 81396 79966 1431 - Cadherin tandem repeat VCBS family protein [Rhodopirellula europaea] (WP_008653861.1, 9e- protein domain (1.41e-04) 23), 25/46

260

Position Associated Putative

Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains Start Start

ORF Function Identity/ % Similarity

Codon Strand Start End Length (E value)

(bp) (bp) Orientation/ Hypothetical protein RHEph10_gp003 [Rhizobium phage RHEph10] Hypothetical (|AGC36047.1, 3e-86), 37/52 116 ATG 83138 81429 1710 - None protein MULTISPECIES: hypothetical protein [Rhizobium] (WP_018449494., 2e- 74), 36/50 117 ORF 117 ATG 83788 83195 594 - None No significant similarities 118 ORF 118 ATG 84076 83795 282 - None No significant similarities 119 ORF 119 ATG 84635 84880 246 + None No significant similarities 120 ORF 120 ATG 84942 85514 573 + None No significant similarities [GIY- YIG_SegABCDEFG] cd10444 Hypothetical protein [ [Clostridium methoxybenzovorans] N-terminal catalytic GIY- (WP_024348460.1, 2e-18), 41/62 Putative 121 ATG 86062 85517 546 - YIG domain of gp330 [Bacillus phage G] (YP_009015633.1, 5e-18), 40/55 endonuclease bacteriophage T4 Putative GIY-YIG family Seg-like homing endonuclease [Acinetobacter segABCDEFG gene phage Ac42] (YP_004009528.1, 5e-16), 35/49 encoding proteins (3.27e- 18) Single stranded [32] PHA02550 Single-stranded DNA-binding protein [Prochlorococcus phage P-SSM2] 122 ATG 86182 87204 1023 + DNA-binding Single-stranded DNA (ACY75884.1, 2e-31), 32/51 protein (T4-like binding protein (7.43e-24) HypotheticalSingle-stranded protein DNA RirG_008610 binding protein [Rhizophagus [Synechococcus irregularis phage DAOMS-SSM4] Conserved [PRK03918] PRK03918 phage gp32 ) 197198w](YP_007677400.1, (EXX79131., 6e-30), 2e -28/4605), 25/44 123 hypothetical ATG 89090 87273 1818 - Chromosome segregation Hypothetical protein [Veillonella sp. ACP1] (WP_009661121.1, 2e-05), protein protein (1.12e-05) 23/42 124 ORF 124 ATG 89518 89090 429 - None No significant similarities Conserved [PHA02078] PHA02078 125 hypothetical ATG 89691 89515 177 - Hypothetical protein No significant similarities protein (8.55e-04) [T4_baseplate] Baseplate hub subunit [Prochlorococcus phage P-SSM7] Putative base pfam12322 (YP_004324840.1, 2e-13), 27/46 126 ATG 90383 89691 693 - plate subunit T4 bacteriophage base Baseplate hub subunit [Prochlorococcus phage P-SSM4] (YP_214575.1, plate protein (3.09e-05) 4e-09), 25/47

261

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains Start Start

ORF Function Start End Identity/ % Similarity

Codon Length (E value) n/ Strand n/

(bp) (bp) Orientatio Putative baseplate tail tube initiator gp54 [Dickeya phage Limestone] Putative baseplate (YP_007237372.1, 3e-11), 28/48 127 ATG 90928 90395 534 - None tail tube initiator Baseplate tail tube [Klebsiella phage 0507-KN2-1] (YP_008531978.1, 5e- 11), 26/49 DNA end protector protein [Aeromonas phage 31] (YP_238862.1, 1e-26), Phage DNA end [2] PHA02577 32/51 128 protector protein ATG 91554 90925 630 - DNA end protector DNA end protector protein [Aeromonas phage 44RR2.8t] (NP_932490.1, during packaging protein (9.96e-34) 1e-26), 32/51 129 ORF 129 ATG 91640 91792 153 + None No significant similarities

[GIY- Hypothetical protein [Clostridium botulinum] (WP_024931749.1, 1e-12), Phage-associated YIG_SegABCDEFG] 36/50 130 homing ATG 92421 91903 519 - cd10444 SegC homing endonuclease [Enterobacteria phage Phi1] endonuclease N-terminal catalytic GIY- (YP_001469494.1, 2e-11), 29/45 YIG domain of 131 ORF 131 ATG 92518 92718 201 None No significant similarities + bacteriophage T4 132 ORF 132 ATG 92737 92943 207 + segABCDEFGNone gene No significant similarities encoding proteins (4.46e- 133 ORF 133 ATG 92943 93182 240 + None13) No significant similarities

134 ORF 134 ATG 93179 93454 276 + None No significant similarities

135 ORF 135 ATG 93451 93987 537 + None No significant similarities

Hypothetical Hypothetical protein SmphiM12_090 [Sinorhizobium phage phiM12] 136 ATG 94059 94310 252 + None protein (AGR47722.1, 5e-10), 41/66 137 ORF 137 ATG 94312 94596 285 + None No significant similarities 138 ORF 138 ATG 94604 94831 228 + None No significant similarities

139 ORF 139 ATG 94920 95234 315 + None No significant similarities 140 ORF 140 ATG 95242 95460 219 + None No significant similarities 141 ORF 141 ATG 95529 95888 360 + None No significant similarities

142 ORF 142 TTG 95890 96024 135 + None No significant similarities 262

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains Start

ORF Function Start End Identity/ % Similarity

Codon Length (E value) n/ Strand n/

(bp) (bp) Orientatio 143 ORF 143 ATG 96021 96245 225 + None No significant similarities 144 ORF 144 ATG 96731 96414 318 - None No significant similarities 145 ORF 145 ATG 97068 96724 345 - None No significant similarities 146 ORF 146 ATG 97463 97068 396 - None No significant similarities 147 ORF 147 ATG 97785 97456 330 - None No significant similarities DNA primase subunit gp61 [Synechococcus phage S-CBM2] DNA primase, [61] PHA02540 (AFK66330.1, 5e-37), 32/49 148 TTG 97942 98904 963 + T4-like DNA primase (6.30e-61) DNA primase subunit [Prochlorococcus phage P-HM2] (YP_004323543.1, 7e-37), 30/50 [RNRR2] cd01049 Ribonucleotide Reductase, R2/beta Ribonucleotide subunit, ferritin-like Putative ribonucleoside-diphosphate reductase protein, beta subunit 149 reductase beta ATG 98918 99928 1011 + diiron-binding domain [Rhizobium phage RHEph10] (|AGC36114.1, 1e-1010, 47/67 subunit (5.14e-38) [NrdF] COG0208 Ribonucleotide reductase, Ribonucleotide beta[RNR_I] subunit cd01679 (6.76e-22) Hypothetical protein [Spiribacter sp. UAH-SP71] (WP_023368018.1, 1e- 150 ATG 99936 101600 1665 + reductase alpha Class I ribonucleotide 180), 50/63 Conservedsubunit reductase[GrxC] COG0695(2.30e-134) Ribonucleotide-diphosphate reductase subunit alpha [Mannheimia Glutaredoxin [gamma proteobacterium SCGC AAA168-I18] 151 hypothetical ATG 101683 101955 273 + Glutaredoxin[Ribonuc_red_lgC] and related varigena] (|WP_025216694.1, 2e-180), 48/67 (WP_020031309.1, 2e-04), 33/53 protein proteinspfam02867 (4.10e - 03) Ribonucleotide reductase, 152 ORF 152 ATG 101918 102217 300 + barrel domainNone (2.06e -96) No significant similarities

153 ORF 153 TTG 102228 102497 270 + None No significant similarities

154 ORF 154 ATG 102559 102828 270 + None No significant similarities

Hypothetical protein [Phyllobacterium sp. YR531] (WP_008121927.1, 7e- Hypothetical 12), 40/56 155 ATG 102954 103265 312 + None protein Hypothetical protein [Microbulbifer variabilis] (WP_020413317.1, 9e-07), 34/50 263

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains

ORF Function Identity/ % Similarity Strand Start End Length (E value) Start Codon Start

(bp) (bp) Orientation/

156 ORF 156 ATG 103277 103540 264 + None No significant similarities Conserved [RfaG] COG0438 157 TTG 103558 104670 1113 + No significant similarities hypothetical Glycosyltransferase 158 ORFprotein 158 ATG 104748 105020 273 + (8.33eNone-06) No significant similarities 159 ORF 159 ATG 105020 105421 402 + None No significant similarities 160 ORF 160 ATG 105421 105600 180 + None No significant similarities 161 ORF 161 ATG 105602 105943 342 + None No significant similarities 162 ORF 162 ATG 105940 106149 210 + None No significant similarities 163 ORF 163 ATG 106366 106650 285 + None No significant similarities 164 ORF 164 TTG 106700 106963 264 + None No significant similarities

165 ORF 165 ATG 107015 107311 297 + None No significant similarities 166 ORF 166 ATG 107308 107460 153 + None No significant similarities 167 ORF 167 GTG 107457 108245 789 + None No significant similarities

Baseplate wedge subunit [Synechococcus phage S-RSM4] Putative base [Phage_gp53] pfam11246 (YP_003097469.1, 5e-06), 24/42 168 plate wedge TTG 108247 108834 588 + Base plate wedge protein Baseplate wedge subunit [Pelagibacter phage HTVC008M] protein 53 (4.76e-06) (YP_007517894.1, 3e-05), 24/42 169 ORF 169 ATG 108947 109312 366 + None No significant similarities

Putative RecA- UvsX RecA-like recombination protein [Aeromonas phage PX29] [RecA] COG0468 like (YP_009011443.1, 1e-64), 36/55 170 TTG 109328 110518 1191 + RecA/RadA recombinase recombination UvsX RecA-like recombination protein [Acinetobacter phage 133] (1.71e-03) protein (YP_004300647.1, 1e-64), 36/56

264

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains

ORF Function Identity/ % Similarity Strand Start End Length (E value) Start Codon Start

(bp) (bp) Orientation/

DNA primase subunit [Synechococcus phage S-RSM4] (YP_003097303.1, 7e-89), 34/55 DNA primase/ [41] PHA02542 DNA primase-helicase [Synechococcus phage S-MbCM7] 171 ATG 110583 112052 1470 + helicase 41 helicase (4.39e-120) (YP_009008282.1, 7e-88), 33/55 DNA primase-helicase subunit [Synechococcus phage S-CAM1] (YP_007673069.1, 9e-87), 34/55 Hypothetical protein [Deftia phage phiW-14] (YP_003358879.1, 7e-12), Hypothetical 31/46 172 ATG 112062 112661 600 + None protein Junction endodeoxyribonuclease [Serratia phage phiMAM1] (YP_007349117.1, 2e-09), 28/40 173 ORF 175 ATG 112665 113090 426 + None No significant similarities Hypothetical Hypothetical protein [Pseudomonas phage YuA] (YP_001595841.1, 1e- 174 ATG 113152 113739 588 + None protein 11), 36/46 Hypothetical protein MP1412_17 [Pseudomonas phage MP1412] Putative Deoxynucleotide monophosphate kinase [Pseudomonas syringae] deoxynucleotide (WP_024659829.1,(YP_006561024.1, 2e3e--10),12), 34/4532/51 175 ATG 113739 114311 573 + None monophosphate Deoxynucleotide monophosphate kinase [Pseudomonas syringae] kinase (WP_024670653.1, 9e-12), 32/51 176 ORF 178 ATG 114322 114774 453 + None No significant similarities [TS_Pyrimidine_Hmase] cd00351 Thymidylate synthase and pyrimidine Hypothetical protein JS09_0136 [Escherichia phage vB_EcoM_JS09] Putative hydroxymethylase (4.40e- (YP_009037459.1, 5e-25), 33/51 177 pyrimidine ATG 114838 115620 783 + 18) gp42 dCMP hydroxymethylase [Acinetobacter phage Acj9] hydroxymethylase [Thymidylat_synt] (YP_004010226.1, 1e-24), 36/53 pfam00303 Thymidylate synthase (9.85e-17) Hypothetical Hypothetical protein Acj61p079 [Acinetobacter phage Acj61] 178 ATG 115617 116648 1032 + None protein (YP_004009696.1, 7e-07), 27/39 Hypothetical protein Acj9p088 [Acinetobacter phage Acj9] 179 ORF 179 ATG 116658 117293 636 + None No significant similarities (YP_004010225.1, 4e-06), 25/40

265

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains Start

ORF Function Identity/ % Similarity

Codon Strand Start End Length (E value)

(bp) (bp) Orientation/ [GyrB] COG0187 Type IIA topoisomerase (1.83e-84) Topoisomerase II large subunit [Escherichia phage PBECO 4] Putative DNA [39] PHA02569 (|AGC35133.1, 2.00E-142), 40/58 180 topoisomerase ATG 117335 119281 1947 + DNA topoisomerase II DNA gyrase subunit B [Cronobacter phage vB_CsaM_GAP32] large subunit large subunit (1.92e-81) (YP_006987424.1, 7.00E-142), 41/57 [PRK05559] PRK05559 DNA topoisomerase IV subunit B (1.32e-65) DNA primase subunit [Synechococcus phage S-RSM4] (YP_003097303.1, 7e-89), 34/55 DNA primase/ [41] PHA02542 171 ATG 110583 112052 1470 + DNA primase-helicase [Synechococcus phage S-MbCM7] helicase 41 helicase (4.39e-120) (YP_009008282.1, 7e-88), 33/55 DNA primase-helicase subunit [Synechococcus phage S-CAM1] Hypothetical(YP_007673069.1 protein, 9e [-Deftia87), 34/55 phage phiW-14] (YP_003358879.1, 7e-12), Hypothetical 31/46 172 ATG 112062 112661 600 + None protein Junction endodeoxyribonuclease [Serratia phage phiMAM1] (YP_007349117.1, 2e-09), 28/40 173 ORF 175 ATG 112665 113090 426 + None No significant similarities Hypothetical Hypothetical protein [Pseudomonas phage YuA] (YP_001595841.1, 1e- 174 ATG 113152 113739 588 + None protein 11), 36/46 Hypothetical protein MP1412_17 [Pseudomonas phage MP1412] Putative Deoxynucleotide monophosphate kinase [Pseudomonas syringae] (YP_006561024.1, 2e-10), 34/45 deoxynucleotide (WP_024659829.1, 3e-12), 32/51 175 ATG 113739 114311 573 + None monophosphate Deoxynucleotide monophosphate kinase [Pseudomonas syringae] kinase (WP_024670653.1, 9e-12), 32/51 176 ORF 178 ATG 114322 114774 453 + None No significant similarities

[TS_Pyrimidine_Hmase] cd00351 Hypothetical protein JS09_0136 [Escherichia phage vB_EcoM_JS09] Putative Thymidylate synthase and (YP_009037459.1, 5e-25), 33/51 177 pyrimidine ATG 114838 115620 783 + pyrimidine gp42 dCMP hydroxymethylase [Acinetobacter phage Acj9] hydroxymethylase hydroxymethylase (4.40e- (YP_004010226.1, 1e-24), 36/53 18) 266

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains Start Start

ORF Function Identity/ % Similarity

Codon Strand Start End Length (E value)

(bp) (bp) Orientation/ Hypothetical protein Acj61p079 [Acinetobacter phage Acj61] Hypothetical (YP_004009696.1, 7e-07), 27/39 178 ATG 115617 116648 1032 + None protein Hypothetical protein Acj9p088 [Acinetobacter phage Acj9] (YP_004010225.1, 4e-06), 25/40 179 ORF 179 ATG 116658 117293 636 + None No significant similarities

[GyrB] COG0187 Type IIA topoisomerase Topoisomerase II large subunit [Escherichia phage PBECO 4] Putative DNA (1.83e-84) (|AGC35133.1, 2.00E-142), 40/58 180 topoisomerase ATG 117335 119281 1947 + [39] PHA02569 DNA gyrase subunit B [Cronobacter phage vB_CsaM_GAP32] large subunit DNA topoisomerase II (YP_006987424.1, 7.00E-142), 41/57 large subunit (1.92e-81) 181 ORF 181 ATG 119291 119602 312 + [PRK05559]None PRK05559 No significant similarities DNA topoisomerase IV [PTZ00427]subunit B (1.32e PTZ00427-65) Hypothetical 182 TTG 119698 120219 522 + isoleucine-tRNA ligase No significant similarities protein (9.16e-03) Hypothetical [Nckap1] pfam09735 183 ATG 120289 120504 216 + No significant similarities protein Membrane-associated 184 ORF 184 TTG 120517 120882 366 + apoptosis Noneprotein (8.51e- No significant similarities 03) 185 ORF 185 ATG 120879 121052 174 + None No significant similarities Hypothetical Hypothetical protein 7-7-1_00059 [Agrobacterium phage 7-7-1] 186 ATG 121045 121287 243 + None protein (YP_007006515.1, 7e-10), 43/58 [52] PHA02592 DNA topisomerase II medium subunit (2.24e- Topoisomerase II medium subunit [Escherichia phage PBECO 4] Topoisomerase IV (AGC35132.1, 6e-61), 31/55 187 ATG 121346 122704 1359 + 64) [DNA_topoisoIV] subunit A pfam00521 Topoisomerase II medium subunit [Enterobacteria phage vB_KleM-RaK2] DNA (YP_007007314.1, 8e-59), 30/54 gyrase/topoisomerase IV, subunit A (9.44e-66) 188 ORF 188 TTG 122715 122963 249 + None No significant similarities

189 ORF 189 ATG 123034 123225 192 + None No significant similarities 267

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), Conserved Domains Start Start

ORF Function Start End %Identity/ %Similarity

Codon Length (E value) n/ Strand n/

(bp) (bp) Orientatio Hypothetical protein PAK_P100160c [Pseudomonas phage PAK_P1] Hypothetical (YP_004327175.1, 2e-22), 46/61 190 ATG 123298 123687 390 + None protein Hypothetical protein DFL12P1_0030 [Dinoroseobacter phage IMEphi4] (AHX00991.1, 2e-19), 43/66

191 ORF 191 ATG 123684 124127 444 + None No significant similarities Hypothetical protein [Comamonas badia] (WP_024538526.1, 1e-31), Hypothetical 192 ATG 124124 124729 606 + None 40/60 protein Hypothetical protein [Bradyrhizobium sp. S23321] (WP_015685103.1, 5e- 31), 44/60 193 ORF 193 ATG 124739 125188 450 + None No significant similarities gp55 [Synechococcus phage S-RIM8 A.HR1] (YP_007518209.1, 6e-13), Putative RNA [55] PHA02547 31/50 194 polymerase sigma ATG 125245 125733 489 + RNA polymerase sigma Hypothetical protein CPPG_00172 [Cyanophage P-RSM1] factor factor (1.02e-11) (YP_007877723.1,7e-11), 30/47 Putative [47] PHA02546 Late transcription sigma factor [Synechococcus phage syn9] Recombination endonuclease sunbunit [Synechococcus phage S-RSM4] 195 endonuclease ATG 125733 126713 981 + endonuclease subunit (YP_717814.2, 9e-11), 36/54 (YP_003097327.1, 2e-07 ), 23/42 subunit (3.45e-06) Putative [46] PHA02562 gp46 recombination endonuclease [Synechococcus phage S-CAM8] 196 TTG 126717 128342 1626 + endonuclease endonuclease subunit (AET72554.1, 2e-80), 32/53 subunit (3.89e-95) Hhypothetical protein SXCG_00167 [Synechococcus phage S-CAM8] 197 ORF 197 ATG 128344 128559 216 + None (YP_008125654.1,No significant similarities 4e-80), 32/53 [GatB_Yqey] pfam02637 Putative glutamyl- GatB domain (6.41e-09) Aspartyl/gl utamyl - tRNA(Asn/Gln) amidotransferase subunit B tRNA [Caulobacter vibrioides] (WP_004620094.1, 3e-10), 56/66 198 ATG 128559 128825 267 + [gatB] PRK05477 amidotransferase Aspartyl/glutamyl-tRNA Glutamyl-tRNA amidotransferase subunit B [Caulobacter vibrioides] subunit B amidotransferase subunit (WP_010920294.1, 7e-10 ), 54/66 B (1.47e-10) Sliding clamp DNA polymerase accessory protein [Pelagibacter phage Putative sliding [45] PHA02545 HTVC008M] (YP_007517980.1, 3e-11), 26/44 199 TTG 128901 129611 711 + clamp Sliding clamp (1.04e-09) Sliding clamp DNA polymerase accessory protein [Synechococcus phage Syn19] (YP_004323972., 1e-09), 27/ 45

268

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains Start Start

ORF Function Start End Identity/ % Similarity

Codon Length (E value) n/ Strand n/

(bp) (bp) Orientatio [4] PHA02544 Clamp loader, small subunit (1.17e-89) Sliding clamp loader [Synechococcus phage syn9] (YP_717826.1, 4e-62), [rfc] PRK00440 200 Putative clamp 37/58 TTG 129622 130572 951 + replication factor C small loader subunit Clamp loader subunit [Synechococcus phage S-SSM5] (YP_004324752.1, subunit (3.15e-56) 5e-61), 36/58 [PLN03025] PLN03025 replication factor C subunit (4.00e-32)

[62] PHA02593 201 Putative clamp Putative clamp loader subunit gp62 [Dickeya phage Limestone] ATG 130569 131000 432 + Clamp loader small loader subunit (YP_007237440.1, 3e-04), 31/52 subunit (8.61e-06)

202 ORF 202 ATG 131002 131397 396 + None No significant similarities

203 ORF 203 ATG 131394 131633 240 + None No significant similarities 204 ORF 204 ATG 131705 131929 225 + None No significant similarities

205 ORF 205 GTG 131926 132186 261 + None No significant similarities 206 ORF 206 ATG 132188 132520 333 + None No significant similarities 207 ORF 207 ATG 132522 132635 114 + None No significant similarities

208 Hypothetical ATG 132721 132975 255 + None gp44 [Phage phiJL001] (YP_223968.1, 3e-07), 47/54 protein

209 ORF 209 ATG 133026 133265 240 + None No significant similarities

210 ORF 210 ATG 133277 133789 513 + None No significant similarities

211 ORF 211 ATG 133789 134007 219 + None No significant similarities

212 ORF 212 ATG 133958 134398 441 + None No significant similarities 269

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Conserved Domains Start Start

ORF Function Start End Identity/ % Similarity

Codon Length (E value) n/ Strand n/

(bp) (bp) Orientatio [Dam] COG0338 Site-specific DNA methylase (6.92e-24) DNA adenine methylase [Hippea sp. KM1] (WP_025209635.1, 8e-18), [MethyltransfD12] Adenine-specific 27/47 213 ATG 134392 135225 834 + pfam02086 methylase DNA adenine methylase [Sulfurihydrogenibium azorense] D12 class N6 adenine- (WP_012675033.1, 4e-17), 26/50 specific DNA methyltransferase; (3.11e-20) DNA polymerase [Synechococcus phage S-SKS1] (YP_007674467.1, 3e- [43] PHA02528 DNA polymerase, 118), 32/53 214 ATG 135263 137797 2535 + DNA polymerase (2.44e- T4-like DNA polymerase [Synechococcus phage syn9] (YP_717843.1, 1e-116), 154) 31/51

215 ORF 215 ATG 137809 138573 765 + None No significant similarities

216 ORF 216 GTG 138588 138788 201 + None No significant similarities

[30] PHA02587 DNA ligase (2.60e-25) ATP-dependent DNA ligase [Paenibacillus alvei] (WP_005552334.1, 2e- ATP-dependent 217 ATG 138862 140124 1263 + [CDC9] COG1793 22), 30/49 DNA ligase ATP-dependent DNA gp32.95 [Bacillus phage SPO1] (YP_002300428.1, 3e-22), 28/46 ligase (1.44e-26) 218 ORF 218 ATG 140126 140281 156 + None No significant similarities

219 ORF 219 ATG 140283 140495 213 + None No significant similarities

220 ORF 220 ATG 140495 140845 351 + None No significant similarities 221 ORF 221 ATG 140845 141174 330 + None No significant similarities

222 ORF 222 ATG 141171 141560 390 + None No significant similarities

270

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Conserved Domains Start Start

ORF Function Start End % Similarity

Codon Strand Length (E value) (bp) (bp) Orientation/ Orientation/ Hypothetical protein [Tenacibaculum maritimum] (WP_024740060.1, 3e-11), Hypothetical 31/ 223 ATG 141557 142147 591 + None protein Hypothetical protein VPFG_00208 [Vibrio phage nt-1] (YP_008125357.1, 7e- 11), 32/ [Thymidylat_synt] Thymidylate synthase [Alistipes sp. CAG:831] (WP_021928618.1, 2e-108), Putative pfam00303 61/75 224 thymidylate ATG 142144 142983 840 + Thymidylate synthase Putative thymidylate synthase protein [Rhizobium phage RHEph10] synthase (1.14e-130) (AGC36120.1, 4e-108), 58/71

[DHFR_1] pfam00186 Dihydrofolate reductase [Methylobacterium extorquens] (WP_003598382.1, Dihydrofolate 225 GTG 142993 143379 387 + Dihydrofolate reductase 1e-26), 42/62 reductase (6.24e-26) Dihydrofolate reductase [uncultured bacterium] (AIA14822.1, 4e-26), 41/63

226 ORF 226 ATG 143376 143606 231 + None No significant similarities 227 ORF 227 ATG 143614 143910 297 + None No significant similarities 228 ORF 228 ATG 143907 144086 180 + None No significant similarities 229 ORF 229 ATG 144088 144486 399 + None No significant similarities 230 ORF 230 ATG 144557 144805 249 + None No significant similarities Hypothetical protein SmphiM12_381 [Sinorhizobium phage phiM12] Conserved [DUF3820] pfam12843 (AGR48013.1, 4e-49), 41/60 231 hypothetical ATG 144810 145580 771 + Protein of unknown Hypothetical protein RB43ORF286c [Enterobacteria phage RB43] protein function (1.16e-04) (YP_239262.1, 1e-38), 35/56 232 ORF 232 ATG 145577 145834 258 + None No significant similarities Hypothetical Hypothetical protein SmphiM12_384 [Sinorhizobium phage phiM12] 233 ATG 145834 146250 417 + None protein (AGR48016.1, 7e-05), 26/ 48

Hypothetical Hypothetical protein P106B_56 [Rhizobium phage vB_RglS_P106B] 234 ATG 146332 146580 249 + None protein (YP_009005982.1, 2e-13), 40/63 235 ORF 235 ATG 146653 146931 279 + None No significant similarities No significant similarities 236 ORF 236 ATG 146928 147080 153 + None

271

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), %Identity/ Conserved Domains

ORF Function Start End %Similarity Strand Length (E value) Orientation/ Orientation/

Start Codon Start (bp) (bp)

[GT1_YqgM_like] cd03801 GT1 family-like Conserved glycosyltransferases 237 hypothetical GTG 147071 148264 1194 + (1.41E-08) No significant similarities protein [Glyco_transf_4] pfam13439 Glycosyltransferase Family 4 (8.26E-03)

238 ORF 238 ATG 148275 148787 513 + None No significant similarities [DUF2312] pfam10073 MULTISPECIES: hypothetical protein [Rhizobium] (WP_007631616.1, 5e- Conserved Uncharacterized protein 39), 82/87 239 hypothetical ATG 148801 149061 261 + conserved in bacteria Hypothetical protein [Rhizobium sp. 2MFCol3.1] (P_018899724.1, 1e-38), protein (DUF2312) (1.58e-25) 86/91 240 ORF 240 ATG 149054 149236 183 + None No significant similarities 241 ORF 241 ATG 149738 149286 453 - None No significant similarities 242 ORF 242 ATG 149843 150289 447 + None No significant similarities 243 ORF 243 ATG 150433 150681 249 + None No significant similarities

244 ORF 244 ATG 150692 151123 432 + None No significant similarities 245 ORF 245 ATG 151190 151501 312 + None No significant similarities 246 ORF 246 ATG 151984 152220 237 + None No significant similarities 247 ORF 247 ATG 152285 152527 243 + None No significant similarities 248 ORF 248 ATG 152543 152950 408 + None No significant similarities 249 ORF 249 ATG 152978 153343 366 + None No significant similarities 250 ORF 250 ATG 153450 153920 471 + None No significant similarities 251 ORF 251 ATG 153923 154186 264 + None No significant similarities

272

Position Associated Putative Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Conserved Domains

ORF Function % Similarity Strand Start End Length (E value) Start Codon Start

(bp) (bp) Orientation/

Hypothetical Hypothetical protein [Sinorhizobium meliloti] (WP_028003164.1, 7e-09), 252 ATG 154189 154527 339 + None protein 35/48

253 ORF 253 ATG 154605 155108 504 + None No significant similarities

Hypothetical protein [Mesorhizobium sp. LSHC414A00] (WP_023724094.1, 254 ORF 254 ATG 155086 155463 378 + None 6e-08), 41/62

255 ORF 255 ATG 155450 155770 321 + None No significant similarities

[tRNA-synt_1b] pfam00579 Hypothetical protein 7-7-1_00088 [Agrobacterium phage 7-7-1] Conserved tRNA synthetases class I (W (YP_007006544.1, 2e-25), 43/60 256 hypothetical ATG 155757 156155 399 + and Y); Hypothetical protein [Mesorhizobium sp. L103C120A0] (WP_023828496.1, protein (7.78e-03) 1e-08), 40/61

257 ORF 257 GTG 156152 156343 192 + None No significant similarities

273

Appendix VI: Genome annotation of vB_RgaS_P106B: Open reading frames in the genome and their predicted functions

Position Associated Putative Putative Best BLASTP Match (es), (GenBank Accession No:, E value), Conserved Function % Identity / % Similarity

Start End Strand Domain(s) (E value) Length (bp) Length Start Codon Start (bp) (bp) Orientation/ ORF/ tRNA ORF/

Phage terminase, large subunit [Shigella boydii ATCC 9905] (ZP_11647700.1, XtmB[COG1783] Phage Terminase 4e-108), 37/93 1 ATG 1 1737 1737 + Phage terminase large Large Subunit gp33 TerL protein [Escherichia coli LT-68] (ZP_11504207.1, 1e-108), 37/93 subunit (5.56e-04)

Conserved hypothetical protein [Methylosinus trichosporium OB3b] (ZP_06887193.1, 2e-110), 42/86 DUF4055[pfam13264 Putative structural phage protein [Ahrensia sp. R2A130] (ZP_07375873.1, 2e- Putative Phage 2 ATG 1750 3411 1662 + ] Domain of unknown 52), 32/81 Structural Protein function (5.15e-07) Structural protein [Pseudomonas phage MP1412] (YP_006561059.1, 1e-44), 29/79 Hypothetical protein HK639_05 [Escherichia phage HK639] (YP_004934035.1, 1e-62), 47/91 Conserved Conserved hypothetical protein [Klebsiella pneumoniae subsp. 3 Hypothetical ATG 3449 4258 810 + None rhinoscleromatis ATCC 13884] (ZP_06014051.1, 1e-54), 46/81 Protein Conserved hypothetical protein [Escherichia coli E22] (ZP_03043735.1, 1e- 53), 48/80 4 ORF04 GTG 4330 4536 207 - None No significant similarities Hypothetical Hypothetical protein RHEph04_gp002 [Rhizobium phage RHEph04] 5 ATG 4533 4724 192 - None Protein (AGC35688.1, 2e-09), 52/92 No significant similarities 6 ORF06 ATG 4721 4846 126 - None

No significant similarities 7 ORF07 ATG 4856 4975 120 + None

No significant similarities 8 ORF08 ATG 4986 5102 117 + None

Hypothetical protein EbC_31800 [Erwinia billingiae [pfam11651] P22 coat Eb661](YP_003742558.1, 4e-128), 56/94 Putative Phage protein superfamily- 9 ATG 5095 6312 1218 + P22 coat protein [Herbaspirillum sp. YR522] (ZP_10591699.1, 5e-89), 49/93 Coat Protein gene protein 5 (2.94e-

04)

274

Position Associated Putative Putative Best BLASTP Match (es), (GenBank Accession No:, E value), Conserved Function % Identity / % Similarity

Start End Strand Domain(s) (E value) Length (bp) Length ORF/ tRNA ORF/ Start Codon Start (bp) (bp) Orientation/

No significant similarities 10 ORF10 ATG 6415 6966 552 + None

No significant similarities 11 ORF11 ATG 6968 7225 258 + None

Hypothetical protein RHEph10_gp002 [Rhizobium phage RHEph10] (AGC36046.1, 1e-09), 38/47 Hypothetical 12 ATG 7232 7795 564 + None Hypothetical protein Bresu_1398 [Brevundimonas subvibrioides ATCC] Protein (YP_003818333.1, 4e-04), 44/32

Lysophospholipase L1-like esterase [Saccharomonospora xinjiangensis XJ-54] [cd01830] Lysophospholipas (ZP_09983067.1, 3e-05), 27/70 13 ATG 7797 9014 1218 + SGNH_hydrolase e L1-like esterase subfamily (3.01e-08)

14 ORF14 ATG 9027 10661 1635 + None No significant similarities 15 ORF15 ATG 10685 11008 324 + None No significant similarities Hypothetical protein PMI41_01878 [Phyllobacterium sp. YR531] (ZP_10587064.1, 2e-16), 39/87 Hypothetical 16 ATG 11085 11603 519 + None Hypothetical protein BN406_01810 [Sinorhizobium meliloti Rm41] Protein (YP_006840681.1, 5e-13), 47/57

17 ORF17 ATG 11670 12161 492 - None No significant similarities Hypothetical protein mlr8012 [Mesorhizobium loti MAFF303099] Hypothetical 18 ATG 12243 12605 363 + None (NP_108201.1, 8e-10), 35/93 Protein

[pfam04233] Phage Putative head morphogenesis protein SPP1 gp7 [Methylobacterium nodulans 19 Putative head Mu protein F like ORS 2060] (P_002502054.1, 2e-89), 47/ 94 Morphogenesis ATG 12608 13660 1053 + protein Superfamily Head morphogenesis protein SPP1 gp7 [Sinorhizobium meliloti Rm41] Protein (2.05e-04) (YP_006839633.1, 4e-75), 39/92 Hypothetical protein MAXJ12_12917 [Mesorhizobium alhagi CCNWXJ12-2]

(ZP_09293218.1, 2e-16), 58/89 21 Hypothetical None ATG 13984 14208 225 - Hypothetical protein MexAM1_META1p3940 [Methylobacterium extorquens Protein AM1] (YP_002964900.1, 7e-16), 51/89 275

Position Associated Putative Putative Best BLASTP Match (es), (GenBank Accession No:, E value), %Identity/ Conserved Function %Similarity

Start End Strand Domain(s) (E value) Length (bp) Length ORF/ tRNA ORF/ Start Codon Start (bp) (bp) Orientation/

Putative phage protein [bacterium Ellin514] (ZP_03626372.1,1e-20), 28/66 Putative Phage 22 ATG 14189 15367 1179 - None Probable phage protein [candidate division TM7 single-cell isolate Protein TM7c](ZP_02544430.1, 8e-15), 26/58 23 ORF23 ATG 15364 15585 222 - None No significant similarities 24 ORF24 GTG 15582 15785 204 - None No significant similarities The AAA+ Hypothetical protein FP2177 [Flavobacterium psychrophilum JIP02/86] superfamily (7.24e- (YP_001297039.1, 1e-55), 42/73 05) ATPase family protein associated with various cellular activities (AAA) 25 Phage Protein ATG 15782 16903 1122 - AAA domain-dynein- [Opitutaceae bacterium TAV1] (ZP_10264985.1, 1e-53), 40/79 related subfamily Hypothetical protein ObacDRAFT_9610 [Diplosphaera colitermitum TAV2] (3.07e-07) (ZP_03724222.1, 2e-51), 45/59 Hypothetical protein BN406_01804 [Sinorhizobium meliloti Rm41](YP_006840675.1, 5e-24), 46/85 Hypothetical protein SM11_chr1828 [Sinorhizobium meliloti SM11] Hypothetical 26 TTG 16990 17394 405 + (YP_005720357.1, 2e-22), 45/85 Protein Hypothetical protein Smed_1886 [Sinorhizobium medicae WSM419] (YP_001327555.1, 4e-19), 41/85

Hypothetical protein Xaut_3714 [Xanthobacter autotrophicus Py2] DUF4128[pfam13554 (YP_001418597.1, 7e-10), 31/96 ] Bacteriophage Hypothetical protein Mnod_3541 [Methylobacterium nodulans ORS 2060] Hypothetical 27 ATG 17394 17819 426 + related domain of (YP_002498754.1, 1e-08), 32/94 Protein unknown function Hypothetical protein Smed_1885 [Sinorhizobium medicae WSM419] (9.31e-16) (YP_001327554.1, 1e-05), 31/80

Hypothetical protein A1OC_02103 [Stenotrophomonas maltophilia Ab55555] (ZP_18105531.1, 2e-43), 52/58 Hypothetical 28 ATG 17836 18627 792 + None Hypothetical protein Smed_1884 [Sinorhizobium medicae WSM419] Protein (YP_001327553.1, 9e-38), 49/56

Hypothetical protein A1OC_02105 [Stenotrophomonas maltophilia Ab55555] Hypothetical (ZP_18105533.1, 6e-09), 34/85 29 ATG 18714 19112 399 + None Protein Hypothetical protein Smed_1883 [Sinorhizobium medicae WSM419] (YP_001327552.1, 2e-06), 28/95 276

Position Associated Putative Putative Best BLASTP Match (es), (GenBank Accession No:, E value), Conserved Function % Identity / % Similarity

Start End Strand Domain(s) (E value) Length (bp) Length ORF/ tRNA ORF/ Start Codon Start (bp) (bp) Orientation/

30 ORF30 GTG 19349 19480 132 + None No significant similarities [pfam06791] Prophage tail length tape measure protein [Rhizobium sp. CF122] Putative Tail Prophage tail length (ZP_10733181.1, 9e-32), 47/54 31 Length Tape ATG 19502 20353 852 + tape measure protein Prophage tail length tape measure protein [Rhizobium leguminosarum bv. Measure Protein Superfamily (5.44e- trifolii WSM2012] (ZP_18308913.1, 2e-25), 45/46 08) COG5281[COG5281] Phage-related minor Putative Tail Phage tail tape measure protein lambda [Sphingomonas sp. LH128] tail protein (1.01e-10) Length Tape (ZP_10871687.1, 2e-25), 42/39 32 ATG 20617 21780 1164 + [TIGR01541] Phage Measure Protein Prophage tail length tape measure protein [Phyllobacterium sp. YR531] tail tape measure Precursor (ZP_10587054.1, 2e-23), 43/40 protein (4.85e-09)

Prophage LambdaSo, minor tail protein M [Comamonas testosteroni S44] [pfam05939] Phage Phage Minor Tail (ZP_07043761.1, 7e-12), 40/73 33 ATG 21784 22149 366 + minor tail protein Protein Phage minor tail protein [Achromobacter xylosoxidans C54] (ZP_16402796.1, (4.70e-19) 5e-11), 41/77 Putative phage minor tail protein L [Rhizobium phage RHEph10] [pfam05100] Phage (AGC36059.1, 6e-65), 41/98 Phage Minor Tail minor tail protein L 34 ATG 22146 22862 717 + Phage minor tail protein L [Methylobacterium nodulans ORS 2060] Protein Superfamily (3.02e- (YP_002498743.1, 2e-52), 37/98 42)

[COG1310], Predicted metal- dependent protease of Putative phage tail assembly protein [Escherichia coli SE11] Phage Tail 35 ATG 22862 23611 750 + the PAD1/JAB1 (YP_002292921.1, 4e-43), 38/92 Assembly Protein superfamily (1.04e- Tail assembly protein [Shigella sonnei Ss046] (YP_310686.1, 5e-43), 38/92 11)

[COG4723] Phage- Putative tail assembly protein I [Rhizobium phage RHEph10] (AGC36061.1, related protein, tail 8e-37), 42/96 Putative Phage 36 TTG 23589 24179 591 + component, Lambda Putative phage tail protein. Bacteriophage lambda tail assembly I [Ralstonia Tail Protein tail I superfamily syzygii R24] (CCA86125.1, 1e-27), 40/96 (9.70e-22) 277

Position Associated Putative Putative Best BLASTP Match (es), (GenBank Accession No:, E value), Conserved Function % Identity / %Similarity

Start End Strand Domain(s) (E value) Length (bp) Length ORF/ tRNA ORF/ Start Codon Start (bp) (bp) Orientation/

[COG4733] Phage- Putative tail tip fiber protein [Burkholderia phage phiE125] (NP_536377.1, 1e- Putative phage- related protein, tail 118), 40/49 37 related protein, GTG 24552 28070 3519 + component (4.46e- Putative tail assembly protein [Rhizobium phage RHEph10] (AGC36062.1, 6e- Tail component 123) 132), 41/46 Phage tail fiber protein [Pseudomonas stutzeri KOS6](ZP_16058461.1, 4e-33), 41/49 Phage tail fiber protein [Pseudomonas pseudoalcaligenes Tail collar [pfam07484] Phage KF707](ZP_21060829.1, 2e-31), 44/45 38 domain- ATG 28067 29224 1158 + Tail Collar Domain Phage tail collar domain-containing protein [Polaromonas naphthalenivorans containing protein (4.16e-16) CJ2] (YP_983525.1, 1e-27), 36/52 Tail Collar domain-containing protein [Desulfovibrio africanus str. Walvis Bay] (YP_005053494.1, 1e-26), 53/27 Tail fiber assembly-like protein [Synechococcus phage S-CRM01] Putative tail fiber (YP_004508450.1, 4e-12), 49/42 39 assembly-like ATG 29236 29748 513 + None Tail fiber assembly-like protein [Ralstonia solanacearum GMI1000] protein (NP_520041.1, 2e-11), 56/31 [cd00737] Endolysins (5.62e-46) Phage related [COG3772] Phage- 40 ATG 29745 30728 984 + Lysozyme [Enterobacteria phage P22] (NP_059622.1, 2e-49), 58/44 lysozyme related lysozyme: muraminidase (1.90e-36) Hypothetical Hypothetical protein Agau_C200517 [Agrobacterium tumefaciens F2] 41 ATG 30725 30991 267 + None protein (ZP_12504634.1, 1e-15), 45/94 Hypothetical Hypothetical protein AGRO_3661 [Agrobacterium sp. ATCC 31749] 42 GTG 30957 31124 168 + None Protein (ZP_08529658.1, 7e-14), 71/80 43 ORF43 ATG 31154 31315 162 - None No significant similarities 44 ORF44 ATG 31462 31782 321 + None No significant similarities 45 ORF45 ATG 31802 31918 117 - None No significant similarities Hypothetical protein RCCGE510_07166 [Rhizobium sp. CCGE 510] Hypothetical (ZP_10834293.1, 6e-09), 40/65 46 ATG 31915 32286 372 - None Protein Hypothetical protein PROSTU_01916 [Providencia stuartii ATCC 25827] (ZP_02960015.1, 2e-07) 33/93 2 78

Position Associated Putative Putative Best BLASTP Match (es), (GenBank Accession No:, E value), Conserved Function % Identity / % Similarity

Start End Strand Domain(s) (E value) Length (bp) Length ORF/ tRNA ORF/ Start Codon Start (bp) (bp) Orientation/

None No significant similarities 47 ORF47 ATG 32289 32486 198 - Putative DNA polymerase [Edwardsiella phage MSW-3] (YP_007348961.1, [cd05538] DNA 2e-160), 41/98 Putative Phage polymerase type-II B Putative DNA polymerase [Vibrio phage vB_VchM-138] (YP_007006391.1, 48 ATG 32500 34551 2052 - DNA Polymerase subfamily catalytic 6e-148), 38/99 domain (2.22e-12) Putative DNA polymerase protein [Rhizobium phage RHEph04] (AGC35732.1, 1e-40), 27/78 None 49 ORF49 ATG 34556 34723 168 - No significant similarities

Exodeoxyribonuclease VIII [Klebsiella oxytoca KCTC 1686] [PHA02570] Putative (YP_005019476.1, 2e-19), 32/97 Exnuclease: DnaQ- 50 Exodeoxyribonucl ATG 34723 35418 696 - Exonuclease [Enterobacteria phage mEpX2] (YP_007111467.1, 2e-19), 32/98 like 3'-5' exonuclease ease Exodeoxyribonuclease VIII [Klebsiella pneumoniae subsp. pneumoniae 1084] domain superfamily (YP_006636877.1 , 3e-19), 32/97 (5.45e-07)

gp68 [Burkholderia phage BcepB1A] (YP_024902.1, 1e-22), 35/95 Hypothetical protein [Klebsiella phage JD001] (YP_007392873.1, 3e-10), Hypothetical 51 ATG 35536 36474 939 - None 32/58 Phage Protein Hypothetical protein 7-7-1_00020 [Agrobacterium phage 7-7-1] (YP_007006476.1, 1e-09), 29/69 tRNA tRNA-Leu TGG 36501 36576 76 [pfam10926] Protein of unknown function Hypothetical protein [Klebsiella phage JD001] (YP_007392872.1, 5e-25), of Superfamily 26/83 Hypothetical 52 ATG 36579 37652 1074 - cl00641, CRISPR/Cas gp67 [Burkholderia phage BcepB1A] (YP_024901.1, 6e-22), 25/97 Phage Protein system-associated Hypothetical protein 7-7-1_00021 [Agrobacterium phage 7-7-1] protein Cas4 (6.88e- (YP_007006477.1, 3e-16), 26/99 32) 53 ORF53 ATG 37659 37952 294 - None No significant similarities 279

Position Associated Putative Putative Best BLASTP Match (es), (GenBank Accession No:, E value), Conserved Function % Identity / % Similarity

Start End Strand Domain(s) (E value) Length (bp) Length ORF/ tRNA ORF/ Start Codon Start (bp) (bp) Orientation/

54 ORF54 ATG 37949 38395 447 - None No significant similarities 55 ORF55 ATG 38501 39142 642 + None No significant similarities 56 ORF56 ATG 39139 39369 231 + None No significant similarities Hypothetical Hypothetical protein SCH4B_4353 [Silicibacter sp. TrichCH4B] 57 ATG 39371 39781 411 + None Protein (ZP_05743180.1, 9e-06), 30/82 [cd00079] Helicase superfamily c- terminal domain Putative helicase [Edwardsiella phage MSW-3] (BAM68876.1, 5e-136), 43/98 58 Putative Helicase ATG 39781 41472 1692 + (5.28e-13) Putative helicase [Vibrio phage vB_VchM-138] (YP_007006398.1, 2e-124), [pfam00271] Helicase 39/95 conserved C-terminal domain (4.05e-15) Hypothetical protein [Edwardsiella phage MSW-3](YP_007348969.1, 1e-20), 46/94 Hypothetical 59 ATG 41469 41858 390 + None gp62 [Burkholderia phage BcepB1A] (YP_024897.1, 4e-18), 45/87 Protein Hypothetical protein VchM-138_0011 [Vibrio phage vB_VchM-138] (YP_007006400.1, 9e-18), 44/90 60 ORF60 ATG 41855 42142 288 + None No significant similarities 61 ORF61 ATG 42139 42363 225 - None No significant similarities Hypothetical Hypothetical protein amb1493 [Magnetospirillum magneticum AMB- 62 TTG 42421 42627 207 - None Protein 1](YP_420856.1, 3e-11), 40/98 63 ORF63 ATG 42624 42815 192 - None No significant similarities 64 ORF64 ATG 42812 42997 186 - None No significant similarities pfam12844] Helix- turn-helix Hypothetical 65 ATG 42994 43227 234 - domain:HTH-XRE No significant similarities Protein Superfamily (4.19e- 04)

66 ORF66 ATG 43291 43563 273 - None No significant similarities 280

Position Associated Putative Putative Best BLASTP Match (es), (GenBank Accession No:, E value), Conserved Function % Identity / % Similarity

Start End Strand Domain(s) (E value) Length (bp) Length ORF/ tRNA ORF/ Start Codon Start (bp) (bp) Orientation/

[cd11541] Nucleoside Probable Hypothetical protein 7-7-1_00030 [Agrobacterium phage 7-7- Triphosphate Nucleoside 1](YP_007006486.1 , 1e-56), 56/82 Pyrophosphohydrolas 67 Triphosphate ATG 43584 44324 741 - MazG nucleotide pyrophosphohydrolase domain-containing protein e,NTP-Ppase Pyrophosphohydr [Candidatus Chloracidobacterium thermophilum B] (YP_004862440.1, 1e-20), superfamily (9.09e- olase 42/51 18) 68 ORF68 ATG 44321 44545 225 - None No significant similarities 69 ORF69 ATG 44511 44771 261 - None No significant similarities [pfam12728] Helix- turn-helix domain (2.13e-08) Putative DNA DNA-binding protein, putative [Synechococcus sp. WH 70 TTG 44859 45092 234 + [TIGR01764] DNA Binding Protein 5701](ZP_01085983.1, 7e-04), 33/62 binding domain, excisionase family (6.76e-07) DR0530-like primase [Burkholderia phage BcepNazgul] (NP_919008.2, 6e- 23), 26/46 71 DNA Primase ATG 45089 47428 2340 + None Primase-polymerase [Burkholderia phage AH2] (YP_006561162.1, 2e-19), 29/27 72 Hypothetical Hypothetical protein 7-7-1_00073 [Agrobacterium phage 7-7- ATG 47558 48133 576 - None Protein 1](YP_007006529.1, 7e-64), 56/98 73 ORF73 ATG 48187 48531 345 + None No significant similarities 74 ORF74 ATG 48673 48909 237 - None No significant similarities 75 ORF75 ATG 49081 49293 213 - None No significant similarities 76 ORF76 ATG 49468 49677 210 - None No significant similarities [COG2110] Predicted phosphatase Phage Tail- homologous to the C- Phage tail assembly protein [Microscilla marina ATCC 23134] 77 ssembly like GTG 49719 50267 549 - terminal domain of (ZP_01691567.1, 2e-32), 40/79 protein histone macroH2A1 (4.98e-20) 281

Position Associated Putative Putative Best BLASTP Match (es), (GenBank Accession No:, E value), Conserved Function % Identity / % Similarity

Start End Strand Domain(s) (E value) Length (bp) Length ORF/ tRNA ORF/ Start Codon Start (bp) (bp) Orientation/

78 ORF78 ATG 50264 50464 201 - None No significant similarities

79 ORF79 ATG 50628 51011 384 - None No significant similarities

80 ORF80 ATG 51059 51451 393 - None No significant similarities

81 ORF81 ATG 51469 51945 477 - None No significant similarities 82 ORF82 ATG 51945 52103 159 - None No significant similarities 83 ORF83 ATG 52108 52236 129 - None No significant similarities 84 ORF84 ATG 52233 52439 207 - None No significant similarities 85 ORF85 ATG 52405 52746 342 - None No significant similarities Hypothetical protein RDJLphi1_gp43 [Roseobacter phage RDJL Phi 1] Hypothetical 86 ATG 53149 53424 276 - None (YP_004421811.1, 5e-08), 36/94 Phage Protein gp88 [Mycobacterium phage Konstantine] (YP_002242145.1, 7e-05), 34/95

Hypothetical Hypothetical protein 7-7-1_00060 [Agrobacterium phage 7-7- 87 ATG 53421 53720 300 - None Protein 1](YP_007006516.1, 8e-19), 47/95

88 ORF88 ATG 53717 53938 222 - None No significant similarities 89 ORF89 ATG 53943 54113 171 - None No significant similarities 90 ORF90 ATG 54114 54377 264 - None No significant similarities 91 ORF91 ATG 54392 54604 213 - None No significant similarities

92 ORF92 TTG 54636 54758 123 - None No significant similarities

Hypothetical Hypothetical protein RAZWK3B_15538 [Roseobacter sp. AzwK-3b] 93 GTG 54783 55346 564 + None Protein (ZP_01903791.1, 8e-04), 34/43

94 ORF94 ATG 55351 55716 366 - None No significant similarities

95 ORF95 GTG 55723 55968 246 - None No significant similarities 282

Appendix VII: Genome annotations of vB_MloP_Lo5R7ANS: Open reading frames in the genome and their predicted functions

Position

Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Start End Conserved Domains

ORF % Similarity Strand (bp) (bp) Length (E value) Start Codon Start Orientation/ Orientation/

Hypothetical Hypothetical protein BW45_23075 [Agrobacterium tumefaciens] 1 ATG 1118 1684 567 + None protein (KAJ36256.1, 3e-57), 98/64 Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805363.1, Hypothetical 2 ATG 1734 2552 819 + None 1e-120), 72/79 hypothetical protein [Aminobacter sp. J41] protein (WP_024847825.1, 3e-97), 61/70 Hypothetical Hypothetical protein SMRU11_1270 [Sinorhizobium meliloti RU11/001] 3 ATG 2640 3038 399 + None protein (CDH85470.1, 6e-11), 91/50 Hypothetical MULTISPECIES: hypothetical protein [Sphingobium] (WP_013039370.1, 5e- 4 ATG 3137 3358 222 + None protein 07 4), 73/61 Hypothetical MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805356.1, 5 ATG 3361 3588 228 + None protein 3e-19), 90/70 6 ORF 6 ATG 3585 3863 279 + None No significant similarities Hypothetical Hypothetical protein SMRU11_1276 [Sinorhizobium meliloti RU11/001] 7 ATG 3972 4139 168 + None protein (CDH85476.1, 1e-04), 92 /56 Hypothetical protein SMRU11_1277 [Sinorhizobium meliloti RU11/001] HTH_39 (pfam14090) Conserved (CDH85477.1, 7e-17), 52/64 8 ATG 4173 4640 468 + Helix-turn-helix domain hypothetical protein Hypothetical protein [Stenotrophomonas maltophilia] (WP_004154592.1, 3e- (1.12e-12) 09), 43/63 HTH_20 pfam12840 Hypothetical 9 ATG 4737 5195 459 + Helix-turn-helix domain Hypothetical protein [Xanthomonas oryzae] (WP_024712333.1, 1e-04), 63/53 protein (3.49e-04) Hypothetical protein [Desulfovibrio oxyclinae] (WP_018123356.1, 3e-15), Hypothetical 10 ATG 5197 5565 369 + None 47/61 protein

283

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Conserved Domains

ORF % Similarity Strand Start End Length (E value) Start Codon Start (bp) (bp) Orientation/

Transcriptional regulator [Sinorhizobium meliloti RU11/001] (CDH85478.1, Hypothetical 2e-24), 51/65 11 ATG 5690 6070 381 + None protein MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023709139.1, 6e-12), 39/55 HP1_INT_C [cd00797] Phage HP1 integrase (1.14e-25) Tyrosine recombinase xerD [Sinorhizobium meliloti RU11/001] (CDH85479.1, Putative phage 12 ATG 6073 7152 1080 + Phage_integrase 5e-98), 47/66 integrase [pfam00589] Integrase [Mesorhizobium sp. LNJC384A00] (WP_023750195.1, 1e-61), 35/55 Phage integrase family (2.11e-20) Hypothetical protein [Novispirillum itersonii] (WP_019647224.1, 3e-83), 60/73 Endonuc-BglII [pfam09195] Putative restriction Restriction endonuclease BglII [Rhodospirillum centenum] (WP_012567837.1, 13 ATG 7233 7886 654 + Restriction endonuclease endonuclease BglII 2e-81), 61/73 BglII (8.30e-44) Restriction endonuclease BglII [Afipia sp. 1NLS2] (WP_009339934.1, 4e-78), 60/73

Puttative adenine- ME4 [COG4725] S-adenosylmethionine-binding protein [Acidobacterium capsulatum] specific DNA Transcriptional activator, (WP_015896818.1, 9e-113), 76/85 14 methyltransferase, ATG 7861 8502 642 + adenine-specific DNA Transcriptional activator, adenine-specific DNA methyltransferase [Nitrincola transcriptional methyltransferase (1.42e-36) sp. AK23] (EXJ09175.1, 7e-109), 73/86 activator

Hypothetical protein [Fluviicola taffensis] (WP_013685338.1, 3e-49) 52/66 Hypothetical 15 ATG 8855 9400 546 + None Hypothetical protein [Hymenobacter norwichensis] (WP_022824845.1, 9e-45), protein 51/63 RtcB [COG1690] Uncharacterized conserved protein [Function unknown] Protein RtcB [Asaia platycodi SF2.1] (CDG40936.1, 1e-70), 55/69 16 Putative RtcB ATG 9397 10050 654 + (4.87e-14) Hypothetical protein [Meganema perideroedes](WP_018633569.1, 2e-70), RtcB [pfam01139] 56/71 tRNA-splicing ligase RtcB (9.68e-17) 284

Position Associated Putative Best BLASTP match (es), (GeneBank Accession No:, E value), % Identity/ Putative Function Conserved Domains

ORF % Similarity Strand Length (E value)

Start End Orientation/ Start Codon Start (bp) (bp)

Hypothetical protein [Mycobacterium avium] (WP_023870746.1, 7e-10), 37/54 Hypothetical 17 ATG 10047 10328 282 + None Hypothetical protein DC74_2935 [Streptomyces albulus] (AIA03435.1, 4e-08), protein 36/54

RNA_pol [pfam00940] DNA-dependent RNA polymerase (1.33e-158) DNA-directed RNA polymerase, mitochondrial [Sinorhizobium meliloti T7-like phage RPO41 [COG5108] RU11/001] (CDH85436.1, 0.0), 58/72 18 DNA-directed TTG 10485 12950 2466 + Mitochondrial DNA- T3/T7 RNA polymerase [Mesorhizobium sp. LNJC405B00] RNA polymerase directed RNA polymerase (WP_023729895.1, 0.0), 42/58 [Transcription] (8.54e-167)

Hypothetical Hypothetical protein SMRU11_1236 [Sinorhizobium meliloti RU11/001] 19 ATG 13031 13234 204 + None protein (CDH85437.1, 2e-17), 59/75

Hypothetical Hypothetical protein SMRU11_1237 [Sinorhizobium meliloti RU11/001] 20 ATG 13411 13677 267 + None protein (CDH85438.1, 1e-30), 64/75

21 ORF 21 ATG 13674 14036 363 + None No significant similarities

Hypothetical MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023729900.1, 22 GTG 14033 14158 126 + None protein 7e-07), 63/71

Putative phage PHA00458 [PHA00458] Single-stranded DNA-binding protein [Pseudomonas sp. TJI-51] single-stranded single-stranded DNA- (WP_009682638.1, 6e-70), 50/64 23 ATG 14187 14927 741 + DNA-binding binding protein Putative phage single-stranded DNA-binding protein [Rhizobium phage protein (1.30e-11) RHEph01] (AGC35527.1, 3e-65), 51/65 285

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Conserved Domains

ORF % Similarity Strand Start End Length (E value) Orientation/ Orientation/

Start Codon Start (bp) (bp)

Hypothetical protein [Mesorhizobium sp. LNJC384A00] (WP_023750225.1, Hypothetical 25 ATG 15413 15649 237 + None 3e-05), 39/55 protein

No significant similarities 26 ORF 26 ATG 15652 15831 180 + None

No significant similarities 27 ORF 27 ATG 15831 16088 258 + None

Hypothetical protein BW45_22910 [Agrobacterium tumefaciens] (KAJ36288.1, 4e-41), 47/65 Hypothetical 28 ATG 16088 16585 498 + None MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805402.1, protein 4e-23), 36/56

29 ORF 29 ATG 16582 16809 228 + None No significant similarities

30 ORF 30 GTG 16806 16955 150 + None No significant similarities

GP4d_helicase [cd01122] GP4d_helicase DNA primase [Pseudomonas putida] (WP_010953243.1, 0.0), 59/73 (7.54e-85) MUL TISPECIES: DNA primase [Thioalkalivibrio] (WP_018868296., 0.0), DNA 31 ATG 16956 18605 1650 + TOPRIM_primases 55/71 primase/helicase [cd01029] Putative DNA primase/helicase protein [Rhizobium phage RHEph01] TOPRIM_primases (AGC35532 .1, 0.0), 58/70 ( 5.33e-09)

32 ORF 32 ATG 18586 18813 228 + None No significant similarities

PcnB [COG0617] tRNA Conserved nucleotidyltransferase/poly Hypothetical protein [Rhodopseudomonas palustris] (WP_013502669.1, 4e- 33 ATG 18961 19542 582 + hypothetical protein (A) polymerase 10), 29/46 (2.57e-05)

286

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Conserved Domains

ORF % Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start

(bp) (bp) DNA_pol_A_pol_I_B [cd08643] DNA polymerase [Sinorhizobium meliloti RU11/001] (CDH85448.1, 0.0), Putative DNA Polymerase I (7.35e-146) 34 ATG 19542 21539 1998 + 69/79 polymerase DNA_pol_A [pfam00476] DNA polymerase [Azorhizobium caulinodans] (WP_012172120.1, 0.0), 56/69 DNA polymerase family A (3.39e-34)

Hypothetical Hypothetical protein SMRU11_1248 [Sinorhizobium meliloti RU11/001] 35 ATG 21701 21901 201 + None protein (CDH85449.1, 5e-16), 55/72

PHA00439 [PHA00439] exonuclease (8.36e-51) DNA polymerase I [Sinorhizobium meliloti RU11/001] (CDH85451.1, 3e-120), PIN_53EXO [cd09859] 71/81 Putative 5'-3' PIN domain of the 5'-3' Phage exonuclease [Azorhizobium caulinodans] (WP_012172115.1, 1e-65), 36 TTG 21898 22629 732 + exonuclease exonuclease of Taq DNA 50/61 polymerase I and homologs Putative 5'-3' exonuclease protein [Rhizobium phage RHEph01] (AGC35537.1, (1.16e-13) 2e-61), 49/60

Hypothetical Hypothetical protein SMRU11_1251 [Sinorhizobium meliloti RU11/001] 37 GTG 22617 22931 315 + None protein (CDH85452.1, 3e-29), 76/83

38 ORF 38 TTG 23152 23036 117 - None No significant similarities Hypothetical Hypothetical protein RHEph01_gp031 [Rhizobium phage RHEph01] 39 ATG 23139 23351 213 + None protein (AGC35542.1, 5e-04), 45/56 Head-to-tail joining protein [Sinorhizobium meliloti RU11/001] (CDH85454.1, Head-tail_con [pfam12236] 0.0), 63/78 Putative head-to- Bacteriophage head to tail MULTISPECIES: head-tail joining protein [Thioalkalivibrio] 40 TTG 23355 24947 1593 + tail joining protein connecting protein (WP_018868285.1, 4e-139), 44/65 (1.50e-94) Putative head-to-tail joining protein [Rhizobium phage RHEph01] (AGC35543.1, 3e-137), 43/63

41 ORF 41 ATG 24940 25083 144 + None No significant similarities 287

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Conserved Domains

ORF % Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start

(bp) (bp)

42 ORF 42 GTG 25537 25133 405 - None No significant similarities

Capsid assembly protein Scaffolding protein [Sinorhizobium meliloti RU11/001] (CDH85455.1, 2e-39), 59/79 PHA00435 [PHA00435] Putative capsid Scaffolding protein [Pelagibacter phage HTVC019P] (YP_007517836.1, 1e- 43 ATG 25589 25936 348 + capsid assembly protein assembly protein 2), 42/61 (6.94e-07) Putative T7-like capsid assembly protein [Rhizobium phage RHEph01] (AGC35546.1, 6e-21), 39/67 PGRP [cd06583] Peptidoglycan recognition proteins (PGRPs) (1.13e-19) N-acetylmuramoyl-L-alanine amidase [Agrobacterium tumefaciens] Putative N- PHA00447 [PHA00447] (KAJ36279.1, 4e-60), 59/76 44 acetylmuramoyl-L- ATG 26026 26484 459 + lysozyme(1.84e-42) Lysozyme [Mesorhizobium sp. LNJC384A00] (WP_023750217.1, 3e-50), alanine amidase Amidase_2 [pfam01510] 52/72 N-acetylmuramoyl-L- alanine amidase (2.05e-16) Hypothetical Hypothetical protein BW45_22985 [Agrobacterium tumefaciens] 45 GTG 26481 26696 216 + None protein (KAJ36294.1, 8e-08), 47/70

Hypothetical Hypothetical protein [Mesorhizobium sp. LNJC405B00] 46 ATG 26689 27015 327 + None protein (WP_023729918.1,2e-11), 37/55 Hypothetical protein SMRU11_1257 [Sinorhizobium meliloti RU11/001] Hypothetical (CDH85457.1, 6e-21), 60/73 47 TTG 26987 27211 225 + None protein Hypothetical protein [Mesorhizobium sp. LNJC405B00] (WP_023729919.1, 8e-08), 58/72 mtlR [PRK11001] mannitol repressor protein Hypothetical protein [Bradyrhizobium sp. Tv2a-2] (WP_024519536.1, 1e-11), Probable (1.36e-050 37/61 48 transcriptional ATG 27583 27254 330 - MtlR [COG3722] Mannitol operon repressor [Alishewanella jeotgali] (WP_008951916.1, 5e-10), regulator Transcriptional regulator 37/51 [Transcription] (9.50e-04) 288

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), %Identity/ Putative Function Conserved Domains

ORF %Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start

(bp) (bp)

49 ORF 49 GTG 27955 27767 189 - None No significant similarities Minor capsid protein 10B [Sinorhizobium meliloti RU11/001] (CDH85458.1, PHA00201 [PHA00201] Putative major 3e-167), 77/87 50 ATG 27956 28891 936 + major capsid protein capsid protein Capsid protein [Mesorhizobium sp. LNJC405B00] (WP_023729922.1, 4e-99), (1.99e-44) 53/68 Tail tubular protein A [Sinorhizobium meliloti RU11/001 (CDH85459.1, 2e- 66), 57/73 PHA00428 [PHA00428] Putative T7-like tail MULTISPECIES: tail tubular protein A [Thioalkalivibrio] (WP_018868278.1, 51 ATG 28972 29559 588 + tail tubular protein A tubular protein A 4e-44), 44/66 (6.02e-44) Putative tail tubular protein A [Rhizobium phage RHEph01] (AGC35551.1, 1e- 36), 40/60

Tail tubular protein B [Sinorhizobium meliloti RU11/001] (CDH85460.1, 0.0), Putative T7-like tail 56/72 52 ATG 29559 32033 2475 + None tubular protein B Putative tail tubular protein B [Rhizobium phage RHEph01] ( AGC35552.1, 0.0), 41/58

Hypothetical protein [Desulfovibrio oxyclinae] (WP_018123373.1, 3e-23), 36/54 PHA01733 [PHA01733] Conserved Hypothetical protein [Pelagibacter phage HTVC019P] (YP_007517841.1, 1e- 53 ATG 32033 32497 465 + hypothetical protein hypothetical protein 19), 34/54 (2.67e-20) Putative internal virion protein [Rhizobium phage RHEph01] (AGC35553.1, 4e-17), 32/47 Hypothetical protein SMRU11_1261 [Sinorhizobium meliloti RU11/001] Hypothetical (CDH85461.1, 3e-80), 70/80 54 ATG 32501 33046 546 + None protein Putative internal virion protein [Rhizobium phage RHEph01] (AGC35554.1, 3e-35), 42/60 Hypothetical protein SMRU11_1262 [Sinorhizobium meliloti RU11/001] Hypothetical (CDH85462.1, 2e-125), 36/53 55 ATG 33046 35280 2235 + None protein Hypothetical protein RHEph01_gp045 [Rhizobium phage RHEph01] (AGC35555.1, 9e-2), 23/42 289

Position Associated Putative

Best BLASTP match (es), (GenBank Accession No:, E value), %Identity/ Putative Function Conserved Domains Start Start

ORF %Similarity

Codon Strand Start End Length (E value)

(bp) (bp) Orientation/

PHA03415 [PHA03415] Transglycosylase domain protein [Sinorhizobium meliloti RU11/001] Putative internal Putative internal virion (CDH85463.1, 0.0), 47/63 56 ATG 35338 39063 3726 + virion protein protein; Provisional Putative internal virion protein [Rhizobium phage RHEph01] (AGC35556.1, (5.19e-09) 6e-87), 29/45

Phage_T7_tail [pfam03906] Putative T7-like tail Collagen-like protein 7 [Sinorhizobium meliloti RU11/001] (CDH85464.1, 4e- 57 ATG 39139 40773 1635 + Phage T7 tail fibre protein fibre protein 122), 40/50 (4.14e-15)

MULTISPECIES: hypothetical protein [Mesorhizobium] Hypothetical (WP_023805376.1,1e-33), 47/59 58 ATG 40782 41189 408 + None protein Hypothetical protein [Mesorhizobium sp. LNJC384A00] (WP_023750205.1, 3e-20), 54/66

Hypothetical MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023729931.1, 59 GTG 41219 41395 177 + None protein 2e-25), 82/

Hypothetical Hypothetical protein SMRU11_1266 [Sinorhizobium meliloti RU11/001] 60 GTG 41392 41664 273 + None protein (CDH85466.1, 2e-22), 54/70

Hypothetical protein [Mesorhizobium sp. LNJC384A00] (WP_023750213.1, Hypothetical 1e-33), 70/80 61 ATG 41664 41927 264 + None protein Hypothetical protein [Mesorhizobium sp. LNJC405B00] (WP_023729920.1, 1e-30), 65/77

Putative conserved protein [Rhizobium sp. LPU83] (CDM57381.1, 1e-25), Conserved 29/44 62 ATG 41927 43177 1251 + None hypothetical protein Hypothetical protein [Asticcacaulis excentricus] (WP_013477879.1, 6e-24), 30/45 290

Position

Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Start End Conserved Domains

ORF % Similarity Strand (bp) (bp) Length (E value) Start Codon Start Orientation/ Orientation/ DNA maturase B DNA packaging protein B [Sinorhizobium meliloti RU11/001] (CDH85467.1, 0.), 76/84 Phage-related DNA maturase [Azorhizobium caulinodans] (WP_012172098.1, Putative DNA 0.0), 61/74 63 ATG 43278 44984 1707 + None maturase B Putative terminase protein large subunit [Rhizobium phage RHEph01] (AGC35562.1, 0.0), 59/72

Hypothetical protein [Mesorhizobium sp. L2C054A000] (WP_023821684.1, pfam06169 Conserved 3e-39), 84/87 64 ATG 45426 45175 252 - Protein of unknown function hypothetical protein Hypothetical protein [Mesorhizobium sp. LNJC394B00] (WP_023744009.1, (DUF982) (1.10e-05) 6e-35), 74/81

291

Appendix VIII: Genome annotations of vB_MloP_Cp1R7ANS-C2: Open reading frames in the genome and their predicted functions

Position

Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Start End Conserved Domains

ORF % Similarity Strand (bp) (bp) Length (E value) Start Codon Start Orientation/ Orientation/

DNA maturase B [Mesorhizobium sp. L2C067A000] (WP_023815422.1, 2e- 81), 92/98 Putative DNA 1 GTG 11 424 414 + None Predicted DNA packaging protein B or DNA maturase B [Citrobacter phage maturase B protein CR8] (YP_009004210.1, 2e-21), 47/63 Large terminase subunit [Vibrio phage Vc1] (AHN84667.1, 7e-20), 49/69

Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815421.1, Hypothetical 3e-16), 60/76 2 GTG 547 738 192 - None protein Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805368.1, 1e-15), 57/76

Hypothetical protein [Mesorhizobium sp. LSHC420B00] (WP_023722778.1, Hypothetical 1e-26), 66/79 3 ATG 735 1055 321 - None protein Hypothetical protein [Mesorhizobium sp. LSHC420B00] (WP_023720041.1, 3e-26), 63/ 78

Hypothetical protein [Mesorhizobium ciceri biovar biserrulae WSM1271] Hypothetical (YP_004144685.1, 5e-24) 78/87 4 ATG 1049 1243 195 - None protein Hypothetical protein [Mesorhizobium sp. LNHC252B00] (WP_023758276.1, 7e-24), 74/81

Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805363.1, Hypothetical 5 ATG 2176 2967 792 + None 3e-115), 69/78 protein Hypothetical protein [Aminobacter sp. J41] (WP_024847825.1, 5e-104), 62/73

6 ORF 6 ATG 3036 3179 144 + None No significant similarities

Conserved PHA00451 [PHA00451] 7 GTG 3200 3610 411 + No significant similarities hypothetical protein Protein kinase (2.37e-04) 292

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Conserved Domains

ORF % Similarity Strand Start End Length (E value) Start Codon Start (bp) (bp) Orientation/

Hypothetical MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805362.1, 8 ATG 3639 3827 189 + None protein 2e-35), 95/100

Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815415.1, Hypothetical 2e-46), 93/95 9 ATG 3922 4224 303 + None protein Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805361.1, 6e-45), 90/92

Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805360.1, Hypothetical 3e-60), 87/89 10 ATG 4227 4529 303 + None protein Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815414.1, 1e-59), 86/88

Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805359.1, Hypothetical 6e-41), 94/95 11 ATG 4526 4735 210 + None protein Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815413.1, 9e-41), 94/97

Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805358.1, Hypothetical 2e-62), 90/95 12 ATG 4732 5037 306 + None protein Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815412.1, 5e-46), 72/80 Hypothetical Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815411.1, 13 ATG 5034 5384 351 + None protein 7e-05), 35/ 60

Hypothetical MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805356.1, 14 ATG 5381 5599 219 + None protein 2e-43), 89/98

MULTISPECIES: hypothetical protein [Mesorhizobium] Hypothetical 15 ATG 5859 6041 183 + None (WP_023805354.1,1e-18), 71/81 protein

293

Position Associated Putative Best BLASTP match (es), (GeneBank Accession No:, E value), % Identity/ Putative Function Conserved Domains

ORF % Similarity Strand Length (E value)

Start End Orientation/ Start Codon Start (bp) (bp)

16 ORF 16 ATG 6107 6517 411 + None No significant similarities

Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805353.1, Hypothetical 7e-38), 86/88 17 ATG 6675 6905 231 + None protein Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815410.1, 6e-37), 84/86 Hypothetical Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805352.1, 18 ATG 6924 7298 375 + None protein 4e-75), 94/95 HP1_INT_C [cd00797] Phage HP1 integrase Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815408.1, (4.54e-29) 0.0), 84/90 19 Putative integrase ATG 7279 8259 981 + Phage_integrase[pfam00589 Phage HP1 integrase [Azorhizobium caulinodans ORS 571] (YP_001526486.1, ] Phage integrase family 1e-34), 31/45 (6.79e-22) 20 ORF ATG 8256 8447 192 + None No significant similarities

21 ORF ATG 8725 8898 174 + None No significant similarities

Hypothetical Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815449.1, 22 ATG 8898 9092 195 + None protein 7e-13), 54/63 Hypothetical protein RHEph03_gp016 [Rhizobium phage RHEph03] (AGC35643.1, 1e-43), 55/67 Hypothetical Hypothetical protein RHEph02_gp015 [Rhizobium phage RHEph02] 23 ATG 9089 9490 402 + None protein (AGC35582.1, 1e-43), 55/67 Hypothetical protein L338C_147 [Rhizobium phage vB_RleS_L338C] (YP_009003304.1, 7e-35), 48/67 Hypothetical Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805410.1, 24 TTG 9487 9726 240 + None protein 2e-21), 62/71 294

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Conserved Domains

ORF % Similarity Strand Start End Length (E value) Orientation/ Orientation/

Start Codon Start (bp) (bp)

Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815447.1, RNA_pol [pfam00940] 0.0), 80/88 DNA-dependent RNA T3/T7 RNA polymerase [Mesorhizobium sp. LNJC405B00] Putative phage polymerase (1.42e-133) 25 ATG 9771 12257 2487 + (WP_023729895.1, 2e-159), 35/54 RNA polymerase PHA00452[PHA00452] DNA-directed RNA polymerase [Azorhizobium caulinodans ORS 571] T3/T7-like RNA polymerase (YP_001526523.1, 3e-147), 35/52 (0e+00)

MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805408.1, Hypothetical 26 ATG 12286 12522 237 + None 8e-46), 95/97 protein Hypothetical protein [Aminobacter sp. J41] (WP_024847862.1, 4e-27), 69/80

Hypothetical MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805407.1, 27 TTG 12523 12738 216 + None protein 1e-46), 100/100

Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815446.1, Hypothetical 1e-56), 93/95 28 ATG 12738 13040 303 + None protein Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805406.1, 1e-55), 93/94 Putative phage MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805404.1, single-stranded 3e-138), 95/98 29 ATG 13033 13935 903 + None DNA-binding Single-stranded DNA-binding protein [Lysobacter capsici AZ78] protein (EYR67303.1, 8e-22), 34/50

PHA00159 [PHA00159] Endonuclease I MULTISPECIES: endonuclease I [Mesorhizobium] (WP_023805403.1, 1e-74), Putative phage (1.75e-36) 30 GTG 13970 14317 348 + 96/99 endonuclease Phage_endo_I[pfam05367] Hypothetical protein [Aminobacter sp. J41] (WP_024847859.1, 8e-52), 71/81 Phage endonuclease I (2.24e-36)

31 ORF 31 ATG 14314 14757 444 + None No significant similarities 295

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Conserved Domains

ORF % Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start

(bp) (bp) Hypothetical Hypothetical protein [Campylobacter showae] (WP_002951730.1,3e-04), 32 ATG 14754 15011 258 + None protein 33/49

33 ORF 33 GTG 14996 15178 183 + None No significant similarities

34 ORF 34 ATG 15171 15344 174 + None No significant similarities

35 ORF 35 ATG 15344 15925 582 + None No significant similarities

Hypothetical MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805401.1, 36 ATG 15965 16234 270 + None protein 1e-33), 74/

MULTISPECIES: hypothetical protein [Mesorhizobium] Hypothetical (WP_023805400.1,1e-36), 88/93 37 ATG 16234 16452 219 + None protein Hypothetical protein RHEph01_gp029 [Rhizobium phage RHEph01] (AGC35540.1, 2e-06), 50/67 GP4d_helicase [cd01122] Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815445.1, (8.64e-61) 0.0), 91/95 Putative TOPRIM_primases DNA primase [Azorhizobium caulinodans ORS 571] (YP_001526514.1, 1e- 38 primase/helicase ATG 16449 18122 1674 + [cd01029] (1.51e-08 126), 44/59 protein Toprim_2[pfam13155] DNA primase/helicase-like protein [Ralstonia phage RSB2] (YP_009017750.1, Toprim-like (7.29e-11) 9e-113), 41/56

Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805398.1, HTH_39[pfam14090] Helix- Conserved 2e-66), 90/93 39 ATG 18122 18460 339 + turn-helix domain; (1.66e- hypothetical protein Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815444.1, 09) 1e-63), 88/93

Hypothetical Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805397.1, 40 ATG 18457 18693 237 + None protein 0.0), 42/ 296

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Conserved Domains

ORF % Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start

(bp) (bp)

41 ORF 41 ATG 18693 18899 207 + None No significant similarities

DNA_pol_A_pol_I_B [cd08643] Polymerase I (9.78e-140) Hypothetical protein [Aminobacter sp. J41] (WP_024847853.1, 0.0), 57/70 Putative DNA DNA_pol_A [pfam00476] 42 TTG 18900 20837 1938 + DNA polymerase [Xanthomonas arboricola] (WP_016904324.1, 3e-151), 41/ Polymerase DNA polymerase family A 54 (8.65e-31) POLAc [smart00482], DNA polymerase A domain (2.58e-14) Hypothetical MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805395.1, 43 TTG 20834 21010 177 + None protein 9e-24), 83/90

PHA00439 [PHA00439] Exonuclease Exonuclease [Mesorhizobium sp. L2C067A000] (WP_023815441.1, 0.0), (1.54e-69) Putative phage 93/96 44 ATG 20994 21806 813 + PIN_T4-like[cd09860] PIN exonuclease Exonuclease [Mesorhizobium sp. L2C089B000] (WP_023805394.1, 0.0), domain of bacteriophage T3, 93/96 T4 RNase H, T5-5'nuclease, and homologs (3.21e-06) DUF3310 [pfam11753] MULTISPECIES: alkaline exonuclease [Mesorhizobium] (WP_023805393.1, Conserved Protein of unknwon function 45 ATG 21806 22189 384 + 4e-25), 87/ 90 hypothetical protein (DUF3310) Hypothetical protein [Aminobacter sp. J41] (WP_024847849.1, 7e-24), 47/54 (3.06e-10)

Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815440.1, Hypothetical 6e-33), 95/96 46 ATG 22258 22446 189 + None protein Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805392.1, 3e-32), 92/96 297

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Conserved Domains

ORF % Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start

(bp) (bp)

Hypothetical protein [Mesorhizobium sp. L2C067A000](WP_023815439.1, 4e- Hypothetical 40), 99/100 47 ATG 22450 22653 204 + None protein Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805391.1, 3e-26), 100/100 Head-tail joining protein [Mesorhizobium sp. L2C067A000]

(WP_023815438.1, 0.0), 95/97 Putative phage Head-tail_con [pfam12236] Head-tail joining protein [Mesorhizobium sp. L2C089B000] 48 head-to-tail joining TTG 22668 24320 1653 + Bacteriophage head to tail (WP_023805390.1, 0.0), 95/97 protein connecting protein Putative head-to-tail joining protein [Rhizobium phage RHEph01] (2.00e-98) (AGC35543.1, 2e-141), 46/64 Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805389.1,1e- Hypothetical 81), 76/80 49 ATG 24320 24799 480 + None protein Hypothetical protein RHEph01_gp034 [Rhizobium phage RHEph01] (AGC35545.1, 1e-11), 39/52

PHA00435 [PHA00435] Capsid assembly protein Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815436.1, Putative T7-like (2.35e-09) 0.0), 91 /96 50 capsid assembly ATG 24823 25683 861 + Phage_T7_Capsid Putative T7-like capsid assembly protein [Rhizobium phage RHEph01] protein [pfam05396] Phage T7 (AGC35546.1, 5e-33), 33/49 capsid assembly protein (1.67e-05)

HNHc [cd00085] HNH nucleases (1.80e-05) MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805387.1, Conserved 51 ATG 25693 25929 237 + HNHc[smart00507] HNH 2e-45), 99/98 hypothetical protein nucleases; Hypothetical protein [Aminobacter sp. J41] (WP_024847841.1, 9e-32), 78/80 (1.76e-04)

52 ORF ATG 26014 26166 153 - None No significant similarity 298

Position Associated Putative

Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Conserved Domains Start Start

ORF % Similarity

Codon Strand Start End Length (E value)

(bp) (bp) Orientation/

Capsid protein [Mesorhizobium sp. L2C089B000] (WP_023805386.1, 0.0), Putative major PHA00201 [PHA00201] 99/99 53 ATG 26289 27251 963 + capsid protein Major capsid protein Capsid protein [Mesorhizobium sp. L2C067A000] (WP_023815435.1, 0.0), (3.87e-41) 99/99 Tail protein [Mesorhizobium sp. L2C067A000] (WP_023815434.1, 8e-128), 95/ 97 Tail protein [Mesorhizobium sp. L2C089B000] (WP_023805385.1, 8e-128), PHA00428 [PHA00428] Putative T7-like tail 95/97 54 ATG 27328 27930 603 + Tail tubular protein A tubular protein A Tail tubular protein A [Pseudomonas putida KT2440] (NP_744432.1, 2e-26), (1.77e-32) 40/56 Putative tail tubular protein A [Rhizobium phage RHEph01] (AGC35551.1, 6e- 26), 38/55 Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805383.1, 0.0), 87/93 Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815432.1, Putative T7-like tail 0.0), 87/93 55 ATG 27931 30309 2379 + None tubular protein B Tail tubular protein B [Pseudomonas putida KT2440] (NP_744433.1, 5e-90), 31/48 Putative tail tubular protein B [Rhizobium phage RHEph01] (AGC35552.1, 6e- 86), 29/47 PHA01733 [PHA01733] MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805382.1, Conserved 56 ATG 30316 30798 483 + Hypothetical protein 8e-110), 94/96 hypothetical protein (9.73e-04) Hypothetical protein [Aminobacter sp. J41] (WP_024847837.1, 3e-43), 51/63 Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815431.1, Hypothetical 6e-97), 97/99 57 ATG 30791 31252 462 + None protein Hypothetical protein [Mesorhizobium sp. L2C089B000] (WP_023805381.1, 5e-91), 97/99 Membrane protein [Mesorhizobium sp. L2C067A000] (WP_023815430.1, 0.0), 91/95 Putative lytic Putative tail-fiber/lysozyme protein [Acinetobacter phage IME-AB2] 58 ATG 31252 33246 1995 + None enzyme (AFV51565.1, 1e-14), 27/44 Muramidase (phage lambda lysozyme) [Thioflavicoccus mobilis 8321] (YP_007243356.1, 1e-14), 33/54 299

Position

Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Identity/ Putative Function Start End Conserved Domains

ORF % Similarity Strand (bp) (bp) Length (E value) Start Codon Start Orientation/ Orientation/

Hypothetical Hypothetical protein [Mesorhizobium sp. L2C067A000] (WP_023815429.1, 59 ATG 33246 37046 3801 + None protein 0.0), 88/93 Hypothetical protein [Mesorhizobium sp. LNJC384A00] (WP_023750206.1, Phage_T7_tail [pfam03906] Phage tail fibre 3e-75), 36/48 60 GTG 37140 38795 1656 + Phage T7 tail fibre protein protein Phage tail fiber protein [Pseudomonas putida KT2440] (NP_744437.1, 4e-26), (5.56e-16) 26/43 MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805376.1, Hypothetical 2e-23), 59/79 61 ATG 38954 39184 231 + None protein Hypothetical protein PBC5p15 [Sinorhizobium phage PBC5](NP_542275.1, 1e-13), 43/66 Pyocin R, lytic enzyme [Mesorhizobium sp. L2C089B000] (WP_023805375.1, Putative lytic Lysozyme-like domain 9e-105), 65/73 enzyme with COG3179 [COG3179] 62 TTG 39248 40000 753 + Pyocin R, lytic enzyme [Mesorhizobium sp. L2C067A000] (WP_023815426.1, lysozyme-like Predicted chitinase 5e-104), 65/73 domain (7.49e-13) Chitinase [Rhizobium phaseoli] (WP_016734091.1, 1e-48), 43/61 Hypothetical protein [Aminobacter sp. J41] (WP_024847829.1, 2e-60), 69/84 Hypothetical 63 ATG 40004 40372 369 + None MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805374.1, protein 1e-50), 68/77 Hypothetical 64 GTG 40369 40548 180 + None Hypothetical protein [Aminobacter sp. J41] 9WP_024847828.1, 7e-11), 53/72 protein

PHA01735 [PHA01735] MULTISPECIES: hypothetical protein [Mesorhizobium] (WP_023805372.1, Conserved 65 ATG 40545 40778 234 + Hypothetical protein 1e-32), 79/85 hypothetical protein (2.15e-12) Hypothetical protein [Aminobacter sp. J41] (WP_024847827.1, 4e-20), 63/76

Lipase_GDSL_2[pfam1347 Putative conserved protein [Rhizobium sp. LPU83] (CDM57381.1, 3e-31), 2] GDSL-like Conserved 29/43 66 GTG 40775 42028 1254 + Lipase/Acylhydrolase hypothetical protein Hypothetical protein [Bradyrhizobium sp. YR681] (WP_008132324.1, 2e-21), family 28/43 (1.03e-05) Hypothetical 67 ATG 42044 42289 246 + None Hypothetical protein [Mesorhizobium sp. (WP_023815423.1, 8e-32), 69/81 protein

300

Appendix IX: Genome annotations of vB_RleM_PPF1: Open reading frames in the genome and their predicted functions

Position

Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Putative Function Start End Conserved Domains

ORF Identity/ % Similarity Strand (bp) (bp) Length (E value) Start Codon Start Orientation/ Orientation/ 1 Hypothetical ATG 121 705 585 + None Hypothetical protein Avi_2309 [Agrobacterium vitis S4] protein (YP_002549657.1,7e-73), 59/74 Hypothetical protein PSE_3036 [Pseudovibrio sp. FO-BEG1] (YP_005081566.1, 5e-46), 47/61 2 Phage terminase, ATG 702 2693 1992 + [pfam05876] Phage Phage terminase large subunit (gp15) [Agrobacterium vitis S4 large subunit terminase large subunit (YP_002549658.1, 0), 79/87 (GpA)(5.08e-128) Phage terminase large subunit [Pseudovibrio sp. FO-BEG1] (YP_005081567.1, 0), 62/74 Terminase GpA [Methylosinus trichosporium] (WP_003611085.1, 0), 51/66 3 Portal protein ATG 2678 4312 1635 + [pfam05136] Phage portal Phage portal protein, lambda family [Rhodobacteraceae bacterium protein, lambda family KLH11] (WP_008755733.1, 0), 60/75 (1.22e-64) Bacteriophage capsid protein [Agrobacterium vitis S4] [COG5511] Bacteriophage (YP_002549659.1, 0), 57/69 capsid protein (4.57e-30) Capsid protein of prophage [Pseudovibrio sp. FO-BEG1] (YP_005081568.1, 0), 58/73

4 Hypothetical ATG 4312 4584 273 + None Hypothetical protein [Ochrobactrum intermedium] (WP_006470858.1, protein 1e-09), 38/56 5 Putative ATG 4559 5518 960 + [cd07022] Signal peptide Periplasmic serine proteases (ClpP class) [Agrobacterium vitis S4] periplasmic serine peptidase A (SppA) type, a (YP_002549660.1, 2e-146), 66/82 protease serine protease (1.42e-74) Periplasm ic serine protease [Pelagibaca bermudensis] [COG0616] Periplasmic (WP_007800175.1, 6e-78), 47/63 serine proteases (ClpP class) Serine peptidase [Roseobacter sp. MED193] (WP_009810009.1, 1e-77), (5.35e-40) 49/62

6 Putative lytic ATG 5523 6068 546 + None Lytic murein transglycosylase [Agrobacterium vitis S4] murein (YP_002549661.1, 2e-16), 41/53 transglycosylase 301

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Putative Function Conserved Domains

ORF Identity/ % Similarity Strand Start End Length (E value) Start Codon Start (bp) (bp) Orientation/

7 Putative head ATG 6099 6500 402 + None Hypothetical protein LOKG_00025 [Loktanella phage pCB2051-A] decorative protein (YP_007674922.1, 3e-17), 38/52 Lambda head decoration protein D HDPD [Sinorhizobium meliloti AK83] (YP_004548722.1, 2e-10), 37/51 8 Putative major ATG 6534 7598 1065 + [pfam03864] Phage major Type II secretory pathway pullulanase PulA [Agrobacterium sp. ATCC capsid protein capsid protein E 31749] (WP_006314168.1, 2e-84), 43/58 (3.69e-52) Major capsid protein [Burkholderia phage AH2] (YP_006561146.1, 5e- 62), 39/55 Capsid protein E [Burkholderia phage BcepNazgul] (NP_918991.1, 6e- 52), 34/50 9 ORF 9 ATG 7666 7941 276 + None No significant similarities

10 ORF 10 ATG 7944 8342 399 + None No significant similarities 11 Hypothetical ATG 8329 9012 684 + None Hypothetical protein [Hoeflea phototrophica] (WP_007196953.1, 5e-40), protein 45/92 Hypothetical protein BBta_6594 [Bradyrhizobium sp. BTAi1] (YP_001242403.1, 1e-05), 27/84 12 ORF 12 ATG 9040 9243 204 + None No significant similarities 13 Bacteriophage tail ATG 9244 10734 1491 + [pfam04984] Phage tail Putative sheath protein [Hoeflea phototrophica] (WP_007196955.1, 1e- sheath protein sheath protein (4.98e-08) 172), 51/69 Putative bacteriophage Mu tail sheath [Ahrensia sp. R2A130] (WP_009464998.1, 3e-110), 41/58 14 Putative tail tube ATG 10762 11133 372 + [pfam10618] Phage tail tube Hypothetical protein [Hoeflea phototrophica] (WP_007196956.1, 8e-35), protein protein (6.67e-05) 51/65 15 Conserved ATG 11133 11453 321 + [pfam10109] Mu-like Hypothetical protein [Hoeflea phototrophica] (WP_007196957.1, 3e-08), hypothetical protein prophage FluMu protein 30/56 gp41 (3.55e-05) 16 Phage tail tape ATG 11557 13377 1821 + [pfam10145] Phage-related Phage tail tape measure protein TP901, core region [Hoeflea measure protein minor tail protein (4.4e-19) phototrophica ] (WP_007196958.1, 4e-100), 36/53 [TIGR01760] Phage tail Phage tail tape measure protein [Agrobacterium vitis S4] tape measure protein (3.57e- (YP_002549675.1, 8e-73), 34/54 09) 302

Position Associated Putative Best BLASTP match (es), (GeneBank Accession No:, E value), % Putative Function Conserved Domains

ORF Identity/ % Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start

(bp) (bp)

[pfam07157] DNA circulation protein N- Tail/DNA circulation protein, putative [Hoeflea phototrophica] Putative tail/DNA terminus (1.89e-12) (WP_007196959.1, 3e-26), 41/55 17 ATG 13379 13837 459 + circulation protein [COG4228] Mu-like DNA circulation family protein [Hyphomicrobium denitrificans ATCC prophage DNA circulation 51888] (YP_003755364.1, 6e-21), 41/53 protein (1.36e-10) Hypothetical protein [Hoeflea phototrophica] (WP_007196960.1, 7e-22), Hypothetical 40/56 18 ATG 13848 14471 624 + None protein Hypothetical protein [Ahrensia sp. R2A130] (WP_009464985.1, 2e-08), 36/54 Putative Phage tail protein, putative [Hoeflea phototrophica] (WP_007196961.1, [COG4379] Mu-like bacteriophage tail 2e-106), 52/64 19 ATG 14480 15526 1047 + prophage tail protein gpP protein GpP ,Mu- Putative prophage MuSo2 tail protein [Ahrensia sp. R2A130] (3.41e-30) like (WP_009464984.1, 8e-84), 48/61 [pfam06890] Bacteriophage Phage-related baseplate assembly protein V [Hoeflea phototrophica] Mu Gp45 protein (2.66e-12) Baseplate assembly (WP_007196962.1, 2e-22), 48/56 20 ATG 15523 15984 462 + [TIGR01644] Phage protein V Mu-like prophage protein GP45-like protein [Methylobacterium baseplate assembly protein nodulans ORS 2060] (YP_002497719.1, 3e-11), 35/46 V (5.2e-05) Hypothetical protein [Citrobacter freundii] (WP_003034764.1, 7e-18), Hypothetical 21 ATG 16028 16264 237 + None 51/66 Hypothetical protein Entas_2691 [Enterobacter asburiae protein LF7a] (YP_004829204.1, 3e-14), 46/61 Putative Bacteriophage protein GP46 [Hoeflea phototrophica] (WP_007196963.1, bacteriophage [pfam07409] Phage protein 3e-60), 55/70 22 ATG 16268 16804 537 + GP46 family GP46 (7.20e-24) GP46 family protein [Ahrensia sp. R2A130] (WP_009464981.1, 1e-21), protein 37/52 [COG3299] Uncharacterized Baseplate J-like protein [Hoeflea phototrophica] (WP_007196964.1, 4e- homolog of phage Mu Putative baseplate 84), 49/61 23 ATG 16776 17858 1083 + protein gp47 (2.05e-23) J-like protein Baseplate J family protein [Methylobacterium nodulans ORS 2060] [pfam04865] Baseplate J- (YP_002497717.1, 6e-56), 40/52 like protein (3.49e-16) 303

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Putative Function Conserved Domains

ORF Identity/ % Similarity Strand Start End Length (E value) Orientation/ Orientation/

Start Codon Start (bp) (bp)

[COG3778] Uncharacterized Tail protein, putative [Hoeflea phototrophica] (WP_007196965.1, 1e- Conserved protein conserved in bacteria 48), 43/56 24 ATG 17848 18633 786 + hypothetical protein [Function unknown] (6.72e- Putative bacteriophage related protein [Ahrensia sp. R2A130] 13) (WP_009464979.1, 2e-29), 37/50 Hypothetical protein [Hoeflea phototrophica] (WP_007196966.1, 9e-71), 47/64 Hypothetical 25 ATG 18652 19536 885 + None Hypothetical protein RHEph04_gp035 [Rhizobium phage RHEph04] protein (AGC35721.1, 3e-66), 53/69

Hypothetical protein HEAR2278 [Herminiimonas arsenicoxydans] (YP_001100533.1, 1e-13), 42/48 Hypothetical 26 ATG 19546 20247 702 + None Hypothetical protein [Acinetobacter baumannii] (WP_001173623.1, 5e- protein 10), 26/40

Hypothetical 27 ATG 20244 21008 765 + None p023 [Bradyrhizobium sp. S23321] (YP_005451134.1, 2e-09), 29/44 protein Hypothetical 28 ATG 21008 21859 852 + None Hypothetical protein S23_28020 [Bradyrhizobium sp. S23321] protein 29 ORF 29 ATG 21856 22113 258 - None No significant similarities 30 ORF 30 ATG 22184 22579 396 + None No significant similarities

Hypothetical Hypothetical protein [Rhizobium leguminosarum] (WP_003586748.1, 31 ATG 22581 23432 852 + None protein 2e-18), 42/59 Conserved [cd00229] SGNH_hydrolase Hypothetical protein [Paenibacillus elgii] (WP_010495069.1, 3e-05), 32 GTG 23545 24651 1107 + hypothetical protein (7.83e-05) 26/54 [pfam01510] N- Hypothetical protein PBC5p05 [Sinorhizobium phage PBC5] acetylmuramoyl-L-alanine (NP_542265.1, 2e-64), 56/97 Putative N- amidase (1.49e-11) N-acetylmuramoyl-L-alanine amidase Cell wall hydrolase; Autolysin; 33 acetylmuramoyl-L- ATG 24720 25244 525 + [cd06583] Peptidoglycan ORFL3 [Sinorhizobium fredii HH103] (YP_005187481.1, 2e-59), 51/97 alanine amidase recognition proteins N-acetylmuramoyl-L-alanine amidase [Ochrobactrum intermedium] (PGRPs) (7.95e-07) (WP_006472367.1, 1e-58), 52/97

304

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Putative Function Conserved Domains

ORF Identity/ % Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start

(bp) (bp) 34 ORF 34 ATG 25244 25570 327 + None No significant similarities Hypothetical protein, partial [Rhizobium etli] (WP_010028892.1, 2e-10), Hypothetical 45/64 35 ATG 25567 25926 360 + None protein Hypothetical protein Avi_3055 [Agrobacterium vitis S4] (YP_002550207.1, 6e-09), 38/75 Hypothetical protein Rpal_0665 [Rhodopseudomonas palustris TIE-1] Hypothetical 36 ATG 25883 26137 255 + None (YP_001989700.1, 7e-05), 36/90 protein

Hypothetical 37 ATG 26168 26434 267 + None Hypothetical protein [Rhizobium etli] (WP_010008740.1, 5e-16), 49/82 protein Hypothetical 38 ATG 26431 26616 186 + None Hypothetical protein [Rhizobium etli] (WP_010008742.1, 6e-07), 45/98 protein Hypothetical [COG2217] Cation transport 39 GTG 26613 27095 483 + Hypothetical protein [Rhizobium etli] (WP_010008744.1, 2e-41), 47/98 protein ATPase (3.55e-04) 40 ORF 40 ATG 27257 27373 117 - None No significant similarities Hypothetical protein Rleg_6990 [Rhizobium leguminosarum bv. trifolii WSM1325] (YP_002984975.1, 2e-57), 93/95 Hypothetical 41 ATG 27348 27641 294 - None Hypothetical protein [Rhizobium leguminosarum] (WP_003539226.1, protein 3e-50), 80/93

[cd07906] Adenylation domain of Mycobacterium tuberculosis LigD and LigC- like ATP-dependent DNA Putative DNA ligase [Rhizobium leguminosarum bv. viciae 3841] Putative ATP- ligases (6.23-78) (YP_771149.1, 0), 81/87 dependent DNA [cd07971] DNA polymerase LigD-like ligase domain-containing protein 42 ligase clustered ATG 27700 28746 1047 + Oligonucleotide/oligosaccha [Rhizobium leguminosarum] (WP_003539228.1, 0), 80/86 with Ku protein, ride binding (OB)-fold DNA polymerase LigD, ligase domain protein [Rhizobium Lig D domain of ATP-dependent leguminosarum bv. trifolii WSM1325] (YP_002984974.1,0), 79/86 DNA ligase (1.59e-37) [TIGR02779] DNA polymerase LigD (4.72e-117) 305

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Putative Function Conserved Domains

ORF Identity/ % Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start

(bp) (bp) [cd01182] DNA breaking- Phage integrase family protein [Rhizobium grahamii] rejoining enzymes, Probable site- (WP_016553054.1, 0), 86/92 integrase/recombinases specific Phage integrase family protein [Ochrobactrum anthropi ATCC 49188] 43 ATG 29009 30124 1116 - (2.93e-17) integrase/recombin (YP_001368763.1, 5e-153), 59/72 [pfam00589] Phage ase Site-specific recombinase XerD [Rhizobium sp. AP16] integrase family (1.10e-10) (WP_007694989.1, 1e-141), 70/80

44 ORF 44 ATG 30182 30295 114 - None No significant similarities 45 ORF 45 ATG 30324 30560 237 - None No significant similarities Hypothetical protein [Rhizobium sp. CF122] (WP_007788764.1, 1e-30), Hypothetical 37/53 46 ATG 30557 31072 516 - None protein Hypothetical protein SM11_chr1881 [Sinorhizobium meliloti SM11] (YP_005720409.1, 2e-22), 38/54 Hypothetical protein [Brevundimonas diminuta] (WP_003164146.1, 2e- Hypothetical 47 ATG 31072 31371 300 - None 08), 44/54 protein

Hypothetical protein Mpe_B0272 [Methylibium petroleiphilum PM1] Hypothetical (YP_001023282.1, 9e-31), 50/67 48 ATG 31368 31901 534 - None protein Hypothetical protein PCC7424_3310 [Cyanothece sp. PCC 7424] (YP_002378577.1, 3e-06), 41/48 49 ORF49 ATG 31898 32140 243 - None No significant similarities Hypothetical Hypothetical protein [Rhizobium sp. PRF 81] (WP_004110160.1, 2e-28), 50 ATG 32142 32417 276 - None protein 61/75

51 ORF 51 ATG 32407 33156 750 - None No significant similarities

[pfam07799] Protein of Hypothetical protein [Mesorhizobium alhagi] (WP_008835307.1, 2e-59), unknown function 57/70 Conserved (DUF1643) (7.70e-43) Protein of unknown function DUF1643 [Rhizobium freirei] 52 GTG 33153 33704 552 - hypothetical protein [COG4333] Uncharacterized (WP_004121178.1, 1e-55), 56/67 protein conserved in bacteria DUF1643 protein [Caulobacter phage CcrColossus] (YP_006988429.1, (3.19-33) 2e-37), 47/65 306

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), Putative Function Conserved Domains

ORF %Identity/ %Similarity Strand Start End Length (E value) Orientation/ Orientation/ Start Codon Start

(bp) (bp)

Hypothetical PRK12775-containing protein [Rhodobacter phage RcapNL] 53 ATG 33701 33916 216 - None protein (YP_007518420.1, 3e-06), 33/56 54 ORF 54 ATG 33913 34140 228 - None No significant similarities 55 ORF 55 ATG 34140 34388 249 - None No significant similarities 56 ORF 56 ATG 34385 34603 219 - None No significant similarities Hypothetical Hypothetical protein [Rhizobium sp. CF122] (WP_007788766.1, 2e-06), 57 ATG 34597 34794 198 - None protein 62/78 [cd06554] ASC-1 homology Hypothetical protein OCAR_5535 [Oligotropha carboxidovorans OM5] Conserved 58 ATG 34791 35282 492 - domain, ASC-1-like (YP_002288531.1, 5e-25), 42/54 hypothetical protein subfamily (7.33e-11) Hypothetical protein [Sinorhizobium meliloti BL225C] [pfam14216] Domain of Conserved (YP_005713864.1, 2e-22), 42/51 59 GTG 35279 35713 435 - unknown function hypothetical protein Hypothetical protein BN69_1583 [Methylocystis sp. SC2] (DUF4326) (1.65e-11) (YP_006591685.1, 1e-21), 40/51 Hypothetical Protein of unknown function [Rhizobium sp.] (CCF19083.1, 4e-45), 60 ATG 35710 36120 411 - None protein 55/72 61 ORF 61 ATG 36117 36587 471 - None No significant similarities Hypothetical protein [Rhizobium leguminosarum] (WP_018516443.1, [pfam07505] Phage protein 2e-1670, 63/71 Gp37/Gp68 (4.02e-82) Gp37Gp68 family protein [Burkholderia sp. CCGE1002] Gp37/Gp68 family 62 ATG 36584 37780 1197 - [COG4422] Bacteriophage (YP_003609867.1, 5e-112), 51/61 phage protein protein gp37 [Function Phage Gp37Gp68 [Ochrobactrum intermedium] (WP_006470827.1, 4e- unknown] (3.68e-54) 108), 49/58

Hypothetical protein [Rhizobium sp. PDO1-076] (WP_007598575.1, 2e- 29),54/68 Hypothetical 63 ATG 37773 38297 525 - None Hypothetical protein Avi_6152 [Agrobacterium vitis S4] protein (YP_002547864.1, 3e-13), 60/75

64 ORF 64 ATG 38299 38907 609 - None No significant similarities 307

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Putative Function Conserved Domains Start Start

ORF Start End Identity/ % Similarity

Codon Length (E value) n/ Strand n/

(bp) (bp) Orientatio gp42 [Mesorhizobium alhagi] (WP_008836206.1, 4e-91), 58/76 Hypothetical Hypothetical protein phiE125p43 [Burkholderia phage phiE125] 65 ATG 38907 39602 696 - None protein (NP_536399.1, 2e-26), 30/57

66 ORF 66 ATG 39606 39752 147 - None No significant similarities Hypothetical protein [Agrobacterium tumefaciens] (WP_003496583.1, Hypothetical 67 ATG 39742 40356 615 - None 5e-39), 53/65 protein

Hypothetical protein [Mesorhizobium alhagi] (WP_008836205.1, 2e-11), 52/65 Hypothetical 68 ATG 40349 40618 270 - None Hypothetical protein [Rhizobium sp. CF122] (WP_007788768.1, 4e-07), protein 44/64

Hypothetical Hypothetical protein [Rhizobium sp. CF122] (WP_007788770.1, 2e-10), 69 ATG 40608 40958 351 - None protein 38/50 70 ORF 70 GTG 40960 41229 270 - None No significant similarities

71 ORF 72 ATG 41226 41423 198 - None No significant similarities

72 ORF 72 GTG 41410 41745 336 + None No significant similarities 73 ORF 73 ATG 41742 42119 378 + None No significant similarities No significant similarities 74 ORF 74 GTG 42387 42578 192 - None

Hypothetical protein RHECIAT_CH0000505 [Rhizobium etli CIAT 652] (YP_001976676.1, 3e-11), 51/65 Hypothetical 75 TTG 42788 43123 336 + None Hypothetical protein BN406_00410 [Sinorhizobium meliloti Rm41] protein (YP_006839281.1, 4e-09), 37/60

Probable transcriptional [cd00093] Helix-turn-helix Hypothetical protein [Sinorhizobium meliloti] (WP_017272106.1, 9e-57), 76 regulator with a ATG 43120 43464 345 - XRE-family like proteins 72/85 helix-turn-helix (5.74e-07) domain Hypothetical Hypothetical protein [Sinorhizobium medicae] (WP_018208520.11, 1e- 77 ATG 43560 43787 228 + None protein 26), 68/77 308

Position Associated Putative

Best BLASTP match (es), (GenBank Accession No:, E value), % Putative Function Conserved Domains Start Start

ORF Start End Identity/ % Similarity

Codon Length (E value) (bp) (bp) / Strand Orientation Hypothetical protein [Rhizobium sp. CF122] (WP_007788781.1, 2e-47), Hypothetical 57/71 Conserved Hypothetical protein Mesop_3723 78 ATG 44020 44508 489 + None protein [Mesorhizobium opportunistum WSM2075] (YP_004612259.1, 1e-45), 55/67 Hypothetical protein SM11_chr1442 [Sinorhizobium meliloti SM11] Hypothetical (YP_005719979.1, 2e-08), 42/53 79 ATG 44508 44774 267 + None protein Hypothetical protein Sinme_1860 [Sinorhizobium meliloti AK83] (YP_004549201.1, 5e-08), 49/58 Hypothetical Hypothetical protein [Rhizobium leguminosarum] (WP_018516457.1, 80 ATG 44847 45320 474 + None protein 2e-64), 89/93 Hypothetical protein [Rhizobium leguminosarum] (WP_018516459.1, 0), Hypothetical 81 GTG 45292 46860 1569 + None 77/84 Hypothetical protein Avi_6173 [Agrobacterium vitis S4] protein (YP_002547878.1, 4e-174), 55/70 Hypothetical protein [Yersinia phage phiR1-RT](YP_007235865.1, 2e- 40), 68/76 Hypothetical Hypothetical protein BRADO3615 [Bradyrhizobium sp. ORS 278] 82 ATG 46857 47186 330 + None protein (YP_001205619.1, 2e-37), 66/75 Hypothetical protein CGPG_00035 [Cellulophaga phage phiST] (YP_007673417.1,1e-35), 62/74 Hypothetical protein [Rhizobium leguminosarum] (WP_018516460.1, Hypothetical 5e-54), 91/ 92 83 ATG 47183 47473 291 + None protein Hypothetical protein [Mesorhizobium alhagi] (WP_008836173.1, 2e-10), 45/54 Hypothetical protein [Rhizobium leguminosarum] (WP_01851646.1, 1e- Hypothetical 48), 88/91 84 ATG 47463 47738 276 + None protein Hypothetical protein [Nitratireductor pacificus] (WP_008599086.1, 2e- 06), 45/62 Hypothetical Hypothetical protein [Rhizobium leguminosarum] (WP_018516462.1, 85 ATG 47735 47932 198 + None protein 1e-14), 52/72 Hypothetical protein [Rhizobium leguminosarum] (WP_018516463.1, Hypothetical 5e-46), 75/81 86 ATG 47929 48225 297 + None protein Hypothetical protein Atu0448 [Agrobacterium fabrum str. C58] (NP_353479.1, 6e-25), 50/65 309

Position Associated Putative Best BLASTP match (es), (GenBank Accession No:, E value), % Putative Function Conserved Domains

ORF Identity/ % Similarity Strand Start End Length (E value) Start Codon Start

(bp) (bp) Orientation/

[COG0270], Site-specific Modification methylase [Mesorhizobium loti MAFF303099] DNA methylase (9.30e-44) (NP_108594.1, 0), 58/67 C-5 cytosine- [cd00315] Cytosine-C5 Modification methylase [Mesorhizobium alhagi] (WP_008836169.1, 0), 87 specific DNA ATG 48222 50357 2136 + specific DNA methylases 53/64 C-5 cytosine-specific methylase (2.03e-17) DNA methylase [Xanthobacter autotrophicus Py2] (YP_001418583.1, 0), 55/66 [COG0338] Site-specific D12 class N6 adenine-specific DNA methyltransferase DNA methylase [Hyphomicrobium denitrificans ATCC 51888] (YP_003755323.1, 9e- Putative adenine- (9.17e-44) 79), 47/64 DNA 88 specific GTG 50377 51279 903 + [pfam02086] D12 class N6 methyltransferase [Mesorhizobium sp. L2C066B000] (WP_023819834.1, methyltransferase adenine-specific DNA 2e-77), 45/ 61 methyltransferase DNA adenine methyltransferase [Pseudomonas putida] (4.00e-19) (WP_009399679.1, 2e-77), 47/63 Hypothetical protein [Rhizobium leguminosarum] (WP_018516465.1, Hypothetical 6e-45), 85/91 89 ATG 51276 51536 261 + None protein Hypothetical protein [Bradyrhizobium sp. YR681] (WP_008130511.1, 7e-12), 44/55 Hypothetical protein [Rhizobium leguminosarum] (WP_018516466.1, 0), Probable [pfam13730] Helix-turn- 93/96 transcriptional helix domain (5.47e-12) Hypothetical protein [Rhizobium sp. CF122] (WP_007788794.1, 9e-143), 90 regulator with a ATG 51603 52943 1341 + [COG3355] Predicted 53/62 helix-turn-helix transcriptional regulator Hypothetical protein SM11_chr1450 [Sinorhizobium meliloti SM11] domain (4.41e-04) (YP_005719987.1, 5e-88), 45/57 [cd08000] NusG N-terminal (NGN) domain Superfamily Transcription antitermination protein NusG [Sinorhizobium meliloti Transcription (1.75e-15) SM11] (YP_005719988.1, 1e-73), 59/74 91 antitermination ATG 52930 53589 660 + [COG0250] Transcription NusG antitermination factor [Sinorhizobium meliloti BL225C] protein NusG anti-terminator (YP_005713497.1, 3e-70), 57/73 (1.08e-10) 92 ORF 92 ATG 53644 53796 153 + None No significant similarities 93 ORF 93 ATG 53793 53921 129 - None No significant similarities Prophage LambdaW5, minor tail protein Z [Pseudovibrio sp. FO-BEG1] Putative minor tail (YP_005081565.1, 4e-45), 45/966 94 ATG 53880 54416 537 + None protein Hypothetical protein [Hoeflea phototrophica] (WP_007196944.1, 5e-32), 43/58 310

311

Appendix X: Permission for reprints

JOHN WILEY AND SONS LICENSE TERMS AND CONDITIONS

Jul 01, 2014

This is a License Agreement between Anupama Halmillawewa ("You") and John Wiley and Sons ("John Wiley and Sons") provided by Copyright Clearance Center ("CCC"). The license consists of your order details, the terms and conditions provided by John Wiley and Sons, and the payment terms and conditions.

All payments must be made in full to CCC. For payment instructions, please see information listed at the bottom of this form.

License Number 3419120602667

License date Jun 30, 2014

Licensed content publisher John Wiley and Sons

Licensed content publication Molecular Microbiology

Licensed content title Comparative phage genomics and the evolution of Siphoviridae: insights from dairy phages

Licensed copyright line Copyright © 2001, John Wiley and Sons

Licensed content author Harald Brüssow,Frank Desiere

Licensed content date Dec 21, 2001

Start page 213

End page 223

Type of use Dissertation/Thesis

Requestor type University/Academic

Format Print and electronic

Portion Figure/table 311

312

Number of figures/tables 1

Original Wiley figure/table number(s) Figure 4

Will you be translating? No

Title of your thesis / dissertation Isolation, characterization and applications of rhizobiophages

Expected completion date Jul 2014

Expected size (number of pages) 250

Total 0.00 USD

TERMS AND CONDITIONS

This copyrighted material is owned by or exclusively licensed to John Wiley & Sons, Inc. or one of its group companies (each a"Wiley Company") or handled on behalf of a society with which a Wiley Company has exclusive publishing rights in relation to a particular work (collectively "WILEY"). By clicking �accept� in connection with completing this licensing transaction, you agree that the following terms and conditions apply to this transaction (along with the billing and payment terms and conditions established by the Copyright Clearance Center Inc., ("CCC's Billing and Payment terms and conditions"), at the time that you opened your Rightslink account (these are available at any time at http://myaccount.copyright.com).

Terms and Conditions • The materials you have requested permission to reproduce or reuse (the "Wiley Materials") are protected by copyright. • You are hereby granted a personal, non-exclusive, non-sub licensable (on a stand-alone basis), non-transferable, worldwide, limited license to reproduce the Wiley Materials for the purpose specified in the licensing process. This license is for a one-time use only and limited to any maximum distribution number specified in the license. The first instance of republication or reuse granted by this licence must be completed within two years of the date of the grant of this licence (although copies prepared before the end date may be distributed thereafter). The Wiley Materials shall not be used in any other manner or for any other purpose, beyond what is granted in the license. Permission is granted subject to an appropriate acknowledgement given to the author, title of the material/book/journal and the publisher. You shall also duplicate the copyright notice that appears in the Wiley publication in your use of the Wiley Material. Permission is also granted on the understanding that nowhere in the text is a previously published source acknowledged for all or part of this Wiley Material. Any third party content is expressly excluded from this permission. • With respect to the Wiley Materials, all rights are reserved. Except as expressly granted by the terms of the license, no part of the Wiley Materials may be copied, modified, adapted (except for minor reformatting required by the new Publication), translated, reproduced, transferred or distributed, in any form or by any means, and no derivative works may be 312

313

made based on the Wiley Materials without the prior permission of the respective copyright owner. You may not alter, remove or suppress in any manner any copyright, trademark or other notices displayed by the Wiley Materials. You may not license, rent, sell, loan, lease, pledge, offer as security, transfer or assign the Wiley Materials on a stand-alone basis, or any of the rights granted to you hereunder to any other person. • The Wiley Materials and all of the intellectual property rights therein shall at all times remain the exclusive property of John Wiley & Sons Inc, the Wiley Companies, or their respective licensors, and your interest therein is only that of having possession of and the right to reproduce the Wiley Materials pursuant to Section 2 herein during the continuance of this Agreement. You agree that you own no right, title or interest in or to the Wiley Materials or any of the intellectual property rights therein. You shall have no rights hereunder other than the license as provided for above in Section 2. No right, license or interest to any trademark, trade name, service mark or other branding ("Marks") of WILEY or its licensors is granted hereunder, and you agree that you shall not assert any such right, license or interest with respect thereto. • NEITHER WILEY NOR ITS LICENSORS MAKES ANY WARRANTY OR REPRESENTATION OF ANY KIND TO YOU OR ANY THIRD PARTY, EXPRESS, IMPLIED OR STATUTORY, WITH RESPECT TO THE MATERIALS OR THE ACCURACY OF ANY INFORMATION CONTAINED IN THE MATERIALS, INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTY OF MERCHANTABILITY, ACCURACY, SATISFACTORY QUALITY, FITNESS FOR A PARTICULAR PURPOSE, USABILITY, INTEGRATION OR NON-INFRINGEMENT AND ALL SUCH WARRANTIES ARE HEREBY EXCLUDED BY WILEY AND ITS LICENSORS AND WAIVED BY YOU • WILEY shall have the right to terminate this Agreement immediately upon breach of this Agreement by you. • You shall indemnify, defend and hold harmless WILEY, its Licensors and their respective directors, officers, agents and employees, from and against any actual or threatened claims, demands, causes of action or proceedings arising from any breach of this Agreement by you. • IN NO EVENT SHALL WILEY OR ITS LICENSORS BE LIABLE TO YOU OR ANY OTHER PARTY OR ANY OTHER PERSON OR ENTITY FOR ANY SPECIAL, CONSEQUENTIAL, INCIDENTAL, INDIRECT, EXEMPLARY OR PUNITIVE DAMAGES, HOWEVER CAUSED, ARISING OUT OF OR IN CONNECTION WITH THE DOWNLOADING, PROVISIONING, VIEWING OR USE OF THE MATERIALS REGARDLESS OF THE FORM OF ACTION, WHETHER FOR BREACH OF CONTRACT, BREACH OF WARRANTY, TORT, NEGLIGENCE, INFRINGEMENT OR OTHERWISE (INCLUDING, WITHOUT LIMITATION, DAMAGES BASED ON LOSS OF PROFITS, DATA, FILES, USE, BUSINESS OPPORTUNITY OR CLAIMS OF THIRD PARTIES), AND WHETHER OR NOT THE PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION SHALL APPLY NOTWITHSTANDING ANY FAILURE OF ESSENTIAL PURPOSE OF ANY LIMITED REMEDY PROVIDED HEREIN. • Should any provision of this Agreement be held by a court of competent jurisdiction to be illegal, invalid, or unenforceable, that provision shall be deemed amended to achieve as nearly as possible the same economic effect as the original provision, and the legality, validity and enforceability of the remaining provisions of this Agreement shall not be affected or impaired thereby. • The failure of either party to enforce any term or condition of this Agreement shall not 313

314

constitute a waiver of either party's right to enforce each and every term and condition of this Agreement. No breach under this agreement shall be deemed waived or excused by either party unless such waiver or consent is in writing signed by the party granting such waiver or consent. The waiver by or consent of a party to a breach of any provision of this Agreement shall not operate or be construed as a waiver of or consent to any other or subsequent breach by such other party. • This Agreement may not be assigned (including by operation of law or otherwise) by you without WILEY's prior written consent. • Any fee required for this permission shall be non-refundable after thirty (30) days from receipt by the CCC. • These terms and conditions together with CCC�s Billing and Payment terms and conditions (which are incorporated herein) form the entire agreement between you and WILEY concerning this licensing transaction and (in the absence of fraud) supersedes all prior agreements and representations of the parties, oral or written. This Agreement may not be amended except in writing signed by both parties. This Agreement shall be binding upon and inure to the benefit of the parties' successors, legal representatives, and authorized assigns. • In the event of any conflict between your obligations established by these terms and conditions and those established by CCC�s Billing and Payment terms and conditions, these terms and conditions shall prevail. • WILEY expressly reserves all rights not specifically granted in the combination of (i) the license details provided by you and accepted in the course of this licensing transaction, (ii) these terms and conditions and (iii) CCC�s Billing and Payment terms and conditions. • This Agreement will be void if the Type of Use, Format, Circulation, or Requestor Type was misrepresented during the licensing process. • This Agreement shall be governed by and construed in accordance with the laws of the State of New York, USA, without regards to such state�s conflict of law rules. Any legal action, suit or proceeding arising out of or relating to these Terms and Conditions or the breach thereof shall be instituted in a court of competent jurisdiction in New York County in the State of New York in the United States of America and each party hereby consents and submits to the personal jurisdiction of such court, waives any objection to venue in such court and consents to service of process by registered or certified mail, return receipt requested, at the last known address of such party. WILEY OPEN ACCESS TERMS AND CONDITIONS

Wiley Publishes Open Access Articles in fully Open Access Journals and in Subscription journals offering Online Open. Although most of the fully Open Access journals publish open access articles under the terms of the Creative Commons Attribution (CC BY) License only, the subscription journals and a few of the Open Access Journals offer a choice of Creative Commons Licenses:: Creative Commons Attribution (CC-BY) license Creative Commons Attribution Non- Commercial (CC-BY-NC) license and Creative Commons Attribution Non-Commercial- NoDerivs (CC-BY-NC-ND) License. The license type is clearly identified on the article.

Copyright in any research article in a journal published as Open Access under a Creative Commons License is retained by the author(s). Authors grant Wiley a license to publish the article and identify itself as the original publisher. Authors also grant any third party the right to use the article freely as long as its integrity is maintained and its original authors, citation details and publisher are identified as follows: [Title of Article/Author/Journal Title and Volume/Issue. 314

315

Copyright (c) [year] [copyright owner as specified in the Journal]. Links to the final article on Wiley�s website are encouraged where applicable.

The Creative Commons Attribution License

The Creative Commons Attribution License (CC-BY) allows users to copy, distribute and transmit an article, adapt the article and make commercial use of the article. The CC-BY license permits commercial and non-commercial re-use of an open access article, as long as the author is properly attributed.

The Creative Commons Attribution License does not affect the moral rights of authors, including without limitation the right not to have their work subjected to derogatory treatment. It also does not affect any other rights held by authors or third parties in the article, including without limitation the rights of privacy and publicity. Use of the article must not assert or imply, whether implicitly or explicitly, any connection with, endorsement or sponsorship of such use by the author, publisher or any other party associated with the article.

For any reuse or distribution, users must include the copyright notice and make clear to others that the article is made available under a Creative Commons Attribution license, linking to the relevant Creative Commons web page.

To the fullest extent permitted by applicable law, the article is made available as is and without representation or warranties of any kind whether express, implied, statutory or otherwise and including, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of defects, accuracy, or the presence or absence of errors.

Creative Commons Attribution Non-Commercial License

The Creative Commons Attribution Non-Commercial (CC-BY-NC) License permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.(see below)

Creative Commons Attribution-Non-Commercial-NoDerivs License

The Creative Commons Attribution Non-Commercial-NoDerivs License (CC-BY-NC-ND) permits use, distribution and reproduction in any medium, provided the original work is properly cited, is not used for commercial purposes and no modifications or adaptations are made. (see below)

Use by non-commercial users

For non-commercial and non-promotional purposes, individual users may access, download, copy, display and redistribute to colleagues Wiley Open Access articles, as well as adapt, translate, text- and data-mine the content subject to the following conditions:

• The authors' moral rights are not compromised. These rights include the right of "paternity" 315

316

(also known as "attribution" - the right for the author to be identified as such) and "integrity" (the right for the author not to have the work altered in such a way that the author's reputation or integrity may be impugned). • Where content in the article is identified as belonging to a third party, it is the obligation of the user to ensure that any reuse complies with the copyright policies of the owner of that content. • If article content is copied, downloaded or otherwise reused for non-commercial research and education purposes, a link to the appropriate bibliographic citation (authors, journal, article title, volume, issue, page numbers, DOI and the link to the definitive published version on Wiley Online Library) should be maintained. Copyright notices and disclaimers must not be deleted. • Any translations, for which a prior translation agreement with Wiley has not been agreed, must prominently display the statement: "This is an unofficial translation of an article that appeared in a Wiley publication. The publisher has not endorsed this translation."

Use by commercial "for-profit" organisations

Use of Wiley Open Access articles for commercial, promotional, or marketing purposes requires further explicit permission from Wiley and will be subject to a fee. Commercial purposes include:

• Copying or downloading of articles, or linking to such articles for further redistribution, sale or licensing; • Copying, downloading or posting by a site or service that incorporates advertising with such content; • The inclusion or incorporation of article content in other works or services (other than normal quotations with an appropriate citation) that is then available for sale or licensing, for a fee (for example, a compilation produced for marketing purposes, inclusion in a sales pack) • Use of article content (other than normal quotations with appropriate citation) by for-profit organisations for promotional purposes • Linking to article content in e-mails redistributed for promotional, marketing or educational purposes; • Use for the purposes of monetary reward by means of sale, resale, licence, loan, transfer or other form of commercial exploitation such as marketing products • Print reprints of Wiley Open Access articles can be purchased from: [email protected]

Further details can be found on Wiley Online Library http://olabout.wiley.com/WileyCDA/Section/id-410895.html

Other Terms and Conditions: v1.9

If you would like to pay for this license now, please remit this license along with your payment made payable to "COPYRIGHT CLEARANCE CENTER" otherwise you will be invoiced within 48 hours of the license date. Payment should be in the form of a check or 316

317 money order referencing your account number and this invoice number 501340650. Once you receive your invoice for this order, you may pay your invoice by credit card. Please follow instructions provided at that time. Make Payment To: Copyright Clearance Center Dept 001 P.O. Box 843006 Boston, MA 02284-3006 For suggestions or comments regarding this order, contact RightsLink Customer Support: [email protected] or +1-877-622-5543 (toll free in the US) or +1-978-646-2777.

Gratis licenses (referencing $0 in the Total field) are free. Please retain this printable license for your reference. No payment is required.

317

318

SPRINGER LICENSE TERMS AND CONDITIONS

Jul 01, 2014

This is a License Agreement between Anupama Halmillawewa ("You") and Springer ("Springer") provided by Copyright Clearance Center ("CCC"). The license consists of your order details, the terms and conditions provided by Springer, and the payment terms and conditions.

All payments must be made in full to CCC. For payment instructions, please see information listed at the bottom of this form.

License Number 3419111399201

License date Jun 30, 2014

Licensed content publisher Springer

Licensed content publication Archives of Virology

Licensed content title Prokaryote viruses studied by electron microscopy

Licensed content author H. W. Ackermann

Licensed content date Jan 1, 2012

Volume number 157

Issue number 10

Type of Use Thesis/Dissertation

Portion Figures

Author of this Springer article No

Order reference number None

Original figure numbers Figure 2

Title of your thesis / dissertation Isolation, characterization and applications of rhizobiophages

Expected completion date Jul 2014

Estimated size(pages) 250

Total 0.00 CAD

Terms and Conditions 318

319

Introduction The publisher for this copyrighted material is Springer Science + Business Media. By clicking "accept" in connection with completing this licensing transaction, you agree that the following terms and conditions apply to this transaction (along with the Billing and Payment terms and conditions established by Copyright Clearance Center, Inc. ("CCC"), at the time that you opened your Rightslink account and that are available at any time at http://myaccount.copyright.com).

Limited License With reference to your request to reprint in your thesis material on which Springer Science and Business Media control the copyright, permission is granted, free of charge, for the use indicated in your enquiry.

Licenses are for one-time use only with a maximum distribution equal to the number that you identified in the licensing process.

This License includes use in an electronic form, provided its password protected or on the university’s intranet or repository, including UMI (according to the definition at the Sherpa website: http://www.sherpa.ac.uk/romeo/). For any other electronic use, please contact Springer at ([email protected] or [email protected]).

The material can only be used for the purpose of defending your thesis limited to university-use only. If the thesis is going to be published, permission needs to be re-obtained (selecting "book/textbook" as the type of use).

Although Springer holds copyright to the material and is entitled to negotiate on rights, this license is only valid, subject to a courtesy information to the author (address is given with the article/chapter) and provided it concerns original material which does not carry references to other sources (if material in question appears with credit to another source, authorization from that source is required as well).

Permission free of charge on this occasion does not prejudice any rights we might have to charge for reproduction of our copyrighted material in the future.

Altering/Modifying Material: Not Permitted You may not alter or modify the material in any manner. Abbreviations, additions, deletions and/or any other alterations shall be made only with prior written authorization of the author(s) and/or Springer Science + Business Media. (Please contact Springer at ([email protected] or [email protected])

Reservation of Rights Springer Science + Business Media reserves all rights not specifically granted in the combination of (i) the license details provided by you and accepted in the course of this licensing transaction, (ii) these terms and conditions and (iii) CCC's Billing and Payment terms and conditions.

Copyright Notice:Disclaimer You must include the following copyright and permission notice in connection with any reproduction of the licensed material: "Springer and the original publisher /journal title, volume, year of publication, page, chapter/article title, name(s) of author(s), figure number(s), original copyright notice) is given to the publication in which the material was originally published, by adding; with kind permission from Springer Science and Business Media" 319

320

Warranties: None

Example 1: Springer Science + Business Media makes no representations or warranties with respect to the licensed material.

Example 2: Springer Science + Business Media makes no representations or warranties with respect to the licensed material and adopts on its own behalf the limitations and disclaimers established by CCC on its behalf in its Billing and Payment terms and conditions for this licensing transaction.

Indemnity You hereby indemnify and agree to hold harmless Springer Science + Business Media and CCC, and their respective officers, directors, employees and agents, from and against any and all claims arising out of your use of the licensed material other than as specifically authorized pursuant to this license.

No Transfer of License This license is personal to you and may not be sublicensed, assigned, or transferred by you to any other person without Springer Science + Business Media's written permission.

No Amendment Except in Writing This license may not be amended except in a writing signed by both parties (or, in the case of Springer Science + Business Media, by CCC on Springer Science + Business Media's behalf).

Objection to Contrary Terms Springer Science + Business Media hereby objects to any terms contained in any purchase order, acknowledgment, check endorsement or other writing prepared by you, which terms are inconsistent with these terms and conditions or CCC's Billing and Payment terms and conditions. These terms and conditions, together with CCC's Billing and Payment terms and conditions (which are incorporated herein), comprise the entire agreement between you and Springer Science + Business Media (and CCC) concerning this licensing transaction. In the event of any conflict between your obligations established by these terms and conditions and those established by CCC's Billing and Payment terms and conditions, these terms and conditions shall control.

Jurisdiction All disputes that may arise in connection with this present License, or the breach thereof, shall be settled exclusively by arbitration, to be held in The Netherlands, in accordance with Dutch law, and to be conducted under the Rules of the 'Netherlands Arbitrage Instituut' (Netherlands Institute of Arbitration).OR:

All disputes that may arise in connection with this present License, or the breach thereof, shall be settled exclusively by arbitration, to be held in the Federal Republic of Germany, in accordance with German law.

Other terms and conditions: v1.3

If you would like to pay for this license now, please remit this license along with your payment made payable to "COPYRIGHT CLEARANCE CENTER" otherwise you will be invoiced within 48 hours of the license date. Payment should be in the form of a check or money order referencing your account number and this invoice number 501340641. Once you receive your 320

321 invoice for this order, you may pay your invoice by credit card. Please follow instructions provided at that time. Make Payment To: Copyright Clearance Center Dept 001 P.O. Box 843006 Boston, MA 02284-3006 For suggestions or comments regarding this order, contact RightsLink Customer Support: [email protected] or +1-877-622-5543 (toll free in the US) or +1-978-646-2777.

Gratis licenses (referencing $0 in the Total field) are free. Please retain this printable license for your reference. No payment is required.

321

322

NATURE PUBLISHING GROUP LICENSE TERMS AND CONDITIONS

Jul 01, 2014

This is a License Agreement between Anupama Halmillawewa ("You") and Nature Publishing Group ("Nature Publishing Group") provided by Copyright Clearance Center ("CCC"). The license consists of your order details, the terms and conditions provided by Nature Publishing Group, and the payment terms and conditions.

All payments must be made in full to CCC. For payment instructions, please see information listed at the bottom of this form.

License Number 3419120310148

License date Jun 30, 2014

Licensed content publisher Nature Publishing Group

Licensed content publication Nature Reviews Genetics

Licensed content title The future of bacteriophage biology

Licensed content author Allan Campbell

Licensed content date Jun 1, 2003

Volume number 4

Issue number 6

Type of Use reuse in a dissertation / thesis

Requestor type academic/educational

Format print and electronic

Portion figures/tables/illustrations

Number of figures/tables/illustrations 1

High-res required no

Figures Figure 1

Author of this NPG article no 322

323

Your reference number None

Title of your thesis / dissertation Isolation, characterization and applications of rhizobiophages

Expected completion date Jul 2014

Estimated size (number of pages) 250

Total 0.00 USD

Terms and Conditions for Permissions

Nature Publishing Group hereby grants you a non-exclusive license to reproduce this material for this purpose, and for no other use,subject to the conditions below:

• NPG warrants that it has, to the best of its knowledge, the rights to license reuse of this material. However, you should ensure that the material you are requesting is original to Nature Publishing Group and does not carry the copyright of another entity (as credited in the published version). If the credit line on any part of the material you have requested indicates that it was reprinted or adapted by NPG with permission from another source, then you should also seek permission from that source to reuse the material. • Permission granted free of charge for material in print is also usually granted for any electronic version of that work, provided that the material is incidental to the work as a whole and that the electronic version is essentially equivalent to, or substitutes for, the print version.Where print permission has been granted for a fee, separate permission must be obtained for any additional, electronic re-use (unless, as in the case of a full paper, this has already been accounted for during your initial request in the calculation of a print run).NB: In all cases, web-based use of full-text articles must be authorized separately through the 'Use on a Web Site' option when requesting permission. • Permission granted for a first edition does not apply to second and subsequent editions and for editions in other languages (except for signatories to the STM Permissions Guidelines, or where the first edition permission was granted for free). • Nature Publishing Group's permission must be acknowledged next to the figure, table or abstract in print. In electronic form, this acknowledgement must be visible at the same time as the figure/table/abstract, and must be hyperlinked to the journal's homepage.

• The credit line should read: Reprinted by permission from Macmillan Publishers Ltd: [JOURNAL NAME] (reference citation), copyright (year of publication) For AOP papers, the credit line should read: Reprinted by permission from Macmillan Publishers Ltd: [JOURNAL NAME], advance online publication, day month year (doi: 10.1038/sj.[JOURNAL ACRONYM].XXXXX) Note: For republication from the British Journal of Cancer, the following credit lines apply. Reprinted by permission from Macmillan Publishers Ltd on behalf of Cancer Research UK: [JOURNAL NAME] (reference citation), copyright (year of publication)For AOP papers, the credit line should read: Reprinted by permission from Macmillan Publishers Ltd on behalf of Cancer Research UK: [JOURNAL NAME], advance online publication, day month year (doi: 323

324

10.1038/sj.[JOURNAL ACRONYM].XXXXX) • Adaptations of single figures do not require NPG approval. However, the adaptation should be credited as follows: Adapted by permission from Macmillan Publishers Ltd: [JOURNAL NAME] (reference citation), copyright (year of publication) Note: For adaptation from the British Journal of Cancer, the following credit line applies. Adapted by permission from Macmillan Publishers Ltd on behalf of Cancer Research UK: [JOURNAL NAME] (reference citation), copyright (year of publication) • Translations of 401 words up to a whole article require NPG approval. Please visit http://www.macmillanmedicalcommunications.com for more information.Translations of up to a 400 words do not require NPG approval. The translation should be credited as follows: Translated by permission from Macmillan Publishers Ltd: [JOURNAL NAME] (reference citation), copyright (year of publication). Note: For translation from the British Journal of Cancer, the following credit line applies. Translated by permission from Macmillan Publishers Ltd on behalf of Cancer Research UK: [JOURNAL NAME] (reference citation), copyright (year of publication) We are certain that all parties will benefit from this agreement and wish you the best in the use of this material. Thank you.

Special Terms: v1.1

If you would like to pay for this license now, please remit this license along with your payment made payable to "COPYRIGHT CLEARANCE CENTER" otherwise you will be invoiced within 48 hours of the license date. Payment should be in the form of a check or money order referencing your account number and this invoice number 501340646. Once you receive your invoice for this order, you may pay your invoice by credit card. Please follow instructions provided at that time. Make Payment To: Copyright Clearance Center Dept 001 P.O. Box 843006 Boston, MA 02284-3006 For suggestions or comments regarding this order, contact RightsLink Customer Support: [email protected] or +1-877-622-5543 (toll free in the US) or +1-978-646-2777.

Gratis licenses (referencing $0 in the Total field) are free. Please retain this printable license for your reference. No payment is required.

324