Quick viewing(Text Mode)

Molecular Factors and Genetic Differences Defining Symbiotic Phenotypes of Galega Spp

Molecular Factors and Genetic Differences Defining Symbiotic Phenotypes of Galega Spp

Department of Food and Environmental Sciences University of Helsinki Helsinki

Molecular factors and genetic differences defining symbiotic phenotypes of spp. and

Neorhizobium galegae strains

Janina Österman

ACADEMIC DISSERTATION

To be presented, with the permission of the Faculty of Agriculture and Forestry of the University of Helsinki, for public examination in lecture hall 2, Viikki building B, Latokartanonkaari 7, on the 26th of June 2015, at 12 noon.

Helsinki 2015

Supervisor: Professor Kristina Lindström Department of Environmental Sciences University of Helsinki, Finland

Co-supervisors: Professor J. Peter W. Young Department of Biology University of York, England

Docent David Fewer Department of Food and Environmental Sciences University of Helsinki, Finland

Pre-examiners: Professor Katharina Pawlowski Department of Ecology, Environment and Sciences Stockholm University, Sweden

Docent Minna Pirhonen Department of Agricultural Sciences University of Helsinki, Finland

Opponent: Senior Lecturer, Doctor Xavier Perret Department of Botany and Plant Biology Laboratory of Microbial Genetics University of Geneva, Switzerland

Dissertationes Schola Doctoralis Scientiae Circumiectalis, Alimentarie, Biologicae ISSN 2342-5423 (print) ISSN 2342-5431 (online)

ISBN 978-951-51-1191-3 (paperback) ISBN 978-951-51-1192-0 (PDF) Electronic version available at http://ethesis.helsinki.fi/

Hansaprint Vantaa 2015

Abstract

Nitrogen is an indispensable element for and animals to be able to synthesise essential biological compounds such as amino acids and nucleotides. Although there is plenty of nitrogen in the form of nitrogen gas (N2) in the Earth’s atmosphere, it is not readily available to plants but needs to be converted (fixed) into ammonia before it can be utilised. Nitrogen-fixing living freely in the soil or in symbiotic association with plants, fix N2 into ammonia used by the plants. This is known as biological (BNF). In contrast to industrial nitrogen fixation, an energy-demanding process using high temperature and pressure to produce chemical fertilizers, BNF makes use of solar energy alone to complete the same reaction. However, the requirements on compatibility of plants and nitrogen- fixing micro-organism, the rate of conversion and the ability of the micro-organisms to survive in stressful environments are limiting factors of this system. The current demand for more sustainable food production makes BNF an attractive alternative. However, optimization of existing BNF systems as well as development of new highly productive ones is necessary, to be able to replace the use of chemical fertilisers. In order to develop new alternatives, we need to gain more knowledge on the requirements set by both plants and micro-organisms for successful and efficient nitrogen fixation to occur. In this thesis, the nitrogen-fixing legume host Galega (goat’s rue) and its symbiotic microbial partner Neorhizobium galegae were used as a model system to investigate the features defining good symbiotic nitrogen fixation. Studies of genetic diversity within the host plant showed that there are genetic traits making a distinction between the two species G. orientalis and G. officinalis, both at a whole-genome level and at the level of specific -related genes. Genome sequencing of ten strains of N. galegae provided a useful dataset for studying i) the genomic features separating N. galegae from related nitrogen-fixing bacteria () and ii) the genetically encoded characteristics that divide strains of N. galegae into two separate symbiovars (symbiotic variants that show different phenotypes on the two different Galega host plant species). These studies provided new information on genes possibly involved in determining host specificity and efficiency of nitrogen fixation. In addition, previously unrecognised genetic contents provided insight into the ecology of N. galegae. Most importantly, genome sequencing enabled identification of the noeT gene, responsible for acetylation of the N. galegae Nod factor (signal molecule required for symbiosis). Although the noeT gene did not turn out to be the crucial determinant enabling nodulation of Galega spp. as previously anticipated, these results are important for future studies on mechanisms behind the selectiveness (host specificity) observed in nitogen-fixing symbioses between Galega and N. galegae.

3 Sammanfattning

Växter och djur är beroende av kväve för att kunna syntetisera nödvändiga biologiska föreningar så som aminosyror och nukleotider. Även om det finns ett överflöd av kvävgas i atmosfären, kan växter inte använda sig av kvävgasen som sådan, utan den måste först omvandlas (fixeras) till ammoniak. Kvävefixerande bakterier har en naturlig förmåga att fixera kvävgas till fördel för växterna. Det här fenomenet kallas biologisk kvävefixering. Bakterierna kan antingen vara frilevande eller leva i symbios med baljväxter. I motsats till industriell kvävefixering, en energikrävande process som är beroende av hög temperatur och högt tryck för att binda kväve i kvävegödsel, utnyttjar biologisk kvävefixering enbart solenergi för samma reaktion. Biologisk kvävefixering begränsas å andra sidan av krav på kompatibilitet mellan värdväxter och kvävefixerande mikroorganismer, effektiviteten av kvävefixeringen samt mikroorganismernas förmåga att överleva under krävande förhållanden. Den rådande efterfrågan på mer hållbara sätt att producera mat för jordens befolkning gör biologisk kvävefixering ett attraktivt alternativ. För att kunna ersätta använding av industriellt kvävegödsel krävs ändå att existerande biologiska kvävefixeringssystem optimeras och att nya högproduktiva system utvecklas. Detta å sin sida förutsätter förbättrad kunskap om de krav som både växter och mikroorganismer ställer för att en lyckad, effektiv kvävefixering ska vara möjlig. I denna avhandling användes som modellsystem växt – bakterie paret Galega (getärt) och Neorhizobium galegae, som ingår kvävefixerande symbios med varandra, för att studera särdrag som är kännetecknande för en lyckad symbiotisk kvävefixering. Studier av den genetiska diversiteten inom värdväxten visade att genetiska drag skiljer de två arterna G. orientalis och G. officinalis åt, både då hela genomet beaktas och på ett plan av specifika symbiosrelaterade gener. Genomsekvensering av tio stammar av N. galegae tillhandahöll ett användbart dataset för att studera genetiska drag som dels skiljer N. galegae från närbesläktade kvävefixerande bakterier (rhizobier), dels delar in stammar av N. galegae i två symbiovarer (d.v.s. varianter av bakterien som uttrycker sig olika på de två olika värdväxtarterna av Galega). Dessa studier resulterade i ny information om gener som kan vara involverade i att bestämma värdväxtspecificiteten och effektivitetsgraden för kvävefixeringen. Dessutom identifierades genetiskt innehåll som ger ny information om bakteriens ekologi. Av största vikt var att genomsekvenseringen ledde till identifiering av genen noeT, som modifierar den så kallade Nod faktorn (en signalmolekyl som behövs för initiering av en fungerande symbios) hos N. galegae. Även om det visade sig att noeT inte spelar den avgörande rollen som möjliggör symbios med arter av Galega som tidigare antogs, så utgör dessa resultat en viktig grund för kommande studier av mekanismerna som bidrar till den selektivitet (värdväxtspecificitet) som präglar den kvävefixerande symbiosen mellan Galega och N. galegae.

4

Acknowledgements

This work was mostly carried out at the excellent working facilities of the Department of Food and Environmental Sciences, University of Helsinki. The Department of Environmental Sciences, University of Helsinki, is acknowledged for providing office space during the final stage. Financial support was provided by the Academy of Finland (project 132544) and the Swedish Cultural Foundation in Finland (grant 13/7440-1304). I wish to thank my supervisors for supporting me through this work. My main supervisor Prof. Kristina Lindström always kept the door open, making it easy to discuss problems and choices to be made. Stina’s support and her faith in me have been important for my scientific development. During the seven weeks I spent in Prof. Peter Young’s group at the Univeristy of York I got a good introduction to the world of bioinformatics and genomics, and I am grateful for the discussions we had - the huge amount of information on rhizobia that Peter possesses is incredible. I also want to thank Doc. David Fewer for helping me with tricky bioinformatics and English editing especially at the beginning of my PhD studies. In addition to my official supervisors I am truly grateful for all the time and effort that Lars Paulin at the DNA Sequencing and Genomics lab has put into guiding me through genome sequencing and bioinformatics related to this. You were like a supervisor to me! Lasse also put me in contact with a number of persons without whom I could not have made it: Pia Laine and Juhana Kammonen from the sequencing lab and Per Johansson from the Department of Food Hygiene and Environmental Health. Pia did a great job with assembly of the two complete N. galegae genomes, but also managed to make work more fun with her sense of humor , so thank you - I really appreciated our little chats! Juhana deserves special thanks for patiently explaining every detail of genome assembly but also for always taking the time to help me out and provide nice detailed descriptions. I also greatly appreciate the guidance to bioinformatics analyses provided by Per. I also wish to thank all co-authors of the publications for their contribution to this thesis! The collaboration with Evgeny Andronov was of the utmost importance for my first publication. I highly appreciate the collaboration with Prof. Jane Thomas- Oates and her student Joanne Marsh, who produced the high-quality mass spectrometry data for the second publication. I also want to express my sincere gratitude to John Sullivan, who provided me with detailed guidance in the mutant construction process for the second publication. Thanks also to Patrik Koskinen for running PANNZER for me, it made my work a lot easier. The pre-examiners of this thesis, Doc. Minna Pirhonen and Prof. Katharina Pawlowski, are acknowledged for their thorough work and for their comments on how the thesis could be improved. I want to thank all members of the N2-group for your friendship, coffee breaks and seminar discussions. Abdy, thanks for being such a good friend and for your company in the lab! I wish to thank Petri for listening to my complaints and for providing advice whenever needed, and Leena S. for showing a genuine interest in

5 my work and for your encouragement. I am also grateful for the help I received from Peter H. during my stay in York (and to some extent after that too!), and for the friendship of Anja W., whose company made my time in York so much nicer! Monna, Annette, Jenny, Kia and Emilia, my hardworking and caring friends, thank you for your friendship and for making sure I have a social life! Ett stort tack till mamma och pappa för att ni alltid trott på mej och varit intresserade av det jag gjort, samt till Emilia för att du trots våra olikheter visat uppskattning för det jag åstadkommit. Jag vill också tacka Nanne för barnpassning när det blev ont om tid på slutrakan, samt Linda och Sandra för att ni alltid är intresserade och uppmuntrande oberoende vad saken gäller! Slutligen, Tack till Reidar, för att du stött mig genom åren och haft tålamod med mej då stressen gjort mig på besvärligt humör. Jag är tacksam över vår lilla familj!

6

7 Contents

Abstract ...... 3

Sammanfattning ...... 4

Acknowledgements ...... 5

Contents ...... 8

List of original publications ...... 10

Abbreviations ...... 11

1 Introduction ...... 12

1.1 Biological nitrogen fixation in ...... 12

1.1.1 Galega species G. orientalis and G. officinalis ...... 13

1.1.2 Neorhizobium galegae ...... 15

1.2 Determinants of nitrogen-fixing symbioses ...... 16

1.2.1 Plant receptors ...... 17

1.2.2 nod genes and Nod factors ...... 18

1.2.3 Additional contributors to successful symbioses ...... 22

1.2.3.1 Protein secretion systems ...... 22

1.2.3.2 Surface polysaccharides ...... 23

1.3 Genome sequencing – an important tool in biological sciences ...... 24

2 Outline, objective and methods of the work...... 27

3 Diversity of the Galega species in the gene centre of G. orientalis ...... 29

3.1 Galega NORK and NFR5 receptor genes ...... 30

3.2 Evaluation of co-evolution between Galega spp. and N. galegae ...... 33

4 The genome of N. galegae ...... 35

4.1 N. galegae compared to reference strains of related species ...... 36

4.2 Differences within N. galegae ...... 38

8

4.2.1 Genome-scale differences between strains that show different symbiotic phenotypes ...... 38

4.2.2 Genes within the symbiosis gene region...... 39

4.2.3 Secretion systems of type IV and VI ...... 41

5 The NoeT protein modulates the Nod factor of N. galegae ...... 43

5.1 Additional players in the symbiotic interaction ...... 43

6 Conclusions ...... 45

7 References ...... 46

9 List of original publications

This thesis is based on the following publications:

I Österman, J., Chizhevskaja, E. P., Andronov, E. E., Fewer, D. P., Terefework, Z., Roumiantseva, M. L., Onichtchouk, O. P., Dresler- Nurmi, A., Simarov, B. V., Dzyubenko, N. I., Lindström, K. 2011. is more diverse than in Caucasus—whole-genome AFLP analysis and phylogenetics of symbiosis-related genes. Molecular Ecology, 20: 4808–4821. doi: 10.1111/j.1365-294X.2011.05291.x

II Österman, J., Marsh, J., Laine, P. K., Zeng, Z., Alatalo, E., Sullivan, J. T., Young, J. P. W., Thomas-Oates, J., Paulin, L., Lindström, K. 2014. Genome sequencing of two Neorhizobium galegae strains reveals a noeT gene responsible for the unusual acetylation of the nodulation factors. BMC Genomics, 15:500. doi: 10.1186/1471-2164-15-500

III Österman, J., Mousavi, S. A., Koskinen, P., Paulin, L., Lindström, K. 2015. Genomic features separating ten strains of Neorhizobium galegae with different symbiotic phenotypes. BMC Genomics, 16:348. doi:10.1186/s12864-015-1576-3

The publications are referred to in the text by their roman numerals. The publications are reproduced with kind permission from the publishers.

The contribution of the author to the publications:

I Janina Österman performed phylogenetic analyses and analyses of positive selection on plant genes, and interpreted these results with David Fewer. Janina Österman also interpreted AFLP results with Evgeny Andronov, wrote the manuscript and was the corresponding author.

II Janina Österman participated in the design of the study, carried out the molecular genetic studies and the plant assays, performed bioinformatics analyses of the genome data and wrote the main part of the manuscript. Janina Österman was also the corresponding author during the article submission process.

III Janina Österman designed the study, performed genome assembly, manual annotation and bioinformatics analyses. Janina Österman also performed experimental laboratory work, wrote the manuscript and was the submitting author.

10

Abbreviations aa Amino acids ABC ATP-binding cassette AFLP Amplified Fragment Length Polymorphism BNF Biological nitrogen fixation CDS Protein-coding DNA sequence COG Clusters of orthologous groups DNA Deoxyribonucleic acid EPS Exopolysaccharide GlcNAc N-acetylglucosamine KPS Capsular polysaccharide LCO Lipo-chitin oligosaccharide LPS Lipopolysaccharide LysM Lysin motif MFP Membrane fusion protein Mpf Mating pair formation N Nitrogen NCR Nodule-specific cysteine-rich NF Nod(ulation) factor NO Nitric oxide OMF Outer membrane factor PCR Polymerase chain reaction QS Quorum sensing RNA Ribonucleic acid rRNA Ribosomal ribonucleic acid Sec The general secretory pathway sp. Species (singular form) spp. Species (plural form) sv. Symbiovar T1SS Type I secretion system T2SS Type II secretion system T3SS Type III secretion system T4SS Type IV secretion system T5SS Type V secretion system T6SS Type VI secretion system Tat Twin-arginine translocation Tg Teragrams tRNA Transfer ribonucleic acid UPGMA Unweighted Pair Group Method with Arithmetic Mean

11 Introduction

1 Introduction

Sustainable development, land management and food production are topics that gain more and more attention in society of today, where environmental change and growing populations challenge people all over the world to come up with new ways of using the natural resources. Chemical nitrogen fertilizers are widely applied to agricultural fields to ensure sufficient food production. However, their production uses natural resources to obtain the high temperature and pressure required (Burris & Roberts 1993), and their application leads to pollution problems (Fowler et al. 2013). A natural phenomenon that can be applied to meet the needs of sustainability in food production is biological nitrogen fixation (BNF), which derives the required energy from sunlight and makes use of the nitrogen gas in the atmosphere to produce nutrients for plants. It thus favours both land management and food production. One reason behind the currently limited application of BNF in landfarming is the restricted amount of food crops able to utilise this system, with a majority of all plants able to fix nitrogen in association with bacteria found in the Leguminosae family (Sprent 2007). A second reason is the knowledge gap regarding the mechanisms determining the efficiency of BNF. In this thesis, studies of genetic features of the plant species Galega orientalis and Galega officinalis and their partners in biological nitrogen fixation, bacteria in the species Neorhizobium galegae, are reported. The results of this research contribute to the general knowledge on the mechanisms used in systems of BNF and the diversity within such systems.

1.1 Biological nitrogen fixation in legumes

Nitrogen is indispensable for plant growth. Molecular nitrogen (N2) is abundant in the atmosphere, but the applicability of nitrogen in this form in natural systems is restricted. Soils are often poor in nitrogen sources that can be used by plants, and plants are not able to use N2 as such. Biological nitrogen fixation is the natural process where atmospheric nitrogen is converted by micro-organisms into a form that can be used by plants as nutrients. This can be performed by free-living, associative or symbiotic microbes (Dalton & Kramer 2007). The agriculturally most important form of BNF is carried out in co-operation between legume plants and their symbiotic bacteria, collectively named rhizobia. This system accounts for a majority of the total amount of biologically fixed nitrogen by all agricultural plants, currently estimated to be 50-70 Tg nitrogen annually (Herridge et al. 2008), while the input of industrial nitrogen fertilizers is 120 Tg nitrogen per year (Fowler et al. 2013). Symbiotic nitrogen fixation takes place in root nodules, special structures that appear on plant roots or stems when compatible rhizobia enter symbiosis with the legume plant. However, selectiveness can be observed in the symbiotic interactions, where both rhizobia and host plants show limitations in compatibility with different nitrogen-fixing partners (Perret et al. 2000). This is called the host specificity of

12

nitrogen fixation, which makes symbiotic BNF an even more intriguing but also complicated system that is still far from well understood. Plants in the Galega, used as model plants in this thesis, are highly selective, the only microsymbiont currently known to be compatible with their nitrogen fixation machinery being strains of the rhizobial genus Neorhizobium galegae. In addition, strains of N. galegae are very host specific too, the only known host plant of published studies to date being G. orientalis and G. officinalis (Lindström 1989). However, unpublished observations indicate that strains of N. galegae might be able to induce nodules, even effective ones in some cases, on species of Acacia (J. Österman; V. Martinez Alcántara and P. Balatti; unpublished).

1.1.1 Galega species G. orientalis and G. officinalis

The legume plant genus Galega, in the subfamily Papilionoideae of the plant family (also called Leguminosae), consists of two well-known, taxonomically accepted species: G. orientalis ( galega) and G. officinalis (goat’s rue; sometimes used for both species). There are sources claiming that there are up to eight species of Galega, but the only well-studied ones which have not been assigned to an alternative genus are the two aforementioned species. These herbaceous perennials are able to fix nitrogen through symbiotic interactions with a specific soil bacterium, Neorhizobium galegae. The properties of these plants have made them of interest as green manure and as fodder plants, but also for use in bioremediation. Galega plants are perennials that grow from 0.3 to 1.2 meters tall and reproduce by seed. They have odd pinnate leaves with 3-9 pairs of leaflets and a terminal leaflet. The species can be told apart from each other by studying their stipules: on G. officinalis they have a hastate base while on G. orientalis they are entire. Flowers of G. orientalis are usually darker than the white–light purple and blue veined flowers of G. officinalis, and pods are hanging, compared to the erect pods of G. officinalis. (Mossberg & Stenberg 2012, Hämet-Ahti et al. 1998, LuontoPortti/NatureGate 2014) According to the most recent molecular studies, Galega forms a sister clade to all other legume species in the Vicioid clade (Wojciechowski et al. 2004, Figure 1). G. orientalis originates from Caucasus (Vavilov 1926), while the origin of G. officinalis is more uncertain. It is speculated that the gene centre of G. officinalis is in Western Asia or southern Europe, in or in Bulgaria. G. orientalis has been reported naturalized in Austria and France (ARS-GRIN). Observations have been reported in Finland, Sweden, Norway, Denmark and Estonia (EOL). G. officinalis, on the other hand, is reported world-wide (Figure 2).

13 Introduction

Figure 1 Phylogeny of Galega orientalis and closely related species in the Vicioid clade of the family Leguminosae, based on parsimony analyses of the plastid matK gene. IRLC: Inverted repeat lacking clade. Modified from Wojciechowski et al. (2004) .

G. officinalis has been found toxic to sheep (Gresham & Booth 1991, Keeler et al. 1986) due to its production of the alkaloid galegine. It also produces vasicine, an alkaloid which is also found in G. orientalis at much lower concentrations (Laakso et al. 1990). No significant amounts of galegine have been found in G. orientalis, which does not induce toxicity in animals. The galegine concentration in G. officinalis varies over plant tissues and phenological growth stages, being highest in reproductive tissues and averaged over all plant parts at the immature pod stage, potentially being most toxic at the stage when it is likely to be harvested (Oldham et al. 2011). Despite its known toxicity, recent studies indicate that a controlled daily dose of G. officinalis in the diet of sheep (2 g dry matter per kg body weight, or about 120 g per sheep) does improve milk production (González-Andrés et al. 2004). The mechanism behind the higher milk production is, however, not clear (González-Andrés et al. 2004). The amount at which consumption of G. officinalis gets toxic has been estimated to be about 400 g of dried plant material per day (Oldham et al. 2011), even though adaptation to the toxin has been observed when animals were exposed to low levels of toxin on several consecutive days (Keeler et al. 1986). Historically, G. officinalis has, however, been used as a medicinal plant

14

(Duke 1987). In Finland, G. officinalis is mostly found as an ornamental (Hämet- Ahti et al. 1998). G. orientalis, on the other hand, has great potential as a plant in Finnish climate conditions, competing with red clover in the quantity and quality of the yield (Varis 1986). In addition, this plant has potential for use in bioremediation of oil-contaminated soil (Mikkonen et al. 2011, Kaksonen et al. 2006, Suominen et al. 2000)

Figure 2 Distribution of Galega officinalis. In the countries coloured with orange, G. officinalis has been reported native or naturalized. In countries coloured in yellow, observations of G. officinalis have been made, but the status has not been found in the literature. Occurrence in big countries is marked with yellow stripes, although observations have not been made across all of the country. Based on information from ARS-GRIN, NRCS and EOL.

1.1.2 Neorhizobium galegae

The soil bacterium Neorhizobium galegae was initially described as Rhizobium galegae (Lindström 1989), a genus of the family Rhizobiaceae in the class of . For a long time, the taxonomic position of R. galegae was uncertain, floating between the clades of Rhizobium and Agrobacterium (Martens et al. 2008, Terefework et al. 1998). Thus, there was a need for a more accurate species classification. As a result of extensive phylogenetic analysis, support for a separate genus was found (Figure 3) and the genus Neorhizobium was proposed by Mousavi et al. in 2014. The type species of Neorhizobium is the former R. galegae, thus renamed N. galegae. The former R. huautlense, R. alkalisoli and R. vignae were also included in this new genus, as well as some strains lacking genus classification. The type strain of N. galegae is strain HAMBI 540T (syn. gal1261T, ATCC 43677T, DSM 11542T, LMG 6214T).

15 Introduction

Figure 3 Phylogenetic tree showing the position of N. galegae in relation to closely related species. Modified from the tree based on six housekeeping genes by Mousavi et al. (2014).

Bacteria belonging to the species N. galegae are Gram-negative rods, able to initiate a symbiotic relationship with the plant species Galega orientalis and G. officinalis (Lindström 1989). This rhizobial species has a very narrow host range compared to, for example, the broad host range strain Sinorhizobium (syn. Ensifer) fredii NGR234, which nodulates host plants from more than 112 plant genera (Pueppke & Broughton 1999). Symbiosis of N. galegae on Galega spp. is initiated by plant root hair deformation and infection thread formation, leading to penetration by the bacterial cells into root cortex cells and formation of indeterminate root nodules that can be effective (nitrogen-fixing) or ineffective (root nodules where no nitrogen fixation takes place) (Lipsanen & Lindström 1988). Strains of N. galegae are very host specific, fixing nitrogen with only one of the two Galega host plant species, and are therefore divided into two symbiovars (formerly biovars, (Rogel et al. 2011)) based on which one of the Galega plant species they induce effective nodules on (Radeva et al. 2001). Strains of N. galegae do not occur naturally in Finnish soils, making inoculation necessary for Galega plants to grow successfully (Lindström et al. 1990).

1.2 Determinants of nitrogen-fixing symbioses

Symbiosis between nitrogen-fixing bacteria and their host plant is a tightly regulated event, requiring signal perception at several different stages. Even though more information is acquired constantly, we still do not know the exact requirements at every step on the way to formation of a successful nitrogen-fixing symbiosis. The fact that only certain bacteria are able to induce nodules on a specific host plant makes it even harder to pinpoint the critical events. Signal molecules exuded from

16

plant roots, flavonoids, are crucial for initiation of the cascade leading to a functional root nodule (Franche et al. 2009). These molecules are polyaromatic secondary metabolites that can trigger gene expression in compatible rhizobia living close to the plant roots, leading to production of a rhizobial signal molecule, the Nod factor. Different plants produce different flavonoids, and different rhizobia respond only to certain flavonoids (Perret et al. 2000, Cárdenas et al. 1995). This is the first control step in initiation of a nitrogen-fixing symbiosis. In the following step, the host plant must be able to recognise the Nod factor secreted by the rhizobial strain and to distinguish the rhizobium from plant pathogenic bacteria. The plant has receptor gene products that interact with rhizobial Nod factors, optimally leading to a signalling cascade in the plant resulting in root hair deformation, bacterial penetration and nodule formation (Franche et al. 2009). The nodules can be of determinate or indeterminate type, characterised by a loss of meristermatic activity after infection (determinate) or retained meristematic activity at the tip of the nodule, giving rise to an elongated nodule shape (indeterminate). In infected nodule primordia cells, rhizobia differentiate into so called bacteroids. Plant factors cause the bacteroids in indeterminate nodules to go through irreversible terminal differentiation, while no such phenomenon is observed for bacteroids in determinate nodules where bacteria maintain their normal size, genome content and the ability to resume growth when released from the nodule (Mergaert et al. 2006). These plant factors of indeterminate nodules were recently identified as nodule-specific cysteine- rich (NCR) peptides that are targeted to the bacteria and enter the bacterial membrane and cytosol (van de Velde et al. 2010). The determinants of early interactions at the onset of symbiosis are discussed in further detail below.

1.2.1 Plant receptors

The current understanding of the plant signal transduction cascade initiated by the Nod factor (Figure 4) is that the first step is carried out by two plant LysM-type receptor kinases, called NFR1 (for Nod factor receptor 1) and NFR5 in the model legume Lotus japonicus (Madsen et al. 2003, Radutoiu et al. 2003) (corresponding proteins in the second model legume Medicago truncatula named LYK3 (Limpens et al. 2003) and NFP (Arrighi et al. 2006, Ben Amor et al. 2003) respectively). Both of these receptors have extracellular LysM domains, which are involved in Nod factor recognition (Radutoiu et al. 2007). A NFR1-NFR5 complex has the potential to initiate downstream signalling through autophosporylation as well as phosphorylation of NFR5 by NFR1 (Madsen et al. 2011). NFR5 has no in vitro kinase activity (Madsen et al. 2011), but it has been shown to form a complex with another receptor required for nodulation, named SYMRK (symbiosis receptor-like kinase, Stracke et al. 2002) in L. japonicus (Antolín-Llovera et al. 2014). The corresponding protein is called NORK (also known as DMI2) in Medicago species (Endre et al. 2002). The NFR1 and NFR5 genes have been shown to participate in determining host specificity, as expression of L. japonicus NFR1 and NFR5 in M. truncatula and L. filicaulis extended their host ranges to include rhizobial species

17 Introduction normally nodulating L. japonicus (Radutoiu et al. 2007). It was further demonstrated that recognition of the Nod factor by the LysM domains of NFR1 and NFR5 depends on the structure of the Nod factor (Radutoiu et al. 2007). However, while both NFR1 and NFR5 are required in L. japonicus (Madsen et al. 2003, Radutoiu et al. 2003), NFP alone performs the same function in M. truncatula (Ben Amor et al. 2003). A single leucine residue in the LysM2 region of NFR5 (NFP) has been shown to play a major role in recognition of different Nod factors in both L. japonicus and M. truncatula (Bensmihen et al. 2011, Radutoiu et al. 2007). Signal transduction following Nod factor perception is believed to be common with the arbuscular mycorrhizal Myc factor signalling pathway. Following Nod factor perception by NFR1 and NFR5, SYMRK (NORK) is activated and a secondary messenger produced, mediating perinuclear calcium spiking.

Figure 4 Nod factor perception signalling cascade. The plant receptor names used for Galega spp. are used, with alternative names used for L. japonicus or Medicago spp. in brackets. The Nod factor binds to the LysM-type receptor kinases NFR1 and NFR5, which interact with the NORK receptor to produce a secondary messenger that induces calcium oscillations in the nucleus. The calcium oscillations are decoded by the calcium (Ca2+)- and calmodulin (CaM)-dependent serine/threonine protein kinase (CCaMK) that associates with and phosphorylates CYCLOPS. The expression of nodulation-associated genes then leads to rhizobial infection. Based on Radutoiu et al. (2003) and Oldroyd (2013). Illustrative representation of calcium oscillation decoding from Oldroyd (2013).

1.2.2 nod genes and Nod factors

The very first step in the interaction between host plant and microsymbiont is the production of signal molecules. Flavonoids exuded from plant roots induce expression of the rhizobial nodD gene, the product of which acts as a transcriptional regulator and activates other genes conferring the nodulation ability to rhizobia (primarily nod, nol and noe genes). Among the first nod genes to be activated are the so called common nod (nodABC) genes, which are found in almost all rhizobia and are required for the synthesis of the rhizobial signal molecule, the Nod factor (NF). NFs are lipochitin oligosaccharides (LCOs) decorated with diverse substitutions (Figure 5A). The nodABC genes are responsible for production of the core structure of the NF, generally consisting of three to five ȕ-1,4-linked N-acetylglucosamine

18

(GlcNAc) residues, with an N-acyl group substituted on the non-reducing-terminal monosaccharide residue. NodC is an N-acetylglucosaminyl transferase (chitin synthase) responsible for linking GlcNAc residues together to form an oligomer (Geremia et al. 1994), NodB is a chitooligosaccharide deacetylase that deacetylates the nonreducing GlcNAc residue of the oligomer (John et al. 1993), and NodA is an acyltransferase involved in N-acylation of the deacetylated acetamido group (Atkinson et al. 1994, Röhrig et al. 1994). The core structure is further modified by other nodulation genes, giving it the final species-specific structure. For example, the nodEF genes influence the length and degree of unsaturation of the fatty acyl group, which varies in different rhizobia. There are tens of different nodulation genes identified, and the combination of those genes present in a certain rhizobium defines the final structure of the NFs of that species. The general structure of the NFs of N. galegae is shown in Figure 5B, and a summary of all nod, noe and nol genes identified to date is presented in Table 1.

Figure 5 A) General Nod factor structure. R1-R10, positions where substitutions are found; n, oligomerisation degree of the N-acetylglucosamine backbone, usually 0-3. B) Nod factor sturctur of N. galegae. The proteins responsible for the specific substitutions are indicated in green. a In N. galegae, fatty acyl structures with carbon chains ranging from C14 to C22 with differing degrees of unsaturation have been observed (Yang et al. 1999, Paper II). b One single LCO variant has been detected having a methyl group on the reducing terminal position in HAMBI 1174 (Paper II).

There are several reports showing that a single nod gene involved in shaping the NF can be crucial for nodulation. For example, nodS (encoding an S-adenosyl methionine methyl transferase) is required for nodulation of R. tropici CIAT899 on Phaseolus vulgaris (Waelkens et al. 1995), nodX (encoding an acetyl transferase) is

19 Introduction required for R. leguminosarum sv. viciae strain TOM to nodulate Afghanistan pea (Davis et al. 1988) while nodH (encoding a sulfotransferase) was reported to be essential for nodulation of alfalfa by R. leguminosarum sv. viciae and R. meliloti (Faucher et al. 1989). However, the structure of the NF does not seem to be the sole thing determining whether symbiosis is initiated or not. Most rhizobia produce NF mixes consisting of NFs with different backbone lengths, different fatty acyl chains and different substitutions, within the framework of alternatives provided by the nodulation genes present. Whether the host plant recognises only one, a few or all of the different NFs presented is still unknown. In N. galegae, the two symbiovars have been shown to produce similar NF mixtures (Yang et al. 1999), indicating that the NF structure is not the final determinant for effective nitrogen fixation. In addition to the genes involved in modulating the NFs, the symbiosis gene region where these are located usually also contains nif and fix genes, required for the function of the nitrogenase complex that performs the actual nitrogen fixation.

Table 1. Summary of rhizobial nod, nol and noe genes with identified or putative functions as determinants of nodulation.

Gene Function Selected references nodA N-acyltransferase Atkinson et al. 1994, Röhrig et al. 1994 nodB Chitin oligosaccharide deacetylase John et al. 1993 nodC UDP-GlcNAc transferase Geremia et al. 1994, Spaink et al. 1994 nodD LysR transcriptional activator Rossen et al. 1985, Horvath et al. 1987 nodE Beta ketoacyl ACP synthase Demont et al. 1993, Bloemberg et al. 1995 nodF Acyl carrier protein Shearman et al. 1986, Demont et al. 1993, Ritsema et al. 1994 nodG 3-oxoacyl-ACP reductase López-Lara & Geiger 2001 nodH Sulfotransferase Roche et al. 1991, Ehrhardt et al. 1995, Schultze et al. 1995 nodI ATP-binding protein, Nod factor transport Evans & Downie 1986, Spaink et al. 1995 nodJ Membrane protein, Nod factor transport Evans & Downie 1986, Spaink et al. 1995 nodK Unknown; nodY homolog Dobert et al. 1994 nodL 6-O-acetyltransferase Bloemberg et al. 1994, Berck et al. 1999 nodM D-glucosamine synthetase Baev et al. 1992 nodN Unknown Surin & Downie 1988 nodO Calcium-binding protein, pore forming? Economou et al. 1990 nodP ATP sulfurylase Schwedock et al. 1994 nodQ ATP sulfurylase, APS kinase Schwedock et al. 1994 nodR Biosynthesis of highly unsaturated fatty Schlaman et al. 2006 acids?

20

nodS S-adenosyl methionine methyl transferase Geelen et al. 1995, Jabbouri et al. 1995 nodT Outer membrane protein Rivilla et al. 1995 nodU 6-O-carbamoyltransferase Jabbouri et al. 1995 nodVW Two component regulation Göttfert et al. 1990, Loh et al. 1997 nodX O-acetyl transferase Firmin et al. 1993 nodY Unknown Banfalvi et al. 1988 nodZ Fucosyltransferase Stacey et al. 1994 syrM LysR type regulator Mulligan & Long 1989 nolA Transcriptional regulation Garcia et al. 1996 nolB NopB, possibly part of the TTSS pili Annapurna & Krishnan 2003, Saad et al. 2008 nolC Unknown Krishnan & Pueppke 1991 nolE Periplasmic protein? Davis & Johnston 1990 nolFGHI membrane proteins Saier et al. 1994 nolJ NopA, part of the TTSS pili Boundy-Mills et al. 1994, Ausmees et al. 2004, Saad et al. 2008 nolK GDP-fucose epimerase/reductase, involved Goethals et al. 1992, Mergaert et in fucosylation al. 1996 nolL O-acetyltransferase Berck et al. 1999 nolMN Unknown Luka et al. 1993 nolO 3-O-carbamoyltransferase Jabbouri et al. 1998 nolP Unknown Davis & Johnston 1990 nolQ Unknown Plazanet et al. 1995 nolR LysR type regulator Kondorosi et al. 1991 nolS Unknown Plazanet et al. 1995 nolTUVW Part of TTSS? Annapurna & Krishnan 2003 nolX NopX, part of the TTSS translocon Annapurna & Krishnan 2003, Marie et al. 2003 nolYZ Unknown Dockendorff et al. 1994 noeA Methyltransferase? Ardourel et al. 1995 noeB Unknown Ardourel et al. 1995 noeC Arabinosylation Mergaert et al. 1996 noeD Unknown Lohrke et al. 1998 noeE Sulfotransferase (fucosylated Nod factors) Quesada-Vincens et al. 1998 noeH Arabinosylation? Lee et al. 2008 noeI 2-O-methyltransferase Jabbouri et al. 1998 noeJ mannose 1-phosphate guanylyltransferase Price, 1999 noeK phosphomannomutase Price, 1999 noeL GDP-mannose 4,6-dehydratase Price, 1999 noeOP Arabinosylation? Lee et al. 2008 noeT O-acetyltransferase Paper II

21 Introduction

1.2.3 Additional contributors to successful symbioses

In addition to the compounds classified as signal molecules or molecules involved in signal perception, there are a number of chemical compounds and molecular systems common to a broad range of plants and microorganisms, not only those involved in nitrogen-fixing symbioses, which are thought to play a role in affecting the final outcome of the symbiosis-specific events. Such contributors are, for example, plant lectins, bacterial secretion systems and surface polysaccharides. Plant lectins are proteins that reversibly and nonenzymatically bind specific carbohydrates. These lectins can have diverse functions in different plants, but legume lectins have been proposed roles in accumulating symbiotic bacteria at the root hair tips and/or as Nod factor signal transmitters in the plant (for a review see (De Hoff et al. 2009)). The roles of secretion systems and surface polysaccharides are discussed below.

1.2.3.1 Protein secretion systems Bacterial secretion systems are found in any bacteria, functioning in delivery of bacterial proteins (and exceptionally, nucleic acids) out of the cell across the inner and outer membranes, and sometimes further across a host membrane into a host cell. In Gram-negative bacteria, there are six different types of secretion systems identified to date, the type I – VI secretion systems (reviewed in Tseng et al. (2009)). The type I, III, IV and VI secretion systems transport substrates directly across both bacterial membranes, while the type II and V secretion systems translocate proteins transported into the periplasm via the universal Sec or Tat pathways, further across the outer membrane. T1SSs, built up by ATP-binding cassette (ABC) transporters, Outer Membrane Factors (OMFs) and Membrane Fusion Proteins (MFPs), contribute to transport of a variety of substrates and are commonly present in rhizobia (Schmeisser et al. 2009). The T2SS is found among Gram-negative (Tseng et al. 2009), where it is used to transport lipoproteins, toxins, proteases, cellulases and lipases across the outer membrane (Johnson et al. 2006). However, among rhizobia the use of T2S seems to be confined to broad-host-range strains (Schmeisser et al. 2009). The T5SS comprises three different forms of protein secretion, where a beta barrel connected to the protein to be secreted is used for translocation across the outer membrane (Tseng et al. 2009). Although known to be mostly involved in virulence of animal and human pathogens (Tseng et al. 2009), genes involved in T5S have also been found in rhizobial species (Schmeisser et al. 2009). T3SSs are widely used by bacterial pathogens to inject virulence proteins into host cells (Hueck, 1998), but similar systems have also been encountered in rhizobia, being present in strains of at least Bradyrhizobium japonicum, B. elkanii, loti and Sinorhizobium fredii (Marie et al. 2001). In B. elkanii USDA61, the T3SS has been shown to hijack the soybean symbiosis genes in the absence of functional NFs and plant NFR receptors, bypassing NF recognition to induce root nodules (Okazaki et al. 2013).

22

The T4SS is different from the other known secretion systems in that it translocates nucleic acids in addition to proteins. In rhizobia, T4SSs can be used for both symbiosis-related protein secretion (Hubber et al. 2007) and for DNA transfer only (Jones et al. 2007). The model T4SS is the VirB/D4 transfer system located on the Ti (tumour-inducing) plasmid of the plant pathogen Agrobacterium fabrum (former A. tumefaciens) causing crown gall disease (for a review see (Zhu et al. 2000). In addition to the virB (T-DNA transfer) and tra-trb (plasmid conjugal transfer) systems found on the Ti plasmid in A. fabrum C58 (Zhu et al. 2000), the same strain has a second plasmid, pAtC58, where a third T4SS is located (Chen et al. 2002). The genes of this third T4SS, named the avhB system (Agrobacterium virulence homologue virB) display high similarity to the virB genes of the plasmid pTi, but are responsible for conjugational transfer of plasmid pAtC58 (Chen et al. 2002). Homologous systems of all three A. fabrum C58 T4SSs can be found also in rhizobia, where the T4SSs can be divided into four different types based on gene organization combined with phylogeny of relaxase genes, representative genes of the mating pair formation (Mpf) component and the coupling protein (Ding et al. 2013). These four types are the QS-regulated conjugation system (type I), the RctA- repressed conjugation sytems (type II), T4SSs with uncharacterized relaxases from plasmids that do not have associated Mpf components (type III) and the type IV systems with MOBPO-type relaxases (Ding et al. 2013). The most recently identified secretion sytem, the T6SS, was named as late as in year 2006, when the system was identified in Vibrio cholerae where it is required for cytotoxicity toward Dictyostelium amoebae (Pukatzki et al. 2006). This one-step secretion system, widespread among proteobacteria, consists of 13 core components named TssA-TssM that can be associated with other conserved genes (Coulthurst, 2013). T6SSs can roughly be divided into two broad categories depending on whether they target eukaryotic cells or competitor bacterial cells (Coulthurst 2013). Even among plant-associated bacteria, the role played by the T6SS is very diverse (Records, 2011). The T6SSs can be regulated through complicated regulatory networks according to the role in each bacterium’s lifestyle (Ho et al. 2014, Coulthurst 2013). In Azoarcus sp., the T6SS gene cluster was shown to be enhanced under nitrogen fixation, even if it did not seem to be under control of NifA (Sarkar & Reinhold-Hurek 2014).

1.2.3.2 Surface polysaccharides It is generally agreed that Nod factors are indispensable for initiation of nitrogen- fixing symbioses, but there is also a large amount of studies showing that bacterial infection also requires the interaction of different surface polysaccharides (Fraysse et al. 2003). Such surface polysaccharides are exopolysaccharides (EPSs), capsular polysaccharides (KPSs) and lipopolysaccharides (LPSs). EPSs are polysaccharides targeted to the cell surface and secreted into the cell’s surroundings, having little or no cell contact (Skorupska et al. 2006, Fraysse et al. 2003). These polysaccharides are known to be involved in protection against environmental stress and attachment to surfaces, but are also considered to have a specific role in the root invasion

23 Introduction process, during which they may inhibit the plant defense response or even functioning as signalling molecules that trigger plant developmental responses (Skorupska et al. 2006, Fraysse et al. 2003). The EPSs, encoded by exo or pss gene clusters, can have differences in the chemical structure, for example in sugar composition and degree of polymerisation (Skorupska et al. 2006). The well-studied EPSs produced by S. meliloti, succinoglycans and galactoglucans, are both symbiotically active but produced under different conditions (Skorupska et al. 2006). KPSs surround the bacterial cell, forming a coherent hydrated matrix on the cell surface, providing protection against bacteriophages and dry conditions (Skorupska et al. 2006, Fraysse et al. 2003). The K-antigen type of KPS has been found to be strain-specific, while EPSs are conserved within species, their production being dependent on growth conditions, and symbiotic involvement occurring after initial contact with the host cell (Fraysse et al. 2003). In S. meliloti – Alfalfa symbiosis, the K-antigen seems to have both an active and a passive way of action, providing protection against natural plant defence products or microorganisms and determining host range through resistance to plant defence, respectively (Fraysse et al. 2003). In S. meliloti, succinoglycan, galactoglucan and KPS complement symbiotic deficiencies of each other on M. sativa (forming intdeterminate nodules) (Skorupska et al. 2006), while in S. fredii, KPS was shown to be essential for symbiosis with soybean and pigeon pea (producing determinate nodules), while EPS was not (Parada et al. 2006). The LPSs, attached to the cell outer membrane, consist of three main domains – a lipid A that is inserted into the bacterial phospholipid layer, a core polysaccharide associated with the lipid A, and possibly an O-antigen domain that can be associated with the core saccharide (Fraysse et al. 2003). These polysaccharides seem to be structurally dependent on culture conditions (Fraysse et al. 2003). LPSs are believed to be actively involved in the later stages of symbiotic interactions with the host plant by mediating protection against host defence mechanisms, but there is also a possibility that their function is just to allow bacteria to adapt to the endosymbiotic conditions (Fraysse et al. 2003). Production of the rhamnan O antigen of S. fredii strain NGR234 LPS has been shown to be flavonoid-inducible, but also the lipid A core was suggested to undergo modifications upon flavonoid induction (Broughton et al. 2006).

1.3 Genome sequencing – an important tool in biological sciences

Methods for DNA sequencing were introduced already in the 1970s, among others in the form of Sanger sequencing (Sanger et al. 1977). The automated Sanger method, also referred to as “first generation technology”, was the dominant DNA sequencing method over several decades (van Dijk et al. 2014). However, a demand for faster, higher throughput and cheaper methods led to the development of new, next- generation sequencing (NGS), technologies in the 21st century. Today it is possible

24

to sequence the whole genome of almost any organism. As computational capacities increase and methodologies continue to develop towards cheaper and more rapid solutions, genome sequencing is becoming an established step in any research in the field of biological sciences. The NGS methods dominating the markets today comprise several sequencing techniques with slightly different approaches to reach the same goal: to obtain a maximum amount of sequence data in a minimum amount of time. Among these NGS technologies, the first one to be released was the “454 pyrosequencing” method provided by 454 Life Sciences (nowadays Roche Diagnostics) (Margulies et al. 2005). Other popular methods include the Illumina/Solexa “sequencing by synthesis” technologies and SOLiD “sequencing by ligation” (Valouev et al. 2008). In this thesis, the 454 pyrosequencing and Illumina MiSeq technologies were applied. The most recent methods that have gained a foothold on the markets are the “ion semiconductor sequencing” by Ion Torrent (Rothberg et al. 2011) and “single-molecule real-time (SMRT) sequencing” by Pacific Biosciences (Eid et al. 2009), released in 2010. The approaches applied in these technologies result in differences in costs, time required to run a sample and length and amount of output raw sequence reads (for recent reviews see van Dijk et al. 2014, Buermans & den Dunnen 2014). The choice of technology used for a certain sequencing project then depends on the availability of services, the suitability of a certain technology for the given type of sample, and the expectations on the outcome. Combining different technologies in one sequencing project can generate the most complete final genome sequence (reviewed in Forde & O’Toole 2013, Metzker 2010). However, the aims of sequencing projects vary. Sometimes it is desirable to obtain the complete sequence of an organism for which no previous genome data is found (de novo sequencing) while the aim of another project can be to re-sequence genomes of an organism that has already been sequenced. Nowadays it is also common to sequence only a majority of the genome, leaving gaps in the sequence meaning that the complete genome cannot be assembled, but one can obtain a maximum amount of data on the genome contents with a minimum cost in time and money. There is thus a tradeoff between completeness of the obtained information and resources sacrificed to obtain the data. Depending on the level of completeness achieved in a sequencing project, the final genome data will be in the form of a complete genome (complete assembly with no gaps in the sequence), contigs (genome represented as a set of sequences containing no gaps, but the internal linking of these sequences is not reported) or scaffolds (long sequences that can contain gaps, but the internal arrangement of the contigs used to build up the scaffolds is known while the link between separate scaffolds in unknown). The presence of repetitive regions is often a reason behind gaps in the assembly (Forde & O’Toole 2013). Genomes sequenced to date have sizes that range from a few thousands of nucleotides (bacteriophages) to several billions of nucleotides (plants). The largest genome sequenced to date is that of the Loblolly pine (Pinus taeda) which is reported to be 22.18 billion base pairs (Neale et al. 2014). Bacterial genomes are much less complicated than eukaryotic genomes, and therefore much cheaper and faster to sequence. Bacteria have much smaller genomes that do not have introns or

25 Introduction alternative splicing of genes, so determining the genes based on DNA sequence is a straight-forward task. The amount of nonsense DNA in between genes is also much smaller in bacteria, making data gained from DNA sequencing of these organisms more compact and easier to analyse. For these reasons, genome sequencing has become a method of preference for many researchers working on any aspects of phenomena involving bacteria, to obtain a maximum amount of data containing information on all possible functions performed by the bacterium. Finding the right tools to extract the substantial information from the huge amount of data is then a crucial part of the process in genomics. Both experimental and computational improvements will continue to maximize the information that can be extracted using NGS methods (van Dijk et al. 2014). Since the publication of the first bacterial genome, that of Haemophilus influenzae in 1995 (Fleischmann et al. 1995), the number of available bacterial genome sequences in the GenBank database has increased exponentially. In October 2014, there were more than 27 000 entries of bacterial genome assemblies (including all levels: contigs, scaffolds, chromosome or complete), of which close to 2 700 were complete genome sequences. Among strains belonging to the genera Rhizobium, Neorhizobium, Sinorhizobium/Ensifer, Bradyrhizobium and Mesorhizobium, there were 303 genome assemblies registered, out of which 34 were complete. Whole-genome data can be used in many different ways to provide answers to very different hypotheses. The NGS technologies are suitable for sequencing of transcriptomes (RNA-Seq) and sequencing of DNA fragments involved in protein- DNA interactions (ChIP-Seq), as well as providing a tool for metagenomics, exploring genetic diversity, population structure and interactions, and for diagnostics of infectious outbreaks (Forde & O’Toole 2013). If genomic sequence is available from several interesting bacterial species or strains of the same species, it can be used to explore differences or similarities in these species/strains, depending on the hypothesis made. Often, genome sequence is used to verify experimental work or if it is used in the first step, experimental work will follow genome analysis to verify hypotheses made based on genome contents.

26

2 Outline, objective and methods of the work

The general aim of the work presented in this thesis was to contribute to the knowledge on determinants of host specificity in symbioses between legume plants and rhizobia. For this purpose the host plant Galega and its microsymbiont Neorhizobium galegae were used as a model system. The specific aims were:

I. To study the diversity of Galega plants in the gene centre of G. orientalis, compared to the diversity of the rhizobial population. II. To identify new symbiosis-related properties of N. galegae through genomic comparison of representative strains of the symbiovars orientalis and officinalis. III. To evaluate the genomic similarity between strains of N. galegae, as well as indentify traits separating very efficient nitrogen fixers from less efficient ones.

The studies were performed utilising sequence analysis methods for single genes and complete genomes, alongside with experimental methods. The work on Galega diversity (Paper I) was done based on DNA from material sampled during the INTAS collaboration expedition to north-west Caucasus (Adygeya, Russia) in 1999 (Andronov et al. 2003), and sequence data prepared thereof. Two studies based on genomic data were designed to improve our knowledge on the genomic properties of N. galegae, potentially contributing to the phenotypic differences observed between strains under symbiotic conditions. The first study comprised the genomes of N. galegae strains HAMBI 540T (type strain of N. galegae) and HAMBI 1141, representing symbiovars orientalis and officinalis respectively (Paper II). In the second genome-based study, eight more genomes of N. galegae, four strains of each symbiovar, were included to expand the comparative genomics study (Paper III). Based on information gained from greenhouse pot experiments (not part of this thesis), out of the four strains chosen to represent symbiovar orientalis strains, two strains are known to be very efficient nitrogen fixers (HAMBI 2427 and HAMBI 2566) while two strains are less efficient nitrogen fixers (HAMBI 2605 and HAMBI 2610). Strain HAMBI 540T is also a very efficient nitrogen fixer. A bar chart visualising these results can be found in Figure 2 of paper III. Unfortunately, corresponding information was not available for the sv. officinalis strains. Instead, the replicon profile was known for three of the four sv. officinalis strains (Figure 6). This information functioned as a criterion when the sv. officinalis strains to be sequenced were selected, as one of the goals was to sequence as different strains as possible. In summary, the strains for which genome sequencing was performed were HAMBI 540, HAMBI 2427, HAMBI 2566, HAMBI 2605 and HAMBI 2610 (N. galegae sv. orientalis) and HAMBI 1141, HAMBI 490, HAMBI 1145, HAMBI 1146 and HAMBI 1189 (N. galegae sv. officinalis). The main results of the original publications will be presented and discussed in the following chapters 3, 4 and 5.

27 Outline, objective and methods of the work

Figure 6 Plasmid profile of a selection of N. galegae strains, produced by Eckhardt gel electrophoresis performed by K. Lindström in 1983. Arrows denote replicons where symbiosis genes were detected by hybridisation experiments. Strains with an underlined culture collection code were used in the present study.

Table 2. Summary of methods used in the original publications.

Method Original publication Experimental work DNA isolation I; II; III PCR I; II; III Deletion mutant construction II; III Nod factor analysis II Cre-lox based DNA excision II Plasmid curing II Conjugation experiments II AFLP I Bioinformatics UPGMA clustering I Phylogenetic analysis I; II Tests of positive selection I; II Genomic alignment II COG category assignment II Symbiosis region comparisons II; III Analysis of ortholog groups II; III Analysis of codon usage II

28

3 Diversity of the Galega species in the gene centre of G. orientalis

The diversity of a certain plant species is considered to be largest in the gene centre (centre of origin) of that plant species (Vavilov 1926), i.e. the geographical region where the plant has developed its specific characteristics, developed a diversity and from where it has begun to spread to other parts of the world. Because Caucasus has been pinpointed as the gene centre of G. orientalis based on phenotypic studies (Vavilov 1926, Andronov et al. 2003), we wanted to study the genetic diversity of both G. orientalis and G. officinalis in this region (Paper I). In addition, the diversity of the nitrogen-fixing microsymbionts of these plant species was studied, in order to see whether there is a correlation between the diversities of the partners in symbiosis (Paper I). Analysis of amplified fragment length polymorphisms (AFLP), produced from DNA isolated from 78 plant accessions collected in the Caucasus region, showed that there are genetic differences that separate plants of the two species into two major UPGMA (Unweighted Pair Group Method with Arithmetic Mean) clusters. This is logical, since they are separate species and should have genetic features that make this distinction. However, there is a difference in intra-species diversity. The accessions of G. orientalis (mean genetic similarity 84.2 %) were genetically less similar to each other than the accessions of G. officinalis (mean genetic similarity 90.8 %). In addition, the G. orientalis accessions formed two subclusters that showed statistically significant association with collection sites (Fishers’ exact test, P<0.0001), indicating genetic differences within the species even at very short geographical distances (Figure 7). Another UPGMA dendrogram was produced from AFLP analysis results of N. galegae strains collected from plants in the Caucasus region (Figure 2 in Paper I). The N. galegae strains, too, formed two main clusters, according to the symbiovar designation. Compared to the plant UPGMA dendrogram, a similar kind of subclustering as that observed for G. orientalis accessions could be observed within the N. galegae sv. orientalis main cluster. However, for the rhizobial strains, there was no statistical evidence for association between subcluster and sampling location. This might be a reflection of the ability of both the bacteria and, especially, their genes to move more rapidly than the host plant material. Thus, the genomic differences in the rhizobial population seem more randomly distributed.

29 Diversity of the Galega species in the gene centre of G. orientalis

Figure 7 Geographical distribution of the plant accessions in the two subclusters of G. orientalis. A) The G. orientalis cluster of the UPGMA dendrogram (modified from figure 2 of Paper I). B) Geographical location of the sampling sites of the G. orientalis plant accessions (modified from Andronov et al. 2003). The roman numerals (I and II) denote in which AFLP subcluster (marked by I and II in A) plant accessions from each sampling site were placed. If plant accessions were found in both clusters I and II, the sampling site is assigned to the cluster where a majority of plant accessions from that site are found and the number of accessions is indicated in parentheses.

Generally, the AFLP results showed that the sampled G. orientalis plants are indeed more diverse than the G. officinalis plants, also at the genomic level. This is in accordance with the initial hypothesis and brings further support to the status of Caucasus as the gene centre of G. orientalis. Furthermore, based on the AFLP fingerprinting, the rhizobial population seems to follow a similar evolutionary pattern as their host plants. Results indicating that rhizobia which nodulate a legume appear to be most diverse in the host’s centre of diversity have also been found e.g. for the leguminous trees Leucaena leucocephala, Gliricidia sepium and Calliandra calothyrsus (Bala et al. 2003). Whether this is a result of environmental factors or specific plant – microbe interactions is at this point not clear.

3.1 Galega NORK and NFR5 receptor genes

Based on the genomic AFLP analysis described above, eleven G. orientalis and G. officinalis plant accessions each, representing different AFLP groups, were picked out to represent the Galega populations in the sampled region. From these twenty- two plants, the receptor genes NORK and NFR5 (Figure 4) were subjected to partial PCR amplification for further investigation of diversity within the sampled Galega population. The NORK gene contains introns, but only sequence corresponding to the mRNA produced from the gene was used for analysis. Phylogenetic analyses of the twenty-two NORK and NFR5 genes generated phylogenetic trees where, like in the AFLP dendrogram, the two plant species are separated into two major clusters

30

(Paper I). In other words, the genes encoding receptors involved in symbiosis functions differ enough to distinguish between the two plant species, a fact that might be of importance in Nod factor recognition. However, since both N. galegae symbiovars are able to nodulate both plant species, the host specificity for nitrogen fixation is most probably determined at a later stage of the infection process, and not by the plant receptors for the signalling molecule. The NORK analysis clearly shows a greater diversity among the G. orientalis strains compared to the G. officinalis plants, which have very few nucleotide substitutions in their NORK genes. The NFR5 phylogeny shows that the NFR5 gene is more diverse than the NORK gene. However, this phylogeny does not reveal the same inter-species difference in diversity seen so clearly in the NORK analysis. In the NFR5 phylogeny, there are two subclusters in both of the “species clusters”. The members of the G. orientalis subclusters are not, however, arranged the same way as in the AFLP analysis dendrogram. Analyses of positive selection indicated that the differences observed in the nucleotide sequences of these genes seem to have evolved in a random manner, not under any selective pressure. This applies to both inter-species differences in the two genes, and differences between gene sequences of the Galega species and closely related plants. Because of the remarkable diversity in the NFR5 gene, the protein sequence of this gene was studied in more detail. Inside the genus Galega, 16.6 % of the codons in the amplified NFR5 fragment had a nonsynonymous polymorphic site. This is remarkable compared to 0.9 % in the NORK gene. In Paper I, we speculated that this difference in diversity could be a consequence of interaction with the Nod factor – if NFR5 interacts with the rich mixture of NFs of N. galegae (Yang et al. 1999, Paper II), it might need different NFR5 proteins to recognize different NFs. It was also found that the LysM2 region of the NFR5 protein (Figure 8) had the highest frequency of amino acid substitutions, when comparing those positions where the sequence was identical between the representatives of the Galega species but differed from one or more of the reference legume accessions. Since publication of Paper I, studies have been published reporting that NFR5 is completely dependent on NFR1 for functional signal initiation, while both NFR1 and NFR5 show similar high-affinity ligand binding of NFs (Broghammer et al. 2012). Moreover, the LysM2 domain of L. japonicus NFR5 was shown to bind specifically to the M. loti NF, causing a conformational change in LysM2, while the corresponding domain of L. filicaulis, differing from the one in L. japonicus only in a Leu-to-Lys substitution, did not bind (Sørensen et al. 2014). These results demonstrate the important role of NFR5 in Nod factor recognition.

31 Diversity of the Galega species in the gene centre of G. orientalis

Figure 8 Structure and protein sequence of the sequenced Galega NFR5 fragment. The structure of L. japonicus NFR5, for which the gene has been sequenced in its entirety, is given as a reference. The blue arrow represents the fragment sequenced in Galega, and the corresponding protein sequence is given as an alignment of three accessions, one representing G. orientalis accessions, and one each from the two phylogenetic subclusters of G. officinalis accessions. The bottom line indicates the differences between the sequences: black circle, the two G. officinalis sequences differ while one of them is identical to the G. orientalis sequence; black cross, the two G. officinalis sequences are identical while differing from G. orientalis; red circle, accessions G072 and G118 are identical to the G. officinalis sequences while differing from the other G. orientalis accessions (not shown); red cross, all three sequences are different. SP, signal peptide; LysM, Lysin motif domain; TM, transmembrane domain; PK, protein kinase domain.

An alignment of the 22 Galega NFR5 protein sequences showed that, in the 281 aa alignment, there is a total of 47 positions with amino acid substitutions. There are 28 positions where at least two strains have an amino acid substitution (the same substitution) with regard to the other strains. Among these 28 positions, there are 7 positions making a distinction between the two species, and 11 positions making a distinction between the two subgroups observed for G. officinalis in the NFR5 phylogenetic tree (even though the amino acid of one subcluster may be the same as that one observed for the G. orientalis strains). Five positions made a distinction between the two separately clustered G. orientalis strains G072 and G118 from the other G. orientalis strains. The LysM2 region holds four of the positions where amino acid substitutions were observed, while the LysM3 region holds six of these positions. The region with the highest frequency of distinguishing substitutions is, however, a region at the C-terminal end. The region spanning amino acids 227 to 243 (Figure 8) holds 9 substitiution positions, i.e. 53 % of the residues in this region. Of special interest is residue 230 (serine/proline/leucine, Figure 8), which makes perfect distinction between the two Galega species but also between the two subclusters of G. officinalis. The biological importance of this residue has, however, not been investigated. Comparing these numbers to numbers of the corresponding NORK protein alignment, there is a remarkable difference. The 316 aa long NORK alignment revealed a total of 11 positions with amino acid substitutions, among

32

which were three single substitutions, while the other eight positions all perfectly separated the two Galega species. Although the tests for positive selection performed in Paper I did not show evidence for positive selection acting upon specific sites of the NFR5 sequence, the observed differences strongly indicated that this gene either has undergone or is undergoing evolution under selective pressure.

3.2 Evaluation of co-evolution between Galega spp. and N. galegae

The term co-evolution has been used with different definitions, but the definition I will use here is that two species jointly evolve in response to selective pressure that they exert on one another. Co-evolution theories between rhizobia and legumes have been studied to some extent, but there is much more evidence of plant selection of rhizobia than there is of effects that rhizobia have on plant evolution (for a review see (Martínez-Romero 2009)). Co-evolution due to geographical isolation has been suggested for the symbiotic relationships of rhizobia with pea (Lie et al. 1987) and common bean (Aguilar et al. 2004) based on nodulation phenotypes and genetic characteristics of the rhizobia. In Paper I, co-evolution between Galega spp. and the respective symbiovars of N. galegae was evaluated based on genetic data from both of the symbiotic partners. N. galegae is presumed to have obtained its symbiosis genes by horizontal gene transfer, and the evolution of these occurred under host constraint (Suominen et al. 2001). What we do not know is whether the host plants have in turn evolved in a direction optimising interactions with specific microsymbionts. The evolution of plants is much slower than that of bacteria, which is why adoption to environmental conditions is more obvious in the microsymbionts. However, the long common history of Galega and N. galegae in Caucasus could be a good base for investigation of co-evolution between these organisms. The diversity observed within the product of NFR5 (discussed in section 3.1) indicates that this gene is (has been) subjected to selective pressure. In Paper I, I concluded that a possible explanation for this might be that the microsymbiont has influenced the evolution of this plant receptor gene, generating a small-scale co- evolution between plant and microsymbiont. This view is supported by the present understanding that NFR5 interacts directly with the bacterial Nod factor (Sørensen et al. 2014, Broghammer et al. 2012). At the genus level, no distinction could be made between phylogenies of the well-conserved chloroplast gene matK, used previously in phylogenies of closely related genera and species (Wojciechowski et al. 2004), and corresponding phylogenies of the NORK and NFR5 genes (Figure 6 in Paper I). Thus, I concluded that no sign of adaptation can be observed directly from the NORK and NFR5 phylogenies. However, the amino acid substitutions directing specificity in recognition of specific Nod factors may be located at positions which are not influencing the results of the phylogenetic analyses performed. A natural diversity is expected in different legume genera, but the specific amino acid sites that are crucial for interaction with the specific microsymbionts may be hidden within the

33 Diversity of the Galega species in the gene centre of G. orientalis differences observed within the plant genus. Because of the slow evolution of plants, microsymbionts are more likely to influence small-scale directed selection (i.e. selection on specific, crucial protein residues) rather than selection affecting the whole gene, or even less the whole species. Evidence of positive selection was found in NORK of Medicago species, but this was not considered associated with changes in rhizobial specificity (De Mita et al. 2007). Within Galega NFR5, despite the obvious diversity, analysis of positive selection did not reveal any sites under positive selection (Paper I). This could either be due to the fact that there really is no selection to be detected, or the restrictions of the software used rendered detection of positive selection impossible. The large amount of substitutions found within NFR5 makes the task difficult for any software. In conclusion, there is no clear evidence of co-evolution between Galega spp. and N. galegae when symbiosis- related genes are analysed with the available dataset and methods, even if it would be an appealing explanation for the observed diversity.

34

4 The genome of N. galegae

The genomes of N. galegae strains seem to be fairly comparable in size (Papers II and III; Table 2), although their replicon content can vary to some extent. This is, however, not unusual among rhizobia. Based on mapping of contigs in the eight draft genomes to the complete genomes of reference strains HAMBI 540T and HAMBI 1141, the main replicon, the chromosome, is fairly well conserved in all strains. However, even if all ten strains analysed appear to have a chromid (replicon with plasmid-type replication system, a G + C composition close to that of the chromosome and having core genes found on the chromosome in other species (Harrison et al. 2010)), the genetic contents of this replicon is much more variable between strains. Based on analysis of ortholog groups in the ten sequenced genomes (Paper III), the core genome of N. galegae consists of genes included in 4255 ortholog groups. This can be translated into a percentage of all coding DNA sequences (CDSs) ranging from 65.1 % to 74.8 % per strain. The remaining 3566 ortholog groups contain genes of the N. galegae accessory genome. In addition, each strain has a number of genes unique to that strain. Among the strains analysed in the work included in this thesis, between 2.0 % and 7.2 % of the protein-coding genes were strain-specific.

Table 3. Summary of relevant data on the ten N. galegae genomes sequenced in this study

Strain Genome sizea Repliconsb Predicted tRNAsc rRNA (HAMBI (Mbp) CDSs operons code) 540T 6.45 2 6230 51 3

s i l 2427 6.55 3 or 4 6318 49 3 a t

n 2566 6.55 2 6289 50 3 e i

r 2605 6.86 4 6654 49 3 o 2610 6.07 2 5792 50 3

1141 6.41 3 6213 50 3 s i l 490 6.32 2 6071 50 3 a n i 1145 6.28 4 6027 50 3 c i f

f 1146 6.41 4 6101 49 3 o 1189 6.00 2 5781 50 3 a In draft genomes, total size of contigs b In draft genomes, estimated from number of repABC replicons and preliminary assembly of contigs c Total number of identified tRNAs when those in the rRNA operon-containing contigs of draft genomes are counted 3 times (because there are three rRNA operons in each genome)

Some of the genes contained within the symbiosis gene region of strain HAMBI 1174, which is an antibiotic-resistant derivative of strain HAMBI 540T, had previously been sequenced and partial internal arrangements determined in published work (Suominen et al. 2001, Suominen et al. 1999). However, the work included in

35 The genome of N. galegae this thesis enabled revision and completion of the previous work, filling in the gaps, confirming the order that genes appear and allowing for detection of previously unidentified genes relevant for symbiotic functions. The aims outlined in this thesis made detailed analysis of the symbiosis gene regions (Figure 1 in Paper III) an area of special interest, but analysis of the produced dataset was further extended to whole-genome comparisons between N. galegae and related rhizobial strains, as well as intra-species genomic comparisons to identify genomic features separating N. galegae strains that show different symbiotic phenotypes.

4.1 N. galegae compared to reference strains of related species

For some time, there has been some debate in the scientific community on whether strains of N. galegae are actually more similar to strains of the genus Agrobacterium than to traditional rhizobial genera, such as Rhizobium (formerly the genus definition of Neorhizobium species) and Sinorhizobium (syn. Ensifer). Phylogenetic work by Mousavi and companions (2014) performed on a well-represented number of rhizobial strains, showed that strains of (the former) R. galegae clustered with strains of R. huautlense and R. alkalisoli, forming a monophyletic clade separated from strains of as well Agrobacterium, Sinorhizobium as other Rhizobium species. Based on this work, R. galegae, R. huautlense and R. alkalisoli were transferred to a genus of their own and were assigned the genus name Neorhizobium. In Paper II, it was shown that, when the complete genome is taken into consideration in analysis of the whole proteome (in the form of analysis of ortholog groups), the strains of N. galegae were more closely related to the reference strain of R. leguminosarum than they were to the representative strain of A. fabrum (Figure 2 in Paper II). Alignments of the whole genome sequences of strains HAMBI 540T and HAMBI 1141 to a set of rhizobial reference strains showed that strains of N. galegae are more similar to strains of the genus Rhizobium than they are to strains belonging to the genus Sinorhizobium (Figure 4 in Paper II). This result confirms that there is a close relationship between N. galegae and Rhizobium ssp. also at a whole-genome level and not only based on a selection of house-keeping genes. However, when it comes to strain HAMBI 1141 it was found that the origin of its plasmid, harbouring the symbiosis genes in this strain, is not easily identifiable. There was very little homology detected when the plasmid sequence was compared to plasmids of a set of 12 related strains. The highest score of total homology (measured as total sequence length of homologous regions) was observed in plasmid pRtrCIAT899b of R. tropici CIAT 899, albeit representing only 14 % (including the homologous symbiosis genes) of the whole HAMBI 1141 plasmid. When the two complete N. galegae genomes were compared to eight rhizobial strains, representing different species, through analysis of ortholog groups (Paper III), the results showed that N. galegae actually seems to have a lot in common with Mesorhizobium ciceri sv. biserrulae strain WSM1271 (Figure 9). Traditionally,

36

strains of N. galegae have not been regarded very closely related to strains of Mesorhizobium, but due to the close relationship between Galega and Cicer (chickpea) (Figure 1), a representative of strains belonging to the species M. ciceri (the only complete genome of the species so far) was included in this analysis. However, it needs to be pointed out that this representative strain does not induce root nodules on Cicer, although it is phylogenetically very closely related to species of M. ciceri which do (Nandasena et al. 2007).

Figure 9 Illustration of the results from an OrthoMCL analysis of orthologous genes with ten rhizobial genomes. Each strain has its own colour. The strain name is followed by an indication of the total number of protein-coding genes in the strain, in parenthesis. The numbers in the tips of the strain-specific ovals indicate the number of singletons identified in the analysis. The number of pairwise ortholog groups are indicated for all combinations of pairs between one N. galegae strain and one of the reference rhizobial strains, as well as the number of ortholog groups between each reference strain and the two N. galegae strain together. Finally, the number of singletons of HAMBI 1141 and HAMBI 540T are indicated, as well as the number of ortholog groups shared by these two strains.

In the analysis comparing the ten strains, the two N. galegae strains had 63 ortholog groups shared with only strain WSM1271, the highest number of ortholog groups shared with a single strain from the included set of related rhizobial strains. In line with the results obtained from the genomic alignments (Figure 4 in Paper II), R. leguminosarum sv. viciae strain 3841 and R. tropici strain CIAT 899 were the strains following WSM1271 in the number of ortholog groups shared uniquely with the two N. galegae strains. The separation of N. galegae from other rhizobial species can however be seen also in this analysis, where all 10 strains included shared more than 2100 ortholog groups, while there were more than 700 groups shared by the two N. galegae strains only, and more than 1000 ortholog groups unique to N. galegae if only one of the two strains would be considered. Only M. ciceri sv. biserrulae strain WSM1271 reaches to the same proportion of strain-specific genes (in relation to the proteome size) in this analysis. However, when the genes unique to the two N. galegae strains were compared to the N. galegae core genome (see chapter 4, p. 33),

37 The genome of N. galegae only 441 out of the 739 species-specific ortholog groups were included in the core genome. Nevertheless, none of the genes in these groups seem to be directly related to known symbiosis functions. Among those groups containing genes that could be assigned to a COG category with a defined functional class, a majority was found involved in amino acid transport and metabolism, followed by genes involved in transcription and in carbohydrate transport and metabolism.

4.2 Differences within N. galegae

In the analysis of ortholog groups involving ten rhizobial genomes (Figure 9), the two N. galegae strains were found to have 376 (HAMBI 1141) and 462 (HAMBI 540T) genes respectively, which were unique to these strains in the gene pool comprising all genes of all 10 rhizobial genomes. This means that these genes definitely also separate the two N. galegae strains included in this analysis. When the functional annotations of these genes were investigated in terms of assigned COG categories, most genes did not obtain any specific functional classification. Among the genes that were included in a descriptive COG category, genes unique to one strain or the other mainly had functions in amino acid transport and metabolism (COG E) or in transcription (COG K). In addition, in HAMBI 540T, genes involved in energy production and conversion (COG C), as well as replication, recombination and repair (COG L), made up a considerable share of the strain-specific genes.

4.2.1 Genome-scale differences between strains that show different symbiotic phenotypes

When COG categories of the genes defined as symbiovar-specific, based on all 10 sequenced N. galegae genomes (Table 3 in Paper III), were analysed, the categories of transcription as well as amino acid transport and metabolism were again the most well represented specified categories. These results would indicate that the properties making a strain able to nodulate a certain host plant most probably lie within functions in transcriptional regulation or in metabolic functions. However, we must not forget that a crucial determinant of nitrogen fixation might also hide among the genes for which specific functions are yet to be determined. When ortholog groups separating efficient nitrogen fixers from less efficient ones were analysed (Table 4 in Paper III), the proportion of genes with unknown function was lower than in any of the other comparisons made, while genes involved in amino acid or inorganic ion transport and metabolism were well represented. Genes involved in nitric oxide (NO) reduction (norEFCBQD and nnrSR) were found in all of the more efficient sv. orientalis nitrogen fixers and in one sv. officinalis strain, HAMBI 1141. NO production has been found related to symbiosis in M. truncatula – S. meliloti and L. japonicus – M. loti systems (Baudouin et al. 2006, Shimoda et al. 2005), and the genes involved in NO reduction have been found regulated by NO and regulators of

38

symbiosis genes (de Bruijn et al. 2006, Mesa et al. 2003). An ntrP gene (regulator of metabolic processes under stressful conditions) was also found in all identified high- efficiency nitrogen fixers among the sv. orientalis strains, but not in the strains identified as less efficient plant growth promoters. Genes of an ABC transporter for aliphatic sulfonates were identified as unique to sv. orientalis efficient nitrogen fixers, and were not found in any of the sv. officinalis strains. An attractive explanation to the differences observed in plant growth promoting effects under nitrogen-limiting conditions is thus that the ability to make use of different sources of energy, and to manage different metabolic pathways, is decisive for how good a nitrogen fixer a strain is.

4.2.2 Genes within the symbiosis gene region

Differences between strains of N. galegae can also be observed at a more detailed level, through analysis of specific genetic regions. One region of outmost interest is the symbiosis gene region, containing the majority of the nod, nif and fix genes. The available information from the ten genomes indicates that the symbiosis genes of N. galegae can be located on any replicon except for the chromosome. From an evolutionary perspective it is interesting to note that the genes coding for symbiotic functions are sometimes found on plasmids that have the potential of being transferred to other bacteria via conjugational transfer, or even being lost. Upon comparison of the symbiosis gene regions of all ten sequenced N. galegae strains (Figure 10; Figure 1 in Paper III), the only apparent difference clearly separating strains of the two symbiovars was the presence of an rpoN gene copy, downstream of a gene coding for a hypothetical protein, in strains of sv. orientalis. This rpoN2 gene was found to be essential for nitrogen fixation to occur in strain HAMBI 540T (Paper III), but it remains to be shown whether it is a determinant of the different host specificity shown by the symbiovars. The presence of an rpoN gene copy involved in symbiosis is not unique to N. galegae, being encountered at least in R. etli (Michiels et al. 1998) and in M. loti (Sullivan et al. 2013), but so far there are no reports on a possible function in host specificity. Strains belonging to the same symbiovar have highly syntenic symbiosis gene regions (Figure 10). With the exception of the rpoN gene mentioned above, the same genes known to have symbiotic functions are present in strains of both symbiovars, even if each symbiovar has its own rearrangement of the gene clusters. The genetic regions separating the main symbiosis gene clusters are, however, divergent also between strains of the same symbiovar.

39 The genome of N. galegae

40

Figure 10 ACT comparisons of the symbiosis gene regions containing nod, noe, nif and fix genes. The gene structures of the reference strains (HAMBI 540T and HAMBI 1141) are given. Red connecting lines indicate syntenic regions, blue connecting lines indicate inverted regions. A) Comparisons of sv. orientalis strains. The symbiosis gene region of HAMBI 2427 is almost identical to HAMBI 540T, while HAMBI 2566, HAMBI 2605 and HAMBI 2610 show high shared synteny while differing from HAMBI 540T to some extent. B) Comparisons of sv. officinalis strains. The symbiosis gene region of HAMBI 490 and HAMBI 1146 are identical and differ only slightly from HAMBI 1141. On the other hand, HAMBI 1145 and HAMBI 1189 are highly syntenic while diverging clearly from the structure in HAMBI 1141, having noticeable insertions between symbiosis gene clusters and additionally an inversion of the nodD2 – noeT gene cluster.

Even if the same genes are present in strains of both symbiovars, the DNA sequences are not, however, always identical. At a closer look, a gene first suggested to be a non-functional remnant in N. galegae (Paper II), the potential nifQ gene, turned into a gene of interest. In other rhizobia, the nifQ gene is involved in specifically targeting molybdenum for the iron-molybdenum cofactor of the nitrogenase complex (Imperial et al. 1984). However, in N. galegae, the proposed molybdenum-binding motif (Hernandez et al. 2008) is missing. Nevertheless, there was an apparent similarity of the protein sequence between strains of the same symbiovar, while showing a dramatic difference when the two symbiovars were compared. Plant inoculation tests with nifQ gene replacement deletion mutants of HAMBI 540T and HAMBI 1141 showed that the nifQ gene is needed for nitrogen fixation in the symbiosis of N. galegae with Galega spp. (Paper III), despite the lack of a traditional molybdenum-binding motif in its protein sequence. This raises the question whether this gene could have a function differing from that in related rhizobial species. It also remains to be shown if the two versions of NifQ observed in the two symbiovars are able to complement each other, or if the observed differences in protein sequence are involved in host specific interactions. On the other hand, it was observed that when Nod, Nif or Fix proteins from all ten sequenced strains were aligned, a majority of the amino acid substitutions observed between HAMBI 540T and HAMBI 1141 (Paper II) were making a distinction between the two symbiovars, while being conserved in strains of the same symbiovar. Nonetheless, the exceptionally high rate of non-synonymous substitutions in NifQ makes this protein particularly interesting.

4.2.3 Secretion systems of type IV and VI

In addition to the gene clusters directly involved in symbiosis functions, the presence of genes forming type IV and type VI secretion systems (T4SSs and T6SSs) was studied. The presence of T4SSs in rhizobia is not unusual, but these are far from found in all rhizobia. Although two types of T4SSs were found in the completely sequenced N. galegae HAMBI 1141, three out of the eight draft genomes produced did not contain any T4SS (Paper III). Proteins secreted by a T4SS in M. loti strain R7A have been shown to have host-specific effects on symbiosis (Hubber et al. 2007), while the T4SS found in S. meliloti 1021 was shown to be required only for

41 The genome of N. galegae

DNA transfer functions (Jones et al. 2007). The N. galegae strain HAMBI 1207, an antibiotic-resistant derivative of HAMBI 1141, was able to transfer its symbiosis plasmid to the Nod- sv. orientalis strain HAMBI 1587. This demonstrated that the T4SS genes found in this strain are involved in conjugational DNA transfer (Paper II). However, the results indicated that both of the two T4SS systems present in this strain are needed for transfer to occur. The exact combination of genes required for conjugal transfer in this strain is not yet known, but this could potentially be further investigated with the help of the strains harbouring only one T4SS in their genome. A T6SS has been found to secrete proteins that affect the infection process in nitrogen-fixing symbioses of at least R. leguminosarum strains nodulating pea (Bladergroen et al. 2003). Based on the comparison of the genomes of N. galegae strains HAMBI 540T and HAMBI 1141, it seemed plausible that the T6SS found in strain HAMBI 540T, similar to the corresponding system in A. fabrum C58 where the T6SS is likely to be involved in tumorigenesis (Wu et al. 2008), could have a symbiovar-specific function in symbiotic interactions since it was absent in HAMBI 1141 (Paper II). However, when the the genomes of eight more strains were added to the analyses, it was found that neither the different types of T4SSs nor the presence of a T6SS is a symbiovar-specific characteristic in N. galegae but all types of secretion systems can be found in strains of both symbiovars (Table 2 in Paper III). Nevertheless, the same T6SS genes are found organised in the same order in all N. galegae strains where such a system is encountered. The gene content of corresponding T4SSs might, on the other hand, vary slightly in different strains. The presence of the quorum sensing (QS) regulation system genes traI/traR/traM is traditionally associated with the T4SS of type I (Ding et al. 2013). However, in N. galegae strain HAMBI 490, an atypical set of QS regulation genes were found not very distantly from the T4SS genes classified as type IV (Paper III). Whether these QS genes are actually involved in regulation of the T4SS is, however, not known. Despite the fact that type IV and type VI secretion systems are frequently found in strains of N. galegae, their presence is not a typical characteristic of the species. In strain HAMBI 2610, none of the three types of secretion systems mentioned above can be found, while all three systems are found in strain HAMBI 2605. Moreover, although T3SSs are fairly common among rhizobia, no such genes could be found among the genes predicted in the 10 sequenced N. galegae strains.

42

5 The NoeT protein modulates the Nod factor of N. galegae

The Nod factor structure of N. galegae is fairly simple (Figure 5), with relatively few substitutions on the GlcNAc backbone. The structures of the NFs in this rhizobial species were studied for the first time in 1999 (Yang et al. 1999). Since then, it has been known that the NFs of N. galegae can have a very rare substitution in the form of an acetyl moiety on the GlcNAc residue next to the non-reducing-terminal residue. However, it was not until the genome sequencing of strains HAMBI 540T and HAMBI 1141 that the gene responsible for this substitution was identified (Paper II). This gene, an acetyl transferase gene named noeT, is located at one end of the symbiosis gene region, close to the nifQ gene (Figure 10), and is preceded by a nod box. The function of this gene was determined through construction of a deletion mutant of strain HAMBI 1174, the antibiotic-resistant derivative strain of HAMBI 540T, and mass spectrometry analysis of the LCOs produced by this mutant strain as compared to the wild-type HAMBI 1174 strain. A majority of the LCOs identified by mass spectrometry were acetylated, while no such LCOs could be detected in the NF extract isolated from culture of the deletion mutant strain. These results showed that the noeT gene is required for acetylation of the NF of N. galegae. To test the initial hypothesis that the acetyl substitution of the NFs is a determinant of the host specificity observed in N. galegae, the noeT mutant was used in inoculation tests on both Galega hosts and the non-hosts Trifolium repens, Pisum sativum cv. Afghanistan, Phaseolus vulgaris, Vicia hirsuta and Astragalus sinicus. However, the host range of the deletion mutant strain did not change in comparison to the wild-type: the non-host plants were not nodulated by the mutant strain, ineffective nodules were formed on G. officinalis and effective nodules observed on G. orientalis. As opposed to what was expected, host specificity does not seem to be affected by the presence or absence of the acetyl moiety on the NF of N. galegae. However, nodule formation on G. orientalis was not completely unaffected by the mutation. Formation of new nodules stagnated at an earlier stage on plants inoculated by the mutant strain (assigned the HAMBI collection code HAMBI 3275) as compared to strain HAMBI 1174 (Figure 7 in Paper II). The most likely reason for this is that the acetyl moiety provides a protective function on the NF, preventing it from degredation by plant chitinases. This kind of function of NF substitutions, including acetyl groups, has been proposed before (Snoeck et al. 2001, Schultze et al. 1998, Staehelin et al. 1994). In this case, the acetyl substitution has an important function under in vivo conditions, even if it is not a direct determinant of host specificity as previously anticipated.

5.1 Additional players in the symbiotic interaction

If the rare structure of the N. galegae NFs does not directly contribute to the strict host specificity, then what does? As the first molecule contributing to recognition by

43 The NoeT protein modulates the Nod factor of N. galegae the host plant, the structure of the NF is definitely of importance, but the combination of backbone chain length and all the different substitutions is perhaps more important than a single substitution. Both NF structure and the outcome of interactions between host plant and microsymbiont can be affected by a number of factors, some of which are discussed below. The type of flavonoids secreted by the plant makes a big difference for induction of the nod genes and strains normally inable to induce root nodules on some legume plant can gain this ability if the induction system of the nodD regulator is circumvented (Cárdenas et al. 1995). In N. galegae, the best synthetic inducer of the nodD1 gene is apigenin, although the level of induction is far from the level observed when Galega root exudate was used as the inducing agent (Suominen et al. 2003). However, the Galega root exudate is not efficient for induction of nod genes of other rhizobial species, even if it contains small amounts of flavonoids known to induce the nodD gene in these closely related rhizobia (Suominen et al. 2003). The chemical compound responsible for the improved level of N. galegae nodD1 induction in the Galega root exudate was found to be a chalcone-type compound (Suominen et al. 2003), which is still not fully identified to date. Apparently, also the type of nodD gene and the level of induction can affect the structure of the produced LCOs (Schlaman et al. 2006a). This is reflected in the results obtained in this work (Paper II) and those of Yang et al. (1999), where the reported backbone length and the structure of the fatty acyl differ to some extent. This is worth noting when analysing NF structures. Growth conditions can also contribute to what kind of NFs that are produced, e.g. the nodX gene has been shown to be temperature regulated, resulting in different NF mixtures at different growth temperatures (Olsthoorn et al. 2000). Usually bacteria are grown under optimal conditions in laboratory experiments such as for NF extraction, and thus the obtained NFs may differ from the NFs produced under natural conditions. Rhizobia also produce a wide variety of different surface polysaccharides, such as EPSs, LPSs and KPSs. In N. galegae, genes required for EPS and LPS production have been found (Paper II), but no KPS production genes. Both EPSs and LPSs affect nodule formation at different stages of the plant – rhizobium interactions leading to root nodule colonisation by the rhizobia, but like every other aspect of plant – bacterium interaction, the impact of surface polysaccharides on successful nitrogen fixation seems to be a complex system that is not yet fully understood (for recent reviews see (Skorupska et al. 2006, Becker et al. 2005)).

44

6 Conclusions

Biological nitrogen fixation through symbiosis between legume plants and specialised rhizobial bacteria has potential as a system that can be used to increase sustainability in agricultural food production practises. This thesis used two species of the legume plant Galega and strains of their nitrogen-fixing microsymbiont N. galegae as a model system to evaluate genetic features that can affect the efficiency of nitrogen-fixing symbioses. This kind of information is needed for optimisation of biological nitrogen fixation systems for agricultural application. The legume plant species G. orientalis and G. officinalis can be distinguished based on genetic features, even when plants growing within a few kilometres from each other are analysed. Genetic diversity between species is observed even within symbiosis-related genes, supporting a view that not only differences in the symbiotic bacteria, but also the host plants, contribute to the host specificity observed in nitrogen-fixing symbioses. Being the only rhizobial species known to induce nitrogen-fixing root nodules on Galega spp., genetic features separating N. galege from other rhizobial species were investigated. No clearly symbiosis-related features unique to N. galegae could be identified, but the analyses provided useful information on similarities with other rhizobia. The complete genome sequences of two N. galegae strains provided new insights into the ecology of this rhizobium, and enabled identification of a key modifier, the noeT gene, of the Nod factors of N. galegae. Genome comparisons of ten N. galegae strains were performed in order to identify intra-specific genetic features possibly contributing to the different symbiotic phenotypes observed for different strains on Galega plants. These analyses identified some differences in the genetic regions containing known symbiosis-related genes: (i) a symbiovar orientalis-specific rpoN gene and (ii) a putative nifQ gene with a remarkable ratio of nonsynonymous amino acid substitutions when comparing strains of the two symbiovars orientalis and officinalis, and lacking a molybdenum-binding motif thought to be essential for its function in symbiosis. Both the rpoN gene and the nifQ gene were shown to be essential for nitrogen fixation, but their impact on host specificity should be further investigated. In addition to the symbiovar-related differences, genes involved in metabolic functions were found to separate strains shown to have a high efficiency in nitrogen fixation from strains with a lower efficiency. Among others, genes for nitric oxide reduction were present in high- efficiency nitrogen fixers but not in low-efficiency strains. For the purpose of identifying good inoculant strains for agricultural applications, it would be good to confirm the effect of such metabolic genes on the efficiency of nitrogen fixation. The information produced in this thesis has broadened the knowledge on the symbiotic Galega spp. and N. galegae strains. Moreover, the availability of genome sequences for ten N. galegae strains will facilitate any further studies where phenotypic characteristics are to be evaluated.

45 References

7 References

Aguilar, O. M., Riva, O., Peltzer, E. 2004. Analysis of Rhizobium etli and of its symbiosis with wild Phaseolus vulgaris supports coevolution in centers of host diversification. Proc. Natl. Acad. Sci. USA 101: 13548-13553. Andronov, E. E., Terefework, Z., Roumiantseva, M. L., Dzyubenko, N. I., Onichtchouk, O. P., Kurchak, O. N., Dresler-Nurmi, A., Young, J. P., Simarov, B. V., Lindström, K. 2003. Symbiotic and genetic diversity of Rhizobium galegae isolates collected from the Galega orientalis gene center in the Caucasus. Appl. Environ. Microbiol. 69: 1067-1074. Annapurna, K. & Krishnan, H. B. 2003. Molecular aspects of soybean - specific nodulation by Sinorhizobium fredii USDA257. Indian. J. Exp. Biol. 41: 1114-1123. Antolín-Llovera, M., Ried, M., Parniske, M. 2014. Cleavage of the SYMBIOSIS RECEPTOR-LIKE KINASE ectodomain promotes complex formation with Nod factor receptor 5. Curr. Biol. 24: 422-427. Ardourel, M., Lortet, G., Maillet, F., Roche, P., Truchet, G., Promé, J.-C., Rosenberg, C. 1995. In Rhizobium meliloti, the operon associated with the nod box n5 comprises nodL, noeA and noeB, three host-range genes specifically required for the nodulation of particular Medicago species. Mol. Microbiol. 17: 687-699. Arrighi, J. F., Barre, A., Ben Amor, B., Bersoult, A., Soriano, L. C., Mirabella, R., de Carvalho-Niebel, F., Journet, E. P., Ghérardi, M., Huguet, T., Geurts, R., Dénarié, J., Rougé, P., Gough, C. 2006. The Medicago truncatula lysine motif- receptor-like kinase gene family includes NFP and new nodule-expressed Genes. Plant Physiol. 142: 265-279. ARS-GRIN: USDA, ARS, National Genetic Resources Program. Germplasm Resources Information Network - (GRIN) [Online Database]. National Germplasm Resources Laboratory, Beltsville, Maryland. URL: http://www.ars- grin.gov/cgi-bin/npgs/html/index.pl (26 February 2014) Atkinson, E. M., Palcic, M. M., Hindsgaul, O., Long, S. R. 1994. Biosynthesis of Rhizobium meliloti lipooligosaccharide nod factors: NodA is required for an N- acyltransferase activity. Proc. Natl. Acad. Sci. USA 91: 8418-8422. Ausmees, N., Kobayashi, H., Deakin, W. J., Marie, C., Krishnan, H. B., Broughton, W. J., Perret, X. 2004. Characterization of NopP, a type III secreted effector of Rhizobium sp. strain NGR234. J. Bacteriol. 186: 4774-4780. Baev, N., Schultze, M., Barlier, I., Ha, D. C., Virelizier, H., Kondorosi, E., Kondorosi, A. 1992. Rhizobium nodM and nodN genes are common nod genes: nodM encodes functions for efficiency of nod signal production and bacteroid maturation. J. Bacteriol. 174: 7555-7565.

46

Bala, A., Murphy, P. J., Osunde, A. O., Giller, K. E. 2003. Nodulation of tree legumes and the ecology of their native rhizobial populations in tropical soils. Appl. Soil. Ecol. 22: 211-223. Banfalvi, Z., Nieuwkoop, A., Schell, M., Besl, L., Stacey, G. 1988. Regulation of nod gene expression in Bradyrhizobium japonicum. Mol. Gen. Genet. 214: 420- 424. Baudouin, E., Pieuchot, L., Engler, G., Pauly, N., Puppo, A. 2006. Nitric oxide is formed in Medicago truncatula-Sinorhizobium meliloti functional nodules. Mol. Plant-Microbe Interact. 19: 970-975. Becker, A., Fraysse, N., Sharypova, L. 2005. Recent advances in studies on structure and symbiosis-related function of rhizobial K-antigens and lipopolysaccharides. Mol. Plant-Microbe Interact. 18: 899-905. Ben Amor, B., Shaw, S. L., Oldroyd, G. E. D., Maillet, F., Penmetsa, R. V., Cook, D., Long, S. R., Dénarié, J., Gough, C. 2003. The NFP locus of Medicago truncatula controls an early step of Nod factor signal transduction upstream of a rapid calcium flux and root hair deformation. Plant J. 34: 495-507. Bensmihen, S., de Billy, F., Gough, C. 2011. Contribution of NFP LysM domains to the recognition of Nod factors during the Medicago truncatula/Sinorhizobium meliloti symbiosis. PLoS ONE 6: e26114. Berck, S., Perret, X., Quesada-Vincens, D., Promé, J.-C., Broughton, W. J., Jabbouri, S. 1999. NolL of Rhizobium sp. strain NGR234 is required for O- acetyltransferase activity. J. Bacteriol. 181: 957-964. Bladergroen, M. R., Badelt, K., Spaink, H. P. 2003. Infection-blocking genes of a symbiotic Rhizobium leguminosarum strain that are involved in temperature- dependent protein secretion. Mol. Plant-Microbe Interact. 16: 53-64. Bloemberg, G. V., Thomas-Oates, J. E., Lugtenberg, B. J. J., Spaink, H. P. 1994. Nodulation protein NodL of Rhizobium leguminosarum O-acetylates lipo- oligosaccharides, chitin fragments and N-acetylglucosamine in vitro. Mol. Microbiol. 11: 793-804. Bloemberg, G. V., Kamst, E., Harteveld, M., van der Drift, K. M. G. M., Haverkamp, J., Thomas-Oates, J. E., Lugtenberg, B. J. J., Spaink, H. P. 1995. A central domain of Rhizobium NodE protein mediates host specificity by determining the hydrophobicity of fatty acyl moieties of nodulation factors. Mol. Microbiol. 16: 1123-1136. Boundy-Mills, K. L., Kosslak, R. M., Tully, R. E., Pueppke, S. G., Lohrke, S., Sadowsky, M. J. 1994. Induction of the Rhizobium fredii nod box-independent nodulation gene nolJ requires a functional nodD1 gene. Mol. Plant-Microbe Interact. 7: 305-308. Broghammer, A., Krusell, L., Blaise, M., Sauer, J., Sullivan, J. T., Maolanon, N., Vinther, M., Lorentzen, A., Madsen, E. B., Jensen, K. J., Roepstorff, P., Thirup, S., Ronson, C. W., Thygesen, M. B., Stougaard, J. 2012. Legume receptors

47 References

perceive the rhizobial lipochitin oligosaccharide signal molecules by direct binding. Proc. Natl. Acad. Sci. USA 109: 13859-13864. Broughton, W. J., Hanin, M., Reliü, B., Kopciñska, J., Golinowski, W., ùimúek, ù., Ojanen-Reuhs, T., Reuhs, B., Marie, C., Kobayashi, H., Bordogna, B., Le Quéré, A., Jabbouri, S., Fellay, R., Perret, X., Deakin, W. J. 2006. Flavonoid- inducible modifications to rhamnan O antigens are necessary for Rhizobium sp. strain NGR234-legume symbioses. J. Bacteriol. 188: 3654-3663. Buermans, H. P. J. & den Dunnen, J. T. 2014. Next generation sequencing technology: Advances and applications. Biochim. Biophys. Acta 1842: 1932- 1941. Burris, R. H. & Roberts, G. P. 1993. Biological nitrogen fixation. Annu. Rev. Nutr. 13: 317-335. Cárdenas, L., Domínguez, J., Quinto, C., López-Lara, I. M., Lugtenberg, B. J. J., Spaink, H. P., Rademaker, G. J., Haverkamp, J., Thomas-Oates, J. E. 1995. Isolation, chemical structures and biological activity of the lipo-chitin oligosaccharide nodulation signals from Rhizobium etli. Plant Mol. Biol. 29: 453-464. Chen, L., Chen, Y., Wood, D. W., Nester, E. W. 2002. A new type IV secretion system promotes conjugal transfer in Agrobacterium tumefaciens. J. Bacteriol. 184: 4838-4845. Coulthurst, S. J. 2013. The type VI secretion system – a widespread and versatile cell targeting system. Res. Microbiol. 164: 640-654. Dalton, D. A. & Kramer, S. 2007. Nitrogen-fixing bacteria in non-legumes. In: Plant-Associated Bacteria. Gnanamanickam, S. S. (ed). Springer, Dordrecht, the Netherlands. 712 p. Davis, E. O., Evans, I. J., Johnston, A. W. B. 1988. Identification of nodX, a gene that allows Rhizobium leguminosarum biovar viciae strain TOM to nodulate Afghanistan peas. Mol. Gen. Genet. 212: 531-535. Davis, E. O. & Johnston, A. W. B. 1990. Analysis of three nodD genes in Rhizobium leguminosarum biovar phaseoli; nodD 1 is preceded by nolE, a gene whose product is secreted from the cytoplasm. Mol. Microbiol. 4: 921-932. de Bruijn, F. J., Rossbach, S., Bruand, C., Parrish, J. R. 2006. A highly conserved Sinorhizobium meliloti operon is induced microaerobically via the FixLJ system and by nitric oxide (NO) via NnrR. Environ. Microbiol. 8: 1371-1381. De Hoff, P. L., Brill, L. M., Hirsch, A. M. 2009. Plant lectins: The ties that bind in root symbiosis and plant defense. Mol. Genet. Genomics 282: 1-15. De Mita, S., Santoni, S., Ronfort, J., Bataillon, T. 2007. Adaptive evolution of the symbiotic gene NORK is not correlated with shifts of rhizobial specificity in the genus Medicago . BMC Evol. Biol. 7: 210.

48

Demont, N., Debellé, F., Aurelle, H., Dénarié, J., Promé, J. C. 1993. Role of the Rhizobium meliloti nodF and nodE genes in the biosynthesis of lipo- oligosaccharidic nodulation factors. J. Biol. Chem. 268: 20134-20142. Ding, H., Yip, C. B., Hynes, M. F. 2013. Genetic characterization of a novel rhizobial plasmid conjugation system in Rhizobium leguminosarum bv. viciae strain VF39SM. J. Bacteriol. 195: 328-339. Dobert, R. C., Breil, B. T., Triplett, E. 1994. DNA sequence of the common nodulation genes of Bradyrhizobium elkanii and their phylogenetic relationship to those of other nodulating bacteria. Mol. Plant-Microbe Interact. 7: 564-572. Dockendorff, T. ɋ., Sharma, A. J., Stacey, G. 1994. Identification and characterization of the nolYZ genes of Bradyrhizobium japonicum. Mol. Plant- Microbe Interact. 7: 173-180. Duke, J. A. 1987. CRC handbook of medicinal herbs. CRC Press LLC, Boca Raton, Florida, USA. Economou, A., Hamilton, W. D., Johnston, A. W., Downie, J. A. 1990. The Rhizobium nodulation gene nodO encodes a Ca2+-binding protein that is exported without N-terminal cleavage and is homologous to haemolysin and related proteins. EMBO J. 9: 349-354. Ehrhardt, D. W., Atkinson, E. M., Faull, K. F., Freedberg, D. I., Sutherlin, D. P., Armstrong, R., Long, S. R. 1995. In vitro sulfotransferase activity of NodH, a nodulation protein of Rhizobium meliloti required for host-specific nodulation. J. Bacteriol. 177: 6237-6245. Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., Bibillo, A., Bjornson, K., Chaudhuri, B., Christians, F., Cicero, R., Clark, S., Dalal, R., deWinter, A., Dixon, J., Foquet, M., Gaertner, A., Hardenbol, P., Heiner, C., Hester, K., Holden, D., Kearns, G., Kong, X., Kuse, R., Lacroix, Y., Lin, S., Lundquist, P., Ma, C., Marks, P., Maxham, M., Murphy, D., Park, I., Pham, T., Phillips, M., Roy, J., Sebra, R., Shen, G., Sorenson, J., Tomaney, A., Travers, K., Trulson, M., Vieceli, J., Wegener, J., Wu, D., Yang, A., Zaccarin, D., Zhao, P., Zhong, F., Korlach, J., Turner, S. 2009. Real-time DNA sequencing from single polymerase molecules. Science 323: 133-138. Endre, G., Kereszt, A., Kevei, Z., Mihacea, S., Kalo, P., Kiss, G. B. 2002. A receptor kinase gene regulating symbiotic nodule development. Nature 417: 962-966. EOL: Encyclopedia of Life. URL: www.eol.org (26 February 2014) Evans, I. J. & Downie, J. A. 1986. The nodI gene product of Rhizobium leguminosarum is closely related to ATP-binding bacterial transport proteins; nucleotide sequence analysis of the nodI and nodJ genes. Gene 43: 95-101. Faucher, C., Camut, S., Dénarié, J., Truchet, G. 1989. The nodH and nodQ host range genes of Rhizobium meliloti behave as avirulence genes in R.

49 References

leguminosarum bv. viciae and determine changes in the production of plant- specific extracellular signals. Mol. Plant-Microbe Interact. 2: 291-300. Firmin, J. L., Wilson, K. E., Carlson, R. W., Davies, A. E., Downie, J. A. 1993. Resistance to nodulation of cv. Afghanistan peas is overcome by nodX, which mediates an O-acetylation of the Rhizobium leguminosarum lipo- oligosaccharide nodulation factor. Mol. Microbiol. 10: 351-360. Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J. M., McKenney, K., Sutton, G., FitzHugh, W., Fields, C., Gocayne, J. D., Scott, J., Shirley, R., Liu, L., Glodek, A., Kelley, J. M., Weidman, J. F., Phillips, C. A., Spriggs, T., Hedblom, E., Cotton, M. D., Utterback, T. R., Hanna, M. C., Nguyen, D. T., Saudek, D. M., Brandon, R. C., Fine, L. D., Fritchman, J. L., Fuhrmann, J. L., Geoghagen, N. S. M., Gnehm, C. L., McDonald, L. A., Small, K. V., Fraser, C. M., Smith, H. O., Venter, J. C. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae rd. Science 269: 496-512. Forde, B. M. & O’Toole, P. W. 2013. Next-generation sequencing technologies and their impact on microbial genomics. Brief. Funct. Genomics 12: 440-453. Fowler, D., Coyle, M., Skiba, U., Sutton, M. A., Cape, J. N., Reis, S., Sheppard, L. J., Jenkins, A., Grizzetti, B., Galloway, J. N., Vitousek, P., Leach, A., Bouwman, A. F., Butterbach-Bahl, K., Dentener, F., Stevenson, D., Amann, M., Voss, M. 2013. The global nitrogen cycle in the twenty-first century. Phil. Trans. R. Soc. B 368: 20130164. Franche, C., Lindström, K., Elmerich, C. 2009. Nitrogen-fixing bacteria associated with leguminous and non-leguminous plants. Plant Soil 321: 35-59. Fraysse, N., Couderc, F., Poinsot, V. 2003. Surface polysaccharide involvement in establishing the rhizobium-legume symbiosis. Eur. J. Biochem. 270: 1365-1380. Garcia, M., Dunlap, J., Loh, J., Stacey, G. 1996. Phenotypic characterization and regulation of the nolA gene of Bradyrhizobium japonicum. Mol. Plant-Microbe Interact. 9: 625-635. Geelen, D., Leyman, B., Mergaert, P., Klarskov, K., Van Montagu, M., Geremia, R., Holsters, M. 1995. NodS is an S-adenosyl-L-methionine-dependent methyltransferase that methylates chitooligosaccharides deacetylated at the non- reducing end. Mol. Microbiol. 17: 387-397. Geremia, R. A., Mergaert, P., Geelen, D., Van Montagu, M., Holsters, M. 1994. The NodC protein of Azorhizobium caulinodans is an N- acetylglucosaminyltransferase. Proc. Natl. Acad. Sci. USA 91: 2669-2673. Goethals, K., Mergaert, P., Gao, M., Geelen, D., Van Montagu, M., Holsters, M. 1992. Identification of a new inducible nodulation gene in Azorhizobium caulinodans. Mol. Plant-Microbe Interact. 5: 405-411.

50

González-Andrés, F., Redondo, P. A., Pescador, R., Urbano, B. 2004. Management of Galega officinalis L. and preliminary results on its potential for milk production improvement in sheep. New Zeal. J. Agr. Res. 47: 233-245. Göttfert, M., Grob, P., Hennecke, H. 1990. Proposed regulatory pathway encoded by the nodV and nodW genes, determinants of host specificity in Bradyrhizobium japonicum. Proc. Natl. Acad. Sci. USA 87: 2680-2684. Gresham, A. & Booth, K. 1991. Poisoning of sheep by goat's rue. Vet. Record 129: 197-198. Harrison, P. W., Lower, R. P. J., Kim, N. K. D., Young, J. P. W. 2010. Introducing the bacterial ‘chromid’: Not a chromosome, not a plasmid. Trends Microbiol. 18: 141-148. Hernandez, J. A., Curatti, L., Aznar, C. P., Perova, Z., Britt, R. D., Rubio, L. M. 2008. Metal trafficking for nitrogen fixation: NifQ donates molybdenum to NifEN/NifH for the biosynthesis of the nitrogenase FeMo-cofactor. Proc. Natl. Acad. Sci. USA 105: 11679-11684. Herridge, D. F., Peoples, M. B., Boddey, R. M. 2008. Global inputs of biological nitrogen fixation in agricultural systems. Plant Soil 311: 1-18. Ho, B., Dong, T., Mekalanos, J. 2014. A view to a kill: The bacterial type VI secretion system. Cell Host Microbe 15: 9-21. Horvath, B., Bachem, C. W. B., Schell, J., Kondorosi, A. 1987. Host-specific regulation of nodulation genes in Rhizobium is mediated by a plant-signal, interacting with the nodD gene product. EMBO J. 6: 841-848. Hubber, A. M., Sullivan, J. T., Ronson, C. W. 2007. Symbiosis-induced cascade regulation of the Mesorhizobium loti R7A VirB/D4 type IV secretion system. Mol. Plant Microbe Interact. 20: 255-261. Hueck, C. J. 1998. Type III protein secretion systems in bacterial pathogens of animals and plants. Microbiol. Mol. Biol. Rev. 62: 379-433. Hämet-Ahti, L., Suominen, J., Ulvinen, T., Uotila, P. 1998. Retkeilykasvio (field flora of Finland). Finnish Museum of Natural History, Helsinki, Finland. 656 p. Imperial, J., Ugalde, R. A., Shah, V. K., Brill, W. J. 1984. Role of the nifQ gene product in the incorporation of molybdenum into nitrogenase in Klebsiella pneumoniae. J. Bacteriol. 158: 187-194. Jabbouri, S., Fellay, R., Talmont, F., Kamalaprija, P., Burger, U., Relic, B., Promé, J., Broughton, W. J. 1995. Involvement of nodS in N-methylation and nodU in 6-O-carbamoylation of Rhizobium sp. NGR234 nod factors. J. Biol. Chem. 270: 22968-22973. Jabbouri, S., Relic, B., Hanin, M., Kamalaprija, P., Burger, U., Promé, D., Promé, J. C., Broughton, W. J. 1998. nolO and noeI (HsnIII) of Rhizobium sp. NGR234 are involved in 3-O-carbamoylation and 2-O-methylation of Nod factors. J. Biol. Chem. 273: 12047-12055.

51 References

John, M., Röhrig, H., Schmidt, J., Wieneke, U., Schell, J. 1993. Rhizobium NodB protein involved in nodulation signal synthesis is a chitooligosaccharide deacetylase. Proc. Natl. Acad. Sci. USA 90: 625-629. Johnson, T. L., Abendroth, J., Hol, W. G. J., Sandkvist, M. 2006. Type II secretion: From structure to function. FEMS Microbiol. Lett. 255: 175-186. Jones, K. M., Lloret, J., Daniele, J. R., Walker, G. C. 2007. The type IV secretion system of Sinorhizobium meliloti strain 1021 is required for conjugation but not for intracellular symbiosis. J. Bacteriol. 189: 2133-2138. Kaksonen, A. H., Jussila, M. M., Lindström, K., Suominen, L. 2006. Rhizosphere effect of Galega orientalis in oil-contaminated soil. Soil Biol. Biochem. 38: 817- 827. Keeler, R., Johnson, A., Stuart, L., Evans, J. 1986. Toxicosis from and possible adaptation to Galega officinalis in sheep and the relationship to Verbesina encelioides toxicosis. Vet. Hum. Toxicol. 28: 309-315. Kondorosi, E., Pierre, M., Cren, M., Haumann, U., Buiré, M., Hoffmann, B., Schell, J., Kondorosi, A. 1991. Identification of NolR, a negative transacting factor controlling the nod regulon in Rhizobium meliloti. J. Mol. Biol. 222: 885-896. Krishnan, H. B. & Pueppke, S. G. 1991. nolC, a Rhizobium fredii gene involved in cultivar-specific nodulation of soybean, shares homology with a heat-shock gene. Mol. Microbiol. 5: 737-745. Laakso, I., Virkajärvi, P., Airaksinen, H., Varis, E. 1990. Determination of vasicine and related alkaloids by gas chromatography-mass spectrometry. J. Chromatogr. 505: 424-428. Lee, K. B., De Backer, P., Aono, T., Liu, C. T., Suzuki, S., Suzuki, T., Kaneko, T., Yamada, M., Tabata, S., Kupfer, D. M., Najar, F. Z., Wiley, G. B., Roe, B., Binnewies, T. T., Ussery, D. W., D'Haeze, W., den Herder, J., Gevers, D., Vereecke, D., Holsters, M., Oyaizu, H. 2008. The genome of the versatile nitrogen fixer Azorhizobium caulinodans ORS571. BMC Genomics 9: 271- 2164-9-271. Lie, T. A., Göktan, D., Engin, M., Pijnenborg, J., Anlarsal, E. 1987. Co-evolution of the legume-Rhizobium association. Plant Soil 100: 171-181. Limpens, E., Franken, C., Smit, P., Willemse, J., Bisseling, T., Geurts, R. 2003. LysM domain receptor kinases regulating rhizobial Nod factor-induced infection. Science 302: 630-633. Lindström, K. 1989. Rhizobium galegae, a new species of legume root nodule bacteria. Int. J. Syst. Bacteriol. 39: 365-367. Lindström, K., Lipsanen, P., Kaijalainen, S. 1990. Stability of markers used for identification of two Rhizobium galegae inoculant strains after five years in the field. Appl. Environ. Microb. 56: 444-450.

52

Lipsanen, P. & Lindström, K. 1988. Infection and root nodule structure in the Rhizobium galegae sp. nov.-Galega sp. symbiosis. Symbiosis 6: 81-95. Loh, J., Garcia, M., Stacey, G. 1997. NodV and NodW, a second flavonoid recognition system regulating nod gene expression in Bradyrhizobium japonicum. J. Bacteriol. 179: 3013-3020. Lohrke, S. M., Day, B., Kolli, V. S. K., Hancock, R., Yuen, J. P.-Y., de Souza, M. L., Stacey, G., Carlson, R., Tong, Z., Hur, H.-G., Orf, J. H., Sadowsky, M. J. 1998. The Bradyrhizobium japonicum noeD gene: A negatively acting, genotype-specific nodulation gene for soybean. Mol. Plant-Microbe Interact. 11: 476-488. López-Lara, I. M. & Geiger, O. 2001. The nodulation protein NodG shows the enzymatic activity of an 3-oxoacyl-acyl carrier protein reductase. Mol. Plant- Microbe Interact. 14: 349-357. Luka, S., Sanjuan, J., Carlson, R. W., Stacey, G. 1993. nolMNO genes of Bradyrhizobium japonicum are co-transcribed with nodYABCSUIJ, and nolO is involved in the synthesis of the lipo-oligosaccharide nodulation signals. J. Biol. Chem. 268: 27053-27059. LuontoPortti/NatureGate 2014. URL: http://www.luontoportti.com/suomi/en/kukkakasvit/goat-s-rue (27 Feb 2014). Madsen, E. B., Antolín-Llovera, M., Grossmann, C., Ye, J., Vieweg, S., Broghammer, A., Krusell, L., Radutoiu, S., Jensen, O. N., Stougaard, J., Parniske, M. 2011. Autophosphorylation is essential for the in vivo function of the Lotus japonicus Nod factor receptor 1 and receptor-mediated signalling in cooperation with Nod factor receptor 5. Plant J. 65: 404-417. Madsen, E. B., Madsen, L. H., Radutoiu, S., Olbryt, M., Rakwalska, M., Szczyglowski, K., Sato, S., Kaneko, T., Tabata, S., Sandal, N., Stougaard, J. 2003. A receptor kinase gene of the LysM type is involved in legume perception of rhizobial signals. Nature 425: 637-640. Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., Berka, J., Braverman, M. S., Chen, Y., Chen, Z., Dewell, S. B., Du, L., Fierro, J. M., Gomes, X. V., Godwin, B. C., He, W., Helgesen, S., Ho, C. H., Irzyk, G. P., Jando, S. C., Alenquer, M. L. I., Jarvie, T. P., Jirage, K. B., Kim, J., Knight, J. R., Lanza, J. R., Leamon, J. H., Lefkowitz, S. M., Lei, M., Li, J., Lohman, K. L., Lu, H., Makhijani, V. B., McDade, K. E., McKenna, M. P., Myers, E. W., Nickerson, E., Nobile, J. R., Plant, R., Puc, B. P., Ronan, M. T., Roth, G. T., Sarkis, G. J., Simons, J. F., Simpson, J. W., Srinivasan, M., Tartaro, K. R., Tomasz, A., Vogt, K. A., Volkmer, G. A., Wang, S. H., Wang, Y., Weiner, M. P., Yu, P., Begley, R. F., Rothberg, J. M. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376-380. Marie, C., Broughton, W. J., Deakin, W. J. 2001. Rhizobium type III secretion systems: Legume charmers or alarmers? Curr. Opin. Plant Biol. 4: 336-342.

53 References

Marie, C., Deakin, W. J., Viprey, V., Kopciñska, J., Golinowski, W., Krishnan, H. B., Perret, X., Broughton, W. J. 2003. Characterization of Nops, nodulation outer proteins, secreted via the type III secretion system of NGR234. Mol. Plant-Microbe Interact. 16: 743-751. Martens, M., Dawyndt, P., Coopman, R., Gillis, M., De Vos, P., Willems, A. 2008. Advantages of multilocus sequence analysis for taxonomic studies: A case study using 10 housekeeping genes in the genus Ensifer (including former Sinorhizobium). Int. J. Syst. Evol. Microbiol. 58: 200-214. Martínez-Romero, E. 2009. Coevolution in Rhizobium-legume symbiosis? DNA Cell Biol. 28: 361-370. Mergaert, P., D'Haeze, W., Fernández-López, M., Geelen, D., Goethals, K., Promé, J., Van Montagu, M., Holsters, M. 1996. Fucosylation and arabinosylation of Nod factors in Azorhizobium caulinodans: Involvement of nolKnodZ as well as noeC and/or downstream genes. Mol. Microbiol. 21: 409-419. Mergaert, P., Uchiumi, T., Alunni, B., Evanno, G., Cheron, A., Catrice, O., Mausset, A., Barloy-Hubler, F., Galibert, F., Kondorosi, A., Kondorosi, E. 2006. Eukaryotic control on bacterial cell cycle and differentiation in the Rhizobium– legume symbiosis. Proc. Natl. Acad. Sci. USA 103: 5230-5235. Mesa, S., Bedmar, E. J., Chanfon, A., Hennecke, H., Fischer, H. 2003. Bradyrhizobium japonicum NnrR, a denitrification regulator, expands the FixLJ-FixK2 regulatory cascade. J. Bacteriol. 185: 3978-3982. Metzker, M. L. 2010. Sequencing technologies - the next generation. Nat. Rev. Genet. 11: 31-46. Michiels, J., Moris, M., Dombrecht, B., Verreth, C., Vanderleyden, J. 1998. Differential regulation of Rhizobium etli rpoN2 gene expression during symbiosis and free-living growth. J. Bacteriol. 180: 3620-3628. Mikkonen, A., Kondo, E., Lappi, K., Wallenius, K., Lindström, K., Hartikainen, H., Suominen, L. 2011. Contaminant and plant-derived changes in soil chemical and microbiological indicators during fuel oil rhizoremediation with Galega orientalis. Geoderma 160: 336-346. Mossberg, B. & Stenberg, L. 2012. Suuri Pohjolan kasvio. Tammi, Helsinki, Finland. 928 p. Mousavi, S. A., Österman, J., Wahlberg, N., Nesme, X., Lavire, C., Vial, L., Paulin, L., de Lajudie, P., Lindström, K. 2014. Phylogeny of the Rhizobium- Allorhizobium-Agrobacterium clade supports the delineation of Neorhizobium gen. nov. Syst. Appl. Microbiol. 37: 208-215. Mulligan, J. T. & Long, S. R. 1989. A family of activator genes regulates expression of Rhizobium meliloti nodulation genes. Genetics 122: 7-18.

54

Nandasena, K. G., O'Hara, G. W., Tiwari, R. P., Willlems, A., Howieson, J. G. 2007. Mesorhizobium ciceri biovar biserrulae, a novel biovar nodulating the pasture legume Biserrula pelecinus L. Int. J. Syst. Evol. Microbiol. 57: 1041-1045. Neale, D. B., Wegrzyn, J. L., Stevens, K. A., Zimin, A. V., Puiu, D., Crepeau, M. W., Cardeno, C., Koriabine, M., Holtz-Morris, A. E., Liechty, J. D., Martínez- García, P. J., Vasquez-Gross, H. A., Lin, B. Y., Zieve, J. J., Dougherty, W. M., Fuentes-Soriano, S., Wu, L. S., Gilbert, D., Marçais, G., Roberts, M., Holt, C., Yandell, M., Davis, J. M., Smith, K. E., Dean, J. F., Lorenz, W. W., Whetten, R. W., Sederoff, R., Wheeler, N., McGuire, P. E., Main, D., Loopstra, C. A., Mockaitis, K., deJong, P. J., Yorke, J. A., Salzberg, S. L., Langley, C. H. 2014. Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome Biol. 15: R59. NRCS: Department of Agriculture, National Resources Conservation Service (NRCS), Plants Database. URL: www.plants.usda.gov (27 February 2014) Okazaki, S., Kaneko, T., Sato, S., Saeki, K. 2013. Hijacking of leguminous nodulation signaling by the rhizobial type III secretion system. Proc. Natl. Acad. Sci. USA 110: 17131-17136. Oldham, M., Ransom, C. V., Ralphs, M. H., Gardner, D. R. 2011. Galegine content in goatsrue (Galega officinalis) varies by plant part and phenological growth stage. Weed Sci. 59: 349-352. Oldroyd, G. E. D. 2013. Speak, friend, and enter: Signalling systems that promote beneficial symbiotic associations in plants. Nat. Rev. Micro. 11: 252-263. Olsthoorn, M. M. A., Stokvis, E., Haverkamp, J., Spaink, H. P., Thomas-Oates, J. 2000. Growth temperature regulation of host-specific modifications of rhizobial lipo-chitin oligosaccharides: The function of nodX is temperature regulated. Mol. Plant-Microbe Interact. 13: 808-820. Parada, M., Vinardell, J. M., Ollero, F. J., Hidalgo, Á., Gutiérrez, R., Buendía- Clavería, A. M., Lei, W., Margaret, I., López-Baena, F. J., Gil-Serrano, A., Rodríguez-Carvajal, M. A., Moreno, J., Ruiz-Sainz, J. E. 2006. Sinorhizobium fredii HH103 mutants affected in capsular polysaccharide (KPS) are impaired for nodulation with soybean and Cajanus cajan. Mol. Plant-Microbe Interact. 19: 43-52. Perret, X., Staehelin, C., Broughton, W. J. 2000. Molecular basis of symbiotic promiscuity. Microbiol. Mol. Biol. Rev. 64: 180-201. Plazanet, C., Réfrégier, G., Demont, N., Truchet, G., Rosenberg, C. 1995. The Rhizobium meliloti region located downstream of the nod box n6 is involved in the specific nodulation of Medicago lupulina. FEMS Microbiol. Lett. 133: 285- 291. Price, N. P. J. 1999. Carbohydrate determinants of Rhizobium–legume symbioses. Carbohydr. Res. 317: 1-9.

55 References

Pueppke, S. G. & Broughton, W. J. 1999. Rhizobium sp. strain NGR234 and R. fredii USDA257 share exceptionally broad, nested host ranges. Mol. Plant-Microbe Interact. 12: 293-318. Pukatzki, S., Ma, A. T., Sturtevant, D., Krastins, B., Sarracino, D., Nelson, W. C., Heidelberg, J. F., Mekalanos, J. J. 2006. Identification of a conserved bacterial protein secretion system in Vibrio cholerae using the Dictyostelium host model system. Proc. Natl. Acad. Sci. USA 103: 1528-1533. Quesada-Vincens, D., Hanin, M., Broughton, W. J., Jabbouri, S. 1998. In vitro sulfotransferase activity of NoeE, a nodulation protein of Rhizobium sp. NGR234. Mol. Plant-Microbe Interact. 11: 592-600. Radeva, G., Jurgens, G., Niemi, M., Nick, G., Suominen, L., Lindström, K. 2001. Description of two biovars in the Rhizobium galegae species: Biovar orientalis and biovar officinalis. Syst. Appl. Microbiol. 24: 192-205. Radutoiu, S., Madsen, L. H., Madsen, E. B., Jurkiewicz, A., Fukai, E., Quistgaard, E. M. H., Albrektsen, A. S., James, E. K., Thirup, S., Stougaard, J. 2007. LysM domains mediate lipochitin-oligosaccharide recognition and Nfr genes extend the symbiotic host range. EMBO J. 26: 3923-3935. Radutoiu, S., Madsen, L. H., Madsen, E. B., Felle, H. H., Umehara, Y., Gronlund, M., Sato, S., Nakamura, Y., Tabata, S., Sandal, N., Stougaard, J. 2003. Plant recognition of symbiotic bacteria requires two LysM receptor-like kinases. Nature 425: 585-592. Records, A. R. 2011. The type VI secretion system: A multipurpose delivery system with a phage-like machinery. Mol. Plant-Microbe Interact. 24: 751-757. Ritsema, T., Geiger, O., van Dillewijn, P., Lugtenberg, B. J., Spaink, H. P. 1994. Serine residue 45 of nodulation protein NodF from Rhizobium leguminosarum bv. viciae is essential for its biological function. J. Bacteriol. 176: 7740-7743. Rivilla, R., Sutton, J. M., Downie, J. A. 1995. Rhizobium leguminosarum NodT is related to a family of outer-membrane transport proteins that includes TolC, PrtF, CyaE and AprF. Gene 161: 27-31. Roche, P., Debellé, F., Maillet, F., Lerouge, P., Faucher, C., Truchet, G., Dénarié, J., Promé, J. 1991. Molecular basis of symbiotic host specificity in Rhizobium meliloti: nodH and nodPQ genes encode the sulfation of lipo-oligosaccharide signals. Cell 67: 1131-1143. Rogel, M. A., Ormeño-Orrillo, E., Martinez Romero, E. 2011. Symbiovars in rhizobia reflect bacterial adaptation to legumes. Syst. Appl. Microbiol. 34: 96- 104. Rossen, L., Shearman, C. A., Johnston, A. W. B., Downie, J. A. 1985. The nodD gene of Rhizobium leguminosarum is autoregulatory and in the presence of plant exudate induces the nodA,B,C genes. EMBO J. 4: 3369-3373.

56

Rothberg, J. M., Hinz, W., Rearick, T. M., Schultz, J., Mileski, W., Davey, M., Leamon, J. H., Johnson, K., Milgrew, M. J., Edwards, M., Hoon, J., Simons, J. F., Marran, D., Myers, J. W., Davidson, J. F., Branting, A., Nobile, J. R., Puc, B. P., Light, D., Clark, T. A., Huber, M., Branciforte, J. T., Stoner, I. B., Cawley, S. E., Lyons, M., Fu, Y., Homer, N., Sedova, M., Miao, X., Reed, B., Sabina, J., Feierstein, E., Schorn, M., Alanjary, M., Dimalanta, E., Dressman, D., Kasinskas, R., Sokolsky, T., Fidanza, J. A., Namsaraev, E., McKernan, K. J., Williams, A., Roth, G. T., Bustillo, J. 2011. An integrated semiconductor device enabling non-optical genome sequencing. Nature 475: 348-352. Röhrig, H., Schmidt, J., Wieneke, U., Kondorosi, E., Barlier, I., Schell, J., John, M. 1994. Biosynthesis of lipooligosaccharide nodulation factors: Rhizobium NodA protein is involved in N-acylation of the chitooligosaccharide backbone. Proc. Natl. Acad. Sci. USA 91: 3122-3126. Saad, M. M., Staehelin, C., Broughton, W. J., Deakin, W. J. 2008. Protein-protein interactions within type III secretion system-dependent pili of Rhizobium sp. strain NGR234. J. Bacteriol. 190: 750-754. Saier, M. H., Tam, R., Reizer, A., Reizer, J. 1994. Two novel families of bacterial membrane proteins concerned with nodulation, cell division and transport. Mol. Microbiol. 11: 841-847. Sanger, F., Nicklen, S., Coulson, A. R. 1977. DNA sequencing with chain- terminating inhibitors. Proc. Natl. Acad. Sci. USA 74: 5463-5467. Sarkar, A. & Reinhold-Hurek, B. 2014. Transcriptional profiling of nitrogen fixation and the role of NifA in the diazotrophic endophyte Azoarcus sp. strain BH72. PLoS ONE 9: e86527. Schlaman, H. R. M., Olsthoorn, M. M. A., Harteveld, M., Dörner, L., Djordjevic, M. A., Thomas-Oates, J., Spaink, H. P. 2006. The production of species-specific highly unsaturated fatty acyl-containing LCOs from Rhizobium leguminosarum bv. trifolii is stringently regulated by nodD and involves the nodRL genes. Mol. Plant-Microbe Interact. 19: 215-226. Schmeisser, C., Liesegang, H., Krysciak, D., Bakkou, N., Le Quéré, A., Wollherr, A., Heinemeyer, I., Morgenstern, B., Pommerening-Röser, A., Flores, M., Palacios, R., Brenner, S., Gottschalk, G., Schmitz, R. A., Broughton, W. J., Perret, X., Strittmatter, A. W., Streit, W. R. 2009. Rhizobium sp. strain NGR234 possesses a remarkable number of secretion systems. Appl. Env. Microbiol. 75: 4035-4045. Schultze, M., Staehelin, C., Brunner, F., Genetet, I., Legrand, M., Fritig, B., Kondorosi, É., Kondorosi, Á. 1998. Plant chitinase/lysozyme isoforms show distinct substrate specificity and cleavage site preference towards lipochitooligosaccharide nod signals. Plant J. 16: 571-580. Schultze, M., Staehelin, C., Röhrig, H., John, M., Schmidt, J., Kondorosi, E., Schell, J., Kondorosi, A. 1995. In vitro sulfotransferase activity of Rhizobium meliloti

57 References

NodH protein: Lipochitooligosaccharide nodulation signals are sulfated after synthesis of the core structure. Proc. Natl. Acad. Sci. USA 92: 2706-2709. Schwedock, J. S., Liu, C., Leyh, T. S., Long, S. R. 1994. Rhizobium meliloti NodP and NodQ form a multifunctional sulfate-activating complex requiring GTP for activity. J. Bacteriol. 176: 7055-7064. Shearman, C. A., Rossen, L., Johnston, A. W. B., Downie, J. A. 1986. The Rhizobium leguminosarum nodulation gene nodF encodes a polypeptide similar to acyl-carrier protein and is regulated by nodD plus a factor in pea root exudate. EMBO J. 5: 647-652. Shimoda, Y., Nagata, M., Suzuki, A., Abe, M., Sato, S., Kato, T., Tabata, S., Higashi, S., Uchiumi, T. 2005. Symbiotic rhizobium and nitric oxide induce gene expression of non-symbiotic hemoglobin in Lotus japonicus. Plant Cell Physiol. 46: 99-107. Skorupska, A., Janczarek, M., Marczak, M., Mazur, A., Król, J. 2006. Rhizobial exopolysaccharides: Genetic control and symbiotic functions. Microb. Cell. Fact. 5: 7. Snoeck, C., Luyten, E., Poinsot, V., Savagnac, A., Vanderleyden, J., Promé, J. 2001. Rhizobium sp. BR816 produces a complex mixture of known and novel lipochitooligosaccharide molecules. Mol. Plant-Microbe Interact. 14: 678-684. Sørensen, K. K., Simonsen, J. B., Maolanon, N. N., Stougaard, J., Jensen, K. J. 2014. Chemically synthesized 58-mer LysM domain binds lipochitin oligosaccharide. ChemBioChem 15: 2097-2105. Spaink, H. P., Wijfjes, A. H., Lugtenberg, B. J. 1995. Rhizobium NodI and NodJ proteins play a role in the efficiency of secretion of lipochitin oligosaccharides. J. Bacteriol. 177: 6276-6281. Spaink, H. P., Wijfjes, A. H. M., der van Drift, K. M. G. M., Haverkamp, J., Thomas-Oates, J. E., Lugtenberg, B. J. J. 1994. Structural identification of metabolites produced by the NodB and NodC proteins of Rhizobium leguminosarum. Mol. Microbiol. 13: 821-831. Sprent, J. I. 2007. Evolving ideas of legume evolution and diversity: A taxonomic perspective on the occurrence of nodulation. New Phytol. 174: 11-25. Stacey, G., Luka, S., Sanjuan, J., Banfalvi, Z., Nieuwkoop, A. J., Chun, J. Y., Forsberg, L. S., Carlson, R. 1994. nodZ, a unique host-specific nodulation gene, is involved in the fucosylation of the lipooligosaccharide nodulation signal of Bradyrhizobium japonicum. J. Bacteriol. 176: 620-633. Staehelin, C., Schultze, M., Kondorosi, É., Mellor, R. B., Boiler, T., Kondorosi, A. 1994. Structural modifications in Rhizobium meliloti Nod factors influence their stability against hydrolysis by root chitinases. Plant J. 5: 319-330. Stracke, S., Kistner, C., Yoshida, S., Mulder, L., Sato, S., Kaneko, T., Tabata, S., Sandal, N., Stougaard, J., Szczyglowski, K., Parniske, M. 2002. A plant

58

receptor-like kinase required for both bacterial and fungal symbiosis. Nature 417: 959-962. Sullivan, J. T., Brown, S. D., Ronson, C. W. 2013. The NifA-RpoN regulon of Mesorhizobium loti strain R7A and its symbiotic activation by a novel LacI/GalR-family regulator. PLoS ONE 8: e53762. Suominen, L., Jussila, M. M., Mäkeläinen, K., Romantschuk, M., Lindström, K. 2000. Evaluation of the Galega–Rhizobium galegae system for the bioremediation of oil-contaminated soil. Environ. Pollut. 107: 239-244. Suominen, L., Paulin, L., Saano, A., Saren, A., Tas, E., Lindström, K. 1999. Identification of nodulation promoter (nod-box) regions of Rhizobium galegae. FEMS Microbiol. Lett. 177: 217-223. Suominen, L., Roos, C., Lortet, G., Paulin, L., Lindström, K. 2001. Identification and structure of the Rhizobium galegae common nodulation genes: Evidence for horizontal gene transfer. Mol. Biol. Evol. 18: 907-916. Suominen, L., Luukkainen, R., Roos, C., Lindström, K. 2003. Activation of the nodA promoter by the nodD genes of Rhizobium galegae induced by synthetic flavonoids or Galega orientalis root exudate. FEMS Microbiol. Lett. 219: 225- 232. Surin, B. P. & Downie, J. A. 1988. Characterization of the Rhizobium leguminosarum genes nodLMN involved in efficient host-specific nodulation. Mol. Microbiol. 2: 173-183. Terefework, Z., Nick, G., Suomalainen, S., Paulin, L., Lindström, K. 1998. Phylogeny of Rhizobium galegae with respect to other rhizobia and agrobacteria. Int. J. Syst. Bacteriol. 48: 349-356. Tseng, T. T., Tyler, B. M., Setubal, J. C. 2009. Protein secretion systems in bacterial-host associations, and their description in the gene ontology. BMC Microbiol. 9: S2. Valouev, A., Ichikawa, J., Tonthat, T., Stuart, J., Ranade, S., Peckham, H., Zeng, K., Malek, J. A., Costa, G., McKernan, K., Sidow, A., Fire, A., Johnson, S. M. 2008. A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res. 18: 1051-1063. Van de Velde, W., Zehirov, G., Szatmari, A., Debreczeny, M., Ishihara, H., Kevei, Z., Farkas, A., Mikulass, K., Nagy, A., Tiricz, H., Satiat-Jeunemaître, B., Alunni, B., Bourge, M., Kucho, K., Abe, M., Kereszt, A., Maroti, G., Uchiumi, T., Kondorosi, E., Mergaert, P. 2010. Plant peptides govern terminal differentiation of bacteria in symbiosis. Science 327: 1122-1126. van Dijk, E. L., Auger, H., Jaszczyszyn, Y., Thermes, C. 2014. Ten years of next- generation sequencing technology. Trends. Genet. 30: 418-426. Varis, E. 1986. Goat's rue (Galega orientalis Lam.) a potential pasture legume for temperate conditions. J. Agr. Sci. Finland 58: 83-101.

59 References

Vavilov, N. I. 1926. Studies on the origin of cultivated plants. Institut de Botanique Appliquée et d’Amélioration des Plantes, Leningrad, State Press (in Russian and English) 3-248. Waelkens, F., Voets, T., Vlassak, K., Vanderleyden, J., van Rhijn, P. 1995. The nodS gene of Rhizobium tropici strain CIAT899 is necessary for nodulation on Phaseolus vulgaris and on Leucaena leucocephala. Mol. Plant-Microbe Interact. 8: 147-154. Wojciechowski, M. F., Lavin, M., Sanderson, M. J. 2004. A phylogeny of legumes (Leguminosae) based on analysis of the plastid matK gene resolves many well- supported subclades within the family. Am. J. Bot. 91: 1846-1862. Wu, H., Chung, P., Shih, H., Wen, S., Lai, E. 2008. Secretome analysis uncovers an hcp-family protein secreted via a type VI secretion system in Agrobacterium tumefaciens. J. Bacteriol. 190: 2841-2850. Yang, G., Debellé, F., Savagnac, A., Ferro, M., Schiltz, O., Maillet, F., Promé, D., Treilhou, M., Vialas, C., Lindstrom, K., Dénarié, J., Promé, J. 1999. Structure of the Mesorhizobium huakuii and Rhizobium galegae Nod factors: A cluster of phylogenetically related legumes are nodulated by rhizobia producing Nod factors with Įȕ-unsaturated N-acyl substitutions. Mol. Microbiol. 34: 227-237. Zhu, J., Oger, P. M., Schrammeijer, B., Hooykaas, P. J. J., Farrand, S. K., Winans, S. C. 2000. The bases of crown gall tumorigenesis. J. Bacteriol. 182: 3885-3895.

60