University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange

Doctoral Dissertations Graduate School

12-2020

Genetic and Epigenetic Control of Soybean Agronomic Traits

Daniel Niyikiza [email protected]

Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss

Part of the Plant Breeding and Genetics Commons

Recommended Citation Niyikiza, Daniel, "Genetic and Epigenetic Control of Soybean Agronomic Traits. " PhD diss., University of Tennessee, 2020. https://trace.tennessee.edu/utk_graddiss/6084

This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council:

I am submitting herewith a dissertation written by Daniel Niyikiza entitled "Genetic and Epigenetic Control of Soybean Agronomic Traits." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Doctor of Philosophy, with a major in Plants, Soils, and Insects.

Tarek Hewezi, Major Professor

We have read this dissertation and recommend its acceptance:

Vince R. Pantalone, Carl E. Sams, Tom Gill, Tessa Burch-Smith

Accepted for the Council:

Dixie L. Thompson

Vice Provost and Dean of the Graduate School

(Original signatures are on file with official studentecor r ds.) Genetic and Epigenetic Control of Soybean Agronomic Traits

A Dissertation Presented for the

Doctor of Philosophy

Degree

The University of Tennessee, Knoxville

Daniel Niyikiza

December 2020

Copyright © 2020 by Daniel Niyikiza

All rights reserved.

ii DEDICATION

To my wife Aline Josee U., our children Abba Shelby S. I. and Addy Davin N. N., without whom this dissertation would not have been completed.

iii ACKNOWLEDGEMENTS

My sincere gratitude and special appreciation go to Dr. Tarek Hewezi for having accepted me in his Lab and to serve as the major supervisor of my research project. His advice and guidance and incredible support were key determinants in the completion of this project. I would like to thank the co-supervisor of my research project, Dr. Vince Pantalone for his unequalled assistance throughout the course of the research project and dissertation. I will always remember that he initiated my first steps into the soybean breeding paths and technically accompanied me through this research journey. My heartfelt appreciation to the members of my committee, Dr. Tom Gill, Dr. Carl Sams, and Dr. Tessa Burch-Smith for enriching contribution to the quality of my dissertation. Your constructive advices were helpful. To Dr. Hewezi’s lab members: John Hollis Rice, Dr. Sarbottam Piya, Dr. Pratyush Routray, Dr. Aditi Rambani, Dr. Yin Fang, Dr. Valeria Lopes-Caitar, Dr. Long Miao, Morgan Bennet, and others I could not list here, and Dr. Pantalone’s lab members: Greyson Dickey, Rachel Fulton, Alison Willette, Dr. Chris Smallwood, Dr. Mia Cunicelli, Cris Wyman, Victoria Benelli, and many others not listed here, for your invaluable technical support. My sincere acknowledgement to Lisa A. Fritz and Susan T. Laired for their technical support for SCN screening assays. Much appreciation to The Borlaug for Higher Education and Agriculture Research and development (BHEARD) for granting me the scholarship and financial support to my project and The Tennessee Soybean Promotion Board (TSB) for financially co-supporting my research project. Special thanks, to my colleagues in Rwanda Agriculture Board, Aime Parfait G., Aimable I., Charles N., Francois S. and many others for their technical and logistics support. To Dr. Dave Ader and all my friends at UT for making my stay memorable And finally, my heartfelt gratitude to my friends, family members and colleagues for the encouragement along this resourceful experience.

iv ABSTRACT

Greater soybean productivity depends on the genetic improvement of yield components, epigenetic effects and the interactions with surrounding environment. We explored the possibility of soybean improvement by conventional biparental crossing of soybean lines, differing by seed quality, yield, and resistance to a range of pathogens, including soybean cyst nematode (SCN). We investigated the role of Resistance to Heterodera glycines (Rhg) 1 and 4 loci in soybean resistance to SCN, race 2 and 3. Further, we evaluated the adaptability of temperate-origin soybean lines in tropical environments of Rwanda. Lastly, we examined , RNA splicing, DNA methylation and their interactions in three distinct developmental stages of soybean nodules. Briefly, compared to the parental lines, the recombinant inbred lines (RILs) generated from the biparental cross, represented varying phenotypes for seed yield, and oil contents, with high broad-sense heritability scores. This suggests a possibility of selection of best individuals with potentially high genetic gain for each trait. Analysis of gene expression, nucleotide sequences, and copy number variation of Rhg1 and Rhg4 in a set of RILs revealed that resistance to race 2 is mediated independently of Rhg1 and Rhg4. Importantly, a QTL on chromosome 17, associated with resistance to SCN race 2 was identified, a finding that provides the foundation for cloning the underlying SCN resistance gene. Our data suggested a possibility to stack favorable SCN resistance alleles in high yielding cultivars as the yield of RILs harboring SCN resistance alleles to race 2, compared to the susceptible RILs, was not negatively impacted. Some US-developed soybean lines could adapt and double the current local yield potential. However, our data revealed a significant GXE interaction, implying their fitness in specific micro-climates of Rwanda. Amino acid profile and consequently seed storage protein can be improved through the manipulation of soybean nodulation. Our results revealed dynamic changes in gene expression, alternative splicing events and extensive DNA methylation reprogramming in the developing nodules. The results also revealed novel insights to the associations between the transcriptome, spliceome, and methylome of the developing soybean nodules and improved our understanding of the genetic and epigenetic mechanisms controlling soybean nodulation.

v TABLE OF CONTENTS

Chapter 1: General introduction and literature review ...... 1 1. Introduction ...... 2 1.1 The Implications of genetics and in Soybean adaptation, agronomic and quality traits ...... 2 1.2 The genetic basis of resistance and the importance of epigenetics in SCN control .... 10 1.3 Soybean nodulation: a biological process for seed yield and quality improvement ... 15 1.4 Soybean production in Rwanda: Challenges and opportunities ...... 19 1.5 Organization of the Dissertation...... 21 References ...... 22 Chapter 2: Performance evaluation of TN09-029 x NCC05-1168 recombinant inbred lines (RIL) population of Soybean Glycines max (L) Merr...... 41 Abstract ...... 42 1. Introduction ...... 43 2. Results ...... 44 2.1 Agronomic traits ...... 44 2.2 Seed quality traits ...... 45 3. Discussion...... 47 3.1 The agronomic performance improved ...... 47 3.2 Protein and oil content relatively increased ...... 48 4. Conclusion ...... 50 5. Materials and Methods ...... 50 5.1 Plant material ...... 50 5.2 Field experiments ...... 51 5.3 Agronomic traits ...... 51 5.4 Protein, Oil and Amino Acid Content ...... 51 5.5 Fatty Acid content ...... 51 5.6 Meal Protein ...... 52 5.7 Heritability estimates ...... 52 5.8 Statistical analysis ...... 52 References ...... 53 Appendix ...... 59

vi Chapter 3: Investigation of Rhg1 and Rhg4 loci for selected TN09-029 x NCC05-1168 recombinant inbred lines with different responses to Soybean Cyst Nematode ...... 66 Abstract ...... 67 1. Introduction ...... 68 2. Results ...... 71 2.1 Greenhouse screening assays for resistance to Hg type 0 (race 3) and Hg type 1.2.5.7 (race 2)...... 71 2.2 SCN race 2 response confirmation in the greenhouse ...... 72 2.3 Rhg1 and Rhg4 amplification and sequencing ...... 72 2.4 Copy Number Variation (CNV) at Rhg 1 and 4...... 72 2.5 Rhg1 &4 transcript quantification by real time RTPCR ...... 73 2.6 Marker assisted selection (MAS) ...... 73 2.6.1 SSR markers ...... 73 2.6.2 SNP marker analysis...... 74 3. Discussion...... 74 3.1 The RIL resistance to SCN, race 2 is mediated independently of Rhg1 and 4 ...... 74 3.2 Favorable SCN resistance alleles could be exploited while maintaining high yield potential ...... 79 4. Conclusion ...... 79 5. Material and Methods ...... 80 5.1 Greenhouse screening assays for resistance to Hg type 0 (race 3) and Hg type 1.2.5.7 (race 2)...... 80 5.2 SCN response confirmation in the green house ...... 81 5.3 Rhg1 and Rhg4 PCR amplification and sequencing ...... 81 5.4 Copy Number Variation (CNV) at Rhg1 and 4 ...... 82 5.5 qPCR quantification of the expression levels of Rhg1 and 4 ...... 82 5.5.1 Nematode inoculation ...... 82 5.5.2 Tissue collection and total RNA extraction and qPCR assays ...... 83 5.6 Marker assisted selection (MAS) ...... 83 5.6.1 SSR markers ...... 83 5.6.2 SNP markers...... 84 References ...... 85 Appendix ...... 92 Chapter 4: Adaptability evaluation of US-developed Soybean Recombinant Inbred Lines in Rwandan conditions...... 115 Abstract ...... 116

vii 1. Introduction ...... 117 2. Results ...... 119 2.1 Rainfall distribution at Muyumbu Research Station (MRS) (2019B) and Rubona Research Station (RRS) (2020A) ...... 119 2.2 Seasonal and locational agronomic performance ...... 119 2.3 Across seasons and environments performance...... 120 3. Discussion...... 120 3.1 The yield potential could be doubled ...... 121 3.2 The genotype x environment (GXE) significantly affect the overall performance ... 123 4. Conclusion ...... 124 5. Material and Methods ...... 124 5.1 Plant material ...... 124 5.2 Field experiments ...... 125 5.3 Agronomic traits measurement ...... 125 References ...... 126 Appendix ...... 131 Chapter 5: Interactions of gene expression, alternative splicing, and DNA methylation in determining nodule identity ...... 140 Abstract ...... 142 1. Introduction ...... 143 2. Results ...... 147 2.1 Anatomical features characteristic of three distinct developmental stages of soybean nodules ...... 147 2.2 Half of soybean genes change expression during nodulation ...... 148 2.3 9,669 core genes determine nodule transcriptome identity ...... 149 2.4 Promoter:GUS activity of core and stage -specific genes confirms RNA-seq findings ...... 150 2.5 Each nodule developmental stage preferentially regulates various metabolic pathways ...... 151 2.6 Nodule developmental stages acquired specific signal transduction components that contribute to nodule transcriptome specificity ...... 152 2.7. Small secretory peptides contribute to nodule transcriptome specificity ...... 152 2.8 Nodule development involves highly elaborated transporter activities ...... 153 2.9 Active mRNA splicing occurs during the early stage of nodule development...... 153

viii 2.10 Splicing variants of certain genes were differentially regulated during nodule development ...... 155 2.11 Transcriptional activity, gene size, exon number, and intron length impact alternative splicing in the developing nodules ...... 156 2.12 Alternative splicing impacts protein domains organization ...... 157 2.13 Alternative splicing impacts the targeting efficiency of miRNAs ...... 158 2.14 Nodule-specific alternatively spliced genes ...... 159 2.15 Hallmarks of nodule methylomes ...... 160 2.16 Impact of DNA methylation on nodule transcriptomes ...... 162 3. Discussion...... 163 4. Material and methods ...... 172 4.1 Bacterial strain and growth conditions ...... 172 4.2 Plant growth conditions and tissue collection ...... 172 4.3 Light and electron microscopy ...... 173 4.4 Acetylene reduction assay ...... 173 4.5 RNA-seq library preparation and analysis...... 173 4.6 MethylC- Seq library preparation and analysis ...... 174 4.7 Quantitative real-time RT-PCR analysis ...... 175 4.8 MicroRNA target prediction ...... 176 4.9 Identification of protein domains ...... 176 4.10 GO term and KEGG analysis ...... 176 4.11 Promoter GUS experiment...... 176 References ...... 178 Appendix ...... 188 Chapter 6: Conclusions ...... 211 Vita ...... 216

ix LIST OF TABLES Table 2-1: Significant differences among genotypes (G), environments (E) and, genotypes and environments (GXE) interaction (P=0.05), for plant height, lodging, maturity date, yield, oil, protein and meal protein...... 59 Table 2-2: Coefficient of correlation between seven traits from the RIL population grown over five different environments across Tennessee...... 60 Table 2-3: RIL population development from inception of cross between TN09-029 and NCC05-1168 in 2012 to the summer 2016 field trial...... 61 Table 3-1: SCN-race 3 reaction of the RILs, Parental lines and Indicator Lines based on FI...... 92 Table 3-2: SCN-race2 reaction of the RILs, Parental lines and Indicator Lines based on FI...... 96 Table 3-3: Single Nucleotides Polymorphism in RHG1 coding sequence of selected RILs in comparison to the susceptible (William 82) and resistant (Forrest) alleles...... 100 Table 3-4: Single Nucleotides Polymorphism in RHG4 coding sequence of selected RILs in comparison to the susceptible (William 82) and resistant (Forrest) alleles...... 101 Table 3-5: Single Nucleotides Polymorphism in Rhg1 promoter sequence of selected RILs in comparison to the susceptible (William 82) and resistant (Forrest) alleles...... 102 Table 3-6: Single Nucleotides Polymorphism in Rhg4 promoter sequence of selected RILs in comparison to the susceptible (W82) and resistant (Forrest) alleles ...... 103 Table 3-7: Screening of selected RILs with SSR markers...... 104 Table 3-8: Primer sequences used in the study ...... 106 Table 4-1: Crop seasons in Rwanda...... 131 Table 4-2: Significant differences among genotypes (G), (P=0.05), for yield, plant height, maturity date, lodging at MRS during season B 2019...... 132 Table 4-3: Pearson correlation between traits at MRS during season B2019 (below the diagonal) and at RRS during season A2020 (above the diagonal)...... 133 Table 4-4: Significant differences among genotypes (G), environments (E) and, genotypes and environments (GXE) interaction (P=0.05), for yield, plant height, maturity and lodging at MRS and RRS during seasons B 2019 and A2020...... 134 Table 5-1: List of primers used in this study ...... 188

x LIST OF FIGURES Figure 2-1: Frequency distribution for seed yield, n=114...... 62 Figure 2-2. Frequency distribution for seed oil content, n=114...... 63 Figure 2-3. Frequency distribution for seed protein content, n=114...... 64 Figure 2-4. Frequency distribution for meal protein, n=114...... 65 Figure 3-1: Comparison of mean seed yield between SCN resistant and SCN susceptible Line...... 107 Figure 3-2: SCN cyst counts on selected representative lines (n=8)...... 108 Figure 3-3: Copy number variation of Rhg1 genomic intervals based on CGH data...... 109 Figure 3-4: Copy number variation of Rhg4 genomic intervals based on CGH data...... 110 Figure 3-5: Relative expression levels of GmSNAP18 from different RILs at 3, 5, 7 and 9 dpi...... 111 Figure 3-6: Relative expression levels of GmSHMT18 from different RILs at 3, 5,7 and 9 dpi...... 112 Figure 3-7: Relative expression levels of GmSNAP08 from SCN-72 and SCN-137 at 3dpi...... 113 Figure 3-8: Relative expression levels of GmSHMT18 from different SCN-072 and SCN- 137 at 3 dpi...... 114 Figure 4-1: Daily rainfall at RRS during season B 2019...... 135 Figure 4-2: Cumulative rainfall during season A 2020...... 136 Figure 4-3: Thirty-three (33) top yielders at MRS during season B 2019...... 137 Figure 4-4: Frequency distribution for seed yield. Checks are shown on the top of their yield class...... 138 Figure 4-5: Fifteen (15) top yielders at RRS during season B 2019...... 139 Figure 4-6: Ten (10) top yielders at RRS during season B 2020...... 139 Figure 5-1: Histological analysis of soybean nodules...... 190 Figure 5-2: Transcriptional landscape of developing soybean nodules...... 191 Figure 5-3: KEGG pathway enrichment analysis of the 9,669 nodule core genes...... 192 Figure 5-4: GO term enrichment analysis of the nodule core genes showing gradual increases or decreases in the expression...... 193 Figure 5-5: GO term enrichment analysis of the 9,669 nodule core genes...... 194

xi Figure 5-6: Histochemical localization of GUS activity driven by the promoters of core and stage -specific genes in the developing nodules...... 195 Figure 5-7: KEGG pathway analysis of the DEGs unique to the 12, 22, 36-d-old nodules.196 Figure 5-8: Fold-change plot of differentially expressed transcription factors specific to the 12-, 22-, and 36-d-old nodules...... 197 Figure 5-9: Heat map demonstrating the differential expression pattern of genes encoding small secretory peptides in the 12, 22, 36-d-old nodules...... 198 Figure 5-10: Fold-change plot of differentially expressed genes with transport-related functions in the developing nodules...... 199 Figure 5-11: Alternative RNA splicing landscape of developing soybean nodules...... 200 Figure 5-12: GO term enrichment analysis of 2,323 differentially spliced genes...... 201 Figure 5-13: KEGG pathway analysis of 2,323 differentially spliced genes...... 202 Figure 5-14: Impacts of gene expression, gene size, exon number, and intron length on alternative splicing in the developing nodules...... 203 Figure 5-15: Impact of alternative splicing on protein domains organization and miRNAs targeting efficiency...... 204 Figure 5-16: qPCR confirmation of differential usage of DNA binding domains-containing exons of three transcription factors...... 205 Figure 5-17: Heatmap highlighting differential gene expression of key genes of DNA methylation pathway...... 206 Figure 5-18: Methylome landscape of developing soybean nodules...... 207 Figure 5-19: Differential DNA methylation occurred preferentially in TEs adjacent to genes...... 208 Figure 5-20: Impacts of DNA methylation on gene expression levels...... 209 Figure 5-21: GO term enrichment analysis of the 1864 differentially expressed/differentially methylated genes...... 210

xii

Chapter 1: General introduction and literature review

1

1. Introduction

1.1 The Implications of genetics and epigenetics in Soybean adaptation, agronomic and quality traits

Soybean [Glycine max (L.) Merrill] is a prime leguminous crop, grown for its use as a staple food for human consumption and a dietary protein source for animal feed (Hartman et al., 2011) and feedstock for biofuels (McFarlane 2013; Kim et al., 2018). Soybean is also preferred for its role in the soil recycling process through the biological nitrogen fixation (BNF) (Ferguson, 2013). The crop is widely believed to have been domesticated from China 9000 years ago (Carter et al., 2004), but recent studies with the integration of new molecular biotechnologies revealed that soybean domestication is a complex process with potential multiple geographical centers of origin (Kim et al., 2011; Sedivy et al., 2017). It is alleged that soybean domestication in China was followed by the expansion of its cultivation worldwide from China to Korea, through Japan and South and Southeast Asia to finally North America in 1765 (Wilson, 2008; Hymowitz and Harlan, 1983). The first domestication efforts were later on followed by both artificial and natural selection, which created a considerable number of land races around the globe (Carter et al., 2004). These factors together with the autogamous nature of soybean and its strict geographical adaptation to specific latitudes and longitudes may have led to the genetical and morphological diversity among created landraces (Li and Nelson, 2001; Nelson, 2011; Liu et al., 2017). Soybean adaptation is believed to be mainly conditioned by photoperiod (Bandillo et al., 2015) and temperature (Liu et al., 2017). However, according to Cao et al. (2017), in addition to previous factors, soybean adaptation is determined by the coupled allelic actions of the ‘E’ genes that regulate flowering time and maturity (Cao et al., 2017) and the long juvenile genes ‘J’ that delay flowering (Lu et al., 2017). Recently, it was also revealed that epigenetic marks especially DNA methylation could have been an important factor in soybean domestication process. In fact, the methylome analyses across diverse soybean accessions reported that more than 75% of differentially methylated regions (DMRs) were explained by factors other than genetic variations (Shen et al., 2018).

2

Soybean is naturally a short-day crop, critically sensitive to day length and for which yield in a given latitudinal zone is strictly dependent on appropriate photoperiods (Destro et al., 2001; Watanabe et al., 2011). In fact, a 12-hours or less daylength is said to trigger soybean flowering, and thus making difficult its adaptation in laltitudes below 20oC and negatively affecting yields due to the resultant short vegetative status (Hartwig and Kiihl, 1979; Sinclair and Hinson, 1992). Therefore, for optimal growth and yield potential achievement, soybean should be grown in areas with specific daylength, which is mostly a function of geographical latitudes (Langewisch et al., 2017). The involvement of the “E” and “J” genes in the control of soybean flowering time and maturity, in addition to environmental cues such as photoperiod and temperature, was later confirmed. A total of ten (10) maturity E plus one (1) J (long juvenile) loci have been identified so far. Initially E1, E2 (Bernard, 1971) and E3 (Buzzell, 1971), then E4 (Buzzell and Voldeng, 1980) and E5 (Mcblain and Bernard, 1987) were reported. Later on, other genes including E6 (Bonato and Vello, 1999), E7 (Cober and Voldeng, 2001), E8 (Cober et al., 2010), E9 (Kong et al., 2014) and E10 (Samanfar et al., 2017) were identified. In the meantime, Ray et al. (1995) had discovered the J gene. Among these, only E6 and E9 exhibited a dominant action in conferring early flowering phenotype, whereas others acted in a recessive manner (Bonato and Vello, 1999; Kong et al., 2014). Of these 10 E loci, six of them and the J locus have been cloned and molecularly characterized, E1 (Xia et al., 2012), E2 (Watanabe et al., 2011), E3 (Watanabe et al., 2009), E4 (Liu et al., 2008), E9 (Sun et al., 2011; Zhao et al., 2016) and J (Yue et al., 2017; Lu et al., 2017). Miranda and Bilyeu (2015) reported a yet to ascertain epistatic interaction between the J gene and some of the E genes to determine the flowering time and maturity date. Nevertheless, Lu et al. (2017) reported a genetic dependency of J gene to the flower repressor E1. The genetic manipulation of E1-E4 genes is believed to have the biggest role in defining the American soybean maturity groups in temperate latitudes (Miranda and Bilyeu, 2015). These E1, E2, E3 and E4 genes have been functionally characterized and reported to encode for soybean-specific putative transcription factors (Xia et al., 2012), an ortholog of GIGANTEA, a protein that is involved in the circadian rhythm regulation in Arabidopsis (Fowler et al., 1999; Watanabe et al., 2011), phytochromes GmPhyA3 (Watanabe et al., 2009) and GmPhyA2 (Watanabe et al., 2011), respectively. In summary, Miranda et al. (2020) reported major and minor effect alleles that when combined can control flowering and maturity in soybean, as well as plant height in typical

3

tropical environments. According to them, E1, E2, and E3 allelic combinations, the long juvenile trait (J), and Dt1 gene are the main genetic factors that influence plant development and adaptation, which are key traits for soybean yield improvement in tropical environments. In the same line, Jia et al. (2014) suggested that current soybean breeding focus should target short post-flowering reproductive growth by genetically manipulating E1, E2, E3, and E4 in very low MGs of the high-latitudes in cold regions. The J locus was reported to encode for the ortholog of Arabidopsis thaliana EARLY FLOWERING 3 (ELF3) (Jia et al., 2014). The flowering time could be defined not only by genetic factors, but also by epigenetic variations through segregating DNA methylation. In fact, in Arabidopsis thaliana, some of the DMRs could serve as epigenetic Quantitative Trait Loci (QTLepi) explaining 60-90% of the variation for flowering time (Cortijo et al., 2014). Phenotypically, the allelic variations of the E genes contribute to the definition of the maturity group classification system (Langewisch et al., 2017). The maturity group (MG) classification system has been devised to characterize varieties’ ecological and adaptable areas. This classification allowed soybean breeders to group soybean into 13 MGs all over the world based on latitudinal adaptations (Liu et al., 2017). However, in addition to these main MGs, more sub-groups are being progressively identified within individual MGs (Gai et al., 2001). With this classification, earlier varieties are assigned lower MG and late maturing varieties, the higher MG (Langewisch et al., 2017; Liu et al., 2017). It is worth mentioning that the MG classification was initially designed to fit the American soybean production systems but not the sub-Saharan-Africa production systems (Miranda and Bilyeu, 2015). Another important component of soybean adaptation is plant height. Soybean plant height is reported to be under the joint control of maturity and architecture genes (Bernard, 1972). The soybean architecture consists of two main types of phenotypes; the determinate and the indeterminate, based on stem termination habit, both controlled by Dt genes (Liu et al., 2010). These genes were discovered in the 70’s by Bernard, (1972) and who named them Dt1 and Dt2. The author found that their allelic combinations and epistatic interactions were producing three types of phenotypes: the determinate, the indeterminate, and the semi-determinate. While the determinate type is characterized by main stem growth termination immediately or few days after flowering, the indeterminate type continue its main stem growth several days after flowering (Miranda et al., 2020). The semi-determinate is the middle phenotype resulting from

4

the monogenic segregation pattern of Dt1 and Dt2 loci (Bernard, 1972). The dominant Dt1 maintains the indeterminate growth habit, while its homozygous recessive (dt1dt1) genotype produces fully determinate phenotype. On the other hand, in Dt1/Dt1 background, Dt2/Dt2 produces semi-determinate plants, and dt2/dt2 have an indeterminate phenotype. A dt1/dt1 genotype exhibits a determinate stem growth termination habit no matter of alleles at Dt2 locus (Bernard, 1972; Tian et al., 2010). While Dt1 is incompletely dominant over dt1, Dt2 exhibits complete dominance over dt2 (Lockhart, 2014). Molecular and functional characterization of Dt1 gene (Liu et al., 2010; Tian et al., 2010) and Dt2 (Lockhart, 2014) was undertaken. The Dt1 (GmTfl1) is a homolog of the Arabidopsis TFL1 (terminal flower1), a gene coding for a signaling protein which plays a regulatory role in shoot apical meristems (SAM) (Tian et al., 2010). The Dt2 gene was reported to a be a gain of function MADS-Domain transcription factor gene, which might partially repress the Dt1 gene in SAM, but the mode of action remains elusive (Ping et al., 2014). However, it was reported that Dt2 directly binds to the Dt1 promoter and represses its transcription (Liu et al., 2016). Dt2 was recently reported to regulate a number of genes associated with time to flowering, density of stomata, water use efficiency, and plant response to stress (Zhang et al., 2019). A third recessive allele, dt1-t, is reported to produce a middle phenotype between the determinate and the semi-determinate (Thompson et al., 1997), but its functional role in this phenotype remians unknown. Soybean major flowering and maturity genes and QTL have been previously reported to have a pleiotropic effects on other agronomic traits such as plant height and node number as well as yield ( Cober and Morrison, 2010; Chapman et al., 2003; Zhang et al., 2015). Plant height and number of nodes are important traits of the soybean plant architecture, considering their significant correlation with seed yield (Chang et al., 2018). High correlations were also reported between plant height and number of nodes themselves (Chang et al., 2018). Though plant height was reported to be controlled by many quantitative trait loci (QTLs) in soybeans (Cao et al., 2017; Yao et al., 2015; Fang et al., 2020), some QTLs and major genes including Dt1, E1 and E2 were found to be tightly associated with plant height and number of nodes (Kato et al., 2015; Miranda et al., 2020; Zhang et al., 2015; Liu et al., 2011). These genes and many others controlling plant architecture traits such as plant height, number of internodes, and seed weight and size represent bona fide targets for genetic manipulation by soybean

5

breeders in order to reach a green revolution in soybean as has been done in other crops (Liu et al., 2020). Plant lodging is also a critical consideration in soybean production systems. Lodging could reduce seed yield by 12–18%, 18–32%, and 13–15% when occurring at R3, R5, and R6, respectively (Woods and Swearingin, 1977). Several QTLs involved in soybean resistance to lodging have been reported (Hwang and Lee, 2019; Yamaguchi et al., 2014; Lee et al., 1996). However, Hwang and Lee (2019), by combining QTLs and meta-QTL data across multiple populations, identified two most important QTLs on chromosome 6 and 19 associated with lodging. It was suggested that in order to achieve optimum yields, soybean breeders should focus on developing taller plants with sturdy stems and shorter internodes in order to resist lodging (Liu et al., 2020). Soybean yield remains the ultimate goal of most breeding programs. Soybean seed yield can be estimated by a number of parameters, including number of plants per acre, number of pods per plant, number of seed per pod, seed size (number of seeds per pound), and seed weight per unit area (Lee and Herbek, 2005; Liu et al., 2010). Seed weight is a complex and multigenic trait that was reported to significantly correlate with seed yield (Liu et al., 2017; Li et al., 2020). Additionally, seed size was found to be quantitative and tightly associated with seed weight (Johnson et al., 2001). Several QTLs associated with soybean seed yield were previously identified (Guzman et al., 2007; Kabelka et al., 2004), and a list of QTLs associated with different yield components is available in Soybase database (Grant et al., 2010). However, genes underlying these QTLs and their functional roles remain largely elusive. Nevertheless, some genes controlling seed number and seed size were cloned and functionally characterized (Jeong et al., 2013; 2011; Fang et al., 2013). The Ln gene is an orthologue of the Arabidopsis JAGGED (JAG) and codes for C2HC zinc finger-type transcription factor. It regulates seed number and is tightly linked to the genes controlling the soybean narrow leaflet trait (Fang et al., 2013). In a separate study, Lu et al. (2017), discovered a Type 2C protein phosphatases (PP2C) gene, which through the association with the Gm ZBR1 transcription factor might regulate seed weight and size. GmZBR1 is also an orthologue of Arabidopsis BZR1, a key transcription factor of the brassinosteroids pathway. Thus, identification and functional characterization of genes controlling individual or a group of yield component traits may provide a strong genetic base for improving soybean yield. For instance, Diers et al. (2018), through a large scale nested

6

association mapping of 5,600 inbred lines, identified a significant number of SNPs associated with soybean agronomic traits including yield, maturity, plant height, plant lodging, and seed mass. The potential impact of epigenetics on soybean yields was also highlighted. In fact, inherited DMRs were associated with soybean higher and stable yields (Raju et al., 2018). Altogether, the above findings provide us with an insight to the importance of narrowing down the genetic and epigenetic components underlying different soybean architectural and yield traits. Soybean seed quality traits consist of mainly protein and protein subunits content, oil content and fatty acid composition as well as isoflavone content (Ma et al., 2015). In fact, soybean seeds are reported to contain 20% of oil, 40% of protein, 35% of carbohydrates, and 5% of ash, making it the first and the second crop in protein and oil, respectively (Liu, 1999). Soybean oil is composed of 5 main fatty acids: 10% palmitic acid (16:0), 4% stearic acid (18:0), 24% oleic acid (18:1), 54% linoleic acid (18:2), and 10% linolenic acid (18:3) of total lipids (Wilson, 2004). The first two are saturated, while the last 3 are unsaturated fatty acids. The uses of soybean oil range from food (vegetable oil, margarine, and mayonnaise) to industrial (candles, plastic, and biodiesels fuels). The efforts to improve soybean oil quality and quantity started a long time ago aiming at manipulating its fatty acid composition through the decrease of palmitic and linolenic acids, and the increase of stearic and oleic acids (Rajcan, 2005; Clemente and Cahoon, 2009). This manipulation intended to achieve different industrial and health related objectives (Fehr, 2007). However, there have been reports of conflicting correlations between relative oil concentration and other traits, including seed yield and protein concentration, making it a relatively difficult to achieve breeding objectives (Morrison et al., 2008; Lee et al., 2007; Clemente and Cahoon, 2009). Nevertheless, soybean breeders were able to increase oil content, while maintaining acceptable levels of protein and seed yield (Eskandari et al., 2013). Soybean fatty acid improvement focused mainly on improving its oxidative stability in order to avoid trans-fats generated through hydrogenation and to enhance -3 fatty acid content of the oil (Clemente and Cahoon, 2009). The development of lines with altered fatty acid profile has been achieved and is still ongoing through conventional breeding and genetic engineering (Fehr, 2007; Rajcan, 2005; Drexler et al., 2003). A list of identified alleles for altered fatty acid profiles was reviewed (Rajcan, 2005; Fehr 2007). Later on, several studies were initiated to identify individual genes underlying fatty acid

7

profile alteration. Main genes involved in fatty acid profile alteration in different oilseed crops have been identified and some were exploited to modify fatty acid profiles in soybean elite lines (Drexler et al., 2003; Fehr, 2007). For instance, the latest breakthrough was realized by the development of soybean lines combining low linolenic and high oleic content. Initially, low linolenic levels were achieved by manipulation of the microsomal omega-3 fatty-acid desaturase (FAD3) genes (Bilyeu et al., 2003; Bilyeu et al., 2005; Anai et al., 2005; Byrum et al., 1997). The microsomal omega-6 fatty-acid desaturase enzyme alternatively named delta-12 fatty acid desaturase 2 enzyme (FAD2), is known to catalyze the conversion of oleic acid to linolenic acid in developing seeds (Schlueter et al., 2007). Therefore, the gene encoding this enzyme was manipulated to alter the oleic acid content in soybean through X-ray irradiation. Interestingly, the GmFAD2-1a encoding a disrupted microsomal omega-6 fatty-acid desaturase (FAD2) was found to be responsible for high levels of oleic acid (Anai et al., 2008). In separate studies, among FAD2 genes and the FAD2-1 desaturases, high expression levels of FAD2- 1A (Glyma.10g278000) and FAD2-1B (Glyma.20g111000), were recorded in developing seeds (Tang et al., 2005). In fact, one of the PIs with the FAD2- 1A allele could produce up to 85% oleic content (Pham et al., 2011). In addition, high oleic levels were achieved by combined allelic actions of FAD2-1A and FAD2-1B (Pham et al., 2010; Darr et al., 2020). The challenge is the ability to develop a soybean line combining high oleic and low linolenic line for health and oxidative stability purposes. To that end, the introgression of FAD3 mutant genes into high oleic acid background produced the lowest and highest levels of linolenic and oleic acid, respectively (Pham et al., 2012). Soybean protein improvement focused on increasing the meal protein in quantity and quality. Protein content in most of current commercial soybean lines ranges between 38 and 42% of dry weight (Patil et al., 2017), yet the minimum market requirement is set at, at least, 41.5% in order to produce a minimum protein meal of 47.5% (Patil et al., 2017; Smallwood et al., 2018). However, considering the historical negative correlation between protein content and yield (Patil et al., 2017; Bandillo et al., 2015), soybean breeders seek to develop high yielding cultivars with high protein content to meet the market standards and maintain acceptable levels of oil content. Indeed, soybean breeders could exploit the genetic potential of some lines and develop elite lines called “High Protein” lines with above 47.5% protein meal, the market minimum requirement

8

(Pantalone et al., 2020; Pantalone and Smallwood, 2018; Panthee and Pantalone, 2006; Smallwood et al., 2018). Seed protein content and composition are complex quantitative traits that are highly influenced by the environment and the genotype-environment interaction (Chaudhary et al., 2015). Several major QTLs (Patil et al., 2017) and meta QTLs (Zhao-Ming et al., 2011) associated with protein content have been reported on different chromosomes mainly on chromosome 15 and 20 (Patil et al., 2017). Similarly, in a separate study using a combination of QTL markers, Diversity arrays technology and next generation sequencing analysis, two major QTLs and important SNPs for protein content were reported (Samanfar et al., 2019). In another study, 13 candidate genes from QTL hotspots that may control seed protein content were reported (Huang et al., 2019). These genes are involved in protein synthesis and were highly expressed in mature seed, suggesting a role in protein accumulation (Huang et al., 2019). Nine of the 20 canonical protein amino acid (AA) namely phenylalanine, valine, threonine, tryptophan, methionine, leucine, isoleucine, lysine and histidine cannot be de novo synthetized by the human body and livestock (Young, 1994). These must be supplied via daily diets and are called essential amino acids (EAAs) (Galili et al., 2005; Beauregard and Hefford, 2006). Soybean is one of the major sources of protein for human and animal feed. However, the quality of soybean protein needs to be supplemented with some EAAs. The main lacking or less abundant AAs are the sulfur containing AA such as methionine and cysteine (Dinkins et al., 2001). Other limiting EAAs are those synthesized through the aspartate pathway such as threonine, lysine, and isoleucine (Azevedo et al., 1997; Qi et al., 2011). Soybean protein quality can be improved through a direct manipulation of individual AAs profile or increasing the ratio of glycinin (11S) to β-conglycinin (7S) components (Panthee et al., 2004; Rajcan, 2005). In fact, these two are the major subunits of soybean storage protein (70%) (Thanh and Shibasaki, 1978). The glycinin can contain up to 3 and 4.5% of methionine and cysteine, whereas β-conglycinin contain only 1% (Harada et al., 1989). Major QTLs for soybean storage protein were recently mapped on chromosome 3 (11S A1), chromosome 10 (7S α′ and 11S A4), and chromosome 13 (11S A3) and could help in molecular marker assisted selection (MAS) for protein quality (Boehm et al., 2018). Individual AA manipulation has been also investigated to improve soybean AA profile mainly through genetic engineering. The most successful cases targeted enzymes involved in the biosynthesis pathway.

9

For instance, by overexpressing the cytosolic isoform of O-acetylserine sulfhydrylase (OASS), a key sulfur assimilatory enzyme, cysteine content of soybean seed could be increased by 58 to 74% in protein-bound cysteine and 22 to 32% increase in the free cysteine (Kim et al., 2012). In another study, the identification, by site directed mutagenesis, of two regulatory residues Glu- 257 and Thr-359 that are involved in lysine inhibition permitted to generate feedback resistant alleles. Seed specific expression of these resistant alleles in soybean seed resulted in up to 100- fold increase in threonine content and increase of other free AA, which boosted total free AA content by 3.5-fold increase (Qi et al., 2011). There is one notable example, TN04-5321, in the U.S. Soybean Germplasm Collection of a line bred for enhanced cysteine (Panthee and Pantalone, 2006) Altogether, it is clear that soybean agronomic and seed quality traits can be manipulated to improve the overall yield by conventional breeding methods as well as genetic engineering approaches. The role played by epigenetics on soybean agronomic performance is also becoming more evident even though the large part of it remains unknown. However, soybean can also be subject to several biotic and abiotic stress factors that can significantly affect the overall performance and reduce its yield components, including seed yield. Such stress factors include the soybean cyst nematode.

1.2 The genetic basis of resistance and the importance of epigenetics in SCN control

The Soybean Cyst Nematode (SCN, Heterodera glycines) reduces soybean production in many regions of the world and was reported to be the first pathogen to cause major soybean economic losses in the United States (Niblack et al., 2006). The pathogen has been reported in all major growing areas including North America (2 countries), South America (16 countries), Asia (9 countries), Europe (3 countries) and Africa (1 country) (https://www.cabi.org/isc/datasheet/27027#todistribution consulted on August 4, 2020). SCN belongs to a group of sedentary, obligate plant-parasitic nematodes, with 4 juvenile stages and an adult stage life cycle. The first stage juvenile (J1) forms inside the egg, molts into second stage juvenile (J2) before hatching from the egg. J2 develops into third (J3) and fourth (J4) stage juveniles, which molts to an adult nematode (Bohlmann, 2015). J3 is marked by sex dimorphism and morphology differentiation between the females and the males. The adult vermiform male moves out of the plant tissues to look for females to fertilize, while the females remain attached

10

to the plant tissue producing eggs (Niblack et al., 2006). The duration of the life cycle is highly depending on the environmental cues such as soil temperatures (Bohlmann, 2015). Under favorable conditions, J2 hatch from the egg and penetrate the soybean root tissue. They are said to receive signals that attract them to plant roots. Upon reaching the soybean root tips they migrate intracellularly towards plant vascular tissues where after several cell wall probing with stylets, and cell wall degrading enzymes secretions, they select a final target cell and establish the initial feeding site (Hewezi and Baum, 2013; Hewezi, 2015). Soybean Cyst Nematode infection process has been divided in two parasitic phases. The first phase occurs in both resistant and susceptible hosts and consists of feeding cell initiation, cell wall degradation accompanied with the enlargement of plasmodesmata and nuclei. The second phase depends on the outcome of the first and involves transformation of the initial feeding site into a functional and permanent feeding cell called “syncytium’’ in the case of the compatible interaction (susceptible). In the case of incompatible interaction, the initiated feeding sites are degenerated, and nematode development is arrested (reviewed in Whitham et al., 2016). There have been extensive reports on the role of nematode secretions called “ effector ” in the outcome of the nematode infection process, from nematode penetration, intracellular migration, feeding cell selection, syncytium formation, development and maintenance (Hewezi and Baum, 2013; Hewezi 2015). In fact, the syncytium remains the sole nutrient source for sedentary stages of the nematodes and critical to the survival of the nematode (Gardner et al., 2015). The effector proteins are secreted through the stylet by the esophageal glands consisting of two sub ventral and one dorsal gland (Hewezi and Baum, 2013). The role of effector proteins include the modulation of the host plant development pathways, the alteration of phytohormonal in favor of the nematode via the feeding cell, as well as subversion of host plant stress and defense response mechanisms (Hewezi et al., 2008; Hewezi et al., 2010; Hewezi et al., 2014; Hewezi, 2015; Hewezi et al., 2015). In other words, it is the outcome of the interaction between nematode effector proteins and host plant interacting proteins that determines the fate of the infective nematode and the infection process in general (Hewezi and Baum, 2013). In addition, the role and the list of putative effector proteins in epigenetic modifications during SCN infection was reviewed by Hewezi (2020). The most remarkable symptom of the compatible interaction is the presence of female nematodes and mature cysts attached to soybean roots. Young females, small in size, white in color, can be

11

observed, with only part of their body on the surface. Older females are large, yellow or brown in color and appear completely on the surface of the root. Dead brown cysts are also present on the roots. Though the above ground symptoms occur in a complex with other biotic and abiotic effects, at the underground level, SCN infection caused an increase in isoflavonoids concentrations and number of nodules but with decreased shoot weight and net nitrogenases activity on SCN susceptible roots inoculated with Bradhorhizobium japonicum (USDA 110) (Kennedy and Niblack, 1999). The decreased nitrogen fixation efficiency is believed to be the result of an alteration in leghemoglobin metabolism and oxygen diffusion (Huang and Barker, 1983). Ko et al. (1984) reported a localized suppression of nodulation after SCN infection and suggested that this was the result of SCN infection and poor plant growth. Similarly, it was reported that soybean cultivars with higher SCN female indices exhibited lower nitrogen fixation rates (McGinnity et al., 1980). This could be associated with SCN impediment of soybean growth and development by interfering with root function through the formation of feeding sites (syncytia) (Wang et al., 2003). In fact, the bacterial community analysis in the soybean infested rhizosphere, revealed a reduced number of the rhizobiales in SCN infested soybeans than in the control (Zhu et al., 2013) probably due to change in root exudates composition (Bais et al., 2006). Indeed, a negative correlation between SCN and soybean isoflavone concentration was reported (Carter et al., 2018) yet, the nodule formation process is triggered by the perception of soybean root flavonoids and expression of nod genes from Rhizobia (Bradyrhizobium) (Kosslak et al., 1987; Ramongolalaina et al., 2018). Finally, a negative impact of SCN on soybean seed fatty acid profile when grown in SCN-heavily infested area had been reported (Beuselinck, 2008). Indeed, relative positive and negative correlation were reported between SCN resistance and soybean oil and protein content, respectively (St-Amour et al., 2020). QTL mapping has helped to understand the genetics of soybean complex traits including cyst nematode resistance. To date, more than 200 SCN resistance QTLs have been mapped to 19 chromosomes and are published on soybase database (Grant et al., 2010). Few of these QTLs have been listed as confirmed SCN QTLs by the soybean genetics committee, namely cqSCN1 (rhg1) (Concibido et al., 2004) on chr 18, cqSCN 2 (rhg4) on chr 08 (Meksem et al., 2001), cqSCN-003 on chr 16 (Glover et al., 2004), cqSCN-005 on chr 17, cqSCN-006 on chr15, cqSCN-007 on chr 18 (Kim and Diers, 2013), qSCN-10 on chr10 (Kadam et al., 2015) and the closest newly reported qSCN-PL10 on chr10 (Guo et al., 2020). Further, with the use of

12

SoySNP50K iSelect BeadChip, the GWAS reported 14 loci with 60 Single nucleotide polymorphisms (SNPs) associated with SCN resistance to SCN race 3, from diverse set of 553 soybean plant introductions (PIs) ranging from MGIII to MGV (Vuong et al., 2015). Similarly, from 461 soybean accessions of different origins, GWAS plus KASP SNP markers, identified 12 SNPs at 4 genomic regions on chromosomes 7,8,10 and 18 including SNPs at Rhg1 and Rhg4 (Tran et al., 2019). Though several QTLs have been discovered, Rhg1 and Rhg4 remain the most utilized loci (Concibido et al., 2004) and the only genes that have been cloned and molecularly characterized so far (Liu et al., 2012; Cook et al., 2014; Cook et al., 2012). However, recently other independent sources of resistance to SCN have been reported (Tran et al., 2019). The Rhg1 locus consist of a three functional genes tandem haplotype (Cook et al., 2012), encoding an amino acid transporter (AAT) (Glyma.18G022400), an ɑ-Soluble N-ethylamide-sensitive factor Attachment Protein (a-soluble N-ethylmaleimide–sensitive factor attachment protein) (Glyma.18g022500), and WI12 (wound-inducible domain) proteins (Glyma.18g022700) that additively contribute to the resistance against SCN. In fact, the susceptible lines were reported to have these genes in one copy while in the resistant lines, these genes are found in multiple copies ranging between 3 and 10 copies (Cook et al., 2012; Liu et al., 2017). Based on histological responses, copy number variation and type of allele on the haplotype, the Rhg1 locus was classified into low copy rhg1 with 1-3 copies (Peking-type or Rhg-a), high copy Rhg1 with 7-10 copies (PI 88788-type or Rhg-b) and single copy Rhg1 with only 1 copy (Rhg1- s) (Brucker et al., 2005; Mitchum, 2016; Kim et al., 2010; Cook et al., 2014) . It was revealed that the low copy Rhg1 (Rhg1-a) needs to be complemented by a Peking-type (resistant) Rhg4 (Cook et al., 2014) in order to provide full resistance to SCN race 3, whereas high copy Rhg1 (Rhg-b) allele functions alone to confer full SCN resistance to race 3 (Liu et al., 2017; Kandoth et al., 2017; Cook et al., 2014). Copy number variation and sequence polymorphism were associated with Rhg1-b mediated SCN resistance (Patil et al., 2017; Liu et al., 2017). In fact, five amino acid change in sequence were reported between Rhg1-a and Rhg1-b (Liu et al., 2017) and contributed to define the difference in SCN resistance between the two types. The Rhg4 gene, which encodes encodes Serine-hydroxylmethyl-transferase (GmSHMT08), was cloned and functionally characterized (Liu et al., 2012). Two missense mutations (P130R and N358Y) associated with resistance, were identified between the resistant and the susceptible alleles (Liu et al., 2012; Wu et al., 2016; Lakhssassi et al., 2019). The epistatic interaction

13

between Rhg-a and Rhg4 is responsible for the Peking-type of resistance to SCN (Patil et al., 2019). Interestingly, an incompatible interaction between Rhg1-b (GmSNAP18) and GmSHMT08 of Peking (Rhg4-a) was reported (Lakhssassi et al., 2020). In addition, the same study revealed a complex protein-protein interaction involving GmSHMT08, GmSNAP18, and GmPR08 IV (pathogenesis related protein VI). However, recently, resistant soybean line ZDD 11047 with Rgh1-a allele was reported to lack a resistant Rhg4 and exhibited full resistance to a SCN suggesting a potential role played by other loci in the resistance to SCN (Guo et al., 2020). In fact, there have been reports of other major and minor QTLs that are involved in SCN resistance, but their genetic and molecular functions remain elusive. For instance, a pool of 58 SCN soybean resistant lines were recently reported to carry none of the two most common resistance alleles, the Peking-type and the PI 88788-type, suggesting that novel SCN resistance alleles may be involved (Tran et al., 2019). Therefore, there is need to discover and elucidate functional mechanism for other QTLs and genes involved in SCN resistance in order to diversify the resistance sources. Indeed, the over-exploitation of only Rhg1 and Rhg4 in resistance to SCN may create shifts in the pathogen population and push the later to become more virulent and reproduce on otherwise resistant cultivars. Not only the genetic makeup of the soybean plant has been explored to understand soybean-SCN interactions, but also epigenetic marks contributing to the control of gene expression, were reported to be involved in controlling plant response to SCN infection (Hewezi, 2020). In fact, cyst nematodes as well as various pathogens can significantly alter plant methylome, which was found to impact gene expression level, transposon mobility, and stability (Hewezi et al., 2017; Hewezi et al., 2018). There is a wide range of epigenetic modifications regulating gene expression including DNA methylation, histone modification, and small and long non-coding RNA. However, DNA methylation can be used in soybean breeding programs because it has proven to be stably inherited (Hewezi, 2020). SCN infections are associated with integral soybean methylome changes distinguishing compatible to the incompatible reactions (Rambani et al., 2015; Rambani et al., 2020a). Those changes were reported to affect protein-coding genes, non-coding genes (miRNAs) and transposable elements, (Rambani et al., 2015; Rambani et al., 2020a; Rambani et al., 2020b). In response to SCN infection, the resistant and susceptible lines exhibited an opposite methylation pattern. Importantly, a considerable number of differentially methylated genes overlapped with syncytium differentially expressed genes (Rambani et al.,

14

2015; Rambani et al., 2020a). Fuctional characterzation of a set of the differentially methylated genes pointed into key role of DNA methylation in controlling soybean response to SCN infection. As seen above, the below-ground SCN effects can include, stunted roots, reduced number of nitrogen-fixing nodules and root susceptibility to more soil-borne plant pathogens (Tylka, 1995). Therefore, SCN could have a negative impact on the AA profile, building blocks of soybean protein through the disruption of normal biological nitrogen fixation metabolism resulting from the decreased number of nodule and fixed nitrogen.

1.3 Soybean nodulation: a biological process for seed yield and quality improvement

Biological nitrogen fixation (BNF), alternatively nodulation, is a biological process resulting from the symbiotic relationship between legume plants and the rhizosphere-inhabiting nitrogen fixing bacteria globally rereferred to as Rhizobia (Hameren et al., 2013). In the case of soybean, the most frequent interacting partner is the bacteria called Bradyrhizobium japonicum. This important process play a triple role of: (i) helping farmers to sustainably increase crop yields, (ii) enhancing soil fertility and (iii) reducing farming cost as well as negative environmental impacts associated with nitrogen fertilizer uses (Hirel et al., 2007). During such interaction, atmospheric nitrogen is converted into a more assimilable form of nitrogen (ammonium), required by the plant for growth, and in return the plant provides photosynthetically fixed carbon in form of sugars, to be used by the rhizobia as a source of energy, and reductants (Prell and Poole, 2006). The key stage of this interaction is the formation of a novel organ commonly called “root nodule” infected with the bacteria from where the atmospheric nitrogen conversion occurs. The nitrogen conversion is catalyzed by enzyme complex called nitrogenases, in an anaerobic environment (inside the nodule) although the bacteria are obligately aerobic. This aerobic- anaerobic buffer zone is maintained by a root nodule synthesized proteins called leghemoglobins, which fix the free oxygen through aerobic respiration and facilitate the optimal conditions for the nitrogenases activity (Santana et al., 1998). Nodule formation is a unique, complex and highly host-specific cellular process that involves cascades of signals between the plants and the interacting bacteria. This communication initiates profound morphological and metabolic changes in both symbionts (Kosslak et al., 1987). Based on structural morphology and histology, two types of nodules can be formed in legumes, the

15

determinate and the indeterminate. On one hand, in the determinate type of nodules like in soybean and Lotus japonicum, the initiation of cell divisions starts in the outer cortex of the root but cease immediately, leading to the formation of spherical nodules exhibiting a transient meristem. On the other hand, in the indeterminate type of nodules like in Medicago truncatula, the process starts by the formation of a persistent apical meristem arising from the divisions of the cells in the inner cortex, which leads to the initiation of “nodule primordia” with a continuous growth. A characteristic feature of the indeterminate nodules is the appearance of differentiation gradient of plants and bacterial cells from distal (apical) meristem to the proximal region close to the taproot (Ferguson et al., 2010; Laporte et al., 2012). In soybean, BNF process is initiated by soybean roots that release flavonoid volatiles into the soil, luring the compatible rhizobia in the vicinity of the plant roots. The rhizobia is then attached to the roots and synthesize highly specific signal molecules commonly known as “Nod factors” (NFs) (Dénarié et al., 1996; Liu et al., 2017). The following step is the activation of bacterial NodD transcription factors, which in turn induce the transcription of the genes controlling the synthesis NFs (Geurts et al., 2005), the kick-start of the soybean-rhizobia symbiosis. The bacteria infect the rots through root hair or cracks in the epidermal tissue. However, the emerging root hair remains the privileged entry of infecting bacteria (Oldroyd and Downie, 2008). The root hair cells ultimately curl around rhizobia, entrapping the growing attached bacteria to form the structure called “pre-infection threads”. The NFs release, and bacteria root hair entry trigger a two-ways reaction from the plant host: (i). the cortical responses that lead to nodulation genes expression characterized by the calcium spiking–dependent signaling pathway and (ii) the epidermal responses that lead to root hair deformation through signaling pathways other than the calcium-spiking (Oldroyd and Downie, 2008). The epidermal program consists of steps associated with the bacterial entry and infection such as root hair curling, infection thread (IT) formation and advancement through the cortex, whereas the cortical program consists of steps leading to the nodule organogenesis (Guinel, 2015). For a successful symbiosis the two programs must be tightly coordinated in such way that a nodule primordium initiation only takes place in close proximity with the site of bacterial infection (Oldroyd and Downie, 2008). The formation and development of nodules are controlled by a coordinated network of genes responsible for various physiological and morphological changes.

16

The perception of released NFs is mediated by Lysin motif (LysM) containing receptors on the soybean roots (Hameren et al., 2013). Two LysM receptor kinases (GmNFRI1ɑ/β and GmNFR5 ɑ/β) were identified (Indrasumunar and Gresshoff, 2010; Madsen et al., 2003; Radutoiu et al., 2007). However, GmNFRI1ɑ and GmNFR5 ɑ/β were reported to be the primary NF receptors, while GmNFR1b was functionally redundant (Indrasumunar et al., 2011). The GmNFR1ɑ and GmNFR1β were located on soybean chromosomes 2 and 14, respectively (Indrasumunar et al., 2011), whereas GmNFR5α and GmNFR β were located on chromosomes 11 and 1, respectively (Indrasumunar and Gresshoff, 2010). The extracellular structure of the LysM domain is the major determinant of rhizobia-host plant specificity, and hence successful symbiosis (Radutoiu et al., 2007), whereas the internal structure is conserved across all legumes (Nakagawa et al., 2011). A parallel signaling pathway using the lectin nucleotide phosphorylase (LNP) at the growing zone of the root hair, was reported as the main NF receptor (Etzeler et al., 1999) but its functional mechanism remain unknown. Downstream the LysM receptor kinase, the two receptor-like kinases GmNORK (nodulation receptor kinase) ɑ/β alternatively GmSymRK ɑ/β (symbiosis receptor kinase) on chromosome 1 and 9, respectively, are activated (Wang and Deng, 2016; Indrasumunar et al., 2015; Stracke et al., 2002). Both GmNORK ɑ/β are said to act in both mycorrhizal and rhizobia infection. However, RNAi of GmNORKβ resulted in a stronger effect than RNAi of GmNORKɑ, suggesting its possible functional specialization after genome duplication ( Wang and Deng, 2016; Indrasumunar et al., 2015). It was suggested that activation of GmNORK induces a cascade of downstream reactions controlling rhizobia infection and nodule organogenesis (Indrasumunar et al., 2011; Madsen et al., 2010) or act in a protein-protein complex with NF receptors for downstream transduction of the NF message (Riely et al., 2004; Oldroyd and Downie, 2008). The fluctuation in ionic balance across the plasma membrane, accompanied with root hair deformation, depolarization of the membrane, calcium-spiking, formation of pre-infection threads, cortical cell divisions, inhibition of reactive oxygen, hormonal modulation are the earliest signs from the plant (Cooper, 2007). The immediately acting genes downstream of GmNORK are not yet functionally validated in soybean. However, Medicago truncatula MtDMI1 (Does not make infections1) and Lotus Japonicum, CASTOR and POLLUX, encoding ion channels required for perinuclear calcium were reported to act downstream the GmNORK orthologues (Zhu et al., 2006; Charpentier et al., 2008; Charpentier,

17

2008). In fact, these ubiquitous genes are highly conserved in various legumes and non-legume systems (Chen et al., 2009). The calcium-spiking gradient signal around the cell nucleus is received by a putative GmDMI3 encoding a calcium/calmodulin-dependent protein kinases (CCaMK) downstream (Choudhury and Pandey, 2015). The activation of CcaMK is reported to induce cytokinin increase in cortical cells and these hormonal changes are perceived by the cytokinin receptors, LjLHK1/MtCRE1, on the cortical cell membrane (Tirichine et al., 2007). In fact, the cytokinine is the first hormone to integrate the nodule organogenesis signaling, playing a central role in nodule formation right downstream NFs perception (Frugier et al., 2008; Tirichine et al., 2007). For example, the application of low concentrations of exogenous cytokinin increased nodule numbers (Mens et al., 2018). In a separate study, the soybean Isopentenyl Transferase (GmIPT5) orthologous of L. Japonicus IPT3(ljIPT3), required to build up cytokinin needed for nodule development (Reid et al., 2017) was upregulated in the shoot independently of GmNARK (Nodule Autoregulation Receptor Kinase) after rhizobia inoculation, suggesting its key role in nodule development (Mens et al., 2018).

Downstream, the nodule inception (NIN) encoding a transmembrane protein with a predicted DNA-binding domain suggesting its action as a transcription factor was identified (Soyano et al., 2013). Simultaneously, the transcription factors of the GRAS family namely, GmNSP1 and GmNSP2 are induced by Gm DMI3, and bind to the promoters of early nodulation (ENOD) genes that are responsible for root hair curling and nodule formation (Gleason et al., 2006; Hirsch et al., 2009). Canonically, phytohormones regulate cellular and developmental growth through perceived signals by the plants. NF perception induce novel signaling pathways, and associated hormonal biosynthesis continue to play key roles throughout nodule organogenesis modulating the integration of both developmental and environmental signals into nodule development (Ryu et al., 2012). Not only individual hormones play a critical role in modulating plant development but also hormonal activity is a function of their balance during biosynthesis, inactivation pathways, signaling and translocation mechanisms (Ryu et al., 2012). For instance, cytokinin/auxin balance plays a critical role in the development and differentiation of meristematic cells and secondary organs. The overexpression of soybean microRNA160, targeting the repressors of auxin response

18

factor transcription factors significantly reduced nodule primordium, implying that auxin hypersensitivity has inhibition effect on nodule development. On the other hand, the overexpression of miRNA393, targeting auxin receptor affected, nodule numbers suggesting that minimal threshold of auxin signaling is needed for optimum nodule development (Turner et al., 2013). Beyond, the historically known regulators of cellular and metabolic process, recent studies unraveled the role played by newly discovered epigenetic modifications in the regulation of plant developmental processes. A critical role of epigenetics and other chromatin-based regulation of gene expression in developing nodules has been reviewed by Mergaert et al. (2020). DNA methylation and demethylation patterns in M. truncatula, an indeterminate nodule producing legume revealed a DNA methylation reprogramming during nodule development, especially during the differentiation stage (Satgé et al., 2016). However, DNA methylation patterns during various stages of nodule formation and development in soybean have not yet been reported. Therefore, profiling DNA methylation patterns in developing soybean nodules and investigating the potential impacts on gene transcription and alternative splicing will improve our understanding of molecular mechanisms controlling nodule formation, development and senescence in soybean. The end goal of soybean nodulation is the conversion of atmospheric nitrogen (N2) into a more assimilable form of Nitrogen to the plants, ammonia (NH4+). However, the latter is also a highly toxic form of nitrogen that is immediately converted into AA in a process called nitrogen assimilation. Two main nitrogen assimilation pathways were reported, the glutamine synthase (GS) and glutamate synthase (GOGAT) pathways that produce the glutamine (Gln) and glutamate (Glu) respectively from the ammonia and serving as template for the synthesis of other AA needed by the plant mainly via transamination process (Ohyama et al., 2010). The AA are then translocated via the phloem to respective sink tissues especially soybean pod and seed to serve as the building blocks to synthesize the proteins (Ohyama et al., 2010). Thus, through the optimization of the nodulation process, soybean protein content can be indirectly improved.

1.4 Soybean production in Rwanda: Challenges and opportunities

Being a short-day crop, soybean adaptation in latitudes below 20oC is complicated especially the sub-Saharan-Africa region where daylengths range from 12 to 13.5 hr, yet there is a huge

19

potential of soybean production (Miranda and Bilyeu, 2015). A recent study reported the main soybean producer countries in Africa include South Africa (31%), Nigeria (25%), Zambia (11%), Uganda (6%), Malawi (6%), Zimbabwe (3%), Mozambique (3%) and the rest (16%) (Shurtleff and Aoyagi 2019). In all these countries, the soybean production potential is still very low in comparison to what was achieved in most of temperate countries. In fact, poor soybean adaptation is listed as the main limiting factor (Miranda and Bilyeu, 2015; Miranda et al., 2020). In addition to this, the low soybean yields in sub-Saharan Africa, is attributed to the lack of high yielding varieties, inadequate use of fertilizers and rhizobia inoculants (Khojely et al., 2018). In Rwanda, soybean was introduced in the 1920’s but has gained relatively low farmers’ interest, due to the competition with common beans and other crops. So far, few cultivars grown in the country are plant introductions from international and regional research institutions or neighboring countries. These few available varieties are also low yielding and less attractive to farmers. Yet, the soybean products demand is growing with the creation of new soybean processing facilities both for oil and animal feed. In addition to this narrow genetic diversity and variety collection, there is poor soil fertility, climatic variability, pests and diseases, and limited skills of best agronomic practices (Mugabo et al., 2014). The narrow genetic diversity would be improved by introducing more competitive varieties in terms of yield but also seed quality traits. In addition, there is a need to improve soybean crop husbandry through efficient application of fertilizers and rhizobia inoculants in marginal soils. Alternatively, for sustainable production a strong breeding program should be put in place in order to frequently release new material. As seen above, through the introduction of exotic soybean lines and the genetic manipulation of the flowering, maturity and long juvenile genes, a soybean high yield potential could be achieved. Overall, soybean agronomic and seed quality traits are major breeding objectives pursued by soybean breeders. The improvement of those traits can be achieved through conventional techniques or biotechnology. Understanding genetic and epigenetic basis of each trait is a paramount requirement since most of them are a function of the genotype (G) and/or Epigenotype (Gepi), the environment (E), the GXE and/or GepiXE and management options (M). We explored, the ways of improving soybean traits such as yield, seed quality traits especially protein and oil, pest and disease resistance, and nodulation.

20

1.5 Organization of the Dissertation

This dissertation is framed into five chapters based on the objective of the research project. i) The General introduction reviews past and current findings about soybean improvement in terms of agronomic traits, seed quality traits, SCN resistance, soybean nodulation and the status of soybean cultivation in Rwanda. ii) Chapter 2 highlights the findings from the multilocational evaluation of the performance of a population of RILs developed at UT, Soybean Breeding and Genetics Program. iii) Chapter 3, the role of Rhg1 and 4 loci in conferring SCN resistance to segregating soybean RILs was investigated. iv) Chapter 4 reports main findings from the adaptability study of US-developed soybean RILs in tropical environments of Rwanda. v) Chapter 5 Gene expression, gene splicing, DNA methylation and their interaction in defining soybean nodulation efficiency was explored. vi) The General conclusions highlight key results demonstrating the genetic and epigenetic control of soybean agronomic and seed quality traits as well as future perspectives.

21

References

Anai T, Yamada T, Hideshima R, Kinoshita T, Rahman SM, Takagi Y (2008) Two high- oleic-acid Soybean mutants, M23 and KK21, have disrupted microsomal omega-6 fatty acid desaturase, encoded by GmFAD2-1a. Breeding Science 58: 447–52. Anai T, Yamada T, Kinoshita T, Rahman SM, Takagi Y (2005) Identification of corresponding genes for three low-α-linolenic acid mutants and elucidation of their contribution to fatty acid biosynthesis in soybean seed. Plant Science 168: 1615–23. Azevedo RA, Arruda P, Turner WL, Lea PJ (1997) The biosynthesis and metabolism of the aspartate derived amino acids in higher plants. Phytochemistry 46 : 395–419. Bais HP, Weir TL, Perry LG, Gilroy S, Vivanco JM (2006) The role of root exudates in rhizosphere interactions with plants and other . Annual Review of Plant Biology 57: 233–66. Bandillo N, Jarquin D, Song Q, Nelson R, Cregan P, Specht J, Lorenz A (2015) A population structure and genome-wide association analysis on the USDA soybean germplasm collection. The Plant Genome 8: 1-13. Beauregard M, Hefford MA (2006) Enhancement of essential amino acid contents in crops by genetic engineering and protein design. Plant Biotechnology Journal 4: 561–74. Bernard RL (1971) Two major genes for time of flowering and maturity in soybeans 1. Crop Science 11: 242–44. https://doi.org/10.2135/cropsci1971.0011183x001100020022x. Bernard RL (1971) Two genes affecting stem termination in soybeans. Crop Science 12: 235– 39. Beuselinck PR (2008) Effect of soybean cyst nematode on fatty acid levels of soybean seed. Plant Health Progress 9: 5. Bilyeu K D, L Palavalli, D A Sleper, and P R Beuselinck (2003) Three microsomal omega-3 fatty-acid desaturase genes contribute to soybean linolenic acid levels. Crop Science 43: 1833–38. Bilyeu K, Palavalli L, Sleper D, Beuselinck P (2005) Mutations in soybean microsomal omega-3 fatty acid desaturase genes reduce linolenic acid concentration in soybean seeds. Crop Science 45: 1830–36. Boehm JD, Nguyen V, Tashiro RM, Anderson D, Shi C, Wu X, Woodrow L, Yu K, Cui Y,

22

Zenglu Li (2018) Genetic mapping and validation of the loci controlling 7S Α′ and 11S A- type storage protein subunits in soybean [Glycine Max (L.) Merr.]. Theoretical and Applied Genetics 131: 659–71. Bohlmann H (2015) Introductory chapter on the basic biology of cyst nematodes. Advances in Botanical Research 73:33–59. Bonato ER, Vello NA (1999) E6, a dominant gene conditioning early flowering and maturity in soybeans. Genetics and Molecular Biology 22: 229–32. Brucker E, Carlson S, Wright E, Niblack T, Diers B (2005) Rhg1 Alleles from soybean PI 437654 and PI 88788 respond differentially to isolates of Heterodera Glycines in the greenhouse. Theoretical and Applied Genetics 111: 44–49. Buzzell RI (1971) Inheritance of a soybean flowering repsonse to fluorescent-daylength conditions. Canadian Journal of Genetics and Cytology 13:703–707. Buzzell RI, and Voldeng HD (1980) Inheritance of insensitivity to long day-length. Soybean Genetics Newsletter 7: 26–29. Byrum JR, Kinney AJ, Stecca KL, Grace DJ, Diers BW (1997) Alteration of the omega-3 fatty acid desaturase gene is associated with reduced linolenic acid in the A5 soybean genotype. Theoretical and Applied Genetics 94: 356–59. Cao D, Takeshima R, Zhao C, Liu B, Jun A, Kong F (2017) Molecular Mechanisms of fowering under long days and stem growth habit in soybean. Journal of Experimental Botany 68: 1873–84. Cao Y, Li S, He X, Chang F, Kong J, Gai J, Zhao T (2017) Mapping QTLs for plant height and flowering time in a Chinese summer planting soybean RIL population. Euphytica 213: 39. Carter A, Rajcan I, Woodrow L, Navabi A, Eskandari M (2018) Genotype, environment, and genotype by environment interaction for seed isoflavone concentration in soybean grown in soybean cyst nematode infested and non-infested environments. Field Crops Research 216: 189–96. Carter Jr T E, Nelson RL, Sneller CH, Cui Z, (2004) Genetic diversity in soybean. Soybeans: Improvement, Production, and Uses, 16:303–416. Chang F, Guo C, Sun F, Zhang J, Wang Z, Kong J, He Q, Sharmin RA, Zhao T (2018). “Genome-wide association studies for dynamic plant height and number of nodes on the

23

main stem in summer sowing soybeans. Frontiers in Plant Science 9: 1–13. Chapman A, Pantalone VR, Ustun A, Allen FL, Landau-Ellis D, Trigiano RN, Gresshoff PM (2003) Quantitative trait loci for agronomic and seed quality traits in an F2 and F4:6 soybean population. Euphytica 129: 387–93. Charpentier M, Bredemeier R, Wanner G, Takeda N, Schleiff E, Parniske M (2008) Lotus Japonicus CASTOR and POLLUX are ion channels essential for perinuclear calcium spiking in legume root endosymbiosis. The Plant Cell Online 20: 3467–79. Charpentier M (2008) Functional characterisation of two channels proteins involved in leguminous symbiosis. PhD dissertation. Chaudhary J, Patil GB, Sonah H, Deshmukh RK, Vuong TD, Valliyodan B, Nguyen HT. (2015) Expanding omics resources for improvement of soybean seed composition traits. Frontiers in Plant Science 6: 1–16. Chen C, Fan C, Gao M, Zhu H (2009) Antiquity and function of CASTOR and POLLUX, the twin ion channel-encoding genes key to the evolution of root symbioses in plants. Plant Physiology 149: 306–17. Choudhury SR, Pandey S (2015) Phosphorylation-dependent regulation of G-protein cycle during nodule formation in soybean. The Plant Cell 27: 3260–76. Clemente TE, Cahoon EB (2009) Soybean Oil: Genetic approaches for modification of functionality and total content. Plant Physiology 151: 1030–40. Cober ER, Voldeng HD (2001) A new soybean maturity and photoperiod-sensitivity locus linked to E1 and T. Crop Science 41: 698–701. Cober ER, Morrison MJ (2010) Regulation of seed yield and agronomic characters by photoperiod sensitivity and growth habit genes in soybean. Theoretical and Applied Genetics 120: 1005–12. Cober ER, Molnar SJ, Charette M, Voldeng HD (2010) A new locus for early maturity in soybean. Crop Science 50: 524–527. Concibido VC, Diers BW, Arelli PR (2004) A decade of QTL mapping for cyst nematode resistance in soybean. Crop Science 44: 1121–31. Cook DE, Bayless AM, Wang K, Guo X, Song Q, Jiang J, Bent AF (2014) Distinct Copy cumber, coding sequence, and locus methylation patterns underlie Rhg1-mediated soybean resistance to soybean cyst nematode. Plant Physiology 165: 630-647.

24

Cook DE, Lee TG, Guo X, Melito S, Wang K, Bayless AM, Wang J (2012) Copy number variation of multiplegenes at Rhg1 mediates nematode resistance in soybean. Science 338: 1206–9. Cooper JE (2007) Early interactions between legumes and rhizobia:disclosing complexity in a molecular dialogue. Journal of Applied Microbiology 103: 1355–65.Cortijo, Sandra, René Wardenaar, Maria Colomé-Tatché, Arthur Gilly, Mathilde Etcheverry, Karine Labadie, Erwann Caillieux, et al. 2014. “Mapping the Epigenetic Basis of Complex Traits.” Science 343 (6175): 1145–48. https://doi.org/10.1126/science.1248127. Cortijo S, Wardenaar R, Colomé-Tatché M, Gilly A, Etcheverry M, Labadie K, Caillieux E (2014) Mapping the epigenetic basis of complex traits. Science 343: 1145–48. Darr L, Cunicelli M, Bhandari H, Bilyeu K, Chen F, Hewezi T, Li, Carl Sams Z, Pantalone V (2020) Field performance of high oleic soybeans with mutant FAD2-1A and FAD2-1B genes in tennessee. Journal of the American Oil Chemists’ Society 97: 49–56. Dénarié J, Debellé F, Promé JC (1996) Rhizobium lipo-chitooligosaccharide nodulation factors: signaling molecules mediating recognition and morphogenesis. Annual Review of Biochemistry 65: 503–535. Destro D, Carpentieri-Pípolo V, Kiihl RS, Almeida LA (2001) Photoperiodism and genetic control of the long juvenile period in soybean: A review. Crop Breeding and Applied Biotechnology 1: 72–92. Dinkins RD., M. S. Srinivasa Reddy, Curtis A. Meurer, Bo Yan, Harold Trick, Françoise Thibaud-Nissen, John J. Finer, Wayne A. Parrott, and Glenn B. Collins (2001) Increased sulfur amino acids in soybean plants overexpressing the maize 15 KDa Zein Protein. In Vitro Cellular and Developmental Biology - Plant 37: 742–47. Drexler H, Spiekermann P, Meyer A, Domergue F, Zank T, Sperling P, Abbadi A, Heinz E (2003) Metabolic engineering of fatty acids for breeding of new oilseed crops: strategies, problems and first results. Journal of Plant Physiology 160: 779–802. Eskandari M, Cober ER, Rajcan I (2013) Genetic control of soybean seed oil: II. QTL and genes that increase oil concentration without decreasing protein or with increased seed yield. Theoretical and Applied Genetics 126: 1677–87. Etzeler ME, Kalsi G, NicholasI N. Ewing NN, Roberts NJ, Day RB, Murphy JB (1999) A Nod factor binding lectin with apyrase activity from legume roots. Proceedings of the

25

National Academy of Sciences 96: 5856–5861. Fang C, Li W, Li G, Wang Z, Zhou Z, Ma Y, Shen Y, Li C, Wu Y, Zhu B, Yang W (2013) Cloning of Ln gene through combined approach of map-based cloning and association study in soybean. Journal of Genetics and Genomics 40: 93–96. Fang, Y, Liu S, Dong Q, Zhang K, Tian Z, Li X (2020) Linkage analysis and multi-locus genome-wide associationstudies identify QTNs controlling soybean plant height. Frontiers in Plant Science 11: 1–14. Fehr WR (2007) Breeding for modified fatty acid composition in soybean. Crop Science 47:S- 72. Ferguson B J, IndrasumunarA, Hayashi S, Lin MH, Lin YH, Reid DE, Gresshoff PM (2010) Molecular analysis of legume nodule development and autoregulation. Journal of Integrative Plant Biology 52: 61–76. Ferguson BJ (2013) The Development and Regulation of Soybean Nodules. In eds. Board JE, A comprehensive survey of international soybean research - Genetics, Physiology, Agronomy and Nitrogen Relationships Relationships, 31–47. Fowler S, Lee K, Onouchi H, Samach A, Richardson K, Morris B, Coupland G, Putterill J (1999) GIGANTEA: A Circadian clock-controlled gene that regulates photoperiodic flowering in Arabidopsis and encodes aprotein with several possible membrane-spanning Domains. EMBO Journal 18: 4679–88. Frugier F, Kosut Sa, Murray JD, Crespi M, Szczyglowski K (2008) Cytokinin: secret agent of symbiosis. Trends in Plant Science 13: 115–20. Gai JY, WangYS, Zhang MC, Wang JA, Chang RZ ( 2001) Studies on the classification of maturity groups of soybeans in China. Acta Agronomica Sinica 27: 286–292. Galili G, Amir R, Hoefgen R, Hesse H (2005) Improving the levels of essential amino acids and sulfur metabolites in plants. Biological Chemistry 386: 817–31. Gardner M, Verma A, Mitchum MG (2015) Emerging roles of cyst nematode effectors in exploiting plant cellular processes. In Advances in Botanical Research. Academic Press 73: 259-291. Geurts R, Fedorova E, Bisseling T (2005) Nod factor signaling genes and their function in the early stages of rhizobium infection. Current Opinion in Plant Biology 8: 346–52. Gleason C, Chaudhuri S, Yang T, Muñoz A, Poovaiah BW, Oldroyd GED (2006)

26

Nodulation independent of Rhizobia induced by a calcium-activated kinase lacking autoinhibition. Nature 441: 1149–52. Glover KD, Wang D, Arelli PR, Carlson SR, Cianzio SR, Diers BW (2004) Near isogenic lines confirm a soybean cyst nematode resistance gene from PI 88788 on linkage Group J. Crop Science 44: 936-941. Grant D, Nelson RT, Cannon SB, Shoemake RC (2010) SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Research 38: 843–46. Guinel FC (2015) Ethylene , a hormone at the center-stage of nodulation. 6: 1121. Guo W, Chen JS, Zhang F, Li ZY, Chen HF, Zhang CJ, Chen LM, Yuan SL, Li R, Cao D, Hao QN (2020) Characterization of pingliang Xiaoheidou (ZDD 11047), a soybean variety with resistance to soybean cyst nematode heterodera glycines. Plant Molecular Biology 103: 253–67. Guzman PS, Diers BW, Neece DJ, St. Martin SK, LeRoy AR, Grau CR, Hughes TJ, and Nelson (2007) QTL Associated with yield in three backcross-derived populations of soybean. Crop Science 47: 111–22. Hameren BV, Hayashi S, Gresshoff PNM, Ferguson BJ (2013) Advances in the Identification of novel factors required in soybean nodulation , a process critical to sustainable agriculture and food security. Journal of Plant Biology & Soil Health 1: 1–6. Harada JJ, Barker SJ, Goldberg RB (1989) Soybean beta-conglycinin genes are clustered in several DNA regions and are regulated by transcriptional and posttranscriptional processes. The Plant Cell 1: 415–25. Hartman GL, West EW, Herman TK (2011) Crops that feed the world 2. Soybean-worldwide Production, use, and constraints caused by pathogens and pests. Food Security 3: 5–17. Hartwig EE, Kiihl RAS (1979) Identification and utilization of a delayed flowering character in soybeans for short-day conditions. Field Crops Research 2: 145–51. Hewezi T (2015) Cellular signaling pathways and posttranslational modifications mediated by nematode effector proteins. Plant Physiology 169: 1018–26. Hewezi T (2020) Epigenetic mechanisms in nematode–plant interactions. Annual Review of Phytopathology 58: 1–20. Hewezi T, Baum TJ (2013) Manipulation of plant cells by cyst and root-knot nematode effectors. Molecular Plant-Microbe Interactions 26: 9–16.

27

Hewezi T, Howe PJ, Maier TR, Hussey RS, Mitchum MG, Davis EL, Baum TJ (2010) Arabidopsis spermidine synthase is targeted by an effector protein of the cyst nematode Heterodera schachtii. Plant Physiology 152: 968–84. Hewezi T, Howe PJ, Maier TR, Hussey RS, Mitchum MG, Davis EL, Baum TJ (2008). Cellulose binding protein from the parasitic nematode Heterodera schachtii interacts with Arabidopsis pectin methylesterase: cooperative cell wall modification during Parasitism. Plant Cell 20: 3080–93. Hewezi T, Juvale PS, Piya S, Maier TR, Rambani A, Rice JH, Mitchum MG, Davis EL, Hussey RS, Baum TJ (2015) The cyst nematode effector protein 10A07 targets and recruits host posttranslational machinery to mediate its nuclear trafficking and to promote parasitism in Arabidopsis. Plant Cell 27: 891–907. Hewezi T, Lane T, Piya S, Rambani A, Rice JH, Staton M (2017) Cyst nematode parasitism induces dynamic changes in the root Epigenome. Plant Physiology 174: 405-420. Hewezi T, Pantalone V, Bennett M, Stewart NC, M. Burch-Smith T (2018) Phytopathogen- induced changes to plant methylomes. Plant Cell Reports 37: 17–23. Hewezi T, Piya S, Richard G, Rice JH (2014) Spatial and temporal expression patterns of auxin response transcription factors in the syncytium induced by the beet cyst nematode Heterodera schachtii in Arabidopsis. Molecular Plant Pathology 15: 730–36. Hirel B, Gouis JL, Ney B, Gallais A (2007) The challenge of improving nitrogen use efficiency in crop Plants: towards a more central role for genetic variability and quantitative genetics within integrated approaches. Journal of Experimental Botany 58: 2369–87. Hirsch S, Kim J, Munoz A, Heckmann AB, Downie JA, Oldroyd GED (2009) GRAS proteins form a DNA binding complex to induce gene expression during nodulation signaling in Medicago truncatula. The Plant Cell Online 21: 545–57. Huang JS, Barker,KR (1983) Influence of Heterodera glycines on leghemoglobins of soybean nodules. Phyytopathology 73: 1002-1004. Huang S, Yu J, Li Y, Wang J, Wang X, Qi H, Xu M, Qin, H., Yin, Z., Mei, H. and Chang, H (2019) Identification of soybean genes related to soybean seed protein content based on quantitative trait loci collinearity analysis. Journal of Agricultural and Food Chemistry 67: 258–74. Hwang S, Lee TG (2019) Integration of lodging resistance QTL in Soybean. Scientific Reports

28

9: 1–11. Hymowitz T, Harlan JR (1983) Introduction of Soybean to North America by Samuel Bowen in 1765. Economic Botany 37: 371–79. Indrasumunar A, Gresshoff PM (2010) Duplicated Nod-Factor receptor 5 (NFR5) genes are mutated in soybean (Glycine Max L. Merr.). Plant Signaling and Behavior 5: 535–36. Indrasumunar A, Searle I, Lin MH, Kereszt A, Men A, Carroll BJ, Gresshoff PM (2011) Nodulation Factor Receptor Kinase 1α controls nodule organ number in soybean (Glycine Max L. Merr). Plant Journal 65: 39–50. Indrasumunar A, Wilde J, Hayashi S, Li D, Gresshoff PM (2015) Functional analysis of duplicated symbiosis receptor kinase (SymRK) genes during nodulation and mycorrhizal infection in soybean (Glycine Max). Journal of Plant Physiology 176: 157–68. Jeong N, Moon JK, Kim HS, Kim CG, Jeong SC (2011) Fine genetic mapping of the genomic region controlling leaflet shape and number of seeds per pod in the soybean. Theoretical and Applied Genetics 122: 865–74. Jeong N, Suh SJ, Kim MH, Lee S, Moon JK, Kim HS, Jeong SC (2013). “Ln Is a key regulator of leaflet shape and number of seeds per pod in soybean.” Plant Cell 24: 4807–18. Jia H, Jiang B, Wu C, Lu W, Hou W, Sun S, Yan H, Han T 2(014) Maturity group classification and maturity locus genotyping of early-maturing soybean varieties from high- latitude cold regions. PLoS ONE 9: 1–9. Johnson SL, Fehr WR, Welke GA, Cianzio SR (2001) Genetic variability for seed size of two- and three-parent soybean populations. Crop Science 41: 1029–1033. Kabelka EA, Diers BW, Fehr WR, Leroy AR, Baianu IC, You T, Neece DJ, Nelson RL (2004) Putative alleles for increased yield from soybean plant introductions. Crop Science 44: 784–91. Kadam S, Vuong TD, Qiu D, Meinhardt CG, Song L, Deshmukh R, Patil G, Wan J, Valliyodan B, Scaboo AM, Shannon JG (2015) Genomic-assisted phylogenetic analysis and marker development for next generation soybean cyst nematode resistance breeding. Plant Science 242: 342–50. Kandoth PK, Liu S, Prenger E, Ludwig A, Lakhssassi, Heinz NR, Zhou Z, Howland A, Gunther J, Eidson S, Dhroso A (2017) Systematic mutagenesis of serine

29

Hydroxymethyltransferase reveals an essential role in nematode resistance. Plant Physiology 175 : 1370–80. Kato S, Fujii K, Yumoto S, Ishimoto, Shiraiwa MT, Sayama T, Kikuchi A, Nishio T (2015) Seed yield and its components of indeterminate and determinate lines in recombinant inbred lines of soybean. Breeding Science 65: 154–60. Kennedy MJ, Niblack TL, Krishnan HB (1999) Infection by heterodera glycines elevates isoflavonoid production and influences soybean nodulation. Journal of Nematology 31: 341–347. Khojely DM, Ibrahim SE, Sapey E, Han T (2018) History, current status, and prospects of soybean production and research in sub-saharan-Africa. Crop Journal 6: 226–35. Kim DS, Hanifzadeh M, Kumar A (2018) Trend of biodiesel feedstock and its impact on biodiesel emission characteristics. Environmental Progress and Sustainable Energy 37: 7– 19. Kim, MY, Van K, Kang YJ, Kim KH, Lee SH (2011) Tracing soybean domestication history: from nucleotide to genome.Breeding Science 61: 445–52. Kim M, Diers BW (2013) Fine mapping of the SCN resistance QTL CqSCN-006 and CqSCN- 007 from Glycine Soja PI 468916. Crop Science 53: 775–85. Kim M, Hyten DL, Bent AF, Diers. BW (2010) Fine mapping of the SCN Resistance Locus Rhg1-b from PI 88788. The Plant Genome 3: 81–89. Kim WS, Chronis D, Juergens M, Schroeder AC, Hyun SW, Jez JM, Krishnan HB. (2012) Transgenic soybean plants overexpressing O-Acetylserine Sulfhydrylase accumulate enhanced levels of cysteine and bowman-birk protease inhibitor in seeds. Planta 235: 13– 23. Ko MP, Barker KR, Huang J (1984) Nodulation of soybeans as affected by half-rootinfection with Heterodera Glycines 1. 16: 97–105. Kong F, H Nan H, D Cao D, Li Y, Wu F, Wang J, Lu S, Yuan X, Cober ER, Abe J, Liu B. (2014) A new dominant gene E9 conditions early flowering and maturity in soybean. Crop Science 54: 2529–35. Kosslak RM, Bookland R, Barkei J, Paaren HE, Appelbaum ER (1987) Induction of Bradyrhizobium Japonicum common nod genes by isoflavones isolated from Glycine Max.” Proceedings of the National Academy of Sciences 84: 7428–32.

30

Lakhssassi N, Patil G, Piya S, Zhou Z, Baharlouei A, Kassem MA, Lightfoot DA, Hewezi T, Barakat A, Nguyen HT, Meksem K (2019) Genome reorganization of the GmSHMT gene family in soybean showed a lack of functional redundancy in resistance to soybean cyst nematode. Scientific Reports 9: 1–16. Lakhssassi N, Piya S, Knizia D, Baze AE, Cullen MA, Meksem J, Lakhssassi A, Hewezi T, Meksem K (2020) Mutations at the Serine Hydroxymethyltransferase impact its interaction with a Soluble Nsf Attachment Protein and a Pathogenesis-Related Protein in Soybean. Vaccines 8: 1–24. Langewisch T, Lenis J, Jiang GL, Wang D, Pantalone V, Bilyeu K (2017) The development and use of a molecular model for soybean maturity groups. BMC plant biology 91: 1–13. Laporte P, Niebel A, Frugier. F (2012) Legume roots and nitrogen-fixing symbiotic interactions. In Root Genomics and Soil Interactions 145–70. Lee GJ, X. Wu, Shannon JG, Sleper DA, Nguyen HT (2007) Soybean. In Kole C ,eds, C Oilseeds 2:1–3. Lee C, Herbek J (2005) Estimating Soybean Yield. University of Kentucky. agr-188: 7–8. Lee SH, Bailey MA, Mian MAR, Shipe ER, Ashley DA, Parrott WA, Hussey RS, aBoerma HR (1996) Identification of quantitative trait loci for plant height, lodging, and maturity in a soybean population segregating for growth Habit. Theoretical and Applied Genetics 92: 516–23. Li Z, Nelson RL (2001) Genetic diversity among soybean accessions from three countries measured by RAPDs. Crop Science 41: 1337–47. Li M, Liu Y, Wang C, Yang X, Li D, Zhang X, Xu C, Zhang Y, Li W, Zhao L. (2020) Identification of traits contributing to high and stable yields in different soybean varieties across three Chinese latitudes. Frontiers in Plant Science 10: 1–14. Liu B, LiuXB, Wang C, Li YS, Jin J, Herbert SJ. (2010) Soybean yield and yield component distribution across the main axis in response tolight enrichment and shading under different densities. Plant, Soil and Environment 56: 384–92. Liu B, Kanazawa A, Matsumura H, Takahashi R, Harada K, Abe J. (2008) Genetic redundancy in soybean photoresponses associated with duplication of the phytochrome A gene. Genetics 180: 995–1007.

31

Liu B, Watanabe S, Uchiyama T, Kong F, Kanazawa A, Xia Z, Nagamatsu A, Arai M, Yamada T, Kitamura K, Masuta C (2010) The Soybean stem growth habit gene Dt1 is an ortholog of arabidopsis TERMINAL FLOWER1. Plant Physiology 153: 198–210. Liu KS (1999) Chemistry and nutritional value of soybean components. In eds K.S. Liu , Aspen Publishers, Inc., MD, Soybeans: Chemistry, Technology, and Utilization, 25–113. Liu S, Kandoth PK, Lakhssassi N, Kang J, Colantonio V, Heinz R, Yeckel G, Zhou Z, Bekal S, Dapprich J, Rotter B (2017) The Soybean GmSNAP18 gene underlies two types of resistance to soybeancyst nematode. Nature Communications 8: 1–11. Liu S,. Kandoth PK, Warren SD, Yeckel G, Heinz R, Alden J, Yang C, Jamai A, El- Mellouki T, Juvale PS, Hill J (2012) A Soybean cyst nematode resistance gene points to a new mechanism of plant resistance to pathogens. Nature 492: 256–60. Liu S, Zhang M, Feng F, Tian Z (2020) Toward a ‘Green Revolution’ for Soybean. Molecular Plant 13: 688–97. Liu W, Kim MY, Van K, Lee YH, Li H, Liu X, Lee SH (2011) QTL Identification of yield- related traits and their association with flowering and maturity in soybean. Journal of Crop Science and Biotechnology 14: 65–70. Liu X, Wu JA, Ren H, Qi Y, Li C, Cao J, Zhang X, Zhang Z, Cai Z, Gai J (2017) Genetic variation of world soybean maturity date and geographic distribution of maturity groups.” Breeding Science 67: 221–32. Liu Y, Jiang X, Guan D, Zhou W, Ma M, Zhao B, Cao F, Li L, Li J (2017) Transcriptional analysis of genes involved in competitive nodulation in Bradyrhizobium diazoefficiens at the presence of soybean root Exudates. Scientific Reports 7: 1–11. Liu Y, Zhang D, Ping J, Li S, Chen Z, Ma J (2016) Innovation of a regulatory mechanism modulating semi-determinate stem growth through artificial selection in soybean. PLoS Genetics 12: 1–22. Liu Z, Li H, Fan X, Huang W, Yang J, Wen Z, i Li Y, Guan R, Guo Y, Chang R, Wang D (2017) Phenotypic characterization and genetic dissection of nine agronomic traits in Tokachi Nagaha and its derived cultivars in soybean (Glycine Max (L.) Merr.). Plant Science 256: 72–86. Lockhart J (2014) Finding Dt2, the dominant gene that specifies the semideterminate growth habit in soybean. Plant Cell 26: 2725.

32

Lu S, Zhao X, Hu Y, Liu S, Nan H, Li X, Fang C, Shi X, Kong L, Su T (2017) Natural variation at the soybean J locus improves adaptation to the tropics and enhances yield.” Nature Genetics 49: 773–79. Lu X, Xiong Q, Cheng T, Li QT, LiuXL, Bi YD, Li W, Zhang WK, Ma B, Lai YC, Du WG (2017) A PP2C-1 allele underlying a enhances soybean 100-Seed weight. Molecular Plant 10: 670–84. Ma L, Li B, Han F, Yan S, Wang L, Sun J (2015) Evaluation of the chemical quality traits of soybean seeds, as related to sensory attributes of soymilk. Food Chemistry 173: 694–701. Madsen, Esben Bjørn, Lene Heegaard Madsen, Simona Radutoiu, Shusei Sato, Takakazu Kaneko, Satoshi Tabata, and Niels Sandal. 2003. “A Receptor Kinase Gene of the LysM Type Is Involved in Legume Perception of Rhizobial Signals,” 637–40. Madsen LH, Tirichine L, Jurkiewicz A, Sullivan JT, Heckmann AB, Bek AS, Ronson CW, James EK, Stougaard J (2010) The molecular network governing nodule organogenesis and infection in the model legume lotus japonicus. Nature Communications 1: 1-2. Mcblain BA, Bernard RL (1987) A New gene affecting the time of flowering and maturity in soybeans. Journal of Heredity 78: 160–62. McFarlane J, (2013) Soybean oil derivatives for fuel and chemical feedstocks. In Soybean-Bio- Active Compounds, 112–33. McGinnity PJ, George Kapusta G, Myers O (1980) Soybean cyst nematode and rhizobium strain influences on soybean nodulation and N 2-Fixation . Agronomy Journal 72: 785–89. Meksem K, Pantazopoulos P, Njiti VN, Hyten LD, Arelli PR, Lightfoot DA (2001) Forrest’ resistance to the soybean cyst nematode is bigenic: saturation mapping of the Rhg1 and Rhg4 Loci. Theoretical and Applied Genetics 103: 710–17. Mens C, Li D, Haaima LE, Gresshoff PM, Ferguson BJ (2018) Local and systemic effect of cytokinins on soybean nodulation and regulation of their isopentenyl transferase (IPT) biosynthesis genes following rhizobia inoculation. Frontiers in Plant Science 9: 1–14. Mergaert P, Kereszt A, Kondorosi E (2020) Gene expression in nitrogen-fixing symbiotic nodule cells in medicago truncatula and other nodulating Plants. The Plant Cell 32: 42–68. Miranda C, Bilyeu K (2015) Adaptation of soybean in tropical environments: A Review. Miranda C, Scaboo A, Cober E, Denwar N, and Kristin Bilyeu K (2020) The effects and interaction of soybean maturity gene alleles controlling flowering time, maturity, and

33

adaptation in tropical environments. BMC Plant Biology 20: 1–13. Mitchum M (2016) Soybean resistance to the soybean cyst nematode heterodera glycines: An update. Phytopathology 106: 1444-1450. Morrison MJ, Cober ER, Saleem MF, McLaughlin NB, Frégeau-Reid J, Ma BL, Yan W, Woodrow L (2008) Changes in isoflavone concentration with 58 years of genetic improvement of short-season soybean cultivars in Canada. Crop Science 48: 2201–2208. Mugabo J, Tollens E, Chianu J, Obi A, Vanlauwe B (2014) Resource use efficiency in soybean production in Rwanda. Ressource 5: 116–22. Nakagawa T, Kaku H, Shimoda Y, Sugiyama A, Shimamura M, Takanashi K, Yazaki K, Aoki T, Shibuya N, Kouchi H (2011) From defense to symbiosis: limited alterations in the kinase domain of LysM receptor-like kinases are crucial for evolution of legume-rhizobium Symbiosis. Plant Journal 65: 169–80. Nelson RL (2011) Managing self-pollinated germplasm collections to maximize utilization. Plant Genetic Resources: Characterisation and Utilisation 9: 123–33. Niblack TL, Lambert KN, Tylka GL (2006) A model plant pathogen from the kingdom animalia: heterodera glycines, the soybean cyst nematode. Annual Review of Phytopathology 44: 283–303. Ohyama T, Ito S, Nagumo Y, Ohtake N (2010) Symbiotic nitrogen fixation and its assimilation in soybean. In Ohyama T, Sueyoshi K , eds, Nitrogen assimilation in plants. p. 661. Oldroyd GED, Downie JA (2008) Coordinating nodule morphogenesis with rhizobial infection in legumes. Annual Review of Plant Biology 59: 519–546. Pantalone V, Cunicelli M, Wyman C (2020) Registration of Soybean Cultivar ‘TN15-5007’ with High Meal Protein. Journal of Plant Registrations 14: 139–43. Pantalone V, Smallwood C (2018) Registration of ‘TN11-5102’ Soybean cultivar with high yield and high protein meal. Journal of Plant Registrations 12: 304. Panthee DR, Kwanyuen P, Sams CE, West DR, Saxton AM, Pantalone VR (2004) Quantitative Trait Loci for β-Conglycinin (7S) and Glycinin (11S) fractions of soybean storage protein. Journal of the American Oil Chemists’ Society 81: 1005–12. Panthee DR, Pantalone VR (2006) Registration of soybean Germplasm Lines TN03–350 and TN04–5321 with improved protein concentration and quality. Crop Science 46: 2328–29.

34

Patil GB, Lakhssassi N, Wan J, Song L, Zhou Z, Klepadlo M, Vuong TD, Stec AO, Kahil SS, Colantonio V, Valliyodan B (2019) Whole-genome re-Sequencing reveals the impact of the interaction of copy number variants of the Rhg1 and Rhg4 genes on broad-based resistance to soybean cyst nematode. Plant Biotechnology Journal 17: 1595–1611. Patil G, Mian R, Vuong T, Pantalone V, Song Q, Chen P, Shannon GJ, Carter TC, Nguyen HT (2017) Molecular mapping and genomics of soybean seed protein: A Review and Perspective for the Future. Theoretical and Applied Genetics 130: 1975–1991. Pham AT, Lee JD, Shannon JG, Bilyeu KD (2010) Mutant alleles of FAD2-1A and FAD2-1B combine to produce soybeans with the high oleic acid seed oil Trait. BMC Plant Biology 10: 195. Pham AT, Lee JD, Shannon JG, Bilyeu KD (2011) A Novel FAD2-1A allele in a soybean plant introduction offers an alternate means to produce soybean seed oil with 85% oleic acid content. Theoretical and Applied Genetics 123: 793–802. Pham AT, Lee JD, Shannon JG, Bilyeu KD (2012) Combinations of mutant FAD2 and FAD3 genes to produce high oleic acid and low linolenic acid soybean oil. Theoretical and Applied Genetics 125: 503–15. Ping J, Liu Y, Sun L, Zhao M, Li Y, M She M, Sui Y, Lin F, Liu X, Tang Z, Nguyen H (2014) Dt2 is a gain-of-function MADS-Domain factor gene that specifies semideterminacy in soybean. Plant Cell 26: 2831–42. Prell J, Poole P (2006) Metabolic changes of rhizobia in legume nodules. Trends in Microbiology 14: 161–68. Qi Q, Huang J, Crowley J, Ruschke L, Goldman BS, Wen L, Rapp WD (2011) Metabolically engineered soybean seed with enhanced threonine levels: biochemical characterization and seed-specific expression of lysine-insensitive variants of aspartate kinases from the enteric bacterium xenorhabdus bovienii. Plant Biotechnology Journal 9: 193–204. Radutoiu S, Madsen LH, Madsen EB, Jurkiewicz A, Fukai E, Quistgaard EMH, Albrektsen AS, James EK, Thirup S, Stougaard J (2007) LysM domains mediate lipochitin-oligosaccharide recognition and Nfr genes extend the symbiotic host range. EMBO Journal 26: 3923–35 https://doi.org/10.1038/sj.emboj.7601826. Rajcan I, Hou G, Weir AD ( 2005 Advances in breeding of seed quality traits in soybean.”

35

Journal Of Crop Improvement 14:145–174. Raju SKK, Shao M, Sanchez R, Xu Y, Sandhu A, Graef G, Mackenzie S (2018) An epigenetic breeding system in soybean for increased yield and stability. Plant Biotechnology Journal 16: 1836–1847. Rambani A, Hu Y, Piya S, Long M, Rice JH , Pantalone V, Hewezi T (2020) Identifcation of differentially methylated MiRNA genes during compatible and incompatible interactions between soybean and soybean cyst nematode. Molecular Plant-Microbe Interactions. Rambani A, Pantalone V, Yang S, Rice JH, Song Q, Mazarei M, Arelli PR, Meksem K, Stewart CN, Hewezi T (2020) Identification of introduced and stably inherited DNA methylation variants in soybean associated with soybean cyst nematode parasitism. New Phytologist 227: 168–184 Rambani A, Rice JH, Liu J, Lane T, Ranjan P, Mazarei M, Pantalone V, Stewart CN, Staton M, and Hewezi T (2015) The methylome of soybean roots during the compatible interaction with the soybean cyst nematode. Plant Physiology 168: 1364–77. Ramongolalaina C, Teraishi M, Okumoto Y (2018) QTLs underlying the genetic interrelationship between efficient compatibility of bradyrhizobium strains with soybean and genistein secretion by soybean roots. PLoS ONE 14: 13. Ray JD, Hinson K, Mankono JE, Malo MF (1995) Genetic control of a long-juvenile trait in soybean. Crop Science 35: 1001–106. Reid D, Nadzieja M, Novák O, Heckmann AB, Sandal N, Stougaard J (2017) Cytokinin biosynthesis promotes cortical cell responses during nodule development. Plant Physiology 175: 361–75. Riely BK, Ané JM, Penmetsa RV, Cook DR (2004) Genetic and genomic analysis in model legumes bring nod-factor signaling to center stage. Current Opinion in Plant Biology 7: 408–413. Ryu H, Cho H, Choi D, Hwang I (2012) Plant hormonal regulation of nitrogen-fixing nodule organogenesis. Molecules and Cells 34: 117–126. Samanfar B, Cober ER, Charette M, Tan LH, Bekele WA, Morrison MJ, Kilian A, Belzile F, Molnar SJ (2019) Genetic analysis of high protein content in ‘AC Proteus’ related soybean populations using SSR, SNP, DArT and DArTseq markers. Scientific Reports 9: 1– 10.

36

Samanfar B, Molnar SJ, Charette M, Schoenrock A, Dehne F, Golshani A, Belzile F, and Cober ER (2017). Mapping and identification of a potential candidate gene for a novel maturity locus, E10, in Soybean.” Theoretical and Applied Genetics 130: 377–90. Santana MA, Pihakaski-Maunsbach K, Sandal N, Marcker KA, Smith AG (1998). Evidence that the plant host synthesizes the heme moiety of leghemoglobin in root nodules.” Plant Physiology 116: 1259–69. Satgé C, Moreau S, Sallet E, Lefort G, Auriac MC, Remblière C, Cottret L, Gallardo K, Noirot C, Jardinaud MF, Gamas P (2016) Reprogramming of DNA methylation is critical for nodule development in medicago truncatula. Nature Plants 2:1-10. Schlueter JA, Vasylenko-Sanders IF, Deshpande S, Yi J, Siegfried M, Roe BA, Schlueter SD, Scheffler BE, Shoemaker RC (2007) The FAD2 gene family of soybean: insights into the structural and functional divergence of a paleopolyploid genome.” Crop Science 47: S- 14 Sedivy EJ, Wu F, Hanzawa Y (2017) Soybean domestication: the origin, genetic architecture and molecular bases. New Phytologist 214: 539–53. Shen Y, Zhang J, Liu Y, Liu S, Liu Z, Duan Z, Wang Z, Zhu B, Guo YL, Tian Z (2018) DNA methylation footprints during soybean domestication and improvement. Genome Biology 19: 25–27. Shurtleff W, Aoyagi A (2019) History of soybeans and soyfoods in Africa (1857-2019): Extensively annotated bibliography and sourcebook. Soyinfo center. Sinclair TR, and Hinson K (1992) Soybean flowering in response to the long‐juvenile trait. Crop Science 32: 1242–48. Smallwood C, Fallen B, Pantalone V (2018) Registration of ‘TN11-5140’ soybean cultivar. Journal of Plant Registrations 12: 203–207. Soyano T, Kouchi H, Hirota A, Hayashi M (2013) NODULE INCEPTION directly targets NF-Y subunit genes to regulate essential processes of root nodule development in lotus japonicus. PLoS Genetics 9: e1003352. St-Amour VTB, Mimee B, Torkamaneh D, Jean M, Belzile F, O’Donoughue LS (2020) Characterizing resistance to soybean cyst nematode in PI 494182, an early maturing soybean accession. Crop Science 1: 1–17.

37

Stracke S, Kistner C, Yoshida S, Mulder, Sato LS, Kaneko T, Tabata S, Sandal N, Stougaard J, Szczyglowski K, Parniske MA (2002) A plant receptor-like kinase required for both bacterial and fungal symbiosis. Nature 417: 959–62. Sun H, Jia Z, Cao D, Jiang B, Wu C, Hou W, Liu Y, Fei Z, Zhao D, Han T (2011) GmFT2a, a soybean homolog of flowering locus T, is involved in flowering transition and maintenance. PLoS ONE 6: 18–20. Tang GQ, Novitzky WP, Griffin HC, Huber SC, Dewey RE (2005) Oleate desaturase enzymes of soybean: evidence of regulation through differential stability and phosphorylation.” Plant Journal 44: 433–46. Thanh VH, Shibasaki K (1978) Major proteins of soybean seeds. subunit structure of β- Conglycinin. Journal of Agricultural and Food Chemistry 26: 692–95. Thompson JA, Bernard RL, Nelson RL (1997) A third allele at the soybean Dt1 locus. Crop Science 37: 757–62. Tian Z, Wang X, Lee R, Li Y, Specht JE, Nelson RL, McClean PE, Qiu L, Ma J (2010) Artificial selection for determinate growth habit in soybean. Proceedings of the National Academy of Sciences of the United States of America 107: 8563–8568. Tirichine L, Sandal N, Madsen LH, Radutoiu S, Albrektsen AS, Sato S, Asamizu E, Tabata S, Stougaard J (2006) A Gain-of-function mutation in a cytokinin receptor triggers spontaneous root nodule organogenesis. Science 315: 104–7. Tran DT, Steketee CJ, Boehm JD, Noe J, Li Z (2019) Genome-wide association analysis pinpoints additional major genomic regions conferring resistance to soybean cyst nematode (Heterodera Glycines Ichinohe). Frontiers in Plant Science 10: 1–13. Turner M, Nizampatnam NR, Baron M, Coppin S, Damodaran S, Adhikari S, Arunachalam SP, Yu O, Subramanian S (2013) Ectopic expression of MiR160 results in auxin hypersensitivity, cytokinin hyposensitivity, and inhibition of symbiotic nodule development in soybean.” Plant Physiology 162: 2042–55. Tylka GL,(1995) Soybean cyst nematode. Plant Pathology. Iowa State Universty. https://www.plantpath.iastate.edu/scn/files/page/files/PM0879.pdf. Vuong TD, Sonah H, Meinhardt CG, Deshmukh R, Kadam S, Nelson RL, Shannon JG, Nguyen HT (2015) Genetic architecture of cyst nematode resistance revealed by genome- wide association study in soybean. BMC Genomics 16:593.

38

Wang J, Niblack TL, Tremain JA, Wiebold WJ, Tylka GL, Marett CC, Noel GR, Myers O, Schmidt ME (2003) Soybean cyst nematode reduces soybean yield without causing obvious aboveground symptoms. Plant Disease 87: 623–28. Wang L, Deng L (2016) GmACP expression is decreased in GmNORK knockdown transgenic soybean roots. Crop Journal 4: 509–516. Watanabe S, K Harada K, Abe J (2011) Genetic and molecular bases of photoperiod responses of flowering in soybean.” Breeding Science 6: 531–43. Watanabe S, Hideshima R, Zhengjun X, Tsubokura Y, Sato S, Nakamoto Y, Yamanaka N, Takahashi R, Ishimoto M, Anai T, Tabata S (2009) Map-Based cloning of the gene associated with the soybean maturity locus E3.” Genetics 182: 1251–62. Watanabe S, Xia Z, Hideshima R, Tsubokura Y, Sato S, Yamanaka N, Takahashi R, Anai T, Tabata S, Kitamura K, Harada K (2011) A map-based cloning strategy employing a residual heterozygous line reveals that the GIGANTEA gene is involved in soybean maturity and flowering. Genetics 188: 395–407. Whitham SA, Qi M, Innes RW, Ma W, Lopes-Caitar V, Hewezi (2016) Molecular soybean- pathogen interactions. Annual Review of Phytopathology 54: 443-468. Wilson RF, (2004) Seed composition. In eds Boerma and J.E. Specht, 3rd ed. Soybeans: Improvement, Production, and Uses. American Society of Agronomy; Crop Science Society of America; Soil Science Society of America 621–77. Wilson RF, (2008) Soybean: market driven research needs.” In Genetics and genomics of Soybean, 3–15. Woods SJ, Swearingin ML (1977) Influence of simulated early lodging upon soybean seed yield and its components. Agronomy Journal 69: 239–42. Wu XY, Zhou GC, Chen YX, Wu P, Liu LW, Ma FF, Wu M, Liu CC, Zeng YJ, Chu AE, Hang YY (2016) Soybean cyst nematode resistance emerged via artificial selection of duplicated serine hydroxymethyltransferase genes. Frontiers in Plant Science 7: 1–8. Xia Z, Watanabe S, Yamada T, Tsubokura Y, Nakashima H, Zhai H, Anai T, Sato S, Yamazaki T, Lü S, Wu H (2012) Positional cloning and characterization reveal the molecular basis for soybean maturity locus E1 that regulates photoperiodic flowering. Proceedings of the National Academy of Sciences of the United States of America 109: 28–30. Yamaguchi N, Sayama T, Yamazaki H, Miyoshi T, Ishimoto M, Funatsuki H (2014)

39

Quantitative Trait Loci associated with lodging tolerance in soybean cultivar ‘Toyoharuka.’” Breeding Science 64: 300–308. Yao D, Liu ZZ, Zhang J, Liu SY, Qu J, Guan SY, Pan LD, Wang D, Liu JW, Wang PW (2015) Analysis of Quantitative Trait Loci for main plant traits in soybean.” Genetics and Molecular Research 14: 6101–6109. Young VR (1994) Adult Amino Acid requirements : the case for a major revision in current recommendations. The Journal of Nutrition 124: 1517S-1523S. Yue Y, Liu N, Jiang B, Li M, Wang H, Jiang Z, Pan H, Xia Q, Ma Q, Han T, Nian H (2017) A single nucleotide deletion in J encoding GmELF3 confers long juvenility and is associated with adaption of tropic soybean. Molecular Plant 10: 656–58. Zhang D, Wang X, Li S, Wang C, Gosney MJ, Mickelbart MV, Ma J (2019) A post- domestication mutation, Dt2, triggers systemic modification of divergent and convergent pathways modulating multiple agronomic traits in soybean.” Molecular Plant 12: 1366–82. Zhang J, Song Q, Cregan PB, Nelson RL, Wang X, Wu J, Jiang GL (2015) Genome-Wide association study for flowering time, maturity dates and plant height in early maturing soybean (Glycine Max) Germplasm. BMC Genomics 16: 217 . https://doi.org/10.1186/s12864-015-1441-4. Zhao-Ming Q, Ya-Nan S, Qiong W, Chun-Yan L, Guo-Hua H, Qing-Shan C (2011) A meta- analysis of seed protein concentration QTL in soybean. Canadian Journal of Plant Science 9: 221–30. Zhao C, Takeshima R, Zhu J, Xu M, Sato M, Watanabe S, Kanazawa A, Liu B, Kong F, Yamada T, Abe J (2016) A recessive allele for delayed flowering at the soybean maturity locus E9 is a leaky allele of FT2a, a FLOWERING LOCUS T Ortholog. BMC Plant Biology 16: 1–15. Zhu H, Riely BK, Burns NJ, Ané JM (2006) Tracing nonlegume orthologs of legume genes required for nodulation and arbuscular mycorrhizal symbioses.” Genetics 172: 2491–99. Zhu Y, Tian J, Shi F, Su L, Liu K, Xiang M, Liu X (2013) Rhizosphere bacterial communities associated with healthy and heterodera glycines-infected soybean roots. European Journal of Soil Biology 58: 32–37.

40

Chapter 2: Performance evaluation of TN09-029 x NCC05-1168 recombinant inbred lines (RIL) population of Soybean Glycines max (L) Merr.

41

Abstract

Soybean is an economically important crop in the US and in the world. Soybean productions in the US represent a major share of the global production, ranking the second after Brazil. Despite the increasing global demand of soybean products, soybean yields are hampered by several biotic and abiotic stresses. In this context, in order to meet global and the state’s yield requirements, soybean continuous improvement is of greatest importance. A population of recombinant inbred lines resulting from a cross between TN09-029 and NCC05-1168 was developed and evaluated in multilocational trials across Tennessee in 2016 and 2017. The objective of the cross was to create a segregating population of recombinant inbred lines (RILs) for seed yield and quality, maturity date as well as resistance to soybean pathogens including soybean cyst nematodes among other factors. The lines were evaluated in 5 environments over two consecutive years. Agronomic traits, including plant height, lodging, date to maturity, seed yield, and protein and oil contents were evaluated. The average seed yield was 3,410 kg ha-1 while the average protein and oil contents were 390.9 and 213.3, respectively. The top yielding RIL was in the same group as the high yielding check of the test. The meal protein of the RILs was in the range of commercially acceptable levels with a minimum of 43.6 and a maximum of 47.3 %. The broad sense heritability scores ranged from 63.8 to 80% except for lodging with h2 of 56.9. Altogether, our results suggest a possibility of selection of the best performers from the RIL population to be released as cultivars or germplasm.

42

1. Introduction

Soybean is by virtue of its share on the global market an important crop in the world and in the US in particular. Soybean is the top oilseed crop with 61% of the global oil seed production and occupies the first place with 70% of protein meal consumption worldwide (American Soybean Association, 2019). Soybean is also a major source of protein for the livestock feed industry (Yin and Blachier, 2011) while being the leading feedstock for the biodiesels industry (Kim et al., 2017; Rajcan, 2005). In the US, soybean share was 33% of the total area planted with 123.7 million MT in 2018. The US’ average yield was estimated at 3.5 MT/Ha, whereas in Tennessee the yield was at 3.1 MT/Ha, ranking the state at 17th position across the nation (American Soybean Association, 2019).

Though soybean production has been increasing over time, global demand also keeps growing at a much higher rate. Thus, maintaining soybean economic advantage is paramount. This requires a continuous study of the genetics of the yield components and interactions with the environment and management options (GxExM) (Xavier and Rainey, 2020; Assefa et al., 2019; Bellaloui et al., 2020). Improved soybean performance and higher grain yields are the main goal pursued by breeders in a bid to cope with increasing demand of soybeans in the market. This is achieved through cultural practices and genetic manipulations of existing germplasm (Thompson and Nelson, 1998; Sherrie et al., 2011). However, being a quantitative trait, determined by multiple interactions between genes and environments, yield is less predictable , and hence multi-year and multilocation field trials are necessary (Yuan et al., 2002). In fact, multilocation trials are becoming a standard tool to evaluate the performance across environments and select the best genotypes fitting in specific environments (Li et al., 2020).

Currently, research focus is on breeding for seed quality in soybeans with emphasis put on improving the oil quality and on increasing the protein content for both feed and human consumption. However, seed protein and oil concentrations are also quantitative traits, and are highly affected by environmental conditions (Qi et al., 2011; Grieshop and Fahey, 2001; Thakur and Hurburgh, 2007). In fact, it was reported that GxExM variability affects seed quality in a

43

similar manner as yield (Assefa et al., 2018). Not only that there is a need to increase the state’s average yield but also increasing seed yield through genotypic selection while maintaining high levels of protein and oil has been a constant goal for plant breeders; taking into consideration the historical evidences of contrasting trends between protein and seed yield at one hand and protein and oil at the other hand (Cunicelli, 2017; Wilcox and Shibles, 2001; Yaklich et al., 2002; Panthee et al., 2005; Mello Filho et al., 2004).

Soybean cyst nematode (SCN, Heterodera glycines) is also another soybean breeding challenge that seriously reduces soybean yields in the US, as SCN associated yield and dollar losses per hectare were ranked first (Bandara et al., 2020). Furthermore, yield drag associated with SCN resistance alleles have been reported (Kopisch-Obuch etl., 2005; Donald et al., 2006), putting the breeding for SCN resistance task in jeopardy. So, the goal of soybean breeders is to achieve combined high yield and resistance to SCN in released cultivars. In this context, the soybean breeding and genetics program at the University of Tennessee initiated a cross between TN09- 029 x NCC05-1168, in order to develop a recombinant inbred lines population segregating for among other traits, resistance to SCN, seed yield and seed quality with the objective of combining high yield, high oil concentration and acceptable levels of protein content in SCN resistant cultivar. Here, we present the results from the study conducted over two years and five environments across Tennessee.

2. Results

2.1 Agronomic traits

Yield, a complex and polygenic trait, is the main goal pursued by soybean breeders. The average seed yield of our trials was 3,410 kg ha-1 with the minimum and maximum of 2,305 and 4,344 kg ha-1, respectively (Table 2-1). The yield in our RIL population was positively correlated with all agronomic traits except with lodging and oil concentration where a negative correlation was recorded (Table 2-2). Plant height is an important breeding trait as some yield increasing quantitative trait loci (QTLs) were found associated with increasing height in soybean (Alcivar et al. 2007).

44

Plant height in the RIL population ranged from 23.7 cm and 37.0 cm with a mean of 28 cm. As for the correlations, our study revealed a moderate positive correlation (r=0.45, p0.001) between plant height and yield, whereas no significant correlation between plant height and resistance to lodging was recorded (Table 2-2). Resistance to lodging is of breeding consideration since lodging and seed yield are inversely correlated and complete lodging could lead to up to 30% or more of yield loss (Saito, 2012) . In addition, serious lodging can make machinery harvest difficult which has economic implications (Kato et al., 2015). Overall, the RIL population was resistant to lodging with an average score of 1.8. A highly significant negative correlation between lodging and seed yield was recorded (r=-0.16) (Table 2-2).

Maturity date (days after panting until harvest ) is a crucial parameter for choosing a suitable cultivar (Salmeron et al., 2014) and determining the right planting time in order to maximize crop returns for any maturity group (Boyer et al., 2015). The RIL population average relative maturity was 124.8 days. The earliest line matured at 119 days and the latest at 131.2 days after planting (Table 2-1). In terms of correlation with other agronomic traits, our study revealed a positive and a negative correlation between the date to maturity and plant height and lodging, respectively (Table 2-2).

2.2 Seed quality traits

The soybean seed oil content ranges from 8 % to 28% (Wilson 2004). However, more recent literature, reports a range of 18% to 22% in cultivated soybeans (Patil et al.,2018). Increasing oil content in new soybean lines is the main objective of many breeding programs. However, due to the strong negative correlation between oil and protein content, combining high oil, high protein, and high yield is challenging (Rotundo, Miller-Garvin, and Naeve 2016). In our study, the oil content ranged from 203.8 to 224.0 g/kg with a mean of 213.3 g/kg on seed dry weight basis (Table 2-1). Most of the RILs were found in the same cluster as the higher oil content parent (TN09-029). However, 44 lines had greater oil content and four of them showed equal or greater than 200g kg-1 (Figure 2-2). As expected, a negative correlation between oil and protein was found among the RILs (r= -.49, p0.001). As for correlation with yield, oil content was negatively correlated with yield (p0.0005) (Table 2-2).

45

The genetic variability in seed protein content was reported to be in the range of 34% and 57% (Wilson 2004). Nevertheless, developing high-protein and high-yielding soybean cultivars is challenging due to the inverse relationship between seed yield and seed protein content (Cober and Voldeng 2000). In our population, the mean protein content was 390.9 g kg-1 with a minimum of 374.4 g kg-1 and a maximum of 419.1 g kg-1 (Table 2-1). Seed protein concentration of the parental line NCC05-1168 was significantly higher (399.1g kg-1) than the parental line TN09-029 (384.3 g kg-1). The protein content distribution showed few transgressive segregants with protein content being higher than the best parent NCC05-1168 (Figure 2-3).

The protein fraction in the soybean meal ranges from 42.7% to 51.1 (Thakur and Hurburgh 2007). However, the US soybean meal comes second in terms of protein fraction after Brazil (Thakur and Hurburgh 2007). Yet, this aspect is becoming an important demand tool especially for animal feed processors. Therefore, improving this soybean meal quality trait is important for US soybean breeders. The meal protein of our RIL population was found in an acceptable range of meal protein content (41- 48%) with a minimum of 43.6 and a maximum of 48.4 % (Table 2- 1). Notably, most of the RILs were found in the middle protein meal content class (44.7-45.7 %) (Figure 2-4).

Fatty acid is in important consideration for soybean breeding especially for improved oil quality which is mostly used for human consumption and the biofuel industry. Soybean oil is the most consumed worldwide at 59% of all oilseeds (Soy stats, 2019) Currently, breeding programs are oriented towards high oleic and low linolenic acid content in soybean due to new FDA regulations (FDA, 2015) and health benefits (Sala-Vila and Ros, 2011) and improved biodiesels quality (Fallen et al., 2012). Fatty acid mean composition for the RIL population was 11.1, 3.5, 17.1, 53.9, 6.7 for palmitic (16:0), stearic (18:0), oleic (18:1), linoleic (18:2) and linolenic (18:3) respectively (Table 2-1). All traits under our study had strong broad sense heritability. The estimates ranged from 57% (lodging) to 80% (meal protein) (Table 2-1).

46

3. Discussion

3.1 The agronomic performance improved

The development of this RIL population led to a yield increase in comparison to the parental lines and the checks. The mean yield of our RIL population was in the range of acceptable yields in comparison with cultivars currently grown across the state of Tennessee. In fact, the state’s average yield in 2017 was 3,360 kg ha-1. Moreover, the RILs seed yield average remained greater than the state’s 5 years average, which was 3180 kg ha-1 (https://data.nal.usda.gov/dataset/nass- quick-stats accessed 2020-08-25). In addition, the RIL’s average seed yield was in the same range as the average seed yield from all early IV conventional lines tested in the Tennessee variety trials in 2017 (3,813kg ha-1). However, that average was slightly lower than the average seed yield from the late IV conventional lines of the same tests (Sykes et al. 2017). Interestingly, the RIL population top yielding line SCNepi-134 with 4154 kg ha-1, were found in the same group as the popular conventional yield check in Tennessee ‘Ellis’ (Pantalone et al., 2017). Even though the check TN09-008 (Pantalone et al., 2018), a newly released cultivar outperformed all RILs, both parents and other checks in the test with 4344 kg ha-1, about 50% of the RIL population was in middle yield class (3121-3580 kg ha-1). In total, 57 lines and 34 of the RILs found in classes of higher yields than the high yielding parent, TN09-029 (Figure 2-1). This distribution of yields, taken together with broad sense heritability (Table 2-1), is a good indication about the selection of best parents aiming at a high genetic gain as previously suggested (Mahmud and Kramer, 1951). For instance, promising results were obtained by Wiggins et al. (2018) with the selection in the upper 5, 10, 15, and 20% tails. Thus, our RIL population provides potential selection options for yield improvement for different environments across Tennessee. Correlation wise, the trends were somehow in contradiction with the report of Mello Filho et al, (2004) where negative correlations were recorded in most of their populations. However, the same trends were found by Mourtzinis et al. (2017) except with oil concentration.

It is worth to note that harsh weather (dry) conditions during year 2016 season could have contributed to the reduction of seed yields, thus, negatively impacting the overall yield means in our trials. In fact, severe stress especially during seed filling stage causes a shift towards the formation of small seeds, which greatly impacts seed yield (Dornbos and Mullen 1991). The range of plant heights in or population was optimal for resistance to lodging since the regression

47

analysis in soybean found an increase in lodging score as a result of an increase in height (Wilcox and Sediyama 1981; Kato et al., 2015). Also, QTLs of plant height were associated with lodging (Yamaguchi et al., 2014). The positive correlation between plant height and yield was consistent with previous findings. Indeed, positive correlations between plant height at maturity and the number of nodes at maturity as well the total number of pods at maturity which in turn was positively correlated with grain yield per plant were reported (Machado et al., 2017) However, both traits being polygenic and complex traits, the correlation is not always evident (Diondra et al., 2008).

This lodging score of our RILs was better than the scores recorded over 2 years in the Tennessee State Variety Test in early and late IV maturity groups, which were 2.2 and 1.9, respectively (Sykes et al., 2017). The RIL standability rating has a practical and economical meaning since it makes harvesting easy, in addition to reported negative correlation between susceptibility to lodging and seed yield. Indeed, our findings confirmed previous reports of negative correlations between both traits (Saito, 2012; Ramli et al., 1980). Thus, in terms of resistance to lodging, our population performed well in relation to breeding objectives. The average maturity date was the same as for the early maturing parent (NCC05-1168) and two days earlier than the late maturing parent (TN09-029). This puts our RIL population in clusters of early IV and late IV maturity groups (MG). In fact, in two-years (2016-2017) evaluation across Tennessee, the average maturity for early and late IV groups was 123 days and 135 days, respectively (Sykes et al., 2017). Some of our population top yielding lines such as SCN-134 matured around 5 days before all checks. This finding has an agroeconomic impact especially as similar yield potential was found between shorter-season and later-maturing cultivars, when exposed to the similar environment (Egli, 1993). Consistent with a previous report (Cober and Morrison, 2010), maturity date was positively correlated with seed yield. The results of our study suggest that, early planting (May) is recommended as suggested by Boyer et al., (2015).

3.2 Protein and oil content relatively increased

The results our study indicate that the line with lowest oil content was above the lowest commercially acceptable oil content (Wilson 2004). Moreover, the RIL average oil content mean was not that different from the average oil content of the candidate lines tested across seven

48

research stations in the Tennessee variety trials in 2017 that were 214 and 215 gkg-1 for the early IV and late IV maturity group, respectively (Sykes et al., 2017). Interestingly, the RIL average was about 2 units greater than the 2017 averages for mid-south states (195g kg-1) and nationwide (191 g kg-1) (Miller-Garvin and Naeve, 2017). In all, four of the RILs showed equal or greater than 200g kg-1 oil content, and therefore they can be classified as high-oil content lines (Stobaugh et al., 2017). The correlation trend between oil and protein were consistent with earlier findings (Cober and Voldeng, 2001; Carrera et al., 2011). This trend was also previously found by Rotundo et al., (2016) between protein and oil both in time (r =-.39 to -.58) and spatially (r =-.41 to -.61). However, the negative correlation recorded between oil content and yield were in disagreement with previous reports (Wiggins et al. 2018; Mourtzinis et al. 2017). Nevertheless, some researchers found a similar relationship (Stobaugh et al., 2017). This contradiction means that though both traits are under genetic control, they may be highly influenced by environmental, management practices, and interaction effects (Bellaloui et al., 2011). Overall the oil content data of RILs provide good selection options for improving existing soybean cultivars in term of oil content and yield in general.

Despite, historical negative correlation between oil and protein, soybean breeders were able to overcome that negative correlation either by recurrent selection (Brim and Burton 1979) or complementary parentage (Pantalone et al., 1996). Though the mean protein content of our RILs was less than average targeted content for commercial cultivars (Wilson, 2004), it was comparatively similar to the content found across Tennessee state, in the 2017 Tennessee Variety Test (Sykes et al., 2017). The parental lines protein content confirmed earlier findings by Gillen and Shelton (2011). The protein content of the lines that recorded higher than the higher protein content parent taken together with their seed yield may provide an important decision-making tool about their use either as cultivar or germplasm.

In terms of meal protein, none of the RILs was capable of producing a high meal protein (48%), a current concern for soybean breeders targeting high yield as their top priority (Pantalone and Smallwood, 2018). However, complemented with digestibility tests (not tested) and concentration in key amino acids, the RIL meal protein content could be in acceptable ranges on feed markets (Thakur and Hurburgh, 2007). As for fatty acid content, our results were in the ranges of typical soybean oil averages (Lee et al., 2007) and as expected, none of the RILs was

49

found under elite high oleic, low linolenic group (Wilson, 2004). However, improvement can be made for some of RILs with higher and stable yields across different environments. In fact, the heritability scores for the measured traits were in the ranges of previously reported data ( Cober and Voldeng, 2000; Brim and Burton, 1979; Burton and Brim, 1981), suggesting that selection of best candidate lines could be made from our RIL population for these traits.

4. Conclusion

The increasing global demand for soybean products coupled with biotic and abiotic challenges facing the soybean production systems require a continuous improvement of the existing soybean cultivars. In order to develop a multipurpose soybean line population, a cross was made between two parental lines differing for among other traits, resistance to a number of pathogens including

SCN, maturity date, seed yield and quality. The progeny population was advanced to the F5 generation and tested in multiple locations across Tennessee in five environments. Our results showed an increase in seed yield in comparison to the checks and parental lines. Similarly, improved oil and protein contents were recorded. Overall, our study, demonstrated that it is possible to combine more than one desirable trait in a cultivar while maintaining at acceptable levels, the otherwise historically negatively correlated traits.

5. Materials and Methods

5.1 Plant material

A population of 115, F5 derived RILs was developed from a cross between TN09-029 and NCC05-1168. Line TN09-029, a late maturity group IV line, is the result of the crossing between Fowler and Anand, and was reported as highly resistant to SCN race 2, 3 and 5 but susceptible to stem canker (Gillen and Shelton, 2012). On the other hand, the parental line NCC05-1168 is an early maturity group V line resulted from the cross between TN97-167 and S99-2281, and was rated as resistant to stem canker, SCN race 3 but susceptible to SCN race 2 and 5 (Gillen and Shelton, 2011).

The RIL population was advanced through the single pod decent method up to the F5 generation

(Table 2-3). At the F5 generation, 115 individual plants were pulled, and individual lines were

50

advanced to the F5:7 by bulk method and evaluated under 5 different locations in Tennessee as detailed below. RILs were tested against standard checks grown in Tennessee, including Ellis (Pantalone et al., 2017), ‘Osage’ (Chen et al., 2007), and the newly released cultivar TN09-008 (Pantalone et al., 2018).

5.2 Field experiments

The F5:6 and F5:7 were tested in two row plots of 4.9m x 0.8m in two environments at the East Tennessee Research and Education Center in Knoxville (ETREC) and the Research and

Education Center at Milan, TN (RECM) in 2016. In 2017, for the F5:7 test, in addition to the above stations, a third location was added at the Highland Rim Research and Education Center in Springfield, TN (HRREC) to cover the middle Tennessee environment. The field trials were set up in a randomized complete block design (RCBD) in two and three replications during the first and the second year, respectively.

5.3 Agronomic traits

For each line, data were taken for flower color, and pubescence color. Plant height was measured as the distance from the ground to the highest node in cm. Plant lodging was measured at the scale of 1 (upright position) to 5 (prostrate position). Maturity date was recorded in days after planting until harvesting. Yield and moisture contents were directly measured by ALMACO Seed Spector LRX using Vantage HD harvest software, mounted on the harvest combine.

5.4 Protein, Oil and Amino Acid Content

Seed protein, amino acid, and oil contents were measured on a dry weight basis by near infrared reflectance (NIR) spectroscopy using a PERTEN DA 7250 analyzer (Perten, Hägersten, Sweden) per manufacturer’s guidelines. Briefly, about 100 randomly selected whole soybean seeds from each plot were scanned and readings were recorded. Protein and oil concentrations were obtained on a 13% moisture basis and converted to g kg-1 dry weight basis.

5.5 Fatty Acid content

The fatty acids were extracted and the composition of five main fatty acids found in soybean oil namely palmitic acid (16:0), stearic acid (18:0), oleic acid (18:1), linoleic acid (18:2), and

51

linolenic acid (18:3) were analyzed. The content was determined by the method described by Spencer et al. (2004) using a Hewlett Packard HP 6890 series gas chromatograph (Agilent Technologies, Santa Clara, CA). Gas chromatography signals were analyzed by ChemStation software (Agilent technologies).

5.6 Meal Protein

Meal protein, an important parameter for soybean processors and feed industry, was calculated as per the following formula given by Pantalone and Smallwood, (2018).

Meal Protein=prot 13/(1-oil13/100)*0.92; with prot13 and oil13 standing for protein and oil concentrations on 13% moisture basis.

5.7 Heritability estimates

Heritability on entry mean basis was calculated with the following formula (Nyquist and Baker 1991):

Where h2 represents the broad sense heritability; σ2g, the genetic variance; σ2ge, the genotype x environmental interaction variance; σ2, the overall error variance; e, the number of environments; and r, the number of replications in the experiment.

5.8 Statistical analysis

Analysis of variances (ANOVA) for genetic effects of RILs for plant height, lodging, maturity, yield, protein, oil and fatty acids contents across environments were done by GLIMIX procedure of SAS 9.4 (SAS institute Inc., USA). Environments and replication within environment as well as the environment by genotype interaction were treated as random effects.

52

References

Alcivar A, Jacobsen J, Rainho J, Meksem K, Lightfoot D, Kassem M (2007) Genetic analysis of soybean plant height, hypocotyl and internode lengths. Journal of Agriculture Food and Environmental Sciences 1: 1–20. American Soybean Association, (2019) Soy Stats: A reference guide to important soybean facts & figures. The Association, 1–36. Assefa Y, Bajjalieh N, Archontoulis S, Casteel S, Davidson D, Kovács P, Naeve S, and Ignacio A. Ciampitti IA (2018) Spatial characterization of soybean yield and quality (amino acids, oil, and protein) for United States. Scientific Reports 8: 1–11. Assefa Y, Purcell L C, Salmeron M, Naeve S, Casteel S N, Kovács P, Sotirios Archontoulis, Licht M, Below F, Kandel H, Lindsey LE (2019) Assessing variation in US soybean seed composition (Protein and Oil). Frontiers in Plant Science 10: 298. Bandara AY, Weerasooriya DK, Bradley CA, Allen TW, Esker PD (2020) Dissecting the economic impact of soybean diseases in the United States over two decades. PLoS ONE 15: 1–28. Bellaloui N, Reddy KN, Bruns HA, Gillen AM, Mengistu A, Zobiole LHS, Fisher DK, Abbas HK, Zablotowicz RM, Kremer RJ (2011) Soybean seed composition and quality: Interactions of environment, genotype, and management practices. In Maxwell JE, eds, Soybeans: Cultivation, Uses and Nutrition. Nova Scientific Publishers, Inc., pp1-42. Bellaloui N, McClure AM, Mengistu A, Abbas HK (2020) The Influence of different agricultural practices, the environment and cultivar differences on soybean seed protein, oil, sugras and amino ccids. Plants 9:378 Boyer CN, Stefanini M, Larson JA, Smith SA, Mengistu A, Bellaloui N (2015) Profitability and risk analysis of soybean planting date by maturity group. Agronomy Journal 107: 2253– 2262. Brim CA, Burton JW (1979) Recurrent selection in soybeans .2. selection for increased percent

53

protein in seeds. Crop Science 19: 494–98. Burton JW, Brim CA (1981) Recurrent selection in soybeans. III. selection for increased percent oil in seeds. Crop Science 21: 31. Carrera C, Martínez MJ, Dardanelli J, Balzarini M (2011) Environmental variation and correlation of seed components in nontransgenic soybeans: protein, oil, unsaturated fatty acids, tocopherols, and isoflavones. Crop Science 51: 800–809. Chen P, Sneller CH, Mozzoni LA, Rupe JC (2007) Registration of ‘Osage’ Ssoybean. Journal of Plant Registrations 1: 89–92. Cober ER, Voldeng HD (2000) Developing high-protein, high-yield soybean populations and lines.” Crop Science 40: 39–42. Cober ER, Morrison MJ (2010) Regulation of seed yield and agronomic characters by photoperiod sensitivity and growth habit genes in soybean. Theoretical and Applied Genetics 120: 1005–1012. Cunicelli MJ, (2017) Effect of the mutant Danbaekkong or stem termination alleles on soybean seed protein concentration , amino acid composition , and other seed quality and agronomic traits. Msc. Thesis. De Bruin JL, Pedersen P (2008) Yield improvement and stability for soybean cultivars with resistance to Heterodera glycines ichinohe. Agronomy Journal 100: 1354–59. Diondra W, Ivey S, Washington E, Woods S, Walker J, Krueger N (2008) Is there a correlation between plant height and yield in soybean. Reviews in Biology and Biotechnology. 7:70-6. Donald PA, Pierson PE, St Martin SK, Sellers PR, Noel GR, Macguidwin AE, Faghihi J, Ferris VR, Grau CR, Jardine DJ, Melakeberhan H (2006) Assessing Heterodera glycines-resistant and susceptible cultivar yield response. Journal of Nematology 38: 76–82. Dornbos DL, Mullen RE (1991) Influence of stress during soybean seed fill on seed weight , germination , and seedling growth rate. Canadian Journal of Plant Sciences. 71: 373–83. Egli DB (1993) Cultivar maturity and potential yield of soybean. Field Crops Research 32: 147– 58. Fallen BD, Rainey KR, Sams CE, Kopsell DA, Pantalone VR (2012) Evaluation of agronomic and seed characteristics in elevated oleic acid soybean lines in the South-Eastern US.

54

Journal of the American Oil Chemists’ Society 89: 1333–1343. Food and Drug Administration, (2015) Final determination regarding partially hydrogenated oils. Federal Register 80(116). Gillen AM, W Shelton GW (2011) Uniform soybean tests. Gillen AM, W Shelton GW (2012) Uniform soybean tests: Southern States. 12. Grieshop CM, Fahey GC (2001) Comparison of quality characteristics of soybeans from Brazil, China, and the United States. Journal of Agricultural and Food Chemistry 49: 2669–2673. Kato S, Fujii K, Yumoto S, Ishimoto M, Shiraiwa T, Sayama T, Kikuchi A, Nishio T (2015) Seed yield and its components of indeterminate and determinate lines in recombinant inbred lines of soybean. Breeding Science 65: 154–60. Kim DS, Hanifzadeh M, Kumar A (2017) Trend of biodiesel feedstock and its impact on biodiesel emission characteristics. Environmental Progress and Sustainable Energy 37: 7– 19. Kopisch-Obuch FJ, McBroom RL, Diers BW (2005) Association between soybean cyst nematode resistance loci and yield in soybean. Crop Science 45: 956–965. Lee JG, Bilyeu KD, Shannon JG (2007) Genetics and breeding for modified fatty acid profile in soybean seed oil. Journal of Crop Science and Biotechnology 10: 201–10. Li M, Liu Y, Wang C, Yang X, Li D, Zhang X, Xu C, Zhang Y, Li W, Zhao L (2020) Identification of traits contributing to high and stable yields in different soybean varieties across three chinese latitudes. Frontiers in Plant Science 10: 1–14. Machado BQV, Nogueira APO, Hamawaki OT, Rezende GF, Jorge GL, Silveira IC, Medeiros LA, Hamawaki RL, Hamawaki CDL (2017) Phenotypic and genotypic correlations between soybean agronomic traits and path analysis. Genetics and Molecular Research 16 (2). Mahmud I, Kramer HH (1951) Segregation for yield and maturity following a soybean cross. Agronomy Journal 43: 605–9. Mello Filho OLD, Sediyama CS, Moreira MA, Reis MS, Massoni GA, Piovesan ND (2004) Grain yield and seed quality of soybean selected for high protein content. Pesquisa Agropecuaria Brasileira 39: 445–50.

55

Miller-Garvin J, Naeve SL (2017) United States Soybean Quality. University of Minnesota, St Paul. Annual Report. 17. Mourtzinis S, Gaspar AP, Naeve SL, Conley SP (2017) Planting date, maturity, and temperature effects on soybean seed yield and composition. Agronomy Journal 109: 2040– 2049. Nyquist WE, Baker RJ (1991) Estimation of heritability and prediction of selection response in plant populations. Critical Reviews in Plant Sciences 10: 235–322. Pantalone VR, Smallwood C (2018) Registration of ‘TN11-5102’ soybean cultivar with high yield and high protein meal. Journal of Plant Registrations 12: 304. Pantalone VR, Smallwood C, Fallen B (2017) Development of ‘Ellis’ soybean with high soymeal protein, resistance to stem canker, southern root knot nematode, and frogeye leaf spot. Journal of Plant Registrations 11: 250. Pantalone VR, Smallwood C, Fallen B, Hatcher CN, Arelli P (2018) Registration of ‘TN09- 008’ soybean cyst nematode–resistant cultivar.” Journal of Plant Registrations 12: 309. Pantalone VR, Burton JW, Carter TE (1996) Soybean fibrous root heritability and genotypic correlations with agronomic and seed quality traits. Crop Science 36: 1120–25. Panthee DR, Pantalone VR, West DR, Saxton AM, Sams CE (2005) Quantitative trait loci for seed protein and oil concentration, and seed size in soybean. Crop Science 45: 2015–22. Patil G, Vuong TD, Kale S, Valliyodan B, Deshmukh R, Zhu, C, Wu X, Bai Y, Youngbluth D, Lu F, Kumpatla S, Shannon J, Varshney R, Nguyen HT(2018) Dissecting genomic hotspots underlying seed protein, oil, and sucrose content in an interspecific mapping population of soybean using high-density linkage mapping. Plant Biotechnology Journal, 16 :1939–1953. Qi ZM, Wu Q, Han X, Sun Y, Du X, Liu C, Jiang H, Hu G, Chen Q (2011) Soybean oil content QTL mapping and integrating with meta-analysis method for mining genes. Euphytica 179: 499–514. Rajcan I, Hou G Weir AD (2005) Advances in breeding of seed quality traits in soybean. Journal Of Crop Improvement. 14: 145–74. Ramli B, Noor M, Caviness CE (1980) Influence of induced lodging on pod distribution and seed yield in soybeans. Agronomy Journal, 2–4.

56

Rotundo JL, Miller-Garvin JE, Naeve SL (2016) Regional and temporal variation in soybean seed protein and oil across the United States. Crop Science 56: 797–808. Saito K, Nishimura K, Kitahara T (2012) Effect of lodging on seed yield of field-grown soybean. Journal of Crop Sciences. 8: 27–32. Sala-Vila A, Ros E (2011) Mounting evidence that increased consumption of linolenic acid, the vegetable n-3 fatty acid, may benefit cardiovascular health. Clinical Lipidology 6: 365–69. Salmeron M, Gbur EE, Bourland FM, Buehring NW, Earnest L, Fritschi FB, Bobby R. Golden BR, Hathcoat D, Lofton J, Miller TD, Neely C (2014) Soybean maturity group choices for early and late plantings in the midsouth. Agronomy Journal 106: 1893–1901. Sherrie I, Khaled O, Washington E, Lage P, Woods S, Kantartzi SK, Meksem K, Lightfoot DA, Kassem MA (2011) Evaluation of several agronomic traits in ‘Essex’ By ‘Forrest’ recombinant inbred line population of soybean [Glycine Max (L.) Merr.]. Atlas Journal of Plant Biology 1: 13–17. Spencer MM, Pantalone VR, Meyer EJ, Landau-Ellis D, Hyten DL (2003) Mapping the fas locus controlling stearic acid content in soybean. Theoretical and Applied Genetics 106: 615–19. Stobaugh BL, Palacios LF, Chen P, Orazaly M (2017) Agronomic evaluation of high-protein and high-oil soybean genotypes for specialty markets.” Journal of Crop Improvement 31: 1– 14. Sykes V, Willette A, Blair R, Smallwood C (2017) Soybean variety tests in Tennessee (2017). Thakur M, Hurburgh CR (2007) Quality of US soybean meal compared to the quality of soybean meal from other origins. Journal of the American Oil Chemists’ Society 84: 835– 8343. Thompson JA, Nelson RL (1998) Utilization of diverse germplasm for soybean yield improvement. Crop Science 38: 1362–68. Wiggins B, Wiggins S, Cunicelli M, Smallwood C, Allen F, West D, Pantalone V (2018) Genetic gain for soybean seed protein, oil, and yield in a recombinant inbred line population. Journal of the American Oil Chemists’ Society 1:43-50. Wilcox JR, Sediyama T (1981) Interrelationships among height, lodging and yield in determinate and indeterminate soybeans. Euphytica 30: 323–26.

57

Wilcox JR, Shibles RM.(2001) Interrelationships among seed quality attributes in soybean. Crop Science 41: 11–14. Wilson RF(2004) Seed Composition. In Boerma and J.E. Specht,eds Soybeans: Improvement, Production, and Uses. American Society of Agronomy; Crop Science Society of America; Soil Science Society of America. Pp 621–77 Xavier A, Rainey KM (2020) Quantitative genomic dissection of soybean yield components. G3: Genes, , Genetics 10: 665–75. Yaklich RW, Vinyard B, Camp M, Douglass S (2002) Analysis of seed protein and oil from soybean Northern and Southern region uniform tests. 1504–1515. Yamaguchi N, Sayama T, Yamazaki H, Miyoshi T, Ishimoto M, Funatsuki H (2014) Quantitative trait loci associated with lodging tolerance in soybean cultivar ‘Toyoharuka. Breeding Science 64: 300–308. Yin Y, Andrew A, Blachier F (2011) Soya Bean meal and its extensive use in livestock feeding and nutrition. Soybean and Nutrition 12: 369-384 Yuan J, Njiti VN, Meksem K, Iqbal MJ, Triwitayakorn K, Kassem MA, Davis GT, Schmidt ME, Lightfoot DA (2002) Quantitative trait loci in two soybean recombinant inbred line populations segregating for yield and disease resistance. Crop Science 42: 271– 77.

58

Appendix

Table 2-1: Significant differences among genotypes (G), environments (E) and, genotypes and environments (GXE) interaction (P=0.05), for plant height, lodging, maturity date, yield, oil, protein and meal protein.

std. Genotype GxE Z dev. LSD Resid. Coeff. Trait P value Value mean min max a value Var. Var. b H2 Yield (Kg/ha) <.0001 <.0001 3410 2305 4344 329 553 194937 12.9 63.8 Protein(g/kg) <.0001 0.0008 390.9 374.4 419.1 7.3 10.0 95.6 2.5 76.5 Meal Protein (%) <.0001 0.0009 45.4 43.6 48.4 0.8 12.3 1.1 2.3 80 Maturity (d.a.p) <.0001 <.0001 124.8 119.0 131.1 2.7 3.7 6.1 2.0 75.6 Oil (g/kg) <.0001 0.0028 213.3 203.8 224.0 3.7 5.0 25.0 2.3 76.2 Lodging <.0001 <.0001 1.8 1.1 2.8 0.3 0.7 0.3 28.4 56.9 Height (cm) <.0001 0.0002 28.7 23.7 37.0 2.0 3.1 6.8 9.1 73.3 Palmitic <.0001 0.0094 11.9 11.1 12.6 0.3 0.3 0.1 2.7 Stearic <.0001 0.0009 4.0 3.5 4.7 0.3 0.3 0.1 6.1 Oleic <.0001 <.0001 20.0 17.1 23.3 1.1 1.4 1.4 6.0 Linoleic <.0001 <.0001 56.3 53.9 58.7 0.9 1.1 1.0 1.8 Linolenic <.0001 0.0002 7.9 6.7 9.1 0.4 0.5 0.2 5.8 a std. deviation of LSMEANs b (Root MSE * 100)/mean H2 broad sense heritability

59

Table 2-2: Coefficient of correlation between seven traits from the RIL population grown over five different environments across Tennessee

Yield Maturity(dap) Lodging Height Protein Oil Meal Protein Yield 0.32(**) -0.16(**) 0.45(**) 0.33(**) -0.43(**) 0.18(**)

Maturity -0.16(**) 0.25(**) 0.16(**) -0.72(**) -0.15(**)

Lodging -0.004(NS) -0.07(*) 0.12(**) -0.020(NS)

Height 0.15(**) -0.26(**) 0.057(NS)

Protein -0.49(**) 0.90(**)

Oil -0.10(**)

Meal Protein

** and *Significant at 5 and 0.1% probability level, NS: non-significant

60

Table 2-3: RIL population development from inception of cross between TN09-029 and NCC05-1168 in 2012 to the summer 2016 field trial.

Year Season Location Generation grown Procedure 2012 Summer ETREC

2012/13 Winter TARS F1 F1 hills 1 ft apart

2013 Summer ETREC F2 F2 pod pick

2013/14 Winter 27 farms F3 F2 pod pick

2014 Summer ETREC F4 F2 pod pick

2015 Summer ETREC F5 Single plant pull

2015/16 Winter 27 farms F5:6 Bulk Harvest

2016 Summer ETREC/MILAN F5:7 Bulk Harvest

ETREC: East Tennessee Research and Education center, TARS: USDA, Tropical Agriculture research, Isabela, PR ,27 farms:27 farms of homestead, Inc., Homestead, FL, MILAN: Research and Education Center at Milan

61

Figure 2-1: Frequency distribution for seed yield, n=114. Arrows denoting lines’ yield are shown to compare parental-progeny yields. Each class consists of 459 units (kg)

62

Figure 2-2. Frequency distribution for seed oil content, n=114. Arrows denoting parental lines’ oil content are shown to compare parental-progeny yields. Each class consists of 4 units (g)

63

Figure 2-3. Frequency distribution for seed protein content, n=114. Arrows denoting parental lines’ protein content are shown to compare parental-progeny yields. Each class consists of 9 units (g)

64

TN09-029

Figure 2-4. Frequency distribution for meal protein, n=114. Arrows denoting parental lines’ protein content are shown to compare parental-progeny yields. Each class consists of 1 unit (g)

65

Chapter 3: Investigation of Rhg1 and Rhg4 loci for selected TN09-029 x NCC05-1168 recombinant inbred lines with different responses to Soybean Cyst Nematode

66

Abstract

Soybean Cyst Nematode (SCN) is the most destructive and yield reducing pathogen in the US and in the world, causing annual yield losses estimated to be over $1 billion in the US. Breeding for resistance remains the main strategy to control SCN and maintain yields. Rhg1 and 4 are the main loci involved in SCN resistance. Rhg1 is responsible for resistance phenotype found in almost all commercialized soybean cultivars. However, recent studies report a shift in virulence of SCN populations and the latter are progressively becoming adapted, breaking the resistance of most of the grown cultivars. This shift is often related to sole use of one source of resistance and a wide genetic diversity among SCN populations. We evaluated 115 F5-derived recombinant inbred lines (RILs), resulting from a cross between TN09-029 and NCC05-1168 and segregating for SCN resistance. The RILs were screened for resistance to SCN race 2 (HG 1.2.5.7) and race 3 (Hg type 0) in the greenhouse. We then investigated the association of SCN response variability in 6 selected soybean RILs representing the resistant, susceptible, and the moderately resistant phenotypes, with potential genotypic variability at Rhg1 and 4. We tested them through stepwise assays including, screening in the greenhouse, sequencing and copy number variation of both loci, transcript abundancy as well as marker assisted selection using SSR and SNP markers. All the RILs presented similar phenotypical and genotypical responses except with SSR marker selection assay. The selected RILs showed distinct SSR markers (satt574 and satt082) genotypes at chromosome 17. The results confirmed the observed cyst-count Female indices (FI) and suggest that the observed SCN phenotypes could be associated with the genotypical polymorphism at chromosome 17. The results suggest that the low copy rhg1 coupled with rhg4 confer resistance to SCN race 3 but not race 2, which seems to be linked to the SSR marker cqSCN 005. Therefore, soybean breeding programs should stack different resistance genes in order to achieve broad resistance to various SCN races. In addition, no significant differences were found between the mean seed yield of the SCN-resistant (3,359 kg ha-1) and the SCN- susceptible RILs (3,416 kg ha-1) to SCN, race 2. These results suggest that resistance to SCN, race 2 does not necessarily impede yield suggesting a possibility to improve seed yield and quality in SCN resistance enhanced cultivars. Our results serve as a baseline for cloning the major SCN resistance gene responsible for resistance against SCN race 2.

67

1. Introduction

The soybean cyst nematode (SCN, Heterodera glycines Ichinohe) is considered as the most destructive pest of soybean production worldwide. In the USA, all soybean producing states reported SCN infections except West Virginia, causing losses amounting to 1 billion US dollars ( Cook et al., 2012; Wang et al. 2017; Korasick et al., 2020). In the southern region, SCN was ranked first among soybean loss causing diseases followed by root-knot nematode (Meloidogyne spp.), frogeye leaf spot (Cercospora sojina), and Cercospora leaf blight (Cercospora kikuchii) in that order. The average percentage of yield loss caused by SCN in 16 southern states was estimated at 1.54% equivalent to 0.6 million MT and 2.0% amounting to 0.05 million MT in Tennessee (Allen et al., 2017). In addition to economic losses, the genetic variability of SCN populations is remarkable, making breeding for resistance and control a real challenge (Kim et al., 1997). Initially, SCN populations were classified in 16 races with reference to their capacity to reproduce on a selected list of 4 soybean resistant lines called differentials or indicator lines: Peking, Picket, PI88788, and PI90763 (Riggs and Schmitt, 1988; Niblack et al., 2002). However, the number kept on increasing with the discovery of new races (Lian et al., 2017). Thus, a more robust classification system, “Hg types (Heterodera glycines)” was introduced to limit the increasing number of differentials that were being added for classifying new virulent populations. In this revised system, SCN populations are classified based on their ability to reproduce on a list of 7 soybean lines called “indicator lines”. A cut off Female index (FI) of 10% in comparison to the standard susceptible line, is used to distinguish between virulent and non-virulent populations on a given indicator line. Thus, a population with a FI greater than 10% on line 1,2,5, and 7 will be named HG 1,2,5,7 and a non-virulent population to all 7 lines will be named HG 0 (Niblack et al., 2002; Beeman et al., 2016). The deployment of resistant cultivars through breeding programs has been the core SCN control strategy for decades (Arelli et al., 2015; Chen et al., 2001; Niblack et al., 2008). The current research focus has been the increase of diversity of SCN resistance genes through the identification of new sources of resistance with the aim of reducing the shift of SCN populations (Arelli et al., 2015; Niblack et al., 2008; Gardner et al., 2017). In the US, soybean breeders have been exploiting available sources of resistance to control SCN. These include plant introduction (PI) 88788, PI 90763, PI 548402, and PI 209332 (Concibidoet al., 2004). Several Quantitative

68

Trait Loci (QTL) related to resistance to SCN have been identified and mapped. The first QTL was mapped in the region containing Rhg1 (Resistance to Heterodera glycines1) loci on chromosome 18 and was named cqSCN-001(http://www. soybase.org accessed on April 5 2020, (Concibido et al., 2004). Commonly known types of SCN resistance, based on their Peking-type (PI 548402) and PI 88788-type were reported to inherit this QTL (Concibido et al., 1997; Glover et al., 2004). Similarly, the second QTL from Peking source, named cqSCN-002 was fine- mapped in the region containing Rhg 4 loci on chromosome 8 (Meksem et al., 2001). Later on, other QTL were confirmed and named cqSCN-003 on chromosome 16 from PI 88788 (Glover et al., 2004), cqSCN-005 with SCN resistance from Hartwig ( Peking type) on chromosome 17 (Kazi et al., 2010), cqSCN-006 on chromosome 15, cqSCN-007 on chromosome 18 from Glycine soja PI 468916 (Kim and Diers, 2013) and recently qSCN10, located on chromosome 10 from PI 567516 C (Kadam et al., 2015). However, Rhg1 and Rhg4 remain the two major QTLs that have been consistently mapped and reported in most of soybean germplasm (Liu et al., 2017; Cook et al., 2012; Liu et al., 2012; Vuong et al., 2010). In fact, it is reported that almost 90% of SCN resistant commercial cultivars in the central United States exploit the Rhg1-b allele (haplotype), inherited from the soybean line PI 88788, as the major SCN resistance locus (Cook et al., 2012). The sole use of this source of resistance gave rise to more adapted SCN populations putting in jeopardy the reliance on the use of resistant cultivars (Kim et al., 1997; Niblack et al., 2008; Mccarville et al., 2017).

The Rhg1 locus consists of a ~31.2 kb tandem repeat of three genes on its haplotype, encoding an amino acid transporter (GmAAT(Glyma.18G022400), an ɑ-Soluble N-ethylamide-sensitive factor Attachment Protein (a-soluble N-ethylmaleimide–sensitive factor attachment protein) (Glyma.18g022500), and WI12 (wound-inducible domain) proteins (Glyma.18g022700) that additively contribute to the resistance against SCN (Liu et al., 2017; Cook et al., 2012; Cook et al., 2014). A fourth gene encoding a PLAC-domain protein was later reported in the tandem (Glyma.18G022600) (Bayless et al., 2019; Lee et al., 2015). The genes cause the SCN‐induced syncytium to degrade a few days after induction (Whitham et al. 2016; Bayless et al. 2019). The resistant Rhg1 were classified into two main groups, Rhg1-a (Peking-type) and Rhg1-b (PI 88788-type) based on their histological responses to SCN infection (Brucker et al., 2005; Kim et al., 2010; Mitchum, 2016), whereas the susceptible was named Rhg1-s (Essex and Williams

69

genotypes for example) (Liu et al., 2017). Phenotypically, the PI 88788 -type resistance is manifested later on during the juvenile 3-4 (J3-J4) stages whereas in the Peking-type resistance (PI 548402), the resistant response occurs as early as J2 stage (Klink et al., 2010; Klink et al., 2009). The analysis of the GmSNAP18, the dominant candidate gene mediating resistance to SCN at the Rhg1 locus revealed polymorphisms of 5 amino acids (AA) between Rhg1-a and Rhg1-b (Liu et al., 2017). It was reported that the Rhg1-a requires the presence of Rhg 4 alleles, while Rhg-b requires only the Rhg1-b allele (Liu et al., 2017; Yu et al., 2016; Kandoth et al., 2017).

The Rhg1-b allele resistance was assumed to result from the copy number variation (cnv) of the genes at Rhg1 locus. In fact, the susceptible lines were reported to have these genes in one copy while in the resistant lines, these genes are found in multiple copies ranging between 3 and 10 copies (Cook et al., 2012; Lee et al., 2015). However, Patil et al., (2019) found that despite Rhg1- b and cnv at that locus being the main parameters conferring resistance to PI 88788, there is a minimum cnv threshold under which an epistatic effect of Rhg1-b and the Peking-type Rhg4 is required to achieve full resistance (Patil et al. 2019).

The SCN resistance gene Rhg4 (Glyma08g108900) encodes a serine hydroxylxymethyltransferase (GmSHMT08) protein (Kandoth et al., 2017; Liu et al., 2012; Korasick et al., 2020). Two genetic polymorphisms (389 G/C and 1,165 A/T in the first exon and in the second exon respectively, that alter the kinetic properties of the SHMT, are responsible for the difference between the susceptible and resistant Rhg4 alleles (Shi et al., 2015; Liu et al., 2012). It has been suggested that changes in the kinetic characteristics of this enzyme may affect folate homeostasis, causing folate deficiency in the nematode induced-feeding sites leading to nematode developmental arrest. Also, folate deficiency in the feeding site may induce hypersensitive response-like programmed cell death, resulting in degeneration of the feeding sites and subsequently SCN resistance (Liu et al., 2012; Korasick et al., 2020). Breeding programs have been focusing on diversifying and/ or combining various sources of SCN resistance in order to maintain soybean resistance in commercial cultivars. In that sense, the UT soybean program developed a population of soybean recombinant inbred lines (RILs) with the goal to combine high yield, seed quality traits, and SCN resistance. The population was screened in the field across Tennessee and in the greenhouse for resistance to SCN race 3 (Hg

70

type 0) and race 2 (Hg type 1.2.5.7). All RILs showed resistant to race 3 but with variable responses to race 2. In the following series of assays, we investigated the population variability in relation to SCN-race 2 response. Thus, we selected two RILs representing differential phenotypic responses: i. resistant, susceptible, or moderately resistant to race 2. We then investigated copy number and sequence variability at rhg1 and Rhg4 as the main known SCN resistance loci in a bid to genotypically explain their phenotypical response to SCN. The RILs showed neither difference in copy number nor significant polymorphisms at both loci. We then explored other potential differences at other QTL through MAS (Marker assisted selection) using SSR markers and BARCSoySNP6K chip. Here we present the main results from the assays. The findings will serve as a baseline to find the causal factor of the phenotypical variability amongst the TN09-029 x NCC05-1168 RILs and provide for a potential new source of resistance to future breeding initiatives.

2. Results

2.1 Greenhouse screening assays for resistance to Hg type 0 (race 3) and Hg type 1.2.5.7 (race 2)

To investigate the phenotypical variation among the 115 F5-derived RIL, we screened them against SCN race 2 and 3 in the greenhouse at Jackson, Tennessee. All RILs exhibited a resistant response to Hg type 0 (Race 3) with FI ranging from 0 to 9. The parental lines TN09-029 and NCC05-1168 were scored Female Index (FI) of 0 and 4, respectively. The susceptible indicator line Hutcheson recorded FI of 95 (Table 3-1). In response to SCN race 2 (HG type 1.2.5.7), 49 RILs scored less that 10 FI and were rated resistant (HG type 1.2.5.7, while 65 scored more than 10 FI and were recorded susceptible (Table 3-2). Among the resistant lines 36 lines scored less than FI of 5 and were considered as highly resistant. On the other hand, among the 65 susceptible lines 61 lines scored a FI less than 50, and therefore considered as tolerant to SCN, race2. This distribution is consistent with our finding that the parental line TN09-029 was resistant (FI = 0), whereas the parental line NCC05- 1168 was susceptible (FI = 95) (Table 3-2). In addition, from seed yield data (chapter 1) the mean seed yield of the SCN resistant lines (3,358,9 kg ha-1) was not significantly different to the mean seed yield of the susceptible lines (3,416.6 kg ha-1) (Fig. 3-1).

71

2.2 SCN race 2 response confirmation in the greenhouse

To confirm the recorded phenotypes, a confirmation test was carried out in the greenhouse on a number of selected lines in addition to the parental lines. The selection was based on the recorded FI during the first round. The confirmation assay showed the same SCN response pattern with TN09-029, SCN-109 scoring FI less than 10, whereas the rest of the lines scored FI more than 10 (Fig 3-2). The susceptible check William 82 had the highest FI after inoculation with SCN race 2. The RILs in this assay exhibited consistent phenotypical responses to SCN race 2 and provided a reliable ground to proceed to the next steps of our investigation.

2.3 Rhg1 and Rhg4 amplification and sequencing

To investigate the sequence variation at rhg1 and 4, we cloned and sequenced the promoter and coding sequences (CDS) region containing of both from various resistant and susceptible RILs. We sequenced the CDS of both loci with the aim of identifying sequence variability between the susceptible and the resistant lines. Second, we sequenced the 2kb region in the promoter, immediately upstream of the translation start codon ATG. The Rhg1 (Gm -SNAP18) locus CDS of the selected RILs matched the sequences of the resistant cultivar Forrest (Peking-type) but showed 6 SNPs in comparison to the susceptible allele from Williams 82 (Table 3-3). The sequence of Rhg4 (Gm SHMT08) locus of the RILs also was identical to the sequence of Forrest, presenting two major SNPs in comparison to the susceptible check (Williams 82) (Table 3-4). In our study, the 2kb promoter region of Rhg1 and Rhg4 loci, revealed 9 distinguishing SNPs in Rhg1 and 6 in Rhg4 (Table 3-5 and 3-6). Considering that all resistant and susceptible lines and Williams 82 showed similar polymorphism, these SNPs don’t seem to be lined to resistance differences among the RILs.

2.4 Copy Number Variation (CNV) at Rhg 1 and 4

To investigate cnv profiles of our RILs, seeds of one resistant, one susceptible, one tolerant line and their parental lines were analyzed for difference in copy number of Rhg1 and Rhg4 using the CGH array. Williams 82, which contains single copy at Rhg1 and 4 was used as a reference and calibrator. The CGH array showed approximately three copies of the Rhg1 and one copy of Rhg4 in the three RILs and their parents compared to the single copy detected in Williams 82 (Fig 3-3,

72

and Fig 3-4). Therefore, absence any variation in the copy number of Rhg1 and Rhg4 in the NILs indicates that resistance differences among the RILs is independent of copy number of Rhg1 or Rhg4.

2.5 Rhg1 &4 transcript quantification by real time RTPCR

To get an insight into the genetic responses of rhg1 to SCN, we analyzed the expression of GmSNAP18 in the selected RILs with reference to the non-infected conditions of the susceptible RIL (SCN-066 or SCN-072) during SCN infection at 3,5,7 and 9 days post infection (dpi) through quantitative real-time PCR (qRT–PCR). At 3 dpi, the transcripts of GmSNAP18 were only slightly induced in the moderately resistant RIL (SCN-099) and the resistant RIL SCN-109. At 5, 7 and 9 dpi no significant difference in the expression between the resistant and the susceptible RILs under infected or non-infected condition can be detected (Fig 3-5). Next, we analyzed the transcript abundance of GmSHMT08 (Rhg4) at 3, 5,7 and 9 dpi (Fig3- 6) in the selected representative RILs. High levels of expression were found in RILs under infected conditions at 3, and 9 dpi. At 7 dpi, GmSHMT08 was only significantly upregulated in the infected condition of the susceptible reference RIL SCN-066. Surprisingly, there was a remarkable upregulation at 5 dpi in the resistant RIL SCN-109, both under non-infected and infected conditions. In contrast, the moderately resistant RIL SCN-099 was almost 6-fold upregulated under non-infected conditions and became significantly downregulated at 5 dpi whereas under infected conditions no significant changes in the expression were observed. To confirm this observed trend, we selected another set of representative RIL; the susceptible SCN-072 and the resistant SCN-137, and we quantified the transcript abundancy of the two genes at 3dpi (Fig 3-7 and 3-8). Similarly, there was a no significant induction of GmSNAP18 whereas GmSHMT08 was slightly induced under infected conditions.

2.6 Marker assisted selection (MAS)

2.6.1 SSR markers

To investigate the genetic basis under SCN resistance (race 2) in our RIL population, we subjected the selected lines to a number of SSR markers linked to Rhg1, 4 and 5. The selected RILs were screened with sets of SSR markers, Satt309 and Sat_168 (linked to Rhg1) and Sat_162 and Satt632 (linked to Rhg4). In addition, we used Satt574 and Satt082, which are

73

linked to cqSCN-005 loci on chromosome 17 and commonly used to screen for resistance to race 2. All the RILs exhibit the same resistance alleles at Rhg1 and 4 loci. The variability between the susceptible and the resistant lines was only detected using markers on chromosome 17 (Rhg5 alternatively cqscn 005) (Table 3-7). However, line SCN-066 showed a segregation pattern, which was confirmed by divergent responses of the SSR markers on chr 17. Surprisingly, even the phenotypically resistant RIL SCN-109 exhibited segregating reactions with satt574. One of the moderate resistant RIL, SCN-042 did also produce an opposite of the expected response exhibiting resistant genotype at both satt574 and satt082. As expected RIL SCN-099, the moderately resistant line, showed a segregating pattern with both markers at chr17. Nevertheless, with marker satt082, the results confirmed the observed phenotype, except one plant that exhibited a heterozygous genotype. Together, these results indicate that unidentified major SCN resistance gene on chromosome 17 control soybean resistance to race 2.

2.6.2 SNP marker analysis

The analysis of Illumina Soy6K SNP chip data was filtered and resulted into a list of 154 high impact SNPs across all the 20 soybean chromosomes. The SNP were then compared to the list of previously reported list of syncytium differentially expressed genes (Rambani et al., 2015). Only three genes (Glyma.13g238200, Glyma.17g0538800, and Glyma.16g196200) with high impact SNPs associated with the susceptible phenotypes overlapped with the syncytium DEGs.

3. Discussion

3.1 The RIL resistance to SCN, race 2 is mediated independently of Rhg1 and 4

Though SCN can enter and trigger the first phase of the infection in the same way in the resistant and the susceptible lines (Mahalingam and Skorupska, 1996), the resistant cultivars have the ability of preventing the second phase of infection process by disrupting the formation and maintenance of the syncytium (Klink et al., 2009; Sobczak et al., 2005; Whitham et al., 2016) . The main sources of resistance to SCN have been reviewed by Concibido et al., (2004) and reported Peking (PI548402), PI88788, and PI437654 to be most predominant (Concibido et al., 2004). From a total of 40 SCN resistance QTLs, only candidate genes for Rhg1 and Rhg4 could be cloned (Cook et al., 2012; Liu et al., 2012). Ever since, different groups of researchers studied

74

the underlying mechanisms of soybean resistance to SCN. In the same line, we investigated the resistance mechanism from a population of recombinant inbred lines segregating for SCN resistance with the focus on their Rhg1 and Rhg4 loci.

Any phenotypic variation including variation in resistance is governed by genetics and environmental factors (Willmore et al., 2007). Here, we sought to unravel genetic factors associated the observed phenotypes with regards to SCN resistance. A female index can be an indicator of aggressiveness and reproducibility easiness for an SCN population on a resistant soybean line (Beeman et al., 2016; Tylka, 2016). So, a higher FI means that the SCN population will easily reproduce and grow on a specific soybean line. The phenotypic screening of the RIL population revealed that the whole population was resistant to race 3 with FI less than 10, whereas a distribution of resistance and susceptibility response was observed against race 2 within the same population. The complete resistance of the population might come from its pedigree parentage. In fact both parents have been previously been reported to be resistant to SCN race ? (Gillen and Shelton, 2011; Gillen and Shelton, 2012). Equally important, at one side, in the background of TN09-029 there is a known resistant line to SCN, Hartwig (Anand, 1992) both at maternal and paternal sides. At the other side, parent NCC05- 1168 has in its background, S99-2281 and TN97-167 reported to have certain levels of resistance to various races of SCN including race 3 (Shannon et al., 2009). In fact, it has been reported that S99-228 exhibits a field and greenhouse broad resistance to SCN especially HG types 2.5.7, 1.2.5.7, 0, 1.2.6.7, and 2.7 (races 1, 2, 3, 5, and 14) (Shannon et al., 2009), while S99-2281 was rated resistant to race 2, 3 and 14 (Gillen and Shelton, 2002). However, the RIL population segregated for resistance to race 2. Our result is consistent with the fact that the parental line, NCC05-1168 is susceptible to SCN race 2 (Gillen and Shelton, 2011). Surprisingly, NCC05- 1168 results from the cross between S99-2281 and TN97-167, reportedly both resistant to SCN race 2 (Shannon et al., 2009; Gillen and Shelton, 2002). It is worth to note that from several crosses involving S99-2281, none is said to have SCN resistance as a breeding trait (Gillen and Shelton, 2009).

Next, we investigated genotypic variations that could be associated with the observed phenotypic variability to race 2. We sequenced the coding sequence of both rhg1 and rhg4 to investigate any significant SNPs in both sequences.

75

The cloning of Rhg4 revealed the critical role played by this locus in SCN resistance (Kandoth et al., 2017; Liu et al., 2012). Through the map-based cloning, Mutation analysis, gene silencing and transgenic complementation of soybean SHMT at Chromosome 08, they found that the alleles of Rhg4 responsible for SCN resistance or susceptibility differed by two SNPs in the CDS which modify a major regulatory property of the enzyme (Liu et al., 2012). Recently, cnv at Rhg4 consisting of a 35.7-kb tandem repeat unit was reported (Patil et al., 2019). Sequence analysis revealed matching Rhg1 sequence polymorphism to the Forrest (Peking) SCN resistance allele, elsewhere called Rhg1-a (Liu et al., 2017). The six identified SNPs in Rhg1 sequence were previously reported to be associated with the resistance to SCN (Liu et al., 2017), (Matsye et al., 2012). Surprisingly, the same SNPs of GmSNAP18 were present in the resistant and the susceptible lines, suggesting that SCN resistance variability among our genotypes is not the result of sequence differences in Rhg1-a. Similarly, the sequence alignment of the Rhg4 (SHMT 08) revealed similarities between our selected lines and Forest. The two SNPs distinguishing Forrest from the susceptible line William 82 were present in all the 6 selected lines. The two SNPs were reported to cause a change in amino acids from Arginine to proline and from tyrosine to asparagine and in some instances explain the variability in SCN response between the resistant and the susceptible lines (Liu et al., 2012; Wu et al., 2016; Rambani et al., 2020). These mutations were functionally reported to have an impact on the encoded GmSHMT08 enzyme by impairing its properties, such as subunits associations, PLP cofactor and substrate binding, and catalytic site structure (Patil et al., 2019).

As previously reported, having both Forrest Rhg1 and Rhg4 alleles suggest a full resistance to SCN (Meksem et al., 2001; Liu et al., 2017). Additionally, Rhg4 is primarily linked to SCN race 3, but it was reported to have some minor effects on resistance to race 2 and 14 (Shaibu et al., 2020). However, both groups used SCN race 3 in their assays, which may infer the involvement of rhg1 and 4 loci in resistance to race 3 but not necessarily to race 2 as all selected lines exhibited identical sequences at Rhg1 and 4. The results suggest that our RILs lines inherited Forrest (Peking) type Rhg1-a and 4 loci and this would partly explain the observed resistance to SCN race 3 (Meksem et al., 2001) but not race 2.

The promoter sequences play an important role in transcription and subsequently gene expression as they control the binding of RNA polymerase to DNA. It has been reported that

76

sequence polymorphism at rhg1 and 4 between the resistant and the susceptible lines. In fact, both alpha-SNAP [PI 88788] and alpha-SNAP [Peking/PI 548402] presented nearly identical sequences, which differ from Williams 82 (Matsye et al., 2012). A recent report suggested that SNPs in the promoter sequences at Rhg1 and Rhg4 may provide an additional layer of resistance to SCN (Patil et al., (2019). In our investigation the promoter sequences at both Rhg1 and 4 could not explain the phenotypical variability in plant response to race 2. Altogether, our results suggest that the phenotypical variation was not due to variability in the promoter sequences at Rhg1 and 4. Copy number variation is reported to be an important source of genetic variation that can affect gene expression profile, phenotypic variation and adaptability in soybean (Anderson et al., 2014; Lye and Purugganan, 2019; McHale et al., 2012). Previous studies pointed out the role played by cnv in SCN resistance (Cook et al., 2012; Cook et al., 2014; Lee et al., 2016). It was reported that both Rhg1 copy number and type were key in controlling SCN resistance, with greater copy number of Rhg1 conferring greater resistance (Yu et al., 2016). In fact, the strong positive selection for high copy number combined with recombination at Rhg1 locus is hypothesized to have led to a wide range of copy number found in soybean populations (Lee et al., 2015). The predominance of PI 88788 (over 95% of marketed cultivars) with high copies of rhg1, as the main source of SCN resistance in the USA, could be an indication that soybean breeders focused on increasing the copy number at rhg1 and its ultimate sole use lead to most of SCN populations currently breaking the resistance (Patil et al., 2019; Cook et al., 2012). Cook et al., (2014), classified SCN copy number variants at Rhg1 in single copy type represented by W82, low copy with 2 to 3 copies represented by Peking and high copy types with 4 to ten copies represented by PI88788.

Recent findings reported that not only cnv at rhg1 copy number is critical to SCN resistance but it also at Rhg4 (Patil et al., 2019). According to their findings, at least 5.6 copies of the PI88788- type hg1 (Rhg1-b) are critical to mediate SCN resistance, regardless of the action of Rhg4 (GmSHMT08) alleles but with the GmSNAP18 copies falling below 5.6, a Peking-type GmSHMT08 allele is compulsory to ensure SCN resistance. Overall, they suggested that the more Rhg4 copies are present, the more resistance to multiple SCN races is achieved (Patil et al., 2019; Yu et al., 2016). All the RILs in our population and the parental lines had only one copy of

77

Rhg4 as in the low copy check Williams 82 (Fig. 3-4). These results suggest also that cnv was not responsible of the variability within the RIL population against SCN race 2.

Increased transcript abundancy of Rhg1 (Cook et al., 2014; Cook et al., 2012) and Rhg4 (Kandoth et al., 2017) were associated with increased resistance to SCN. From the Rhg1 tandem repeat, GmSNAP18 was found to be the strongest and most expressed gene (Liu et al., 2017). Therefore, we investigated whether changes in the expression levels of Rhg1 and Rhg4 could be responsible for the variable response of our RILs to race 2. No significant changes in the expression of Rhg1 were observed among the susceptible and resistant RILs in response to infection by race 2. Similarly, Rhg4 (SHMT08) transcript quantification revealed high expression levels in both susceptible and resistant RILs under infected conditions at 3, and 9 dpi. This is in agreement with previous reports of SHMT08 induction as early as 3 dpi (Liu et al., 2012) and at later stages (5 and 10 dpi) (Lakhssassi et al., 2019). Taken together, our results suggest that the resistance to SCN found in our RILs is independent of Rhg1 and Rhg4 and another unidentified major SCN may be responsible for resistance to race 2 in this population. In this context, it may be important to mention that Schuster et al., (2001) identified novel QTL on LG D2 (chr17) in a region delimited by two SSR markers (Satt082/Satt574) that was associated with resistance to SCN race 14. Interestingly, the same region was later linked to resistance to SCN race 2 (Kazi et al., 2010). We used these markers together with other canonical markers at Rhg1 and 4, to investigate the possible association of this region with resistance to race 2 in our population. The use of the of Satt082/Satt574 was considered after assessing our population pedigree and finding it had cultivar “Hartwig” known to have broad spectrum of resistance to several SCN races including race 2, derived from PI 437654 (Kazi et al., 2010; Anand, 1992). In fact, SCN race 2 can easily reproduce on Peking and PI88788 (Gardner et al., 2017), a finding that may explain why our Peking-type Rhg1 and 4 did not produce a resistant phenotype to SCN race 2 in all RILs. Moreover, the use of these markers was based on their previous use for SCN race 2 screening (Arelli et al., 2006). In general, these markers confirmed the association between resistance phenotype to race 2 and the indicated QTL (cqSCN-005). Of note is that the QTL associated with this marker were previously reported to be located in the interval linked to SDS leaf scorch from from ‘Pyramid’ and ‘Ripley’ (cqSDS-001) (Kazi et al., 2010). A fine mapping of this region will be necessary to identify gene candidates for in-depth functional

78

characterization in order to identify the major resistance gene conferring resistance against race 2. In summary, we investigated the association between the observed resistance to SCN phenotypes and genotypic variability especially at Rhg1 and 4 loci. We subjected a population of RIL segregating for SCN resistance to SCN race 2 and 3 infections in the greenhouse. We also examined Rhg1 and 4 for DNA polymorphism in the coding sequence and promoter region, sequences, copy number variation, and gene expression levels. or any other QTL that may explain the phenotypical variation between the resistant, the susceptible and the moderately susceptible lines. In the greenhouse, all RIL were resistant to SCN race 3 and segregated for SCN race 2. The resistance to SCN race 3 is consistent with could be the parental pedigree and the Rhg1 and 4 sequences. However, neither sequence variability in Rhg1 and 4, cnv, nor transcript abundance could explain the observed variability in resistance to SCN race 2. Interestingly, the observed phenotypes could only be associated with the variability found with SSR markers at chromosome 17. Taken together, our results point into a new major resistance gene controlling soybean response to SCN race 2.

3.2 Favorable SCN resistance alleles could be exploited while maintaining high yield potential

The average yield from resistant and susceptible lines were not statistically different. This result is consistent with what was reported by De Bruin and Pedersen, (2008). However, our results were in contradiction to reports from Donald et al., (2006). Our results suggest that resistance to SCN race 2 in our population does not necessarily impact yield and support the possibility for the selection of new cultivars with high yield potential especially in SCN highly infested fields. Furthermore, considering that all RILs were resistant to SCN race 3, this result means that the resistant lines combined double races SCN resistance.

4. Conclusion

SCN is major soybean production threat in the US and globally. Rhg1 and Rhg4 are major SCN- resistance loci deployed in most of commercial resistant cultivars. However, this resistance is becoming progressively broken and associated yield losses are increasing. We examined whether the resistance found in a segregating population of RILs resulting from parents differing by their

79

resistance to SCN-races, was associated to Rhg1 and Rhg4 loci. The DNA polymorphism analysis at CDS and coding sequences, the copy number variation, transcript quantification and SNP marker assays did not reveal any causative difference between the resistant and susceptible lines to SCN, race 2. However, SSR markers linked to SCN- resistance QTL on chromosome 17 revealed a differentiating SCN, race 2 resistance pattern among the RILs suggesting its genetic action in conferring the SCN- resistance to the resistant lines. We further analyzed the seed yield of the resistant lines and the susceptible lines to check whether the seed yield was negatively impacted by the presence of SCN, race 2 resistance alleles. The means seed yield of the resistant and susceptible lines were not statistically different, suggesting that the resistance alleles to SCN-race 2 were not causing any yield drag in the resistant lines.

5. Material and Methods

5.1 Greenhouse screening assays for resistance to Hg type 0 (race 3) and Hg type 1.2.5.7 (race 2)

A total of 114 RILs were screened for SCN phenotype in the greenhouse at Jackson, TN at the USDA-ARS research station. Line SCN-13 was dropped off the assay because of seed shortage. In addition, two parental lines, 7 indicator lines (ILs) and two susceptible controls, “Hutcheson” (Buss and Camper, 1988) and 5601T (Pantalone et al., 2003) were included in the test. The parental lines, TN09-029 and NCC05-1168 were also included in the essay. The ILs included PI 548402 (Peking) (IL1), PI 88788 (IL2), PI 90763 (IL3), PI 437654 (IL4), PI 209332 (IL5), PI 89772 (IL6) and PI 548316 (Cloud) (IL7). The bioassays were performed as per the method described by Arelli et al., (2017). Briefly, 5 seeds from each line were planted separately in clay micro pots filled with steam-sterilized soil and inoculated with approximately 2500 eggs of homogenous population of SCN (race 2 and 3). Each line was tested in five single plant replications. Five weeks post inoculation, SCN cysts were extracted and separated from the root and other plants debris through a series of sieves with pasteurized water. The extracted cysts were collected, suspended in water in 50-ml tube and kept in cold storage for further counting under a stereomicroscope at the University of Tennessee, Department of Plant, Sciences in Dr Hewezi’s Lab. Based on HG type classification established by Niblack et al., (2002), cyst numbers were recorded for each RIL and translated into female indices with reference to the 8

80

indicator lines and 2 susceptible lines. The FI was used to determine the RIL population reaction to SCN and classify the respective lines into the resistant or susceptible group.

5.2 SCN response confirmation in the green house

From the SCN cyst counts in greenhouse at Jackson Tennessee, six representative RILs were selected. These lines included SCN-099 and SCN-042 (tolerant), SCN-066 and SCN-072 (susceptible), and SCN-109, SCN-137 (resistant). These lines were selected based on FI values. The two parental lines TN09-029 and NCC05-1168 as well as the susceptible check Williams 82 (PI518671) were also included in the assay. Homogenous nematode populations of races 2 (HG Type 1.2.5.7) and 3 (HG Type 0), that were maintained In Hewezi lab were used to inoculate 8 pots of each line as per the method described by Arelli et al., (2017). Briefly, one seed of each line was planted in a plastic micro pot filled with steam-sterilized soil and inoculated with 2500 eggs of SCN races 2 (HG Type 1.2.5.7) and 3 (HG Type 0). Each line was replicated 8 times, and the roots from the same plant were bulk-collected together 5 weeks post inoculation to make 8 replicated samples per line. SCN cysts from each plant were extracted and suspended in test tubes containing tap water. The counting was done using a stereomicroscope at the University of Tennessee, Department of Plant sciences in Dr. Hewezi’s Lab. Cysts count numbers were then translated into female indices (FI) with references to Williams 82 susceptible indictor line. The FI was used to confirm the individual line reaction to SCN.

5.3 Rhg1 and Rhg4 PCR amplification and sequencing

Soybean seeds of the RILs SCN-137, SCN-099, SCN-066, SCN-109, and SCN-072 were water- soaked for 30 min, followed by surface sterilization in 10% (v/v) bleach solution for about 10 min, and thoroughly rinsed in running tap water. Seeds germination was done rolled papers for 3 days at 26°C in the dark. Root tissue was collected in sterile Eppendorf tubes and frozen in liquid nitrogen. Frozen tissues from the tubes were ground, homogenized using a mini bead beater (Stanley Goldberg) and stored at -80oC. Genomic DNA was extracted from root ground tissues using DNeasy Plant Mini Kit (Qiagen). The primers flanking the targeted coding sequence of rhg1 and rhg4 region were designed and the respective fragments were amplified by PCR. The promoter sequences (2kb) upstream of both loci were also PCR-amplified and sequenced to check for any significant variability in the sequences. The primers used for amplification of CDS

81

and promoter regions are provided in Table 3-8, in appendix. The amplicons obtained by PCR were isolated by agarose gel electrophoresis and purified using the Zymo-research DNA purification kit following the manufacturer’s guidelines. Purified PCR products were sent for sequencing and obtained sequences were aligned to the published Williams 82 and Forrest rhg1 and 4 sequences (www.soybase.org).

5.4 Copy Number Variation (CNV) at Rhg1 and 4

Copy number Rhg1 and Rhg4 was determined in the parental lines and selected resistant and susceptible RILs using array comparative genomic hybridization (aCGH) at the University of Minnesota. The four genotypes were each compared to genotype 'Williams 82', which is known to have only one copy at these loci. The aCGH microarray design, chemistry, and data analysis were all performed as described by Dobbels et al., (2017). Final images of the aCGH data were generated using Spotfire DecisionSite software.

5.5 qPCR quantification of the expression levels of Rhg1 and 4

5.5.1 Nematode inoculation

The nematode inoculation and tissue collection was done as previously described by Rambani et al., 2015). Briefly, after seeds germinating described in 2.1, freshly-hatched J2, race 2 and 3 nematodes were thoroughly rinsed and re-suspended in 0.1% (w/v) sterilized agarose (450 J2 per 100 mL). Three-days-old seedlings were then transferred on sterile blue blotter paper moistened with MES (4-Morpholineethanesulfonic acid monohydrate)-buffered H2O in petri dishes (150 mm) and inoculated with about 2500 J2, directly applied on the roots. Each line was inoculated in three biological replications and the respective controls were mock-inoculated with agarose (500 mL of 0.1% (w/v). Roots of inoculated seedlings were then covered again with sterilized and sterile MES-buffer-soaked blue blotter paper (pH 6.4). The petri dishes were hermetically sealed and incubated at 26°C in 16 h of light (75 mmol m–2 s –1)/8 h of dark conditions. The nematode infection success was Fuschisin stain confirmed in a number of the inoculated samples as previously described by (Daykin and Hussey (1986).

82

5.5.2 Tissue collection and total RNA extraction and qPCR assays

The expression of Rhg1 and 4 after infection with SCN, was quantified at different time points after inoculation (dpi). The total RNA from infected and non-infected samples was extracted from frozen ground root tissues according to Verwoerd et al., (1989). The extracted total RNA was DNase treated using DNase I kit (New England Biolabs). Fifty nanograms of DNase treated RNA were used for gene expression quantification using the Verso One-Step qPCR SYBR Kit (Fisher Scientific) following the manufacturer’s protocol. qPCR reactions were run on a QuantStudio™ 6 Flex Real-Time PCR System (Applied Biosystems) using the manufacturer’s protocol. A dissociation curve was generated after qPCR amplification using the following temperature settings: 95°C for 15 s, 60°C for 1 min 15 s, and 95°C for 15 sec. The qPCR data were generated from three biological replicates, each with 3 technical replicates. The reference gene, the soybean ubiquitin (Glyma.20G141600) was used as reference gene to normalize gene expression abundance. Determination of changes in gene expression due to SCN-infection in our samples in comparison to noninfected control samples was performed as previously described by Hewezi et al., (2015). The primer sequences used in qPCR assays are provided in Table 3-7 in appendix.

5.6 Marker assisted selection (MAS)

5.6.1 SSR markers

The marker assisted selection was done as previously described by Arelli et al., (2017). Briefly, seeds of the selected lines were grown in the greenhouse at USDA ARS, Jackson, Tennessee. DNA of leaf tissues were purified with Whatman FTA purification reagent, and used in 50 ml PCR reactions with simple sequence repeat (SSR) markers using with the following thermal cycles: 95°C for 2 min; 33 cycles of 92°C for 30 s, 47°C or 50°C for 30 s, 68°C for 30 s; 68°C for 5 min. PCR reactions were performed with primers for two co-segregating SSR markers (satt 309 and sat_168) for rhg1 (Cregan et al., 1999), Satt 632 and sat_162 for Rhg4 (Arelli et al., 2006), (Arelli et al., 2007). In addition, two micro satellites (satt 574 and satt082) for QTL cqSCN-005 (Schuster et al., 2001; Arelli et al., 2006; Wu et al., 2009) were used to screen resistance phenotypes to race 2.

83

5.6.2 SNP markers

The seeds of the selected RILs and parental lines were grown in the greenhouse at the University of Tennessee, Knoxville. Genomic DNA was extracted with Qiagen DNA extraction kit following the manufacturer’s guidelines. 50 ng of DNA per samples were sent to the USDA- ARS in Beltsville, MD for analysis of SCN samples on the Illumina Soy6K SNP chip. The SNP data for were then extracted from the raw data file using PLINK plugin v2-1-4 and converted to VCF file and ultimately analyzed with software SNPEff v4.3.1t for genetic variant annotation and their effect prediction. The predicted effects were filtered by the impact level (moderate and high impact).

84

References

Allen TW, Damcone JP, Dufault NS, Faske TR, Hershman DE, Hollier CA, T. Isakeit T, Kemerait RC, Kleczewski NM, Kratochvil RJ, Mehl HL (2017) Southern United States soybean disease loss estimates for 2017. In Proceedings of the Southern Soybean Disease Workers (Ssdw), 10–15. Anand, SC (1992) Registration of ‘Hartwig’ Soybean. Crop Science 32: 1069–70. Anderson JE, Kantar MB, Kono TY, Fu F, Stec AO, Song Q, Cregan PB, Specht JE, Diers BW, Cannon SB, McHale LK (2014) A Roadmap for functional structural variants in the soybean Genome. G3: Genes, Genomes, Genetics 4: 1307–18. Arelli PR, Pantalone VR, Allen FL, Mengistu A (2007). Registration of soybean germplasm JTN-5303. Journal of Plant Registrations 1: 69–70. Arelli PR, Young LD, Mengistu A (2006) Registration of high yielding and multiple disease‐resistant soybean germplasm JTN‐5503. Crop Science 46: 2723–24. Arelli PR, Pantalone VR, Allen FL, Mengistu A, Fritz LA. (2015) Registration of JTN-5203 soybean germplasm with resistance to multiple cyst nematode populations.” Journal of Plant Registrations 9: 108–14. Arelli PR, Shannon JG, Mengistu A, Gillen AM, Fritz LA (2017) Registration of conventional soybean germplasm JTN-4307 with resistance to nematodes and fungal diseases.” Journal of Plant Registrations 11: 192. Bayless AM, Zapotocny RW, Han S, Grunwald DJ, Amundson KK, Bent AF (2019) The Rhg1-a (Rhg1 low-copy) nematode resistance source harbors a copia-family retrotransposon within the Rhg1-encoded α-SNAP gene. Plant Direct 3: 1–19. Beeman AQ, Harbach CJ, Marett CC, Tylka GL (2016) Soybean cyst nematode HG Type test results differ among multiple samples from the same field but the management implications are the same. Plant Health Progress, 17: 160–62. Brucker E, Niblack T, Kopisch-Obuch FJ, Diers BW (2005) The Effect of Rhg1 on reproduction of Heterodera glycines in the field and greenhouse and associated effects on agronomic traits. Crop Science 45: 1721–27. Buss GR, Camper Jr HM, Roane CW (1988) Registration of ‘HUTCHESON’ soybean. Crop Science 28: 1024–25.

85

Chen SY, Porter PM, Orf JH, Reese CD, Stienstra WC, Young ND, Walgenbach DD, Schaus PJ, Arlt TJ, Breitenbach FR (2001) Soybean cyst nematode population development and associated soybean yields of resistant and susceptible cultivars in Minnesota. Plant Disease 85 (7): 760–66. Concibido VC, Diers BW, Arelli PR (2004) A Decade of QTL mapping for cyst nematode resistance in soybean. Crop Science 44: 1121–31. Concibido VC, Lange DA, Denny RL, Orf JH, Young ND (1997) Genome mapping of soybean cyst nematode resistance genes in ‘Peking’, PI 90763, and PI 88788 using DNA markers. Crop Science 37: 258–64. Cook DE, Bayless AM, Wang K, Guo X, Song Q, Jiang J, Bent AF (2014) Distinct copy number, coding Sequence, and locus methylation patterns underlie Rhg1-mediated soybean resistance to soybean cyst nematode. Plant Physiology 165: 630-47. Cook DE, Lee TG, Guo X, Melito S, Wang K, Bayless AM, Wang J, Hughes TJ, Willis DK, Clemente TE, Diers BW (2012) Copy number variation of multiple genes at Rhg1 mediates nematode resistance in soybean. Science 338: 1206–9. Cregan PB, Mudge J, Fickus EW, Danesh D, Denny R, Young ND (1999) Two simple sequence repeat markers to select for soybean cyst nematode resistance conditioned by the Rhg1 locus. Theoretical and Applied Genetics 99: 811–18. Daykin, Hussey, RS (1986) Staining and histopathological techniques in nematology. In Barker KR, Carter CC, Nasser JN, eds, An Advanced Treatise on Meloidogyne: Biology and Control. 2: 39-48 Dobbels AA, Michno JM, Campbell BW, Virdi KS, Stec AO, Muehlbauer GJ, Naeve SL, Stupar RM (2017) An induced chromosomal translocation in soybean disrupts a KASI ortholog and is associated with a high-sucrose and low-oil seed phenotype. G3: Genes, Genomes, Genetics 7: 1215–23. Gardner M, Heinz R, Wang J, Mitchum MG (2017) Genetics and adaptation of soybean cyst nematode to broad spectrum soybean resistance. G3: Genes, Genomes, Genetics 7: 835–41. Gillen AM, Shelton GW (2002) Uniform Soybean Tests. 180. Gillen AM, Shelton GW (2009) Uniform Soybean Tests. 200. Gillen AM, Shelton GW (2011) Uniform Soybean Tests. Gillen AM, Shelton GW (2012) Uniform Soybean Tests: Southern States. 12.

86

Glover KD, Wang D, Arelli PR, Carlson SR, Cianzio SR, Diers BW (2004) Near isogenic lines confirm a soybean cyst nematode resistance gene from PI 88788 on linkage group J. Crop Science 44: 936-41. Hewezi T, Juvale PS, Piya S, Maier TR, Rambani A, Rice JH, Mitchum MG, Davis EL, Hussey RS, Baum TJ (2015) The cyst nematode effector protein 10A07 targets and recruits host posttranslational machinery to mediate its nuclear trafficking and to promote parasitism in Arabidopsis. Plant Cell 27: 891–907. Kadam S, Vuong TD, Qiu D, Meinhardt CG, Song L, Deshmukh R, Patil G, Wan J, Valliyodan B, Scaboo AM, Shannon JG (2015) Genomic-assisted phylogenetic analysis and marker development for next generation soybean cyst nematode resistance breeding. Plant Science 242: 342–50. Kandoth PK, Liu S, Prenger E, Ludwig A, Lakhssassi N, Heinz R, Zhou Z, Kandoth PK, Liu S, Prenger E, Ludwig A, Lakhssassi N, Heinz R, Zhou Z, Howland A, Gunther J, Eidson S, Dhroso A (2017) Systematic mutagenesis of serine hydroxymethyltransferase reveals an essential role in nematode resistance. Plant Physiology 175: 1370–80. Kazi S, Shultz J, Afzal J, Hashmi R, Jasim M, Bond J, Arelli PR, Lightfoot DA (2010) Iso- lines and inbred-lines confirmed loci that underlie resistance from cultivar ‘Hartwig’ to three soybean cyst nematode populations. Theoretical and Applied Genetics 120: 633–44. Kim DG, Riggs RD, Robbins RT, Rakes L (1997) Distribution of races of Heterodera glycines in the Central United States. Journal of Nematology 29: 173–79. Kim M, Diers BW (2013) Fine mapping of the SCN resistance QTL CqSCN-006 and CqSCN- 007 from Glycine soja PI 468916. Crop Science 53: 775–85. Kim M, Hyten DL, Bent AF, Diers BW (2010) Fine mapping of the SCN resistance locus Rhg1-b from PI 88788 . The Plant Genome 3: 81–89. Klink VP, Hosseini P, Matsye P, Alkharouf NW, Matthews BF (2009) A gene expression analysis of syncytia laser microdissected from the roots of the Glycine Max (Soybean) Genotype PI 548402 (Peking) undergoing a resistant reaction after infection by Heterodera glycines (soybean cyst nematode). Plant Molecular Biology. 71: 525-67. Klink VP, Hosseini P, Matsye P, Alkharouf NW, Matthews BF (2010) Syncytium gene expression in Glycine Max[PI 88788] roots undergoing a resistant reaction to the parasitic nematode Heterodera glycines. Plant Physiology and Biochemistry 48: 176–93.

87

Korasick DA, Kandoth PK, Tanner JJ, Mitchum MG, Beamer LJ (2020) Impaired folate binding of serine hydroxymethyltransferase 8 from soybean underlies resistance to the soybean cyst nematode. Journal of Biological Chemistry 295: 3708–18. Lakhssassi N, Patil G, Piya S, Zhou Z, Baharlouei A, Kassem MA, David A. Lightfoot, Hewezi T, Barakat A, Nguyen HT, Meksem K (2019) Genome reorganization of the GmSHMT gene family in soybean showed a lack of functional redundancy in resistance to soybean cyst nematode. Scientific Reports 9: 1–16. Lee TG, Diers BW, Hudson ME (2016) An efficient method for measuring copy number variation applied to improvement of nematode resistance in soybean. Plant Journal 88: 143– 53. Lee TG, Kumar I, Diers BW, and Hudson ME (2015) Evolution and selection of Rhg1, a copy-number variant nematode-resistance locus. Molecular Ecology 24: 1774–91. Lian Y, Guo J, Li H, Wu Y, Wei H, Wang J, Li J, Lu W (2017) A new race (X12 ) of soybean cyst nematode in China. Journal of Nematology 49: 321–26. Liu S, Kandoth PK, Lakhssassi N, Kang J, Colantonio V, Heinz R, Yeckel G, Zhou Z, Bekal S, Dapprich J, Rotter B (2017) The Soybean GmSNAP18 gene underlies two types of resistance to soybean cyst nematode. Nature Communications 8: 1–11. Liu S, Kandoth PK, Warren SD, Yeckel G, Heinz R, Alden J, Yang C, Jamai A, El- Mellouki T, Juvale PS, Hill J (2012) A soybean cyst nematode resistance gene points to a new mechanism of plant resistance to pathogens. Nature 492: 256–60. Lye ZN, Purugganan MD (2019) Copy number variation in domestication. Trends in Plant Science 24: 352–65. Mahalingam R, Skorupska HT (1996) Cytological expression of early response to infection by Heterodera glycines ichinohe in resistant PI 437654 soybean. Genome 39: 986–98. Matsye PD, Lawrence GW, Youssef RM, Kim KH, Lawrence KS, Matthews BH, Klink VP (2012) The expression of a naturally occurring, truncated allele of an α-SNAP gene suppresses plant parasitic nematode infection. Plant Molecular Biology 80: 131–55. Mccarville MT, Marett CC, Mullaney MP, Gebhart GD, Tylka GL (2017) Increase in soybean cyst nematode virulence and reproduction on resistant soybean varieties in Iowa from 2001 to 2015 and the effects on soybean yields. Plant Health Progress 18: 146–55. McHale LK, Haun WJ, Xu WW, Bhaskar PB, Anderson JE, Hyten DL, Gerhardt DJ,

88

Jeddeloh JA, Stupar RM (2012) Structural variants in the soybean genome localize to clusters of biotic stress-response genes. Plant Physiology 159: 1295–1308. Meksem K, Pantazopoulos P, Njiti VN, Hyten LD, Arelli PR, Lightfoot DA (2001) Forrest’ resistance to the soybean cyst nematode is bigenic: saturation mapping of the Rhg 1 and Rhg 4 loci. Theoretical and Applied Genetics 103: 710–17. Mitchum M (2016) Soybean resistance to the soybean cyst nematode Heterodera glycines: An update. Phytopathology 106: 1444-50. Niblack TL, Colgrove AL, Colgrove K, Bond JP (2008) Shift in virulence of soybean cyst nematode is associated with use of resistance from PI 88788. Plant Health Progress 9: 29. Niblack TL, Arelli PR, Noel GR, Opperman CH, Orf JH, Schmitt DP, Shannon JG, Tylka GL (2002) A revised classification scheme for genetically diverse populations of Heterodera glycines. Journal of Nematology 34: 279–88. Pantalone VR, Allen FL, Landau-Ellis D (2003) Registration of 5601T soybean. Crop Science 43:1123-1124. Patil GB, Lakhssassi N, Wan J, Song L, Zhou Z, Klepadlo M, Vuong TD, Stec AO, Kahil SS, Colantonio V, Valliyodan B (2019) Whole-genome re-sequencing reveals the impact of the interaction of copy number variants of the Rhg1 and Rhg4 genes on broad-based resistance to soybean cyst nematode. Plant Biotechnology Journal 17: 1595–1611. Rambani A, Pantalone V, Yang S, Rice JH, Song Q, Mazarei M, Arelli PR, Meksem K, Stewart CN, Hewezi T (2020) Identification of introduced and stably inherited DNA methylation variants in soybean associated with soybean cyst nematode parasitism. New Phytologist 227: 168–84. Rambani A, Rice JH, Liu J, Lane T, Ranjan P, Mazarei M, Pantalone V, Stewart CN, Staton M, Hewezi T (2015) The methylome of soybean roots during the compatible interaction with the soybean cyst nematode. Plant Physiology 168: 1364–77. Riggs RD, Schmitt DP (1988) Complete characterization of the race scheme for Heterodera glycines. Journal of Nematology 20: 392–95. Schuster I, Abdelnoor RV, Marin SRR, Carvalho VP, Kiihl RAS, Silva JFV, Sediyama CS, Barros EG, Moreira MA (2001) Identification of a new major QTL associated with resistance to soybean cyst nematode (Heterodera glycines). Theoretical and Applied Genetics 102: 91–96.

89

Shaibu AS, Li B, Zhang S, Sun J (2020) Soybean cyst nematode-resistance: gene identification and breeding strategies. The Crop Journal, 1–13. Shannon JG, Lee JD, Wrather JA, Sleper DA, Mian MAR, Bond JP, Robbins RT (2009) Registration of S99-2281 soybean germplasm line with resistance to frogeye leaf spot and three nematode species. Journal of Plant Registrations 3: 94–98. Shi Z, Liu S, Noe J, Arelli P, Meksem K, Li Z (2015) SNP identification and marker assay development for high-throughput selection of soybean cyst nematode resistance. BMC Genomics, 16: 314. Sobczak M, Avrova A, Jupowicz J, Phillips MS, Ernst K, Kumar A (2005) Characterization of susceptibility and resistance responses to potato cyst nematode (Globodera Spp.) infection of tomato lines in the absence and presence of the broad-spectrum nematode resistance hero gene. Molecular Plant-Microbe Interactions 18: 158–68. Tylka GL (2016) Understanding soybean cyst nematode HG types and races. Plant Health Progress 17: 149–51. Verwoerd TC, Dekker BMM, Hoekema A (1989) A small-scale procedure for the rapid isolation of plant RNAs. Nucleic Acids Research 17: 2362. Vuong TD, Sleper DA, Shannon JG, Nguyen HT (2010) Novel quantitative trait loci for broad-based resistance to soybean cyst nematode (Heterodera glycines ichinohe) in soybean PI 567516C. Theoretical and Applied Genetics 121: 1253–66. Wang X, Bergstrom GC, Chen S, Thurston DM, Cummings JA, Handoo ZA, Hult MN, Skantar AM (2017) First report of the soybean cyst nematode, Heterodera glycines, in New York. Plant Disease 101: 1957. Whitham SA, Qi M, Innes RW, Ma W, Lopes-Caitar V, Hewezi T (2016) Molecular soybean-pathogen interactions. Annual Review of Phytopathology 54: 443-68. Willmore KE, Young NM,. Richtsmeier JT (2007) Phenotypic variability: its components, measurement and underlying developmental processes. Evolutionary Biology 34: 99–120. Wu XY, Zhou GC, Chen YX, Wu P, Liu LW, Ma FF, Wu M, Liu CC, Zeng YJ, Chu AE, Hang YY (2016) Soybean cyst nematode resistance emerged via artificial selection of duplicated Serine hydroxymethyltransferase genes. Frontiers in Plant Science 7: 1–8. Wu X, Blake S, Sleper DA, Shannon JG, Cregan P, Nguyen HT (2009) QTL, additive and epistatic effects for SCN resistance in PI 437654. Theoretical and Applied Genetics 118:

90

1093–1105. Yu N, Lee TG, Rosa DP, Hudson M, Diers BW (2016) Impact of Rhg1 copy number, type, and interaction with Rhg4 on resistance to Heterodera glycines in soybean.” Theoretical and Applied Genetics 129: 2403–12.

91

Appendix

Table 3-1: SCN-race 3 reaction of the RILs, Parental lines and Indicator Lines based on FI. Genotypes #1 #2 #3 #4 #5 Total Mean STD FI Reaction Error SCN-099 0 0 0 0 0 0 0 0 R SCN-115 0 0 0 0 0 0 0 0 0 R TN09-029 0 0 0 0 0 0 0 0 0 R SCN-108 1 0 0 0 0 1 0 0 1 R SCN-135 0 1 0 0 1 0 0 1 R SCN-119 1 0 0 1 0 0 1 R SCN-015 0 0 0 2 0 2 0 0 1 R SCN-049 2 0 0 0 0 2 0 0 1 R SCN-058 0 1 0 1 0 2 0 0 1 R SCN-107 1 0 0 1 0 2 0 0 1 R SCN-110 0 1 0 0 1 2 0 0 1 R SCN-118 0 0 1 0 1 2 0 0 1 R SCN-066 1 0 0 1 2 1 0 2 R SCN-102 1 1 0 0 2 1 0 2 R SCN-112 1 1 0 0 2 1 0 2 R SCN-127 2 0 0 0 2 1 1 2 R SCN-133 0 0 0 2 2 1 1 2 R SCN-138 0 0 1 1 2 1 0 2 R SCN-047 2 1 0 0 0 3 1 0 2 R SCN-089 0 3 0 0 0 3 1 1 2 R SCN-091 0 0 1 0 2 3 1 0 2 R SCN-104 0 0 1 1 1 3 1 0 2 R SCN-113 0 1 1 0 1 3 1 0 2 R SCN-087 1 1 0 2 1 0 2 R SCN-068 0 0 2 1 3 1 0 3 R SCN-098 0 2 0 1 3 1 0 3 R SCN-065 2 2 0 0 0 4 1 0 3 R SCN-069 0 0 1 1 2 4 1 0 3 R SCN-139 1 2 0 1 0 4 1 0 3 R SCN-027 0 1 2 2 0 5 1 0 4 R SCN-070 1 1 0 0 3 5 1 1 4 R SCN-081 2 1 2 0 0 5 1 0 4 R SCN-090 3 2 0 0 0 5 1 1 4 R SCN-092 2 0 0 0 3 5 1 1 4 R

92

Table 3-1 continued SCN-100 0 3 1 0 4 1 1 4 R SCN-109 1 0 1 1 2 5 1 0 4 R SCN-125 1 0 0 2 2 5 1 0 4 R SCN-131 2 1 1 0 4 1 0 4 R SCN-136 0 0 2 2 4 1 1 4 R SCN-048 3 1 0 0 2 6 1 1 4 R SCN-072 0 1 0 2 3 6 1 1 4 R SCN-117 0 1 1 0 4 6 1 1 4 R SCN-128 1 3 0 0 2 6 1 1 4 R SCN-134 1 3 1 0 1 6 1 0 4 R NCC05-1168 2 1 1 1 1 6 1 0 4 R SCN-093 0 2 2 1 5 1 0 4 R SCN-130 0 0 1 4 5 1 1 4 R SCN-078 0 2 2 4 1 1 5 R SCN-061 3 0 2 1 1 7 1 1 5 R SCN-063 0 2 2 1 2 7 1 0 5 R SCN-067 2 1 3 0 1 7 1 1 5 R SCN-129 3 0 2 2 0 7 1 1 5 R SCN-096 4 1 1 0 6 2 1 5 R SCN-101 5 0 0 1 6 2 1 5 R SCN-103 2 1 0 3 6 2 1 5 R SCN-123 2 1 3 0 6 2 1 5 R SCN-041 2 3 0 2 1 8 2 1 6 R SCN-052 0 5 2 1 0 8 2 1 6 R SCN-057 0 1 3 2 2 8 2 1 6 R SCN-120 0 0 3 2 3 8 2 1 6 R SCN-114 0 3 2 2 7 2 1 6 R SCN-105 7 0 0 0 7 2 2 6 R SCN-050 3 1 3 0 7 2 1 6 R SCN-042 3 2 1 1 2 9 2 0 6 R SCN-043 2 2 4 1 0 9 2 1 6 R SCN-026 2 1 3 3 0 9 2 1 6 R SCN-034 3 1 3 2 1 10 2 0 7 R SCN-037 3 1 3 1 8 2 1 7 R SCN-056 3 1 2 2 8 2 0 7 R SCN-106 0 1 0 6 3 10 2 1 7 R SCN-035 2 2 3 2 1 10 2 0 7 R SCN-084 2 1 2 2 3 10 2 0 7 R

93

Table 3-1 continued SCN-088 3 1 2 6 2 1 7 R SCN-095 1 1 5 2 1 10 2 1 7 R SCN-046 0 1 5 3 1 10 2 1 7 R SCN-014 1 2 3 6 2 1 7 R SCN-021 2 3 0 4 2 11 2 1 8 R SCN-054 2 2 1 3 3 11 2 0 8 R SCN-137 1 0 0 10 0 11 2 2 8 R SCN-045 4 5 0 1 1 11 2 1 8 R SCN-008 1 3 2 1 4 11 2 1 8 R SCN-028 1 1 9 0 0 11 2 2 8 R SCN-116 3 2 1 3 9 2 0 8 R SCN-040 2 2 2 3 9 2 0 8 R SCN-039 1 3 2 3 9 2 0 8 R SCN-086 0 0 7 2 9 2 2 8 R SCN-006 2 1 3 3 9 2 0 8 R SCN-017 1 4 2 7 2 1 8 R SCN-053 0 4 3 7 2 1 8 R SCN-025 1 2 4 7 2 1 8 R SCN-012 0 3 5 4 0 12 2 1 9 R SCN-032 3 2 3 3 1 12 2 0 9 R SCN-111 1 2 3 3 3 12 2 0 9 R SCN-132 3 0 2 2 5 12 2 1 9 R SCN-031 3 2 2 3 2 12 2 0 9 R SCN-044 7 0 0 5 0 12 2 2 9 R SCN-051 0 4 7 0 1 12 2 1 9 R SCN-019 1 1 6 3 1 12 2 1 9 R SCN-020 1 3 3 2 3 12 2 0 9 R SCN-023 2 3 4 1 2 12 2 1 9 R SCN-022 4 1 1 3 3 12 2 1 9 R SCN-029 0 2 4 4 10 3 1 9 R SCN-122 1 4 5 3 2 9 R SCN-126 3 2 3 2 10 3 0 9 R SCN-038 4 3 2 1 10 3 1 9 R SCN-060 1 1 2 6 10 3 1 9 R SCN-024 2 3 2 3 10 3 0 9 R SCN-016 0 4 6 3 0 13 3 1 9 R SCN-030 4 5 2 2 0 13 3 1 9 R SCN-033 2 3 3 3 2 13 3 0 9 R

94

Table 3-1 continued SCN-036 2 1 3 5 2 13 3 1 9 R SCN-121 2 2 2 3 4 13 3 0 9 R SCN-124 4 4 3 2 0 13 3 1 9 R SCN-055 2 3 3 2 3 13 3 0 9 R SCN-059 7 0 1 5 0 13 3 1 9 R SCN-018 2 5 3 2 1 13 3 1 9 R 5601T 47 5 14 28 45 139 28 8 100 S PI 548402 0 1 0 0 1 0 0 1 R (Peking) PI 88788 0 3 3 2 2 5 R PI 90763 0 0 0 0 0 0 0 0 0 R PI 437654 0 0 0 0 0 0 0 R PI 209332 3 3 1 3 10 3 1 9 R PI 89772 0 1 1 1 1 2 R PI 548316 (Cloud) 3 1 4 2 1 7 R Hutcheson 30 18 20 30 34 132 26 3 95 S R for races in the table above indicates a female index 10; S indicates a female index . Female index (FI) = (average number of females on test line)/ (average number of females on 5601T) × 100.

95

Table 3-2: SCN-race2 reaction of the RILs, Parental lines and Indicator Lines based on FI. Genotype #1 #2 #3 #4 #5 Total Mean STD FI Reaction Error SCN-109 2 0 1 2 0 5 1 0 0 R SCN-137 1 2 2 1 2 8 2 0 0 R TN09-029 0 2 4 1 1 8 2 1 0 R SCN-041 1 2 0 5 8 2 1 0 R SCN-086 3 2 3 1 1 10 2 0 0 R SCN-108 1 3 4 2 1 0 R SCN-114 0 3 3 2 2 10 2 1 0 R SCN-115 4 2 1 3 10 3 1 0 R SCN-113 2 2 1 1 7 13 3 1 1 R SCN-095 2 6 4 0 1 13 3 1 1 R SCN-048 3 3 3 2 11 3 0 1 R SCN-106 3 1 2 2 6 14 3 1 1 R SCN-107 5 2 2 3 12 3 1 1 R SCN-116 1 5 2 4 3 15 3 1 1 R SCN-054 3 4 2 2 7 18 4 1 1 R SCN-105 1 6 6 4 1 18 4 1 1 R SCN-128 7 0 2 4 6 19 4 1 1 R SCN-026 8 3 3 2 5 21 4 1 1 R SCN-117 8 2 1 3 8 22 4 2 1 R SCN-081 3 1 2 7 9 22 4 2 1 R SCN-047 1 3 7 6 7 24 5 1 1 R SCN-100 8 5 4 5 2 24 5 1 1 R SCN-126 0 4 14 6 3 27 5 2 1 R SCN-027 2 9 4 7 22 6 2 1 R SCN-133 8 3 5 6 22 6 1 1 R SCN-061 4 12 6 5 1 28 6 2 1 R SCN-049 7 13 5 6 5 36 7 1 1 R SCN-092 4 14 6 6 9 39 8 2 2 R SCN-127 2 2 8 26 4 42 8 5 2 R SCN-091 9 3 16 12 3 43 9 3 2 R SCN-069 10 7 11 12 5 45 9 1 2 R SCN-012 13 11 12 5 8 49 10 1 2 R SCN-068 8 11 14 11 6 50 10 1 2 R SCN-112 19 18 21 12 27 97 19 2 4 R SCN-104 42 19 16 23 0 100 20 7 4 R SCN-121 3 70 24 3 7 107 21 13 4 R SCN-122 15 34 20 16 35 120 24 4 5 R

96

Table 3-2 continued SCN-124 24 24 32 20 100 25 3 5 R SCN-087 29 26 17 2 54 128 26 9 5 R SCN-110 21 35 24 46 3 129 26 7 5 R SCN-102 31 35 27 34 127 32 2 6 R SCN-123 53 83 0 15 11 162 32 15 6 R SCN-036 18 16 52 29 58 173 35 9 7 R SCN-103 76 33 21 13 31 174 35 11 7 R SCN-096 2 98 31 13 144 36 22 7 R SCN-130 9 3 81 17 77 187 37 17 7 R SCN-101 20 24 69 113 38 16 7 R SCN-019 41 36 49 47 173 43 3 8 R SCN-052 26 45 10 104 185 46 21 9 R SCN-093 60 62 75 9 39 245 49 12 9 R SCN-125 3 21 51 99 76 250 50 18 10 S SCN-098 100 82 12 18 44 256 51 17 10 S SCN-032 33 41 52 62 73 261 52 7 10 S SCN-017 19 65 39 44 104 271 54 14 11 S SCN-014 18 90 99 8 56 271 54 18 11 S SCN-006 39 68 57 65 229 57 7 11 S SCN-018 70 51 68 48 55 292 58 4 11 S SCN-020 34 51 96 181 60 18 12 S SCN-022 81 87 79 2 249 62 20 12 S SCN-129 60 83 24 58 92 317 63 12 12 S SCN-118 55 68 27 73 97 320 64 11 12 S SCN-023 82 59 13 106 260 65 20 13 S SCN-033 35 41 71 83 114 344 69 14 13 S SCN-042 29 101 75 42 99 346 69 15 13 S SCN-059 47 25 113 100 285 71 21 14 S SCN-099 75 81 49 49 105 359 72 11 14 S SCN-084 79 25 63 130 297 74 22 14 S SCN-028 20 80 13 107 155 375 75 27 15 S SCN-034 100 31 11 206 27 375 75 36 15 S SCN-031 56 63 97 67 107 390 78 10 15 S SCN-139 226 46 2 120 4 398 80 42 15 S SCN-016 16 67 106 115 97 401 80 18 16 S SCN-015 58 94 70 94 92 408 82 7 16 S SCN-046 116 85 108 21 330 83 22 16 S SCN-025 35 99 96 96 111 437 87 13 17 S SCN-044 120 105 5 91 132 453 91 22 18 S SCN-131 91 77 108 130 56 462 92 13 18 S

97

Table 3-2 continued SCN-089 108 137 34 279 93 31 18 S SCN-090 52 230 101 70 41 494 99 34 19 S SCN-134 126 108 84 80 105 503 101 8 19 S SCN-039 81 108 110 104 403 101 7 20 S SCN-132 43 145 111 112 411 103 21 20 S SCN-120 5 156 161 322 107 51 21 S SCN-024 152 104 72 107 435 109 16 21 S SCN-058 136 107 173 47 86 549 110 21 21 S SCN-135 158 23 107 159 447 112 32 22 S SCN-008 82 99 102 119 161 563 113 13 22 S SCN-043 122 118 147 107 101 595 119 8 23 S SCN-029 145 76 165 79 144 609 122 18 24 S SCN-088 123 18 263 182 25 611 122 47 24 S SCN-078 187 117 94 144 75 617 123 20 24 S SCN-040 61 192 68 186 118 625 125 28 24 S SCN-037 155 128 86 160 100 629 126 15 24 S SCN-050 81 112 212 117 114 636 127 22 25 S SCN-021 68 148 83 179 165 643 129 22 25 S SCN-051 112 107 186 144 113 662 132 15 26 S SCN-030 105 1 40 201 318 665 133 57 26 S SCN-136 172 112 206 39 172 701 140 29 27 S SCN-045 137 168 13 251 569 142 49 28 S SCN-035 131 156 193 110 123 713 143 15 28 S SCN-038 135 162 122 140 166 725 145 8 28 S SCN-138 110 178 139 188 128 743 149 15 29 S SCN-056 170 72 161 197 600 150 27 29 S SCN-053 175 302 134 94 62 767 153 42 30 S SCN-111 181 109 108 241 171 810 162 25 31 S SCN-065 212 5 250 197 664 166 55 32 S SCN-119 79 175 108 156 324 842 168 42 33 S SCN-070 57 121 173 104 425 880 176 65 34 S SCN-055 287 22 10 305 258 882 176 66 34 S SCN-060 217 213 160 152 160 902 180 14 35 S SCN-063 268 101 189 277 341 1176 235 41 46 S SCN-057 253 226 281 250 272 1282 256 10 50 S SCN-066 335 264 310 312 209 1430 286 22 55 S SCN-072 260 280 268 336 538 1682 336 52 65 S SCN-067 458 264 103 625 1450 363 114 70 S NCC05-1168 615 432 511 490 402 2450 490 37 95 S 5601T 427 490 636 559 466 2578 516 37 100 S

98

Table 3-2 continued PI548402(Peking) 70 117 216 77 113 593 119 26 23 S PI88788 152 128 352 433 1065 266 75 52 S PI90763 0 0 2 0 2 1 1 0 R PI437654 4 2 1 7 2 1 0 R PI209332 114 314 526 954 318 119 62 R PI89772 1 1 6 8 3 2 1 R PI548316(Cloud) 305 206 136 183 830 208 36 40 S Hutch 548 495 569 646 177 2435 487 81 94 S R for races in the table above indicates a female index 10; S indicates a female index . Female index (FI) = (average number of females on test line)/ (average number of females on 5601T) × 100.

99

Table 3-3: Single Nucleotides Polymorphism in RHG1 coding sequence of selected RILs in comparison to the susceptible (William 82) and resistant (Forrest) alleles. SCN A SNP position phenotype

636 868 873 874 875 877

W82 C G DEL DEL DEL C S

SCN-72 G T G G T A S

SCN-66 G T G G T A S

SCN-99 G T G G T A S

SCN-42 G T G G T A S

SCN-109 G T G G T A R

SCN-137 G T G G T A R

FORREST G T G G T A R A: adenine, C:cytosine, G:Guanine, T:Tyrosine, R: Resistant, S:, Susceptible

100

Table 3-4: Single Nucleotides Polymorphism in RHG4 coding sequence of selected RILs in comparison to the susceptible (William 82) and resistant (Forrest) alleles.

Lines SNP position SCN phenotype

465 1148

W82 C A S

SCN-72 G T S

SCN-66 G T S

SCN-99 G T S

SCN-42 G T S

SCN-109 G T R

SCN-137 G T R

FORREST G T R A: adenine, C:cytosine, G:Guanine, T:Tyrosine, R: Resistant, S:, Susceptible

101

Table 3-5: Single Nucleotides Polymorphism in Rhg1 promoter sequence of selected RILs in comparison to the susceptible (William 82) and resistant (Forrest) alleles Lines SNP position SCN phenotype

337 339 366 823 837 978 1025 1034 1093

W82 G T A A G C A C C S

SCN-72 A C A C T A C C C S

SCN-66 A C A C T A C C C S

SCN-99 A C A C T A C C C S

SCN-42 A C A C T A C C C S

SCN-109 A C A C T A C C C R

SCN-137 A C A C T A C C C R

FORREST A C T C T A C T A R

Ref. position 334 336 363 815 829 970 1016 1025 1082

A: adenine, C:cytosine, G:Guanine, T:Tyrosine, R: Resistant, S:, Susceptible

102

Table 3-6: Single Nucleotides Polymorphism in Rhg4 promoter sequence of selected RILs in comparison to the susceptible (W82) and resistant (Forrest) alleles Lines SNP position SCN phenotype

135 222 652 764 1252 1391

W82 DEL C C A A DEL S

SCN-72 T T T DEL T T S

SCN-66 T T T DEL T T S

SCN-99 T T T DEL T T S

SCN-42 T T T DEL T T S

SCN-109 T T T DEL T T R

SCN-137 T T T DEL T T R

FORREST T T T DEL T T R

Ref. position 630 717 1147 1257 1745 1884 A: adenine, C: cytosine, T:tyrosine, DEL: deletion R: resistant, S:, susceptible

103

Table 3-7: Screening of selected RILs with SSR markers. Chromosome # 8 18 17 SCN Phenotype

Gene/QTL Rhg4 rhg1 cqSCN-005 Sample Name Sat_162 Satt632 Satt309 Sat_168 Satt574 Satt082

PI 88788 (1) a a a b - b R PI 88788 (2) a a a b - b R Peking (1) a a b a - b R Peking (2) a a b a - b R TN09-029 (1) a a b a b b RP TN09-029 (2) a a b a b b SCN-109 (1) a a b a a a,b R SCN-109 (2) a a b a a b SCN-109 (3) a a b a b b SCN-109 (4) a a b a a b SCN-109 (5) a a b a a b SCN-137 (1) a a - a b b R SCN-137 (2) a a b a b b SCN-137 (3) a a b a b b SCN-137 (4) a a b a b b SCN-137 (5) a a b - b b SCN-042 (1) a a b a b b MR SCN-042 (2) a a b a b b SCN-042 (3) a a b a b b SCN-042 (4) a a b a b b SCN-042 (5) a a b a b b SCN-099 (1) a a b a - a MR SCN-099 (2) a a b a a b SCN-099 (3) a a b a - b SCN-099 (4) a a b a a a SCN-099 (5) a a b a b b NCC05-1168 (1) a a b a a a SP NCC05-1168 (2) a a b a a a

104

Table 3-7 continued

SCN-066 (1) a a b a b b S SCN-066 (2) a a b a b b SCN-066 (3) a a b a a a SCN-066 (4) a a b a b b SCN-066 (5) a a b a b b SCN-072 (1) a a b a a a S SCN-072 (2) a a b a a a SCN-072 (3) a a b a a a SCN-072 (4) a a b a a a SCN-072 (5) a a b a a a R for resistant, RP for resistant parent, MR for moderately resistant SP for susceptible parent, S for susceptible. – sign for failed reaction. Lines with the same letter produced the same reaction with the same marker. In parenthesis at the end of line name: the number of the plant on the test

105

Table 3-8: Primer sequences used in the study PCR Coding sequences Gene Direction Sequence (5’-3’) Rhg1 (Gm Forward GGACAATCCTTCTTGTTA CGC SNAP18) Reverse CAGACTCCAGCAACCTCATTA Rhg4 (Gm Forward CTTTTAAAAACGAACACATACGC SHMT08) ATGACCTCATCGAGAAGGAGAA Reverse AAGGGAACACCGCGAAGTTA TGCTTCGCGTAGGCCTTAAA Promoter sequences Rhg1 (Gm Forward TTCAACATGAGTATGATAATAATA SNAP18) ACTCCAACATGGACTGAAATTCA

Reverse TTTCCTCCGATCGAAACAAA Rhg4 (Gm Forward CCGACCAAAAATATTGGTACAGC SHMT08) CAAGGTCCTCCTACCTATAGTC

Reverse TGCCAAAGCCATTTGTAAAGAGTC RT-QPCR Rhg1 (Gm Forward ACAAGGCTGGAGCGACATAC SNAP18) Reverse AGCAATGTGCAGCATCGACA Rhg4 (Gm Forward TAA CTT CGC CGT GTT CCC TT SHMT08) Reverse TGT TTC GCG TAG GCC TTA AA A: adenine, C: cytosine, T:tyrosine, DEL: deletion R: resistant, S:, susceptible

106

Figure 3-1: Comparison of mean seed yield between SCN resistant and SCN susceptible Line. Error bars represent the standard error of the means.

107

Figure 3-2: SCN cyst counts on selected representative lines (n=8). In red: susceptible lines with10 FI, green: resistant lines with  10 FI, LSD groups on top of the error bars, orange dotted line: a cut off level of 10 FI

108

CGH log2 ratio (Sam ple/W illiam s 82)

Figure 3-3: Copy number variation of Rhg1 genomic intervals based on CGH data. Each spot indicates the log2 ratios of each genotype compared to Williams 82 for a specific CGH probe. A value of zero indicates that there is no difference between the sample and Williams 82 (one copy). The different genotypes are represented by spot color (Red: TN09-029; Blue: NC05- 1168; Green: SCN-109; Black: SCN-099; Yellow: SCN-066).

109

CGH log2 ratio (Sam ple/ Willia ms 82)

Figure 3-4: Copy number variation of Rhg4 genomic intervals based on CGH data. Each spot indicates the log2 ratios of each genotype compared to Williams 82 for a specific CGH probe. The different genotypes are represented by spot color (Red: TN09-029; Blue: NC05-1168; Green: SCN-109; Black: SCN-099; Yellow: SCN-066).

110

Figure 3-5: Relative expression levels of GmSNAP18 from different RILs at 3, 5, 7 and 9 dpi. Standard errors are represented on the top of each genotype. T-test, ***: highly significant difference, **: significant difference at p value 0.05. Note: at 7 and 9 dpi, only the susceptible (SCN-066) and the resistant (SCN-109) were tested. Comparison of gene expression levels was made with RIL-66-mock infected roots (taken as 1). The qpcr results are normalized to a soybean ubiquitin (Glyma.20G141600)

111

Figure 3-6: Relative expression levels of GmSHMT18 from different RILs at 3, 5,7 and 9 dpi. Standard errors are represented on the top of each genotype. T-test, ***: highly significant difference at p value 0.05. Note: at 7 and 9 dpi, only the susceptible (SCN-066) and the resistant (SCN-109) were tested. Comparison of gene expression levels was made with RIL-66-mock infected roots (taken as 1). The qpcr results are normalized to a soybean ubiquitin (Glyma.20G141600)

112

Figure 3-7: Relative expression levels of GmSNAP08 from SCN-72 and SCN-137 at 3dpi. Standard errors are represented on the top of each genotype. Comparison of gene expression levels was made with RIL-072-mock infected roots (taken as 1). The qpcr results are normalized to a soybean ubiquitin (Glyma.20G141600).

113

Figure 3-8: Relative expression levels of GmSHMT18 from different SCN-072 and SCN- 137 at 3 dpi. Standard errors are represented on the top of each genotype. T-test, **: difference at p value 0.05. comparison of gene expression levels was made with RIL-072-mock infected roots (taken as 1). The qpcr results are normalized to a soybean ubiquitin (Glyma.20G141600).

114

Chapter 4: Adaptability evaluation of US-developed Soybean Recombinant Inbred Lines in Rwandan conditions

115

Abstract

About 70% of the Rwandan population live on agriculture-related activities. Soybean is among the selected priority crops that are supported by the government through the agriculture sector subsidy program. However, the national production and yields per hectare remain very low compared to other countries such as the USA. Yet soybean products and by products demand is increasingly rising. On the list of limiting factors, the narrow germplasm is ranked first. We introduced and tested a US-developed soybean population of 115 recombinant inbred lines segregating for yield among other factors. The lines were tested during the cropping seasons B2019 and A2020. They were tested at two research stations using a randomized complete block design (RCBD) with 3 replications. At one of the stations in the low altitudes the top yielder from the US-developed RIL outperformed the high yielding local check by almost 1.2 MT/ha. A total of 32 RILs yielded more than the local check. At the other station, the general performance of the RIL population was in the range of the top performing local check. In general, our data suggest that the US-developed population though coming for a temperate zone, can easily adapt in some agroecological zones of Rwanda. The available evaluations need to be completed by seed quality data in order to provide a complete profile of the US-RIL under Rwandan conditions.

116

1. Introduction

Soybean [Glycine max (L.) Merrill], is among the oldest crop introductions in Rwanda, having been introduced by the Belgians in early 1920’s (Niyibituronsa et al., 2018; Shurtleff and Akiko, 2010). However, the crop gained a relatively low farmers’ interest until recently after the government’s efforts to boost its production. In fact, the area under soybean production in Rwanda had only reached an estimated 1640 ha in 1973 (Munezero et al., 2018) and increased to 46695 ha in 2019 (NISR, 2019). This increase was the result of the government’s decision to include soybean among priority crops under the crop intensification program (CIP). Under the CIP, a subsidy scheme helps farmers’ access to improved seeds and fertilizers. The decision is said to have been informed by among other factors soybean’s competitive advantages in nutritional value, climate adaptability and response to agriculture inputs (Mugabo et al., 2014). In terms of production, recent statistics report annual production of 24,525 MT which is equivalent to around 0.5 tons/ha in yield. Put in context of the local cropping calendar, with two major soybean cropping seasons (A&B), the yield per ha is comparable to almost 50% of the reported annual production (World Bank, 2015) (Table 4-1). This yield remains very low compared to the yields obtained in other parts of the world such as the USA where the average yield was 3.3 tons/ha in 2018 (American Soybean Association, 2018). However, despite the government efforts to promote soybean production, the country’s productivity does not increase proportionally to the deployed efforts. This is mainly due to among other factors and in descending order, the poor germplasm leading to lack of high yielding and adapted varieties, poor soil fertility, climatic variability, pests and diseases, poor farmers’ accessibility to quality seeds, and limited skills of best agronomic practices (Mugabo et al., 2014). However, among these factors, the poor germplasm and lack of high yielding varieties remain the most important limiting factors to soybean production in the country. Nutritionally, soybean contribution also increased up to 8% in protein in 2010 (NISR, 2010) which makes soybean the second legume crop after common bean. However, recent statistics have reported high levels of chronic malnutrition (up to 45%) especially in children under the age of five (NISR, 2019) . Last but not least, in the livestock industry, recent reports rank the

117

unaffordability of quality feeds as the major limiting factor in the sector (Mbuza et al., 2017; Mutimura et al., 2013) Though soybean has been selected as a priority crop, overall, the local production does not meet the national demand in terms of soybean needs. The national soybean grain demand of the two main soybean processing plants, on their own, is estimated at 62,000 MT/year which is almost double the current national annual production (MINAGRI, 2018). The local germplasm consists of a limited number of varieties but only two of them, namely Peka6 and SB 24 are the most grown by farmers. Yet, the yield of these two does not even reach the average minimum yield of the varieties grown in countries like the US. In terms of seed quality (amino acids, fatty acids, protein and oil), there are no reported data about seed quality traits for the local varieties. On the other hand, the US soybeans were among the best in terms of seed quality with the average protein and oil content of 35.7% and 19.5% respectively, on a 13% moisture basis in 2014 (Assefa et al., 2019). These data suggest that, the US varieties could be an asset to the soybean industry in Rwanda both in terms of production and seed quality. In fact, previous studies proved that soybean meal was the most affordable way to fill the protein gap in in animal feed (Hishamunda et al., 1998) and could even be the best replacement to commercial feeds (Nyina- Wamwiza et al., 2007). Soybean breeding has been generally the core strategy used to mitigate the above listed production challenges that reduce production and negatively affects not only the agriculture sector but livestock equally. Unfortunately, in Rwanda, the lack of varieties (germplasm), make soybean research in general and breeding in particular quite nonexistent. Therefore, there is a need to create genetic diversity of the local soybean germplasm through introduction and breeding programs for future research initiatives intended to find practical solutions to the identified soybean production problems. In the present study, we tested, in Rwandan conditions, the adaptability of a population of soybean recombinant inbred lines (RILs) developed by the soybean breeding program at the University of Tennessee with the overall goal of improving the local germplasm pool for high yield, seed quality (oil and protein content) and disease resistance.

118

2. Results

2.1 Rainfall distribution at Muyumbu Research Station (MRS) (2019B) and Rubona Research Station (RRS) (2020A)

Rainfall distribution during the soybean growing season is a critical parameter for maximizing grain yields as grain yield is affected by the rainfall during growing season (Mandic et al., 2017). During the cropping season B 2019, RRS was characterized by rain shortage towards the end of the season (end May - July). Daily rainfall reached almost 0 mm per day during the last two months of the season approximately 60 -115 days after planting) (Figure 4-1) while during the cropping season A2020, the closest meteorology station recorded a very high cumulative rainfall attaining 236.5 mm (Figure 4-2).

2.2 Seasonal and locational agronomic performance

In events of climatic and microclimatic diversity like in Rwanda, testing newly developed and or introduced genotypes across multiple environments and cropping seasons is a requirement in order to avail the most adapted and stably performing lines across or for specific environments. We tested the RILs population in two year-locations. Seasons consisted of season 2019B and season 2020A according to local agriculture season naming (Ndayambaje et al., 2014, Nahayo et al., 2018). Locations consisted of the Rwanda agriculture board research stations in RRS representing the medium altitudes and MRS representing the low altitudes. During the season 2019B, at MRS, the yields ranged from 771 kg ha-1 to 3426 kg ha-1 with a population mean of 2033 kg ha-1 (Table 4-2). A total of 32 US-developed lines including Ellis yielded numerically higher than the top yielding check (Figure 4-3). Population wise, the high yielding local check (Peka6) came in group of the middle-class yielders (Figure 4-4). The earliest line matured at 102 days whereas the latest matured at 120 days and in general plants were resistant to lodging with a lodging score of 1.6. Plant height ranged from 38.3cm to 65.7 cm with an average of 51.2 cm. The performance of the population and the entire trial at RRS was not as high as at MRS. In fact, the yield of the most performing line SCN-031 was only 714 kg ha-1 (Figure 4-5) due a shortage of rains at critical stages of the vegetative growth and maturation (Figure 4-1). The Pearson correlation analysis between traits at MRS during season B2019 (Table 4-3, below the diagonal) revealed a highly significant (p0.001) and positive correlation between yield and

119

height (r=0.61), lodging (r=0.24) and maturity (r=0.43). Plant height and lodging (r=0.3) and maturity (r=0.39) were positively correlated. The correlation between lodging and maturity was non-significant. The performance, at RRS during season 2020A, was comparatively low to the MRS but higher than RRS season B2019. The local check Peka-6 was the highest performer with 1630 Kgha-1 (Figure 4-6 and Table 4-3. A Number of the US-developed lines such as Ellis, SCN-138, SCN- 006 SCN-072, SCN-038 consistently came in the top yielders. The earliest lines matured at 120 days whilst the latest was 130 days. Plant height ranged from 27.4 to 43.5 with a mean of 19.7 while the resistance to lodging was at mean of 1.2. As for the relationships between agronomic traits at RRS (Table 4-3, above the diagonal), all traits were positively correlated except for the correlation between plant yield and days to maturity (non-significant).

2.3 Across seasons and environments performance

Multi-environment trials helps to evaluate the performance of cultivars in a given environment by quantifying GXE effects and determining cultivar stability (Gurmu et al., 2010). We analyzed the performance of the US-developed lines by combining data from the first season (B2019) at MRS and second (A2020) season at RRS. Data from B2019 at RRS were dropped out due to the very apparent effect of droughts on the overall performance. Similarly, no data from A2020 at MRS was collected after the whole trial was swept out by inundations (the station is located in the valley). The combined performance of the RIL revealed a significant GXE effect for yield and lodging (Table 4-4). The average mean yield was 1393.7 while the maximum yield was 2239.4. The days to maturity ranged from 113 to 125. The mean recorded plant height was 31.5 while the average plant lodging was 1.4.

3. Discussion

In this study, we investigated the general performance of a US-developed population of recombinant inbred lines introduced into Rwandan conditions. The population consisted of 115 lines plus 1 of its parental lines and 2 elite checks. They were tested against two popular local checks over 2 consecutive agricultural seasons. Due to environmental cues, trials at some

120

locations completely or partly failed and data were dropped out. Here we discuss the main findings from collected data.

3.1 The yield potential could be doubled

Yield is the primary breeding objective pursued by breeders in potentially successful varieties. At MRS during 2019B season, our population outperformed the local checks and as the top yielders could double the standard average yields obtained by the farmers locally. In fact, official data from the institute of statistics reports the country’s annual average yields as around 0.5 Mt ha-1 (NISR, 2018a; National Institute of Statistics of Rwanda, 2020). The same performance was reached by Rurangwa et al., (2018) only after applying rhizobia and different types of fertilizers including manure and potassium (Rurangwa et al., 2018). Our yields were largely higher than what was obtained in farmers’ field after applying only rhizobia inoculation and urea (Nsengiyumva et al., 2017) and even those obtained during the pan-African soybean variety trials across three environments (Soybean Innovation Lab, 2019). Nevertheless, the average mean yield of our population was in the range of the reported performance under optimum management (RAB, 2016). The poor performance at RRS during 2019B season is mainly explained by rains shortage around the podding (R3-R4) and filling stages (R5-R6) (Pedersen, 2004) (Figure 4-1). In fact, soybean development and yields may be limited by water stress during critical development stages, especially at the germination-emergence and flowering-grain filling stages (Rodrigues et al., 2017; Desclaux et al., 2000) . Yield loss estimates over 56 years could reach up to 21.8% due to severe droughts (Wang et al., 2020). In terms of losses due to water deficits at critical growth stages, Sioni and Kramer, (1977) recorded up to 20% reduction in pod number, as a result of flower abortion. According to the authors, seeds per pod and seed size, seed weight and yield were also impacted (Sionit and Kramer, 1977; Desclaux et al., 2000)` During the cropping season 2020 A, due to heavy rains (Figure 4-2), the low altitude trial was completely swept out by the rains, therefore our data collection and analysis only focused on medium altitude location, in RRS. The 2020A season was generally characterized by higher than the normal rainfall countrywide which could have resulted in higher yields. However, at RRS, there was an opposite trend of soybean production to increasing rainfall since 2010 (Mikova,

121

2015) even though, overall, the prediction models suggest that climate change were unlikely to affect soybean production in Africa in the near future (Foyer et al., 2019). Nevertheless, the obtained yields at RRS were in the range of what was reported earlier (Nsengiyumva et al., 2017) and even still higher than what was obtained during the pan-African soybean variety trials (Soybean Innovation Lab, 2019). Days to maturity is an important consideration while deciding on the right genotypes to be grown. At one hand, early maturity may results from a premature flowering causing short stature of the plants and hence reducing yields (Sinclair and Hinson 1992; Miranda et al., 2020). This short stature was reported to be the result of very short days (~12 hours of photoperiod) during the vegetative stage also called juvenile stage when temperate genotypes are introduced to the tropics (Miranda et al., 2020). Soybean breeders could overcome this natural constraint and maintain high yields through the genetic exploitation of the long juvenile (Lj) trait (Miranda et al., 2020; Hartwig and Kiihl, 1979; James and Lawn, 2011). In fact, the J gene allows the plant to delay flowering and have longer vegetative growth even when exposed to reduced photoperiods of 12 hrs (Bäurle and Dean, 2006). This could be another area of future research with the US- developed population in a bid to manipulate its yield components and adaptability in Rwanda. Similarly, it is generally assumed that maturity, positively correlate yield estimates in soybeans and later maturing cultivars will mostly out-yield earlier maturating cultivars (Moreira et al., 2019). At the other hand, early maturing may be preferred against late maturing varieties especially in areas experiencing harsh environmental conditions. Soybean breeders are then called upon to develop early maturing varieties while maintaining acceptable level of yields (Yamaguchi et al., 2015; Chigeza et al., 2019). Therefore, breeders often times make a tradeoff between high yields and early maturity especially in semi-arid areas that experience rain shortage and prolonged droughts. The positive correlations were previously reported between yield and plant height (Li et al., 2020; Gawęda et al., 2020), lodging (Mansur et al., 1996) and days to maturity (Abugalieva et al., 2016; Balla and Ibrahim, 2017; Moreira et al., 2019). Plant height was higher in MRS probably due to more vegetative growth which may explain higher yields obtained at MRS. In fact, this positive relationship between yield and vegetative growth was reported previously

(Wang et al., 2020; Diondra et al., 2008; Ruiz-vega, 1984; Morsy, 2014) . However, in other studies no correlation was found between yield and height, as both traits are polygenic and

122

complex traits (Diondra et al., 2008; Shree et al., 2018). As for the relationship between yield and plant lodging, the later was found to only affect yield negatively depending on the stage of the crops or management options (Shapiro and Flowerday, 1987; Ramli et al., 1980; Wilcox and Sediyama, 1981; Leffel, 1961; Xiang et al., 2013). Elsewhere, lodging was found to have no effect on seed yield (Kabelka et al., 2004; Tefera et al., 2009). This could be due to polygenic nature of resistance to lodging (Chen et al., 2017). The positive correlation between yield and days to maturity could be phenologically explained by the production of more number of branches/plant, number of pods/plant (Akram et al., 2014). In, summary, our data showed that, a selection made from the US-developed to yielders may offer the best varieties fitting in the low lands of Rwanda. Equally, some lines could be grown in the middle altitudes considering their earliness and comparable yields to the currently available variety options.

3.2 The genotype x environment (GXE) significantly affect the overall performance

The combined performance revealed a significant GXE effect for yield and lodging (Table 4-4). The maximum yields was relatively comparable to the yields obtained in previous studies (Rurangwa et al., 2018). The latest maturing line of our population matured only 2 days after the earliest maturing lines, ‘Caviness’ from the pan-African soybean variety trial (Soybean Innovation Lab, 2019). However, the population members would be classified in the class of medium to late maturating varieties according to Chiegeza et al., 2019 (Chigeza et al., 2019). Surprisingly, the combined cross environments trials did not reveal a significant GXE z value for days to maturity, suggesting that maturity date was less influenced by environmental cues. Yet, according to Mourtniz and Conley, (2017), maturity group, a direct controller of maturity date in the temperate zone, is determined by two abiotic factors namely photoperiod and temperature (Mourtzinis and Conley, 2017). This could an indication that, the concept of maturity groups does not apply in the sub Saharan Africa (Miranda et al., 2020) or at least need to be continuously monitored (Mourtzinis and Conley, 2017) since temperate varieties, when grown in the tropics, flower too early to allow for optimum vegetative growth, regardless of maturity group (Miranda et al., 2020). Alternatively, this could be explained by a strong involvement of the E genes that control time to flowering and maturity, but this should be confirmed by the investigation of the RILs E locus.

123

Taken together, the cross-environments and seasons, data analysis suggest that in general the US-developed lines fit more in the low altitudes of Rwanda than the medium altitudes. That the same population revealed an equal performance as the local checks in the medium altitudes.

4. Conclusion

Soybean is becoming a priority crop in Rwandan agriculture. The current soybean yields, and production nationwide do not meet the existing market given the growing soybean-based products and by products demand. The Rwandan soybean program is characterized by a narrow germplasm with a low yield potential than cannot provide enough and stable production to meet the national demand. Considering the challenge of adaptation of the temperate soybean lines when taken to the tropics, we introduced and tested the performance of US-developed maturity group V lines in Rwandan conditions. The lines were tested in two major soybean growing areas; the low and medium altitude and tested them over two consecutive cropping seasons (A and B). The study showed that the US-developed population could outperform the local checks and double the current yield potential. It showed further that the US-developed lines perform better in the low altitudes than the middle altitudes. There was a highly significant GXE effect on yield. The US-developed lines matured in the same ranged as the earliest local check and could be classified as medium-to-late maturing lines. For getting a better insight to the adaptation and stability of these lines, they should be tested in more environments and over many seasons. Quality traits should be also investigated in order to get a comprehensive profile of the US- developed population in Rwandan conditions. Further, it would be important to investigate the effect of maturity (E) and long juveniles (J) genes on the adaptation of this population.

5. Material and Methods

5.1 Plant material

A total of 120 US-developed soybean lines were tested in two different research stations in

Rwanda over two agricultural seasons. Among them 115, F5:7 RILs derived from a cross between TN09-029 and NCC05-1168 at the University of Tennessee, Soybean Breeding and Genetics Program. These lines were in the range of late IV- V maturity groups (MG). Line TN09-029, is a late IV MG, highly resistant to SCN race 2, 3 and 5 (Gillen and Shelton, 2012) while Line

124

NCC05-1168, is an early V MG, resistant to stem canker and SCN race 3 (Gillen and Shelton, 2011). In addition, standards consisting of three popular checks grown in Tennessee, Ellis (Pantalone et al., 2017) , Osage (Chen et al., 2007) and the recently released high yielding and SCN resistant check TN09-008 (Pantalone et al., 2018) . To complete the trial, the two most prominently grown local checks Peka-6 and SB 24 were included in the test.

5.2 Field experiments

The trial was set up in four row plots of 5m x 1.60 m in two locations at Muyumbu research station (MRS) and Rubona research station (RRS) representing the low and mid altitudes agroecological zones respectively. Soil types in low altitudes consist of ferralsols, nitosols and vertisols while major soil types in medium altitudes conist of nitosols, ferralsols, leptisols and lexisols (Ndayambaje et al., 2014). MRS is located at 1° 59' 44.8S/30° 14' 35E and 1361m asl while RRS is located at 2° 27' 36'' S/ 29° 45' 36'' E and 1706 asl. The experiments were conducted in Season B 2019 and Season A 2020. In season B 2019, planting was done from 20th to 21st March at MRS and from 4th to 5th April at RRS. In season A 2020, the planting was done from 11th to 12th October 2020 at RRS and 14th to 15th October 2020 at MRS. The trials were set up in a randomized complete block design (RCBD) in 3 replications.

5.3 Agronomic traits measurement

For each entry, flower color and pubescence color notes were taken. Plant height measured from the ground to the highest node in cm was recorded. Plant lodging notes were measured at the scale of 1= upright position to 5= prostrate position. Date to maturity was recorded in days after germination. The seed yield at 13% moisture content was measured by an electronic scale.

125

References

Abugalieva S, Didorenko S, Anuarbek S, Volkova L, Gerasimova Y , Sidorik I, and Turuspekov Y (2016) Assessment of soybean flowering and seed maturation time in different latitude regions of kazakhstan. PLoS ONE 11: 1–12. American Soybean Association (2018) SOYSTATS 2018 :A Reference Guide to Important Soybean Facts & Figures. http://soystats.com/wp-content/uploads/SoyStats2018.pdf. Akram M, Fares WM, Fateh HS, Rizk AM (2014) Genetic variability , correlation and path analysis in soybean. Egyptian Journal of Plant Breeding 15:89-102 Assefa Y, Purcell LC, Salmeron M, Naeve S, Casteel SN, Kovács P, Archontoulis S, Licht M, Below F, Kandel H, Lindsey LE (2019) Assessing variation in US soybean seed composition (Protein and Oil). Frontiers in Plant Science 10:298. Balla MY, SE (2017) Genotypic correlation and path coefficient analysis of soybean [Glycine Max (L.) Merr.] for yield and its components. Agricultural Research and Technology 7: 3– 7. Bäurle I, Dean C (2006) The timing of developmental transitions in plants. Cell 125: 655–64. Binagwaho A, Agbonyitor M., Rukundo A., Ratnayake N., Ngabo F., Kayumba J., Dowdle B., Chopyak E (2018) Underdiagnosis of malnutrition in infants and young children in Rwanda: implications for attainment of the Millennium Development Goal to end poverty and hunger. International Journal for Equity in Health 10:1-6. Chen H, Yang Z, Chen L, Zhang C, Yuan S, Zhang X, Qiu D, Wan Q, Zhan Y, Chen S, Shan Z (2017) Combining QTL and candidate gene analysis with phenotypic model to unravel the relationship between lodging and related traits in soybean. Molecular Breeding 37: 43. Chen P, Sneller CH, Mozzoni LA, Rupe JC (2007) Registration of ‘Osage’ soybean. Journal of Plant Registrations 1: 89–92. Chigeza G, Boahen S, Gedil M, Agoyi E, Mushoriwa H, Denwar N, Gondwe T, Tesfaye A, Kamara A, Alamu OE, Chikoye D (2019) Public sector soybean (Glycine Max) breeding: advances in cultivar development in the African tropics. Plant Breeding 138: 455–64. Desclaux D, Huynh TT, Roumet P(2000) Identification of soybean plant characteristics that

126

indicate the timing of drought stress. Crop Science 40: 716–22. Diondra W, Ivey S, Washington E, Woods S, Walker J, Krueger N (2008) Is there a correlation between plant height and yield in soybean .7: 70-76. Foyer CH, Siddique KHM, Tai APK, Anders S, Fodor N, Wong FL, Ludidi N, Chapman MA, Ferguson BJ, Considine MJ, Zabel F (2019) Modelling predicts that soybean is poised to dominate crop production across Africa. Plant Cell and Environment 42: 373–385. Gawęda D, Nowak A, Haliniarz M, Woźniak A (2020) Yield and economic effectiveness of soybean grown under different cropping systems.” International Journal of Plant Production:1-11. Gillen AM, Shelton GW (2011) Uniform Soybean Tests: Southern States. Gillen AM, Shelton GW (2012) Uniform Soybean Tests: Southern States. 12. Gurmu F, Mohammed H, Alemaw G (2010) Genotype X Environment interactioins and stability of soybean for grain yield and nutrition quality. African Crop Science Journal 17: 87–99. Hartwig EE, Kiihl RAS (1979) Identification and utilization of a delayed flowering character in soybeans for short-day conditions. Field Crops Research 2: 145–51. Hishamunda N, Jolly CM, Engle CR (1998) Evaluation of small-scale aquaculture with intra- rural household trade as an alternative enterprise for limited resource farmers: The case of Rwanda. Food Policy 23: 143–54. James AT, Lawn RJ (2011) Application of physiological understanding in soybean improvement. II. Broadening phenological adaptation across regions and sowing dates.” Crop and Pasture Science 62: 12–24. Kabelka EA, Diers BW, Fehr WR, Leroy AR, Baianu IC, You T, Neece DJ, Nelson RL (2004) Putative alleles for increased yield from soybean plant introductions. Crop Science 44: 784–791. Leffel RC (1961) Plant lodging as a selection criterion in soybean breeding. Crop Science 1: 346-349. Li M, Liu Y, Wang C, Yang X, Li D, Zhang X, Xu C, Zhang Y, Li W, Zhao L ( 2020) Identification of traits contributing to high and stable yields in different soybean varieties across three chinese latitudes. Frontiers in Plant Science 10: 1–14.

127

Mandic V, Bijelic Z, Krnjaja V, Simic A, Ruzic-Muslic D, Dragicevic V, Petricevic V (2017) The rainfall use efficiency and soybean grain yield under rainfed conditions in vojvodina. Biotechnology in Animal Husbandry Biotehnologija u Stocarstvu 33: 475–86. Mansur LM, Orf JH, Chase K, Jarvik T, Cregan PB, Lark KG (1996) Genetic mapping of agronomic traits using recombinant inbred lines of soybean. Crop Science 36: 1327–36. Mbuza F, Manishimwe R, Mahoro J, Simbankabo T Nishimwe K (2017) Characterization of broiler poultry production system in Rwanda. Tropical Animal Health and Production 49: 71–77. Mikova K (2015) Effect of climate change on crop production in Rwanda. Earth Sciences 4: 120. MINAGRI, (2018) Strategic Plan for Agriculture Transformation 2018-24. Miranda C, Scaboo A, Cober E, Denwar N, Bilyeu K (2020) The effects and interaction of soybean maturity gene alleles controlling flowering time, maturity, and adaptation in tropical environments. BMC Plant Biology 20: 1–13. Munezero M, Mulyungi P, Wanzala FN, Ntaganira E, and Nsengiyumva A. (2018) Effect of social-economic factors on profitability of soya bean in Rwanda. International Journal of Scientific and Engineering Research 9:828-833. Moreira FF, Hearst AA, Cherkauer KA, Rainey KM (2019) Improving the efficiency of soybean breeding with high-throughput canopy phenotyping. Plant Methods 15: 1–9. Mourtzinis S, Conley SP (2017) Delineating soybean maturity groups across the United States. Agronomy Journal 109: 1397–1403. Mugabo J, Tollens E, Chianu J, Obi A, Vanlauwe B (2014) Resource use efficiency in soybean production in Rwanda 5: 116–22. Mutimura M, Lussa AB, Mutabazi J, Myambi CB, Cyamweshi RA, Ebong C (2013) Status of animal feed resources in Rwanda. Tropical Grasslands-Forrajes Tropicales 1: 109–10. Nahayo L, Habiyaremye G, Kayiranga A, Kalisa E, Mupenzi C, Nsanzimana DF (2018) Rainfall variability and its impact on rain-fed crop production in Rwanda. American Journal of Social Science Research 4: 9–15. National Institute of Statistics of Rwanda (2020) Seasonal Agricultural Survey. National Institute of Statistics of Rwanda (2010) Crop Assesment Report. National Institute of Statistics of Rwanda (2018a) Seasonal agriculture survey.

128

National Institute of Statistics of Rwanda (2018b) Seasonal agriculture survey National Institute of Statistics of Rwanda (2019) Seasonal agriculture survey report: Rwanda Ndayambaje JD, T. Mugiraneza T, Mohren GMJ (2014) Woody biomass on farms and in the landscapes of Rwanda. Agroforestry Systems 88: 101–24. Niyibituronsa M, Onyango AN, Gaidashova S, Imathiu SM, Uwizerwa M, Wanjuki I, Nganga F, Muhutu JC, Birungi J, Ghimire S, Raes K (2018) Evaluation of mycotoxin content in soybean (Glycine Max L.) grown in Rwanda.” African Journal of Food, Agriculture, Nutrition and Development 18: 13808–13824. Nsengiyumva A, C K Byamushana CK, Rurangwa E (2017) Evaluation of the response of two soybean varieties to rhizobia inoculation for improved Biological Nitrogen Fixation.” International Journal of Scientific & Technology Research 06: 109–12. Nyina-Wamwiza L, Bernard Wathelet B, Kestemont P (2007) Potential of local agricultural by-products for therearing of African catfish Clarias Gariepinus in Rwanda: effects on growth, feed utilization and body composition. Aquaculture Research 38: 206–14. Pantalone V, Smallwood C, Fallen B (2017) Development of ‘Ellis’ Soybean with High Soymeal Protein, Resistance to Stem Canker, Southern Root Knot Nematode, and Frogeye Leaf Spot.” Journal of Plant Registrations 11: 250-255. Pantalone V, Smallwood C, Fallen B, Hatcher CN, Arelli P (2018) Registration of ‘TN09- 008’ Soybean Cyst Nematode rResistant cultivar. Journal of Plant Registrations 12: 309– 313. Pedersen P (2004) Soybean Growth and Development. Iowa State University Extension. Ramli B, Noor M, Caviness CE (1980) Influence of induced lodging on pod distribution and seed yield in soybeans. Agronomy Journal 72:904–906. Rodrigues TR, Casaroli D, Evangelista AWP, Júnior JA (2017) Water availability to soybean crop as a function of the least limiting water range and evapotranspiration1. Pesquisa Agropecuária Tropical 47: 161–67. Ruiz-vega J (1984) Soybean phenology and yield as influenced by environmental and management factors. PhD dissertation 180. Rurangwa E, Vanlauwe B, Giller KE (2018) Benefits of inoculation, P fertilizer and manure on yields of common bean and soybean also increase yield of subsequent maize. Agriculture, Ecosystems and Environment 261: 219–29.

129

Rwanda Agriculture Board, (2016) Feasibility study for implementaion of the project on increased soybean production and productivity for sustainable markets. Shapiro CA, Flowerday AD (1987) Effect of simulated lodging on soybean yield. Journal of Agronomy and Crop Science 158: 8–16. Shree Y, Ram S, Bhushan S, Verma N, Ahmad A (2018) Correlation between yield and yield attributing traits in soybean (Glycine Max (L.) Merrill ). Journal of Pharmacognosy and phytochemistry :298–301. Shurtleff W, Akiko A. (2010) History of Soybeans and Soyfoods in Canada (1831-2010): Extensively Annotated Bibliography and Sourcebook. Sinclair TR, Hinson K (1992) Soybean flowering in response to the long‐juvenile trait.” Crop Science 32: 1242–48. Sionit N, Kramer PJ (1977) Effect of water stress during different stages of growth of soybean. Agronomy Journal 69: 274–78. Soybean Innovation Lab, (2019) Pan-African soybean variety trials. Tefera HA, Kamara Y, Asafo-Adjei B, Dashiell KE (2009) Improvement in grain and fodder yields of early-maturing promiscuous soybean varieties in the guinea savanna of Nigeria. Crop Science 49 (6): 2037–42. Wang C, Linderholm HW, Song Y, Wang F, Liu Y, Tian J, Xu J, Song Y, Ren G (2020) Impacts of drought on maize and soybean production in northeast China during the past five decades. International Journal of Environmental Research and Public Health 17:2459. Wilcox JR, Sediyama T (1981) Interrelationships among height, lodging and yield in determinate and indeterminate soybeans. Euphytica 30: 323–26. World Bank (2015) Agriculture global practice technical assistance paper: RWANDA agriculture sector risk assessment. World Bank: 96290-RW Xiang D, Yu X, Wan Y, Guo K, Yang W, Gong W (2013) Responses of soybean lodging and lodging-related traits to potassium under shading by maize in relay strip intercropping system. African Journal of Agricultural Research 8: 6499–6508. Yamaguchi N, Kurosaki H, Ishimoto M, Kawasaki M, Senda M, Miyoshi T (2015) Early- maturing and chilling-tolerant soybean lines derived from crosses between Japanese and Polish cultivars. Plant Production Science 18: 234–39.

130

Appendix

Table 4-1: Crop seasons in Rwanda.

Seasons YEAR1 YEAR2 Months Months J F M A M J J A S O N D J F M A M J J A S O N D Season A Soybean Season B Soybean Season C Source: Adapted from (World Bank 2015)and (NISR 2018b). Note: the crop seasons of the current starts by September of the previous year.

131

Table 4-2: Significant differences among genotypes (G), (P=0.05), for yield, plant height, maturity date, lodging at MRS during season B 2019. Genotype std. LSD Resid. Coeff. Trait P value mean min max dev.a value Var. Var. b Yield (Bu/Acre) 0.0021 2033.0 771.6 3426.2 357.0 786.9 1 0.05 Height 0.0446 51.2 38.3 65.6 6.0 14.9 0.9 1.94 Maturity (Days After Planting) 0.0454 107.3 101.6 120.3 2.0 4.9 0.9 0.89 Lodging 0.7908 NS 1.6 1.6 1.6 1.6 0.3 0.05 14.15 a std. deviation of LSMEANs b (Root MSE * 100)/mean NS: non-significant

132

Table 4-3: Pearson correlation between traits at MRS during season B2019 (below the diagonal) and at RRS during season A2020 (above the diagonal). Yield Height Lodging Maturity Yield 0.21(***) 0.18(**) -0.09(NS) Height 0.61(***) 0.4(***) 0.023(***) Lodging 0.24(***) 0.3(***) 0.28(***) Maturity 0.43(***) 0.39(***) 0.04(NS) NS: non-significant, (***): significant at p0.001, (**): significant at p0.05

133

Table 4-4: Significant differences among genotypes (G), environments (E) and, genotypes and environments (GXE) interaction (P=0.05), for yield, plant height, maturity and lodging at MRS and RRS during seasons B 2019 and A2020. Trait Genotype GxE Z mea mi ma std. LSD Resid. Coeff. P value Value n n x dev.a value Var. Var. b Yield (Bu/Acre) 0.001 0.00030 139 66 223 250.1 554.5 123970 25.3 3.7 3.0 9.4 .0 Height 0.003 0.11170 39.2 31. 51.9 4.0 9.0 50.7 18.1 3 Maturity (Days <.0001 0.08320 113. 10 125. 1.7 3.2 6.3 2.2 After Planting) 6 9.9 3 Lodging 0.164 0.00770 1.4 1.4 1.4 1.4 0.3 0.1 16.6 a std. deviation of LSMEANs b (Root MSE * 100)/mean

134

Figure 4-1: Daily rainfall at RRS during season B 2019.

135

Figure 4-2: Cumulative rainfall during season A 2020.

136

Figure 4-3: Thirty-three (33) top yielders at MRS during season B 2019.

137

Figure 4-4: Frequency distribution for seed yield. Checks are shown on the top of their yield class.

138

Figure 4-5: Fifteen (15) top yielders at RRS during season B 2019.

Figure 4-6: Ten (10) top yielders at RRS during season B 2020.

139

Chapter 5: Interactions of gene expression, alternative splicing, and DNA methylation in determining nodule identity

140

This chapter was published in Plant Journal by by Daniel Niyikiza, Sarbottam Piya, Pratyush Routray, Long Miao, Won-Seok Kim, Tessa Burch- Smith, Tom Gill, Carl Sams, Prakash R. Arelli, Vince Pantalone, Hari B. Krishnan, Tarek Hewezi. My contributions to this manuscript include performing experiments, data analysis, results interpretation, and writing.

141

Abstract

Soybean nodulation is a highly controlled process that involves complex gene regulation at both transcriptional and post-transcriptional levels. In this study, we profiled gene expression changes, alternative splicing events, and DNA methylation patterns during nodule formation, development and senescence. The transcriptome data uncovered key transcription patterns of nodule development that included 9,669 core genes and 7,302 stage-specific genes. Alternative splicing analysis uncovered a total of 2,323 genes that undergo alternative splicing events in at least one nodule developmental stage, with activation of exon skipping and repression of intron retention being the most common splicing events in nodules compared with roots. About 40% of the differentially spliced genes were also differentially expressed at the same nodule developmental stage, implying a substantial association between gene expression and alternative splicing.

Genome-wide-DNA methylation analysis revealed dynamic changes in nodule methylomes that was specific to each nodule stage, occurred in a sequence-specific manner, and impacted the expression of 1,864 genes. An attractive hypothesis raised by our data is that increased DNA methylation may contribute to the efficiency of alternative splicing. Together, our results provide intriguing insights into the associations between gene expression, alternative splicing, and DNA methylation that may shape transcriptome complexity and proteome specificity in developing soybean nodules.

142

1. Introduction

One distinguishing characteristic of legume plants is their ability to form nitrogen-fixing nodules through symbiotic interaction with specialized soil bacteria collectively known as rhizobia.

Under limited nitrogen availability in the soil, the plants secrete flavonoids that attract compatible rhizobia, which in turn produce a group of lipochitooligosaccharides, known as Nod factors (NFs) (Lerouge et al., 1990). Perception of NFs by plant receptor-like kinases activates symbiosis signaling cascades leading to the formation of infection threads that allow rhizobia passage from the initial infection site at epidermal root hairs to the root cortex where nodule organogenesis takes place (Limpens and Bisseling, 2003, Oldroyd et al., 2011, Popp and Ott,

2011). Depending on host plants, two types of nodules can be recognized: determinate and indeterminate. The indeterminate nodules such as those formed on Medicago truncatula maintain active apical meristems, which constantly produces new cells that gradually become infected with rhizobia, leading to the formation of successive zones at various developmental stages. In contrast, determinate nodules such as those formed on soybean, lack a persistent meristem, and hence do not exhibit a gradient of developmental states but rather contain nitrogen‐fixing bacteroids (differentiated bacteria) at a similar phase of symbiosis (Ferguson et al., 2010). The determinate nodules have a life span of few weeks that can be divided into three main stages: formation, development, and senescence. Nodule formation involves active cell division and in this stage nitrogen fixation begins. At the development stage, the nodules are fully developed and nitrogen fixation is at its highest level. During the last stage, there is a drastic decline in nitrogen fixation and the nodules undergo senescence.

143

The identity of nitrogen-fixing nodules is determined by a unique gene expression pattern, which is established through the function of a suite of cis-binding transcription factors in association with other transcriptional and post-transcriptional modifiers. The transcriptomes of soybean nodules have been described in a number of studies, providing compelling evidence for the importance of various signal transduction pathways, suppression of defense responses, cell wall modifications, regulation of primary and secondary metabolic pathways, and phytohormones for nodule development (Lee et al., 2004, Brechenimacher et al., 2008, Libault et al., 2009, Libault et al., 2010, Severin et al., 2010). However, most of these studies have focused on one developmental stage and lack the resolution necessary for advancing our understanding of the many biological processes and signaling pathways that are involved in nodule formation, development, and senescence.

There is accumulating data indicating that post-transcriptional regulation of mRNA, by miRNAs and alternative splicing, plays a key regulatory role in various biological processes including nodulation. For example, 139 soybean miRNA genes have been classified as nodule-regulated miRNAs, which have the potential to regulate the expression of a significant number of target genes at different developmental stages (Yan et al., 2015). Alternative splicing is also a fundamental regulatory mechanism in plants that not only enhances transcriptome and proteome diversity but also contributes to the specificity and functions of genes. The possibility of generating multiple mRNA variants from a single gene through differential use of introns and exons may impact mRNA stability and protein abundance. For example, intron retention, the most common splicing event in plants (Filichkin et al., 2010, Zhang et al., 2010, Marquez et al.,

2012, Shen et al., 2014, Thatcher et al., 2014, Wang et al., 2014), is frequently producing non- functional mRNAs, which are immediately degraded through the nonsense-mediated mRNA

144

decay (NMD) surveillance system (Kalyna et al., 2012, Drechsel et al., 2013). Differential usage of exons that contain nuclear localization signals or protein-protein interaction domains can also impact protein localization and protein-protein networks. Also, alternative usage of exons and introns has the potential to modulate the targeting efficiency of miRNAs (Yang et al., 2012,

Thatcher et al., 2014). Recent reports have demonstrated the involvement of alternative splicing in various developmental programs, determining tissue identities, and stress responses.

Alternative splicing of individual genes impacts nodule organogenesis. These genes included the

CCAAT-binding transcription factor MtHAP2-1, the SymRK-interacting protein 1 (SIP1), and the Rj2 resistance gene (Combier et al., 2008a, Wang et al., 2013a, Tang et al., 2016).

Nevertheless, the level of global alternative splicing and its dynamic during nodule development remains elusive.

Consistent with the notion that genetic mechanisms function in tight association with epigenetic mechanisms, it has been recently demonstrated that epigenetic modifications, including DNA methylation and repressive histone marks contribute to transcriptional control of gene expression during nodule organogenesis in Medicago truncatula (Satge et al., 2016, Pecrix et al., 2018).

While these studies point to DNA demethylation as the major driving force of epigenetic modifications mediating gene expression reprogramming that controls indeterminate nodule differentiation, the role of this epigenetic mark in the regulation of gene expression during the development of determinate nodules remains unknown. DNA methylation frequently occurs in protein-coding genes and transposable elements (TEs) in the CG, CHG and CHH sequence contexts. The CG methylation is carried-out by the activity of methyltransferase enzyme MET1, whereas CHG methylation is catalyzed by the chromomethylases CMT2 and CMT3 (Law and

Jacobsen, 2010, Kim and Zilberman, 2014). CHH methylation is established de novo through the

145

function of DOMAINS REARRANGED METHYLTRANSFERASEs (DRMs) and the RNA- directed DNA methylation (RdDM) pathway (Matzke and Mosher, 2014). Cytosine methylation in the body and regulatory regions of protein-coding genes has been shown to regulate gene expression (Rambani et al., 2015, Hewezi et al., 2017). Methylation of TEs located within or nearby genes can also impact their transcriptional activity in various developmental and stress contexts (Secco et al., 2015, Hewezi et al., 2017, Piya et al., 2017).

Recent progress in high-throughput RNA sequencing (RNA-seq) technology and mapping of short reads has revolutionized not only gene expression quantification but also facilitated the identification of RNA modifications including splicing variants. Also, the development of whole

-genome bisulfite sequencing (BS-seq) has facilitated profiling genome-wide cytosine methylation at single-base resolution. In this study, we used RNA-seq and BS-seq to profile gene expression, alternative splicing variants, and DNA methylation changes and their interactions during three developmental stages of soybean nodules. Our analysis revealed that nodule development is not only associated with massive gene expression changes but also involves remarkable alternative splicing, particularly during the early nodule developmental stage. We demonstrate that differential exon and intron usage can impact the targeting efficiency of microRNAs and generate structural and functional rearrangements of protein domains that may impact gene expression and function. Genome-wide-DNA methylation analysis revealed that nodule methylomes undergo dynamic changes in a temporal and stage-specific manner and that

CHH hypermethyaltion and CHG hypomethylation are the key determinants of nodule methylomes. In summary, our analyses provide unprecedented insights into the interactions between the transcriptome, spliceome, and methylome of the developing nodules and expand our understanding of gene regulation during nodulation.

146

2. Results

2.1 Anatomical features characteristic of three distinct developmental stages of soybean nodules

Soybean nodules harvested at 12, 22, and 36 day after inoculation (dpi) with Bradyrhizobium diazoefficiens (strain USDA110) were examined by light and electron microscopy. Anatomical features characteristic of the three distinct stages in the lifecycle of a determinate nodule were observed (Figure 5-1). Light microscopic examination of 12 dpi soybean nodules revealed a central zone of actively dividing cells that is prominently stained by hematoxylin and eosin

(Figure 5-1a). The blue-purple stain reflects the abundance of nucleic acids in these cells. The outer cortex of parenchymatous cells and a layer of scleriod cells surrounded this region.

Numerous vascular bundles were also seen in the outer cortex of the nodule. Electron microscopic observation of 12 dpi soybean nodules revealed the presence of a prominent nucleus in the infected cells (Figure 5-1b). These cells contained granular cytoplasm and numerous organelles including mitochondria and endoplasmic reticulum. Bacteria were also seen in the infected cells that were enclosed by peribacteriod membrane. Light microscopic examination of

22 dpi soybean nodules revealed a prominent bacteria-infected central zone that stained pink with hematoxylin and eosin (Figure 5-1c). Most of the cells in this region appear to be completely filled with bacteria. Ultrastructural observation of this region clearly showed the presence of densely packed bacteria, which were differentiated into bacterioids (Figure 5-1d).

These bacterioids, which contained prominent poly--hydroxybutyrate granules, were enclosed by peribacteriod membranes (symbiosomes). In contrast to the 12 dpi nodules, the symbiosomes in the 22 dpi nodules enclosed several bacterioids. Prominent peroxisomes were seen in uninfected cells. Light microscopic examination of the 36 dpi soybean nodules revealed the presence of small vesicles inside the infected cells and small starch grains (dark particles) lining

147

the outer cell walls of uninfected cells (Figure 5-1e) as confirmed by Lugol's iodine staining.

Ultrastructural observation of thin sections of the 36 dpi nodules showed loss of cell organelle integrity and the presence of several vacuolar structures (Figure 5-1f). These vacuolar structures presumably represent the disintegrated symbiosomes resulting from nodule senescence.

In addition, the nitrogen fixation activity of soybean nodules harvested at 12, 22, and 36 dpi was estimated in three biological samples using the acetylene reduction method. Acetylene reduction activity was barely detected in 12 dpi nodules, highest in 22 dpi nodules (4.8 ± 0.4 μmol/g/hr), and declined in 36 dpi nodules (2.0 ± 0.2 μmol/g/hr). Together, these data show that the 12 dpi nodules are not fully differentiated for functional nitrogen fixation, whereas the 22 dpi nodules are fully developed showing high nitrogen fixation activity, which declines at 36 dpi where the nodules entre the senescence stage. Thus, at 12, 22, and 36 dpi the nodules formed in our experimental system represented the formation, development, and senescence stages of nodulation, respectively.

2.2 Half of soybean genes change expression during nodulation

To gain insight into the cellular reprogramming and gene expression changes that occur during nodule formation, development, and senescence, we performed comprehensive transcriptome analysis of the nodules at 12, 22, and 36 dpi formed by B. diazoefficiens (strain USDA110) infection. At each time point, three biological RNA samples were collected from the nodules and the corresponding root tissues (controls) and used for constructing 18 RNA-seq libraries (ref

Table S1, published paper). Principal Component Analysis (PCA) of RNA-seq data revealed high similarity between the biological replicates and obvious separation between root and nodules samples (Figure 5-2a). Using a false discovery rate (FDR) of 0.01 and log2 fold change

≥ 1, we identified 15,372, 20,057 and 19,883 differentially expressed genes (DEGs) in the 12-,

148

22- and 36-day-old nodules compared to the corresponding root tissues (Figure 5-2b, and Data

S1-3, published paper). The analysis identified 6,620, 8,821, and 9,131 upregulated genes and

8,752, 11,236, and 10,752 downregulated genes, respectively, the in the 12-, 22- and 36-day-old nodules compared with the corresponding root tissues. This resulted in a total of 26,301 unique

DEGs, indicating that about 50% of soybean genes change expression during nodulation. The

DEGs were grouped into five gene sets that exhibit distinct patterns of expression (Figure 5-2c).

The first three gene sets included 2,992, 1,880 and 2,430 DEGs that were specific to the 12-, 22- and 36-d-old nodules, respectively (Figure 5-2b, and Data S1-3, published paper). The fourth gene set included 9,669 nodule core genes that were similarly regulated during all nodule developmental stages (Figure 5-2b, and Data S4, published paper). The fifth gene list included

334 genes with expression patterns that distinguish the 12-d-old nodules from those of the 22- and 36-d-old nodules (Figure 5-2b, and Data S5, published paper). These 334 genes were upregulated in the 12-d-old nodules but were downregulated in both of the 22- and 36-d-old nodules or vice versa. These expression configurations imply that unique and common cellular reprogramming occurs during nodule developmental stages.

2.3 9,669 core genes determine nodule transcriptome identity

We examined whether the expression profiles of the 9,669 common nodule genes (Figure 5-2c) determine the identity of soybean nodules. To this end, we compared the expression patterns of those genes in various soybean tissues and organs using publicly available RNA-seq data. PCA analysis revealed that the nodule core genes could be clustered in a group separated from other tissues and organs (Figure 5-2d). KEGG pathway analysis was then used to assign the core genes to metabolic pathways. The analysis revealed that genes involved in the biosynthesis of secondary metabolites, particularly phenylpropanoid biosynthesis, are the most significantly

149

enriched among the 9,669 core genes (Figure 5-3). We also examined the core gene set for genes showing gradual increases or decreases in expression during the course of nodule development.

We identified 716 and 863 genes showing gradual increases or decreases in the expression, respectively (Figure 5-2e and Data S4, published paper). The 716 genes with continuous increases in the expression were enriched in oxidoreductase activity and metal-binding functions

(Figure 5-4a). The 863 genes with continuous decreases in expression were enriched in functions associated with primary metabolic processes, cell wall organization, and hydrolase activity

(Figure 5-4b). These findings suggest differential requirements for these cellular processes over the course of nodule development.

GO term enrichment analysis of the core genes revealed the molecular functions and biological processes that are common in soybean nodules undergoing various stages of development.

Among the most significantly overrepresented terms we found nitrate transport, xylem development, regulation of meristem growth, negative regulation of programmed cell death, negative regulation of defense response, cell wall biogenesis, regulation of cell size, cell tip growth, regulation of hydrogen peroxide metabolic process, metal ion transport, transmembrane transport, regulation of hormone levels, auxin polar transport, and root morphogenesis (Figure 5-

5). These findings are consistent with previous reports showing the importance of these biological processes during nodule formation and development (Cabeza et al., 2014, Yuan et al.,

2016, Yuan et al., 2017). Together, our analyses suggest that the common set of 9,669 genes represents a core suite of genes that determine nodule transcriptome identity.

2.4 Promoter:GUS activity of core and stage -specific genes confirms RNA-seq findings

We next examined the specificity of the expression patterns of four DEGs showing constitutive or stage-specific expression in the nodule using promoter:GUS assays. Transgenic hairy roots

150

expressing the promoter:GUS fusion constructs of Glyma.07G218400, Glyma.01G130200,

Glyma.14G142700, and Glyma.04G051000 were selected based on the expression of Discosoma sp. red fluorescent protein (DsRed) and then inoculated with B. diazoefficiens USDA110. GUS staining of root tissues at various nodule developmental stages revealed that GUS activity of the core gene Glyma.07G218400 throughout the three nodule developmental stages (Figure 5-6a-c).

In contrast, the stage specific genes Glyma.01G130200, Glyma.14G142700, and

Glyma.04G051000 showed GUS activity exclusive to the 12-, 22-, or 36-d-old nodules, respectively (Figure 5-6d-l), confirming RNA-seq data.

2.5 Each nodule developmental stage preferentially regulates various metabolic pathways

Nitrogen-fixing nodules in various legumes exhibit distinct metabolic phenotypes (Udvardi and

Poole, 2013). Thus, we examined whether particular metabolic pathways are preferentially regulated during various nodule developmental stage in our system. KEGG pathway analysis was implemented using the 2,992, 1,880 and 2,430 DEGs specific to the 12-, 22- and 36-d-old nodules, respectively. The analysis revealed that genes involved in the metabolism of purine and sulfur, and biosynthesis of carotenoids and folate are the most significantly enriched among the

2,992 DEGs unique to the 12-d-old nodules (Figure 5-7a). Similarly, genes involved in the metabolism of biotin, pyruvate, and glycerophospholipid, as well as those involved in synthesis and degradation of ketone bodies and biosynthesis of fatty acid are the most significantly enriched among the 1,880 DEGs specific to the 22-d-old nodules (Figure 5-7b). Genes encoding functions associated with ubiquitin mediated proteolysis and metabolism of ascorbate and aldarate were exclusively enriched among the DEGs unique to the 36-d-old nodules (Figure 5-

7c). These data suggest that the nodule transcriptome is reprogrammed to support specific metabolic demands associated with each developmental stage.

151

2.6 Nodule developmental stages acquired specific signal transduction components that contribute to nodule transcriptome specificity

Careful examination of the DEGs specific to each stage revealed that transcription factors (TFs) constitute a significant portion of the DEGs. There are 263, 160, and 139 differentially expressed

TFs that are unique to the 12-, 22-, and 36-d-old nodules, respectively (Figure 5-8 and Data S6-8, published paper). The majority of these transcription factors belong to the NAC, AP2, bZIP,

ERF, C2H2, bHLH, MYB, WRKY, ZF, and GRAS families (Figure 5-8 and Data S6-8, published paper). Of note is that several members belonging to each of these families are simultaneously expressed at particular nodule stage (Figure 5-8), suggesting that cross-regulation may occur between these family members to ensure tight and specific control over the transcriptional activity of downstream target genes. In addition to transcription factors, we noted that each stage is associated with different types of protein kinases and phosphatases, including leucine-rich repeat (LRR) receptors, mitogen-activated protein kinase (MAPK), dual specificity phosphatases and kinases. Together, these findings suggest that each nodule developmental stage uses specific signal transduction components that contribute to nodule transcriptome specificity.

2.7. Small secretory peptides contribute to nodule transcriptome specificity

Various classes of small secretory peptides have been shown to play vital role in symbiosis

Djordjevic et al., 2015; de Bang et al., 2017; Gautrat et al., 2020). Therefore, we examined stage- specific DEGs for transcripts coding for small secretory peptides. We found that various members of the CLV3/ESR-related (CLE), cysteine-rich, and Rapid ALkalinization Factor

(RALF) peptide classes (Figure 5-9), are specifically differentially regulated at the three nodule developmental stages, suggesting that members of these peptide classes may have stage-specific function during nodulation. Other peptides encoded by the stage-specific DEGs were also

152

identified, namely, ROTUNDIFOLIA like, EPIDERMAL PATTERNING FACTOR-like (EPFL) and Root meristem Growth Factor (RGF) (Figure 5-9). Interestingly, transcripts encoding

Tapetum Determinant 1 (TAPD1) peptides were identified only among the 12-d DEGs, suggesting a unique role of this peptide class during nodule organogenesis. Based on functional data of orthologs, potential roles of these peptides in autoregulation of nodulation (CLEs), infection threads and nodule development (RALFs and ROTUNDIFOLIA), suppression of plant defense responses (cysteine-rich and phytosulfokines), and nodule formation (RGF) can be postulated (Combier et al., 2008b, Mortier et al., 2010, Wang et al., 2015a, Li et al., 2019).

2.8 Nodule development involves highly elaborated transporter activities

Several studies have shown that nodule development involves active nutrient and metabolite transport to and from nodules (reviewed in Udvardi and Poole (2013). Our analysis revealed that nodule development employs networks of genes to carry out highly integrated transport functions. Among the nodule core genes there are more than 400 genes with transport-related functions (Data S4, published paper), including amino acid transporters, sugar transporters, ABC transporters, nitrate transporters, aquaporin transporters, peptide transporters, and various ion transporters for instance. Also, more than 200 genes encoding these transporters were identified among the stage-specific DEGs (Figure 5-10), a finding that may reflect the specialization and differences between the three nodule developmental stages in term of carbon and nitrogen metabolism.

2.9 Active mRNA splicing occurs during the early stage of nodule development

We next explored our RNA-seq data for differentially spliced events using the JunctionSeq package (Hartley and Mullikin, 2016). Using a FDR cut-off of 0.05, we identified 5,564, 1,099,

153

and 542 differentially spliced events in the nodules compared to the root tissues at 12, 22, and 36 dpi, respectively (Figure 5-11a-c, Data S9-11, published paper). These differentially spliced events were classified into two main groups: differentially used exons (DUEs) and intron retention (IR) events. In the 12-d-old nodules, 2,671 DUEs from 1,447 genes were identified

(Figure 5-11a, and Data S9, published paper). Of these DUEs, 497 were significantly highly used in the nodules and 2,174 were significantly highly used in the corresponding root samples

(Figure 5-11a, and Data S9, published paper). In the 22-d-old nodules, the number of DUEs was significantly lower. We identified 563 DUEs from 267 genes with 191 exons being significantly highly used in the nodules and 372 in the roots (Figure 5-11b, and Data S10, published paper). In the 36-d-old nodules, only 365 exons from 159 genes were identified as differentially spliced. Of these 365 DUEs, 128 were significantly highly used in the nodules and 237 in the roots (Figure

5-11c, and Data S11, published paper). Similarly, we detected 2,893, 536, and 275 IR events from 1,748, 308, and 167 genes in the 12-, 22-, and 36-d-old nodules, respectively (Figure 5-11a- c, and Data S9-11, published paper). Remarkably, 83.3 % of the IR events identified in the 12-d- old nodule were suppressed in comparison with root tissues (Data S9, published paper). In the

22-, and 36-d-old nodules only 61% and 51% of the identified IR events were suppressed (Data

S10 and 11, published paper). Also, we found that more than 99% (3,688/3,696) of the IR events produce premature termination codons (Data S9 - 11, published paper). These results indicate that active pre-mRNA splicing occurs during the early stage of nodule development with activation of exon skipping and repression of IR events as compared with root tissues.

In order to validate the alternative splicing data we quantified the abundance of 2 differentially spliced events at each nodule developmental stage using qPCR with primers designed to specifically amplify the targeted exons or introns. The relative expression levels of nodules and

154

root tissues at various developmental stages confirmed all six nodule splicing events with only slight differences in the abundance levels between qPCR and RNA-seq observed (Figure 5-11d- f). These results validate the splicing events detected from our RNA-seq data.

GO term enrichment analysis of genes containing differentially spliced exons or IR events revealed that the most predominant molecular function categories are those involved in oxidoreductase activity (Figure 5-12), suggesting that differential splicing may contribute to fine- tuning redox signaling during nodule development. KEGG pathway analysis revealed that genes involved in biosynthesis of primary and secondary metabolites, ubiquitination, plant-pathogen interactions, splicing, RNA transport, and protein processing are significantly enriched among the DSGs (Figure 5-13).

2.10 Splicing variants of certain genes were differentially regulated during nodule development

Analysis of the DUEs at each nodule developmental stage revealed unique and common splicing variants. For example, we identified 142 DUEs common to the 12 and 22-d-old nodules; 236 common to the 22- and 36-d-old nodules; and 96 common to the 12- and 36-d-old nodules. In addition, 70 exons belonging to 35 genes were found to have consistently undergone differential splicing during the three nodule developmental stages (Figure 5-11g). Therefore, we examined whether splicing variants of certain genes were differentially regulated during nodule development. Our results indicate that certain splicing variants can be differentially accumulated throughout nodule developmental stages. For examples, among the 142 DUEs common to the

12- and 22-d-old nodules, 7 showed opposite regulation in these two developmental stages in comparison with the corresponding root tissues. An obvious example is the exon 2 of

Glyma.14G145700, encoding a GATA binding transcription factor. The relative abundance of

155

this exon was increased two-fold in the 12-d-old nodules but it then showed more than two-fold decrease in the 22-d-old nodules despite the abundance of the mRNA not being significantly altered between the two nodule stages (Data S9 and 10, published paper). Similarly, of the 96

DUEs common to the 12- and 36-d-old nodules 12 were inversely regulated in these two stages.

A representative example is exon 4 of Glyma.01G080200, encoding a NAC domain transcription factor, whose abundance was reduced by ~ 2 fold in the 12-d-old nodules relative to root tissues but was increased in the 36-d-old nodules (Data S9 and 11, published paper). Another interesting example is the splicing events of exon 1 and 2 of the Glyma.16G170100 gene encoding an extensin-like protein. At 12 dpi, the abundance of exon 1 was increased but exon 2 was decreased. However, at 36 dpi this pattern was inverted with the abundance of exon 1 being decreased and exon 2 being increased (Data S9 and 11, published paper).

Of note is that the large majority (68) of the 70 DUEs during all the three nodule developmental stages are similarly spliced (Figure 5-11h), suggesting that inclusion or exclusion of these exons may be of fundamental importance for nodule development. The remaining two exons of the 70

DUEs showed a splicing pattern in the 12-d-old nodules that is opposite to that found in the 22- and 36-d-old nodules. Our analysis also revealed that 26 IR events were oppositely regulated between at least two nodule developmental stages. Together, these results indicate that certain splicing variants differentially accumulate during nodule development, yielding stage-specific transcripts.

2.11 Transcriptional activity, gene size, exon number, and intron length impact alternative splicing in the developing nodules

We examined whether gene expression impacts alternative splicing events in the nodules. To this end, we compared the lists of DEGs and DSGs identified at each nodule developmental stage for

156

potential overlaps. As shown in Figure 14a-c, 32.5%, 59.1% and 55.2% of the DSGs were also differentially expressed in the 12-, 22-, and 36-d-old nodules, implying that gene transcription may contribute to alternative splicing in the developing nodules. Comparing the abundance of differentially spliced exons and differentially retained introns with the expression levels of the corresponding genes revealed multidirectional associations: highly abundant exons/introns were associated with either up-regulation or down-regulation of the corresponding genes and vice versa (Figure 5-14d-f). This finding suggests that gene transcription and alternative splicing may impact each other in a reciprocal fashion.

We next examined the potential impact of gene size, exon number, and intron length on the frequency of alternative splicing at various nodule developmental stages. We found that DSGs with 3 or more splicing events are larger, and contain more exons and longer introns than those

DSGs with 1 or 2 splicing events (Figure 5-14g-i). This trend was consistent across the three nodule stages, suggesting that large genes and those with high numbers of exons and long introns tend to be alternatively spliced to a high degree.

2.12 Alternative splicing impacts protein domains organization

To provide further insights into the functional significance of alternative splicing during nodule development, the amino acid sequences of the DUEs were retrieved and used to identify annotated protein domains and families using Pfam database. A collection of functionally annotated protein domains was generated (Figure 5-15a, and Data S12, published paper). Among the most predominant domains in the DUEs we found the glycoside hydrolase family 1 domain, peroxidase superfamily domain, and the catalytic domain of the E2 ubiquitin-conjugating enzyme. Domains associated with defense responses, including protein kinase, leucine-rich repeat, NB-ARC, Toll/interleukin-1 receptor (TIR), lipoxygenase, xylanase inhibitor, LURP-

157

one-related (LOR) superfamily, and Mlo were also detected. Additionally, numerous DNA binding domains such as WRKY, B3, Myb-like, no apical meristem (NAM), homeodomain, and

AP2, were identified (Figure 5-15a and Data S12, published paper). Furthermore, domains involved in mRNA splicing were recognized. This included a number of RNA recognition motifs, cyclophilin, DEAD/DEAH box helicase, and pentatricopeptide repeat (PPR) (Figure 5-

15a and Data S12, published paper). Also, several protein-protein interaction domains were identified, including regulator SNARE, WD40, and Tetratricopeptide repeat-like domain (TPR) for instance. Together, these data suggest that differential usage of exons can alter organization of protein domains and likely influence protein-protein interaction networks.

We used qPCR to confirm differential usage of exons encoding DNA binding domains of three transcription factors. The expression levels of Glyma.13G227100 and Glyma.04G257000 isoforms containing the AP2 and B3 domains, respectively, were more abundant in the three nodule developmental stages as compared with the corresponding roots (Figure 5-16a,b). In contrast, the Glyma.09G199800 isoform containing the AP2 domain was significantly less abundant in the 22- and 36-d nodules as compared with roots (Figure 5-16c).

2.13 Alternative splicing impacts the targeting efficiency of miRNAs

We next examined whether alternative usage of exons and introns impacts the ability of miRNA to recognize its target genes, thereby impacting mRNA stability and translation. To this end, we retrieved the nucleotide sequences of differentially spliced exons and introns and searched for putative miRNA binding sites using a penalty score cutoff of 3.0 for miRNA/target pairing. We identified 205 putative binding sites (in 182 unique genes) for 119 miRNA genes (Data 5-S13).

Remarkably, 19 putative binding sites in 18 different genes were identified for gma-miR1533

(Figure 5-15b). Similarly, several target sites were predicted for gma-miR10415, gma-miR9724,

158

gma-miR1520, and gma-miR396 (Figure 5-15b), suggesting that alternative splicing may eventually modulate the targeting efficiency of these miRNAs during nodule development. Our analysis also suggested that alternative splicing modulating miRNA-targeting efficiency may impact nodule transcriptomes as several differentially spliced transcription factors were found to contain predicted or confirmed miRNA binding sites. For example, exon 5 of the bZIP transcription factor Glyma.13G260300 is significantly less used in the 22-d-old nodules and contains a near perfect complementary site for gma-miR4378 (Figure 5-15c), thereby making this splice variant less accessible to miRNA regulation. In contrast, exon 2 of the gene encoding

ETHYLENE RESPONSIVE ELEMENT BINDING FACTOR5 (ATERF5, Glyma.01G206700) is significantly highly used in the 12-d-old nodules and contains a near perfect complementary site for gma-miR9752 (Figure 5-15d), thereby subjecting the mRNA of this transcription factor to fine post-transcriptional regulation by gma-miR9752.

2.14 Nodule-specific alternatively spliced genes

We next identified a list of genes that underwent alternative splicing only in the nodules by comparing our list of DSGs with those experienced alternative splicing in various soybean tissues and organs (Iniguez et al., 2017). Out of the 2,323 nodule DSGs, 516 were identified as nodule-specific alternatively spliced genes (Data 5-S14). Among the nodule-specific alternatively spliced genes we found several nodulins, leghemoglobin-related, nodule-specific sucrose transporters, SNARE-like. Genes with possible regulatory roles in nodule organogenesis were also identified, including several MAP kinases and transcription factors of the WRKY and

MYB families. Several genes involved in oxidation reduction were also identified among the nodule-specific alternatively spliced genes.

159

2.15 Hallmarks of nodule methylomes

Our RNA-seq data revealed that key genes involved in DNA methylation (MET1, DRM2, DRM3,

CMT1, CMT3, DDM1, and VIM1) and active demethylation (ROS1, and DME) are significantly differentially expressed between nodules and root tissues (Figure 5-17), suggesting active reprogramming of DNA methylation in the developing nodules. Therefore, we used bisulphite sequencing to profile genome-wide DNA methylation at a single base resolution in the three nodule developmental stages and the corresponding root tissues (Table S2, published paper) using the same tissues utilized for RNA-seq library preparation. Analysis of global DNA methylation over protein coding genes revealed higher levels of cytosine methylation in the three nodule developmental stages compared with the corresponding root tissues in the CG, CHG, and

CHH sequence contexts (Figure 5-18a-c). In contrast, global DNA methylation over TEs revealed differences between nodules and roots only in the CHH context (Figure 5-18d-f), with contrasting patterns between the 12-d-old nodules and that of the 22- and 36-d-old nodules.

Though the 12-d -old nodules showed reduced global CHH-methylation compared with the corresponding root tissues (Figure 5-18d), the 22- and 36-d-old nodules showed increased CHH- methylation (Figure 5-18e,f).

To further characterize the difference in DNA methylation in more detail, we identified differentially methylated regions (DMRs) between nodules and root tissues in all sequence contexts using a false discovery rate (FDR) less than 0.01 and a minimum methylation difference of 50%. Using these stringent criteria, we identified 28,405, 39,920, and 47,885 DMRs in the nodule at 12, 22 and 36 dpi compared with roots (Figure 5-18g). About 80% of these DMRs mapped to unannotated genomic regions (Figure 5-18g), and the remaining 20% mapped to TEs and protein-coding genes. In the 12-d-old nodule the number of hypo-DMRs was higher than

160

hyper-DMRs, whereas at the later developmental stages the opposite trend was observed with hyper-DMRs being higher than hypo-DMRs (Figure 5-18h), suggesting that DNA methylation increases with nodule age. Hierarchical clustering of the DMRs revealed that the large majority of the DMRs are unique to a single nodule developmental stage and that only small portions are common to two or the three stages (Figure 5-18i), suggesting that nodule methylomes undergo dynamic changes in a temporal and stage-specific mode. Examining the sequence contexts and methylation directions of the DMRs revealed striking features of the nodule methylomes. We found that DMRs in the CHG and CHH contexts are more abundant than CG-DMRs (Figure 5-

18j). Additionally, CHG-DMRs were predominantly hypomethylated, whereas CHH-DMRs were predominantly hypermethylated (Figure 5-18j), suggesting that hyper- and hypo- methylation in the nodules tend to occur in a sequence-specific fashion. Mapping the DMRs to annotated protein-coding genes and TEs revealed other characteristics of nodule methylomes.

There are 2,425, 2,831, and 2,589 DMRs overlapping with protein-coding genes (Data S15-17, published paper), and 4,100, 6,306, 5,456 DMRs overlapping with TEs in the nodules at 12, 22, and 36 dpi, respectively (Data S18-20, published paper). The number and percentage of CHH-

DMRs overlapping with gene promoter was increased about three folds between the 12- and 36- d-old nodules, whereas those associated with gene body regions were reduced (Figure 5-18k), suggesting that preferential CHH methylation sites change with a progression in nodule development. TE-associated DMRs overlapped mainly with the long terminal repeat retrotransposons Gypsy and Copia, as well as the DNA transposons Mutator, and this overlap was consistent across the three sequence contexts (Figure 5-18l). We also observed that differential DNA methylation occurs preferentially in the TEs located nearby genes (Figure 5-

161

19), suggesting the DNA methylation of TEs may impact the transcript abundance of proximate genes.

2.16 Impact of DNA methylation on nodule transcriptomes

Comparing the DEGs with those associated with DMRs at each nodule developmental stage revealed that more than one-third of the DMR-overlapping genes were significantly differentially expressed, resulting in a unique set of 1,864 differentially expressed/differentially methylated genes (Figure 5-20a). To further examine the association between DNA methylation and gene expression, we associated hyper- and hypo-DMRs in the gene body and flanking regions

(promoter and untranslated regions) with gene expression levels at the three nodule developmental stages. We found that gene body hypo-DMRs were associated with higher gene expression levels as compared with gene body hyper-DMRs (Figure 5-20b-d). This association was consistent across the CG, CHG, and CHH sequence contexts and the three nodule developmental stages. Similarly, hypo-DMRs in all sequence contexts located in the flanking regions were generally associated with higher gene expression levels as compared with hyper-

DMRs (Figure 5-20e-g) across the three nodule stages except CHH-DMRs at 36 dpi (Figure 5-

20g). Thus, differential DNA methylation in gene body and flanking regions appears to impact gene transcription levels in the developing nodules. Because hyper- and hypomethylation occurred favorably in TEs adjacent to genes (Figure 5-19), we examined the association between

TE methylation and transcript abundance of the closet genes within 5 kb. We found a trend of increased transcript abundance of genes located within 5 kb of hypermethylated TEs compared with those genes associated with hypomethylated TEs. This trend was consistent with TEs located downstream or upstream of nearby genes (Figure 5-20h,i), suggesting that DNA methylation of TEs located within or nearby genes may suppress TE-originated aberrant

162

transcripts that interfere with active gene expression. GO term enrichment analysis indicated that the 1864 differentially expressed/differentially methylated genes are enriched for genes involved in jasmonic acid biosynthetic process, defense responses, signal transductions, DNA metabolic process, and beta-glucan biosynthetic process (Figure 5-21).

3. Discussion

In this study, we present comprehensive analyses of gene expression, alternative splicing, and

DNA methylation changes in the developing soybean nodules. Our analysis revealed extensive interaction between gene transcription and DNA methylation as well as widespread alternative splicing events, which were previously unknown. The transcriptome data clearly uncovered key transcription patterns of nodule development that included a set of 9,669 core genes and 7,302 stage-specific genes. While 16,832 of the previously reported nodule DEGs (Brechenimacher et al., 2008, Hayashi et al., 2008, Yuan et al., 2016, Yuan et al., 2017) were identified in the current analysis, the versatility of our RNA-seq analysis expanded the nodule DEGs with more than

9,000 genes. The core- and stage-specific expression patterns were associated with a significant number of up- and downregulated TFs belonging to the NAC, AP2, bZIP, ERF, C2H2, bHLH,

MYB, WRKY, ZF, and GRAS families, which previously have been shown to play key regulatory roles during nodulation (Godiard et al., 2011, Battaglia et al., 2014, Kang et al., 2014,

Soyano and Hayashi, 2014). The identification of a large number of transcription factors (1,409) among the core (847) and stage-specific DEGs (562 genes) not only expands the list of previously reported nodule transcription factors several fold but also sheds light on a possible cross regulation and combinatorial mode of action.

Of note is that members of several transcription factor families showed both up- and downregulation at a single nodule developmental stage, a finding that demonstrates the

163

complexity of gene regulation during nodule development. While some of these TFs may be bifunctional acting as activators and repressors, their functions might provide fine control over the expression of their direct targets. DNA methylation of gene body and regulatory regions has been proposed to play a role in fine-tuning gene expression (Regulski et al., 2013, Takuno and

Gaut, 2013, Rambani et al., 2015, Hewezi et al., 2017). Our analysis indicates that DNA methylation may directly contribute to the regulatory functions of these TFs. Of the 1,409 differentially expressed TFs, 119 were differentially methylated in the promoter or gene body regions. In addition, DNA methylation in the promoter and untranslated regions (UTRs) may interfere with the binding capacity of the TFs to regulate the transcription of their target genes, thereby modulating the gene expression network. This hypothesis is supported by our finding that 2091 gene promoters were identified as differentially methylated. Our data also demonstrate that alternative splicing can modulate transcription factor activity during nodule development as

97 of the differentially expressed TFs were also subjected to differential exon usage and/or IR events. For example, exon 2 and 3 of Glyma.09G199800 encode an AP2 DNA binding domain, and hence differential usage of these exons during the three nodule stages (Supplemental Figure

9) is expected to impact the ability of this TF to bind to its target genes. Furthermore, differential usage of exons can also modulate TF association with other proteins as in the case of the HLH domain located in exon 1 and 2 of the bHLH TF Glyma.12G114300. The HLH is a protein- protein interaction domain involved in homo- and heterodimerization of bHLH proteins (Toledo-

Ortiz et al., 2003, Jin et al., 2011). Thus, low usage of exon 1 observed in the 12-d-old nodule may impact the ability of this TF to form homo- and heterodimer combinations with other bHLH factors, thereby generating tissue-restricted expression.

164

Metabolites exchange between the plant and bacterioids involves a wide range of highly integrated transporters whose expression is regulated in a temporal and spatial manner (Udvardi and Poole, 2013, Mus et al., 2016). One notable finding of our transcriptome analysis is that several hundreds of genes encoding transport functions were identified among the core and stage-specific genes. DNA methylation and alternative splicing seem to contribute to the regulation and functional specificity of the nodule transport system as more than 80 genes encoding transport functions were identified as differentially methylated or differentially spliced.

Interestingly, several of the transporter-encoding genes produced splicing variants exclusive to the nodules.

Beyond massive gene expression changes, nodule development involves remarkable alternative splicing activity particularly during the 12-d-old nodule stage. This finding supports previous reports on the prevalence of alternative splicing in rapidly dividing cells and differentiating organs (Shen et al., 2014, Thatcher et al., 2014). We identified a total of 2,323 genes that underwent alternative splicing events in at least one nodule developmental stage. Alternative splicing in the nodules seems to be a tightly regulated process as many splicing-related genes are themselves alternatively spliced, including several splicing factors and RNA binding proteins,

Our findings are in agreement with several recent reports showing that essential components of the spliceosome undergo alternative splicing and may control the splicing of their own transcripts in response to various developmental and stress signals (Filichkin et al., 2010, Chang et al., 2014, Shen et al., 2014, Thatcher et al., 2016, Calixto et al., 2018, Dong et al., 2018). In addition, genes involved in the nonsense-mediated mRNA decay (NMD) surveillance system also experienced alternative splicing. This included the eukaryotic release factor 1

(Glyma.10G170200), and the REGULATOR OF NONSENSE TRANSCRIPTS-LIKE PROTEIN

165

UPF3 (Glyma.04G257000), which function in NMD (Merai et al., 2013, Degtiar et al., 2015,

Szadeczky-Kardoss et al., 2018).

Our analysis also demonstrated that 516 genes undergo alternative splicing uniquely in the nodules and encode diverse functions related to nodule organogenesis and development. Since the large majority (>90%) of these genes are expressed in other tissues, alternative splicing of these genes may enhance their functional capacity in the nodules. Notable examples of these genes are those encoding WRKY and MYB transcription factors, which represent two of the largest and functionally diverse families of transcription factors in plants. Alternative splicing of

WRKY and MYB genes have shown to modulate their functional activity (Li et al., 2006, Peng et al., 2008, Zhao and Beers, 2013). Thus, it is tempting to speculate that alternative splicing of these transcription factors may give rise to variants with altered or nodule-specific regulatory roles. In agreement with this assumption, alternative splicing of a heat-shock transcription factor in Medicago sativa yield a nodule-specific transcript (He et al., 2008). Consistent with the role of reversible protein phosphorylation in the regulation of splicing factors and regulatory protein assembles (Stamm, 2008), we identified several genes involved in reversible protein phosphorylation, including phosphatases, serine/threonine-protein kinases, LRR receptor-like kinases, and MAP kinases, among the nodule-specific alternatively spliced genes. This finding points into a specific role of these kinases in controlling alternative splicing events in the nodules.

Further, our analyses suggest that the interaction between gene expression and alternative splicing is multifaceted and involves a variety of regulation. Alternative splicing may shape nodule transcriptome through the control of RNA biogenesis as GO terms related to gene expression, translation, structural constituent of ribosome, and ribonucleoprotein complex were

166

the most significantly enriched among the DSGs. Suppression of IR events could also shape the nodule transcriptome. It is well known that IR events frequently generate transcripts with premature termination codons that are selectively degraded through NMD (Chang et al., 2007,

Isken and Maquat, 2007, Kalyna et al., 2012, Drechsel et al., 2013). Interestingly, our data revealed that 77.8% (2,885/3,708) of the identified IR events were suppressed in the nodules as compared with the control root tissues. This would likely to reduce the abundance of nonfunctional transcripts and enhance the translation efficiency of productive transcripts. This is line with our finding that more than 99% (3,688/3696) of the IR events discovered in our analysis generate premature termination codons. Of note is that almost all of the IR events detected in several members of the MYB family TFs were suppressed in the 12-d-old nodules, a finding that may suggest that the magnitude of functional transcripts of these family members is of great importance for this nodule developmental stage. Thus, control of RNA biogenesis and processing, and suppression of IR events may provide the nitrogen-fixing nodules with a potent mechanism for adjusting gene expression levels taken into consideration that modulating the levels of functional and translatable mRNA can profoundly impact gene expression.

Unlike recent studies reporting on the absence of substantial association between gene expression and alternative splicing (Li et al., 2013, Chang et al., 2014, Hartmann et al., 2016,

Dong et al., 2018), our analyses suggest that both processes may be coordinated and functionally linked as about 40% of the DSGs were differentially expressed at the same nodule developmental stage. The association between the abundance of differentially spliced events and both of up- and downregulation of the corresponding genes highlights the complexity of this process. These multidirectional associations can be explained by the possibility that in some cases splicing could occur concurrently with gene transcription (Kornblihtt et al., 2004), leading

167

to positive linkage between gene expression and alternative splicing events. In other cases, splicing could be initiated after completion of gene transcription, and then degradation of nonfunctional transcripts through NMD could lead to gene downregulation. In addition, our finding that more than two-thirds (1,574 of 2,323) of the DSGs exhibited multi-splicing events, many of which are in opposite directions, not only highlights the complexity of this process but also precluded our effort to generalize such association between gene expression and alternative splicing. The comparison between DEGs and DSGs also implied that changes in protein abundance in the developing nodule could occur without changes in gene expression because

60% of the DSGs showed no changes in expression between nodules and root tissues.

We demonstrated that differential usage of exons can generate structural and functional rearrangements of protein domains that potentially impact gene expression and function. For example, alternatively spliced transcripts encoding TFs could lose their DNA-binding domains, therefore abrogating their ability to regulate expression of their targets. Additionally, stage- specific splicing events may contribute to stage-specific function. For instance, the genes encoding the NAM domain TFs Glyma.11G096600, Glyma.14G152700, and Glyma.12G186900 have the potential to lose their DNA-binding domains at one or more nodule developmental stages, resulting in stage-specific function. Alternative splicing can also impact gene expression abundance via generating miRNA-sensitive or -insensitive transcripts (Yang et al., 2012,

Thatcher et al., 2014). In this context, we found that 182 nodule alternatively spliced genes have the potential to lose or gain miRNA target sites that are anticipated to alter the targeting efficiency of 119 miRNAs. The fact that many of these 119 miRNAs were reported to be expressed in soybean nodules (Wang et al., 2009, Joshi et al., 2010, Yan et al., 2015), implies that splicing-modulated miRNA targeting efficiency could be of biological significance. In

168

accordance with this suggestion, the differentially spliced transcripts included binding sites for miR1512, miR172, and miR167, which were reported to play key role in soybean nodule formation and development (Li et al., 2010, Thatcher et al., 2014, Wang et al., 2015b).

The mechanism underlying alternative splicing is partially elucidated, and cis- and trans-splicing factors seem to play crucial roles in regulating this process (Reddy et al., 2013). Differences in splicing activity between the 12-d-old nodules and later stages are likely related to the significant differences in the expression of considerable number of cis- and trans-splicing factors (Data 5-

S21), a finding in agreement with previous reports (Thatcher et al., 2014, Calixto et al., 2018).

Given recent studies indicating that DNA methylation may contribute to the regulation of alternative splicing in maize and rice (Regulski et al., 2013, Wang et al., 2016), we examined this possibility by comparing the DSGs and gene-body DMGs at each developmental stage.

Interestingly, 159 unique genes common to these sets were identified, indicating that DNA methylation may contribute to the regulation of alternative splicing in the developing nodules.

Remarkably, the large majority (89.3%) of the gene-body DMRs overlapping with the 159 DSGs was hyper-methylated, suggesting that gain but not loss of DNA methylation may influence alternative splicing efficiency. Our finding also implies that the impact of DNA methylation on alternative splicing efficiency is independent of DNA methylation sequence context as gene- body methylation of these 159 DSGs was found in CG (28.3%), CHG (34.4%) and CHH

(37.3%) contexts. The mechanisms through which DNA methylation regulates alternative splicing remain largely unknown but it may involve methyl-CpG-binding domain (MBD) proteins and homologues of Heterochromatin Protein1 (HP1) (Maor et al., 2015). MBD proteins recognize methylated DNA in various sequence contexts and recruit repressor complexes (Grafi et al., 2007) that may slow Pol II elongation rate leading to exon or intron inclusion on the

169

mature mRNA. Plant Like HP1 (LHP1) binds to the repressive epigenetic mark H3K27me3

(Turck et al., 2007), which is associated with body DNA methylation (Mathieu et al., 2005, Zhou et al., 2016) and, analogous to animal HP1, it may recruit splicing factors to the pre-mRNA

(Maor et al., 2015). In support of this hypothesis, the soybean homolog of Arabidopsis LHP1

(Glyma.03G094200) was identified among the core upregulated genes. Although the extent to which gene body methylation impacts alternative splicing needs to be investigated further, other epigenetic marks and chromatin assembly may also play crucial role in the regulation of alternative splicing.

In our system both DNA hyper- and hypo-methylation establish the methylome of soybean nodules, significantly impacting the expression of more than 1,800 genes. The patterns of hyper- and hypo-methylation seem be stage-specific and occur in a sequence-specific manner. Our finding that CHH-DMRs were mostly hypermethylated is in agreement with a recent study showing that hyper CHH-DMRs were predominant in nodule samples of Medicago truncatula

(Satge et al., 2016), suggesting that common components of the epigenetic mechanisms controlling determinant and indeterminate nodules have been equally recruited during evolution.

The prevalence of hyper CHH-DMRs in soybean nodules seems to be associated with increased expression of DRM2 and its paralog DRM3 throughout the three nodule developmental stages.

The involvement of SUVH histone methyltransferases, which regulate CHH methylation independent of siRNAs (Stroud et al., 2013, Li et al., 2018), could be also responsible for the hyper CHH-DMRs as homologs of SUVH6 (Glyma.01G202700 and Glyma.11G040100) were identified among the upregulated nodule core genes. We also observed that the number of CHH-

DMRs increased noticeably from early to late nodule developmental stages, suggesting a role of

CHH methylation in nodule stage transition. Similarly, the level of CHH methylation was

170

dramatically increased from early to late stages of seed development (Bouyer et al., 2017,

Kawakatsu et al., 2017). Another striking feature of the nodule methylome is the massive loss of

DNA methylation at CHG sites. More than three-fourths of CHG-DMRs were hypomethylated, which could point to a mechanism for counterbalancing the elevated level of CHH methylation despite the fact that the overlap between hypo-CHG DMRs and hyper-CHH DMRs is very negligible. Our gene expression analysis suggests that hypomethylation of CHG sites is mediated through active and passive mode of actions. Though increased expression of REPRESSOR OF

SILENCING 1 (ROS1) and DEMETER (DME) during the 12- and 22-day-old nodules might be responsible for CHH hypomethylation, absence of CMT3 expression during the latest stage may have contributed to passive demethylation of CHH sites.

In this study, elucidating the interrelationship between gene expression, alternative splicing and epigenetic modifications has shed light on possible mechanism of regulation underlying cell differentiation and reprogramming during nodule development. Our analysis revealed dynamic changes in gene expression, reprogramming of DNA methylation, and widespread alternative splicing events during nodule development in soybean. Our results indicate that activation of exon skipping and repression of IR events are important features of the nodule spliceome. Our results also establish CHH hypermethylation and CHG hypomethylation as key determinants of the nodule methylome. Together the data presented in this study can be explored to further address the relevance of alternative splicing variants and different epigenetic statuses to nodule development and function.

171

4. Material and methods

4.1 Bacterial strain and growth conditions

Starter cultures of Bradyrhizobium diazoefficiens USDA110 were grown in 30 mL of yeast extract-mannitol (YEM) broth containing spectinomycin (100 µg/mL) on a rotary shaker (150 rpm) for 3 days at 30° C. This culture was used to inoculate 1 L of antibiotic-free YEM broth in a Fernbach flask and grown for an additional 3 days.

4.2 Plant growth conditions and tissue collection

Soybean seeds (Glycine max (L.) Merr. cv Williams 82), were obtained from Missouri

Foundation Seed Stocks, Columbia, MO. The seeds were surface-sterilized in 50% bleach (2.5%

NaClO) and about 200 seeds were planted in large plastic tray (50 x 28 x 15 cm deep) with drainage holes, filled with vermiculite. Each tray was watered at planting with 4 L of sterile water and the seeds allowed to germinate in dark for 3 days. B. diazoefficiens USDA110 cultures diluted in YEM broth to give a final concentration of 4.5 x 106 cells/ml were poured over each tray. Following inoculation, the trays were transferred to a growth chamber that was maintained at a constant temperature of 28° C with a light intensity of 500 mol of photons per square meter per second, and a 12-h light period. Similarly, trays of soybean seeds that were not inoculated with B. diazoefficiens USDA110 were also grown under the same conditions. Soybean plants were watered alternatively with sterile water and Jensen's Nitrogen-free solution as needed.

Roots and nodules harvested at 12, 22, and 36 dpi were immediately frozen in liquid nitrogen and stored at -80° C.

172

4.3 Light and electron microscopy

Freshly harvested nodules at 12, 22, and 36 dpi were immediately fixed either in FAA or 2.5% glutaraldehyde and processed for light and electron microscopy as described earlier (Krishnan,

2002).

4.4 Acetylene reduction assay

Nitrogen fixation activity of soybean nodules was estimated using the acetylene reduction method (Schwinghamer et al., 1970).

4.5 RNA-seq library preparation and analysis

Total RNA of each sample was extracted from 100 mg frozen ground plant tissue following the method described by Verwoerd et al. (1989). Total RNA was treated with DNase I (Invitrogen).

RNAseq libraries were prepared using NEBNext Ultra RNA Library Prep Kit (New England

Biolabs) following the manufacturer’s instructions. The nine libraries of nodules (3-time points

X 3 biological replicates), and the nine libraries of corresponding root tissues were multiplexed and sequenced in two different lanes using the HiSeq2500 system with 150-bp pair-end reads.

Quality assessment of sequenced reads was done using FastQC version 0.11.4. After removing the low quality reads and adapters with Trimmomatic 0.35 (Bolger et al., 2014), the remaining high-quality reads were mapped to the latest soybean reference genome assembly using TopHat v2.0.14 (Trapnell et al., 2009). Reads mapped to the soybean genome were counted using HTSeq

(Anders et al., 2015). 2015). Differentially expressed genes (DEGs) were identified using

DESeq2 package (Love et al., 2014). A gene was considered differentially expressed if the log2FC was ≥1 using a false discovery rate of less than 1%. Analysis of differential used exons and splice junctions was performed using R package JunctionSeq (Hartley and Mullikin, 2016).

173

JunctionSeq pipeline uses the QoRTs quality-control/data-processing software package (Hartley and Mullikin, 2015) for quality control and read counts. Reads mapped to each non-overlapping exonic regions and splice junctions belonging to each individual gene were counted. Similarly, reads mapped to novel splice junctions that are within the gene's span with a minimum of six reads per biological replicate were also counted. These read counts were used to compute differential usage of each exonic region or splice junction using the DESeq2 package (Love et al., 2014) with a false discovery rate less than 5%.

For principle component analysis (PCA) of nodule core genes in different soybean tissues, read counts for the 9,669 core genes were extracted from publicly available RNA-seq experiments summarized in Table 5-S3. PCA were generated using plotPCA function in DESeq2 (Love et al.,

2014).

4.6 MethylC- Seq library preparation and analysis

DNA was extracted using DNAeasy Plant Mini Kit (Qiagen) following the manufacturer’s instructions. DNA was then treated with bisulfite using Zymo EZ DNA Methylation kit (Zymo

Research). The bisulfite-treated DNA samples were used to prepare methylC-seq libraries. The libraries were prepared using TruSeq DNA methylation Kit (Illumina) as per manufacturer’s instructions. The nine libraries of nodules (3 time points X 3 biological replicates), and the nine libraries of corresponding root tissues were multiplexed and sequenced in two different lanes using the HiSeq2500 system with 150-bp pair-end reads.

Quality assessment of sequenced reads was done using FastQC version 0.11.4. After removing the low quality reads and adapters with Trimmomatic 0.35 (Bolger et al., 2014), the remaining high-quality reads were mapped to the latest soybean reference genome version (Wm82.a2.v1)

174

using Bismark (Krueger and Andrews, 2011). The mapped reads were used to identify differentially methylated cytosines in each of 18 samples separately using the methylKit package

(Akalin et al., 2012). Cytosines were considered only if covered by at least 10 high quality reads.

Three methylation call files corresponding to the CG, CHG, and CHH sequence contexts were generated for each treatment. Then, the soybean genome was divided into 200-bp non- overlapping windows and used to identify differentially hyper- and hypo-methylated regions

(200 bp pins) in nodule samples compared with the corresponding root samples using methylKit package (Akalin et al., 2012) with a minimum methylation difference of 50% in the CG, CHG and CHH sequence contexts using a false discovery rate of less than 1%. Differentially methylated regions (DMRs) overlapping with various annotated features of protein-coding genes and transposable elements (TEs) were determined and classified. Genomic coordinates of TEs in the new assembly of soybean reference genome (Wm82.a2.v1) (Rambani et al., 2020) were used to map DMRs to various TE families.

4.7 Quantitative real-time RT-PCR analysis

Total RNA was isolated from 20 mg tissue as explained in Verwoerd et al. (1989). Total RNA was treated with DNase I (Invitrogen) and approximately 25 ng RNA was used in a qPCR reaction. The qPCR was performed using Verso SYBR green One-Step qRT-PCR Rox mix

(Thermo Scientific) according to the manufacturer’s protocol. The PCR cycle comprised of 1 cycle of 50°C for 15 minutes and 95°C for 15 minutes for cDNA synthesis and thermo-start activation followed by 40 cycles of 95°C for 15 s, 60°C for 30 s and 72°C for 30 s. The dissociation curves were created using the following program: 95°C for 15 s and 60°C for 75 s, followed by a slow gradient from 60°C to 95°C. E3 ubiquitin ligase was used as internal controls to normalize mRNA level. The primer sequences used in this study are presented in Table 5-1.

175

4.8 MicroRNA target prediction

The putative targets of the miRNAs were predicted using psRNATarget (Dai et al., 2018). A stringent penalty score cutoff of 3.0 or lower was used as threshold for predicting the miRNA targets.

4.9 Identification of protein domains

Conserved protein domains were identified by submitting protein sequence to batch web CD- search tool (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) using E-value of 0.01.

4.10 GO term and KEGG analysis

GO terms enrichment analysis was performed using agriGO (Du et al., 2010) with Fisher’s exact test and Bonferroni multi-test adjustment with a significance cutoff p-value of 0.05. KEGG analysis was performed using KOBAS and pathways with adjusted p-value of <0.05 was considered statistically significant (Xie et al., 2011).

4.11 Promoter GUS experiment

For promoter-GUS analysis, we amplified around a two Kilobase promoter DNA fragment using the primers listed in Table 5-1 and cloned into Gateway entry clone (GEC) CD3-1840 (Wang et al., 2013b) above the GUS gene to generate a translational GUS construct driven by promoter of the gene of interest. To create the destination vector, the polyA signal of Octopine synthase from the pEarleyGate 301 (Earley et al., 2006) vector was cloned into pKGW_RedRoot using SpeI and HindIII restriction sites (Karimi et al., 2002) and hence named pKGW_RedRoot_OCSA.

The promoter-GUS constructs from the entry vector were transferred to the destination vector by

LR reaction and were confirmed by sequencing at UT genome core sequencing facility. The

176

sequence-confirmed constructs were transformed to Agrobacterium rhizogenes strain K599; and transient transgenic soybean roots were generated by hairy root transformation technique

(Kereszt et al., 2007). Transgenic hairy roots were screened based on the expression of DsRed under a fluorescent stereomicroscope and non-transgenic roots were excised from the plants.

Plants with transgenic roots were inoculated with B. diazoefficiens USDA110, transferred into the pot with vermiculite and watered twice a week with the nitrogen-free solution. The roots were then excised from the plants at stated time points and stained for GUS expression. GUS staining was performed as described in Jefferson et al. (1987) and images were captured using a stereo-microscope fitted with a camera. The images were further processed using ImageJ and

Adobe Photoshop.

177

References

Akalin A, Kormaksson M, Li S, Garrett-Bakelman FE, Figueroa ME, Melnick A, Mason, CE (2012) methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biology, 13:R87. Anders S, Pyl PT, Huber W (2015) HTSeq-a Python framework to work with high-throughput sequencing data. Bioinformatics, 31:166-169. Battaglia M, Ripodas C, Clua J, Baudin M, Aguilar OM, Niebel A, Zanetti ME, Blanco FA (2014) A nuclear factor Y interacting protein of the GRAS family is required for nodule organogenesis, infection thread progression, and lateral root growth. Plant Physiology, 164:1430-1442. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30: 2114-2120. Bouyer D, Kramdi A, Kassam M, Heese M, Schnittger A, Roudier F, Colot V (2017) DNA methylation dynamics during early plant life. Genome Biology, 18: 179. Brechenimacher L, Kim MY, Benitez M, Li M, Joshi T, Calla B, Lee MP, Libault M, Vodkin LO, Xu D, Lee SH, Clough SJ, Stacey G (2008) Transcription profiling of soybean nodulation by Bradyrhizobium japonicum. Molecular Plant Microbe Interaction, 21: 631-645. Cabeza RA, Liese R, Lingner A, von Stieglitz I, Neumann J, Salinas-Riester G, Pommerenke C, Dittert K, Schulze J (2014) RNA-seq transcriptome profiling reveals that Medicago truncatula nodules acclimate N 2 fixation before emerging P deficiency reaches the nodules. Journal of Experimental Botany, 65: 6035-6048. Calixto CPG, Guo WB, James AB, Tzioutziou NA, Entizne JC, Panter PE, Knight H, Nimmo HG, Zhang RX, Brown JWS (2018) Rapid and dynamic alternative splicing impacts the Arabidopsis cold response transcriptome. Plant Cell, 30: 1424-1444. Chang CY, Lin WD, Tu SL (2014) Genome-wide analysis of heat-sensitive alternative splicing in Physcomitrella patens. Plant Physiology, 165: 826-840. Chang YF, Imam JS, Wilkinson ME (2007) The nonsense-mediated decay RNA surveillance pathway. Annual Review of Biochemistry, 76: 51-74.

178

Combier JP, de Billy F, Gamas P, Niebel A, Rivas S (2008a) Trans-regulation of the expression of the transcription factor MtHAP2-1 by a uORF controls root nodule development. Gene and Develepment, 22: 1549-1559. Combier JP, Kuster H, Journet EP, Hohnjec N, Gamas P, Niebel A (2008b) Evidence for the involvement in nodulation of the two small putative regulatory peptide-encoding genes MtRALFL1 and MtDVL1. Molecular Plant Microbe Interaction, 21: 1118-1127. Dai X, Zhuang Z, Zhao PX (2018) psRNATarget: a plant small RNA target analysis server (2017 release). Nucleic Acids Research, 46: W49-W54. de Bang TC, Lundquist PK, Dai X, Boschiero C, Zhuang Z, Pant P, Torres-Jerez I, Roy S, Nogales J, Veerappan V, Dickstein R, Udvardi MK, Zhao PX, Scheible WR (2017) Genome-wide identification of medicago peptides involved in macronutrient responses and nodulation. Plant Physiology, 175: 1669-1689. Degtiar E, Fridman A, Gottlieb D, Vexler K, Berezin I, Farhi R, Golani L, Shaul O (2015) The feedback control of UPF3 is crucial for RNA surveillance in plants. Nucleic Acids Research, 43: 4219-4235. Djordjevic MA, Mohd-Radzman NA, Imin N (2015) Small-peptide signals that control root nodule number, development, and symbiosis. Journal Experimental Botany, 66: 5171- 5181. Dong C, He F, Berkowitz O, Liu J, Cao P, Tang M, Shi H, Wang W, Li Q, Shen Z, Whelan J, Zheng L (2018) Alternative splicing plays a critical role in maintaining mineral nutrient homeostasis in rice (Oryza sativa). Plant Cell, 30: 2267-2285. Drechsel G, Kahles A, Kesarwani AK, Stauffer E, Behr J, Drewe P, Ratsch G, Wachtera A (2013) Nonsense-mediated decay of alternative precursor mRNA splicing variants is a major determinant of the Arabidopsis steady state transcriptome. Plant Cell, 25: 3726- 3742. Earley KW, Haag JR, Pontes O, Opper K, Juehne T, Song K, Pikaard CS (2006) Gateway- compatible vectors for plant functional genomics and proteomics. Plant Journal, 45: 616- 629. Ferguson BJ, Indrasumunar A, Hayashi S, Lin MH, Lin YH, Reid DE, Gresshoff PM. (2010) Molecular analysis of legume nodule development and autoregulation. Journal of Integrative Plant Bioliology, 52: 61-76.

179

Filichkin SA, Priest HD, Givan SA, Shen RK, Bryant DW, Fox SE, Wong WK, Mockler TC (2010) Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Research, 20: 45-58. Gautrat P, Laffont C, Frugier FJCB (2020) Compact root architecture 2 promotes root competence for nodulation through the miR2111 systemic effector. Current Biology, 30: 1339-1345. Godiard L, Lepage A, Moreau S, Laporte D, Verdenaud M, Timmers T, Gamas P (2011) MtbHLH1, a bHLH transcription factor involved in Medicago truncatula nodule vascular patterning and nodule to plant metabolic exchanges. New Phytologist, 191: 391-404. Grafi G, Zemach A, Pitto L (2007) Methyl-CpG-binding domain (MBD) proteins in plants. Bba-Gene Structure and Expression, 1769: 287-294. Hartley SW, Mullikin JC (2015) QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments. BMC Bioinformatics, 16: 224. Hartley SW, Mullikin JC (2016) Detection and visualization of differential splicing in RNA- Seq data with JunctionSeq. Nucleic Acids Research, 44: e127-e127. Hartmann L, Drewe-Boss P, Wiessner T, Wagner G, Geue S, Lee HC, Obermuller DM, Kahles A, Behr J, Sinz FH, Ratsch G, Wachter A (2016) Alternative splicing substantially diversifies the transcriptome during early photomorphogenesis and correlates with the energy availability in Arabidopsis. Plant Cell, 28: 2715-2734. Hayashi S, Gresshoff PM, Kinkema M (2008) Molecular analysis of lipoxygenases associated with nodule development in soybean. Molecular Plant Microbe Interactions, 21: 843-853. He ZS, Zou HS, Wang YZ, Zhu JB, Yu GQ (2008) Maturation of the nodule-specific transcript MsHSF1c in Medicago sativa may involve interallelic trans-splicing. Genomics, 92:115-121. Hewezi T, Lane T, Piya S, Rambani A, Rice JH, Staton M (2017) Cyst nematode parasitism induces dynamic changes in the root epigenome. Plant Physiology, 174: 405-420. Iniguez LP, Ramirez M, Barbazuk WB, Hernandez G (2017) Identification and analysis of alternative splicing events in Phaseolus vulgaris and Glycine max. BMC Genomics, 18: 650. Isken O, Maquat LE (2007) Quality control of eukaryotic mRNA: safeguarding cells from abnormal mRNA function. Genes Development, 21: 1833-1856.

180

Jin J, Hewezi T, Baum TJ (2011) The Arabidopsis bHLH25 and bHLH27 transcription factors contribute to susceptibility to the cyst nematode Heterodera schachtii. Plant Journal, 65: 319-328. Joshi T, Yan Z, Libault M, Jeong DH, Park S, Green PJ, Sherrier DJ, Farmer A, May G, Meyers BC, Xu D, Stacey G (2010) Prediction of novel miRNAs and associated target genes in Glycine max. BMC bioinformatics, 11 Suppl 1, S14. Kalyna M, Simpson CG, Syed NH, Lewandowska D, Marquez Y, Kusenda B, Marshall J, Fuller J, Cardle L, McNicol J, Dinh HQ, Barta A, Brown JWS (2012) Alternative splicing and nonsense-mediated decay modulate expression of important regulatory genes in Arabidopsis. Nucleic Acids Reseacrh, 40: 2454-2469. Kang H, Chu XJ, Wang C, Xiao AF, Zhu H, Yuan SL, Yang ZZ, Ke DX, Xiao SB, Hong ZL, Zhang ZM (2014) A MYB coiled-coil transcription factor interacts with NSP2 and is involved in nodulation in Lotus japonicus. New Phytologist, 201: 837-849. Karimi M, Inze D, Depicker A (2002) GATEWAY vectors for Agrobacterium-mediated plant transformation. Trends in Plant Sciences, 7: 193-195. Kawakatsu T, Nery JR, Castanon R, Ecker JR (2017) Dynamic DNA methylation reconfiguration during seed development and germination. Genome Biology, 18: 171. Kereszt A, Li D, Indrasumunar A, Nguyen CD, Nontachaiyapoom S, Kinkema M, Gresshoff PM (2007) Agrobacterium rhizogenes-mediated transformation of soybean to study root biology. Nature Protocols, 2: 948-952. Kim MY, Zilberman D (2014) DNA methylation as a system of plant genomic immunity. Trends in Plant Sciences, 19: 320-326. Kornblihtt AR, De la Mata M, Fededa JP, Munoz MJ, Nogues G (2004) Multiple links between transcription and splicing. RNA, 10: 1489-1498. Krueger F, Andrews SR (2011) Bismark: a flexible aligner and methylation caller for Bisulfite- Seq applications. Bioinformatics, 27: 1571-1572. Law JA, Jacobsen SE (2010) Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nature Reviews Genetics, 11: 204-220. Lee H, Hur CG, Oh CJ, Kim HB, Park SY, An CS (2004) Analysis of the root nodule- enhanced transcriptome in soybean. Molecular Cell, 18: 53-62.

181

Lerouge P, Roche P, Faucher C, Maillet F, Truchet G, Prome JC, Denarie J (1990) Symbiotic host-specificity of Rhizobium meliloti Is determined by a sulfated and acylated glucosamine oligosaccharide signal. Nature, 344: 781-784. Li H, Deng Y, Wu TL, Subramanian S, Yu O (2010) Misexpression of miR482, miR1512, and miR1515 increases soybean nodulation. Plant Physiology, 153: 1759-1770. Li JG, Li XJ, Guo L, Lu F, Feng XJ, He K, Wei LP, Chen ZL, Qu LJ, Gu HY (2006) A subgroup of MYB transcription factor genes undergoes highly conserved alternative splicing in Arabidopsis and rice. Journal of Experimental Botany, 57: 1263-1273. Li Q, Li M, Zhang D, Yu L, Yan J, Luo L (2019) The peptide-encoding MtRGF3 gene negatively regulates nodulation of Medicago truncatula. Biochemical and Biophysical Research Communications, 523: 66-71. Li WF, Lin WD, Ray P, Lan P, Schmidt W (2013) Genome-wide detection of condition- sensitive alternative splicing in Arabidopsis roots. Plant Physiology, 162: 1750-1763. Li XQ, Harris CJ, Zhong ZH, Chen W, Liu R, Jia B, Wang ZH, Li SS, Jacobsen SE, Du JM (2018) Mechanistic insights into plant SUVH family H3K9 methyltransferases and their binding to context-biased non-CG DNA methylation. Proceedings of National Academy of Sciences of the USA, 115: E8793-E8802. Libault M, Farmer A, Brechenmacher L, Drnevich J, Langley RJ, Bilgin DD, Radwan O, Neece DJ, Clough SJ, May GD, Stacey G (2010) Complete transcriptome of the soybean root hair cell, a single-cell model, and its alteration in response to Bradyrhizobium japonicum infection. Plant Physiology, 152: 541-552. Libault M, Joshi T, Takahashi K, Hurley-Sommer A, Puricelli K, Blake S, Finger RE, Taylor CG, Xu D, Nguyen HT, Stacey G (2009) Large-scale analysis of putative soybean regulatory gene expression identifies a Myb gene involved in soybean nodule development. Plant Physiology, 151, 1207-1220. Limpens E, Bisseling T (2003) Signaling in symbiosis. Current Opiion in Plant Biology, 6: 343- 350. Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15: 550. Maor GL, Yearim A, Ast G (2015) The alternative role of DNA methylation in splicing regulation. Trends in Genetics, 31: 274-280.

182

Marquez Y, Brown JWS, Simpson C, Barta A, Kalyna M (2012) Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Research, 22: 1184-1195. Mathieu O, Probst AV, Paszkowski J (2005) Distinct regulation of histone H3 methylation at lysines 27 and 9 by CpG methylation in Arabidopsis. The Embo Journal, 24: 2783-2791. Matzke MA, Mosher RA (2014) RNA-directed DNA methylation: an epigenetic pathway of increasing complexity. Nature Reviews Genetics, 15: 394-408. Merai Z, Benkovics AH, Nyiko T, Debreczeny M, Hiripi L, Kerenyi Z, Kondorosi E, Silhavy D (2013) The late steps of plant nonsense-mediated mRNA decay. Plant Journal, 73: 50-62. Mortier V, Den Herder G, Whitford R, Van de Velde W, Rombauts S, D'Haeseleer K, Holsters M, Goormachtig S (2010) CLE peptides control Medicago truncatula nodulation locally and systemically. Plant Physiology, 153: 222-237. Mus F, Crook MB, Garcia K, Costas AG, Geddes BA, Kouri ED, Paramasivan P, Ryu MH, Oldroyd GED, Poole PS, Udvardi MK, Voigt CA, Ane JM, Peters JW (2016) Symbiotic nitrogen fixation and the challenges to its extension to nonlegumes. Applied Environmental Microbiology, 82: 3698-3710. Oldroyd GED, Murray JD, Poole PS, Downie JA (2011) The rules of engagement in the legume-rhizobial symbiosis. Annual Reviews of Genetics, 45: 119-144. Pecrix Y, Staton SE, Sallet E, Lelandais-Brere C, Moreau S, Carrere S, Blein T, Jardinaud MF, Latrasse D, Zouine M, Zahm M, Kreplak J, Mayjonade B, Satge C, Perez M, Cauet S, Marande W, Chantry-Darmon C, Lopez-Roques C, Bouchez O, Berard A, Debelle F, Munos S, Bendahmane A, Berges H, Niebel A, Buitink J, Frugier F, Benhamed M, Crespi M, Gouzy J, Gamas P (2018) Whole-genome landscape of Medicago truncatula symbiotic genes. Nature Plants, 4: 1017-1025. Peng Y, Bartley LE, Chen XW, Dardick C, Chern MS, Ruan R, Canlas PE, Ronald PC (2008) OsWRKY62 is a negative regulator of basal and Xa21-mediated defense against Xanthomonas oryzae pv. oryzae in rice. Molecular Plant, 1: 446-458. Piya S, Bennett M, Rambani A, Hewezi T (2017) Transcriptional activity of transposable elements may contribute to gene expression changes in the syncytium formed by cyst nematode in arabidopsis roots. Plant Signaling and Behaviour, 12: e1362521.

183

Popp C, Ott T (2011) Regulation of signal transduction and bacterial infection during root nodule symbiosis. Current Opinion in Plant Biology, 14: 458-467. Rambani A, Pantalone V, Yang S, Rice JH, Song Q, Mazarei M, Arelli PR, Meksem K, Stewart CN, Hewezi T (2020) Identification of introduced and stably inherited DNA methylation variants in soybean associated with soybean cyst nematode parasitism. New Phytologist, 227: 168-184 Rambani A, Rice JH, Liu J, Lane T, Ranjan P, Mazarei M, Pantalone V, Stewart CN Jr, Staton M, Hewezi T (2015) The methylome of soybean roots during the compatible interaction with the soybean cyst nematode. Plant Physiology, 168: 1364-1377. Reddy ASN, Marquez Y, Kalyna M, Barta A (2013) Complexity of the alternative splicing landscape in plants. Plant Cell, 25: 3657-3683. Regulski M, Lu ZY, Kendall J, Donoghue MTA, Reinders J, Llaca V, Deschamps S, Smith A, Levy D, McCombie WR, Tingey S, Rafalski A, Hicks J, Ware D, Martienssen RA (2013) The maize methylome influences mRNA splice sites and reveals widespread paramutation-like switches guided by small RNA. Genome Research, 23: 1651-1662. Satge C, Moreau S, Sallet E, Lefort G, Auriac MC, Rembliere C, Cottret L, Gallardo K, Noirot C, Jardinaud MF, Gamas P (2016) Reprogramming of DNA methylation is critical for nodule development in Medicago truncatula. Nature Plants, 2: 16166. Schwinghamer E A, Evans HJ, Dawson MD (1970) Evaluation of effectiveness in mutant

strains of Rhizobium by acetylene reduction relative to other criteria of N2 fixation. Plant Soil, 33: 192-212. Secco D, Wang C, Shou HX, Schultz MD, Chiarenza S, Nussaume L, Ecker JR, Whelan J, Lister R (2015) Stress induced gene expression drives transient DNA methylation changes at adjacent repetitive elements. Elife, 4: e09343. Severin AJ, Woody JL, Bolon YT, Joseph B, Diers BW, Farmer AD, Muehlbauer GJ, Nelson RT, Grant D, Specht JE, Graham MA, Cannon SB, May GD, Vance CP, Shoemaker RC (2010) RNA-Seq Atlas of Glycine max: A guide to the soybean transcriptome. BMC Plant Biology, 2229: 1110-1160. Shen YT, Zhou ZK, Wang Z, Li WY, Fang C, Wu M, Ma YM, Liu TF, Kong LA, Peng DL, Tian ZX (2014) Global dissection of alternative splicing in paleopolyploid soybean. Plant Cell, 26: 996-1008.

184

Soyano T, Hayashi M (2014) Transcriptional networks leading to symbiotic nodule organogenesis. Current Opiion in Plant Biology, 20: 146-154. Stamm S (2008) Regulation of alternative splicing by reversible protein phosphorylation. Journal of Biology and Chemistry, 283: 1223-1227. Stroud H, Greenberg MV, Feng S, Bernatavichute YV, Jacobsen SE (2013) Comprehensive analysis of silencing mutants reveals complex regulation of the Arabidopsis methylome. Cell, 152: 352-364. Szadeczky-Kardoss I, Csorba T, Auber A, Schamberger A, Nyiko T, Taller J, Orban TI, Burgyan J, Silhavy D (2018) The nonstop decay and the RNA silencing systems operate cooperatively in plants. Nucleic Acids Research, 46: 4632-4648. Takuno S, Gaut BS (2013) Gene body methylation is conserved between plant orthologs and is of evolutionary consequence. Proceedings of National Academy of Sciences of the USA, 110: 1797-1802. Tang F, Yang S, Zhu H (2016) Functional analysis of alternative transcripts of the soybean Rj2 gene that restricts nodulation with specific rhizobial strains. Plant Biology, 18: 537-541. Thatcher SR, Danilevskaya ON, Meng X, Beatty M, Zastrow-Hayes G, Harris C, Van Allen B, Habben J, Li BL (2016) Genome-wide analysis of alternative splicing during development and drought stress in maize. Plant Physiology, 170: 586-599. Thatcher SR, Zhou WG, Leonard A, Wang BB, Beatty M, Zastrow-Hayes G, Zhao XY, Baumgarten A, Li BL (2014) Genome-wide analysis of alternative splicing in Zea mays: Landscape and genetic regulation. Plant Cell, 26: 3472-3487. Toledo-Ortiz G, Huq E, Quail PH (2003) The Arabidopsis basic/helix-loop-helix transcription factor family. Plant Cell, 15: 1749-1770. Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA- Seq. Bioinformatics, 25: 1105-1111. Turck F, Roudier F, Farrona S, Martin-Magniette ML, Guillaume E, Buisine N, Gagnot S, Martienssen RA, Coupland G, Colot V (2007) Arabidopsis TFL2/LHP1 specifically associates with genes marked by trimethylation of histone H3 lysine 27. PLoS Genetics, 3: 855-866. Udvardi M, Poole PS (2013) Transport and metabolism in legume-rhizobia symbioses. Annual Review of Plant Biology, 64: 781-805.

185

Verwoerd TC, Dekker BMM, Hoekema A (1989) A small-scale procedure for the rapid isolation of plant RNAs. Nucleic Acids Research, 17: 2362-2362. Wang C, Yu HX, Zhang ZM, Yu LL, Xu XS, Hong ZL, Luo L (2015a) Phytosulfokine is involved in positive regulation of Lotus japonicus nodulation. Molecular Plant Microbe Interactions, 28: 847-855. Wang C, Zhu H, Jin LP, Chen T, Wang LX, Kang H, Hong ZL, Zhang ZM (2013a) Splice variants of the SIP1 transcripts play a role in nodule organogenesis in Lotus japonicus. Plant Molecular Biology, 82: 97-111. Wang X, Fan C, Zhang X, Zhu J, Fu YF (2013b) BioVector, a flexible system for gene specific-expression in plants. BMC Plant Biology, 13: 198. Wang XT, Hu LJ, Wang XF, Li N, Xu CM, Gong L, Liu B (2016) DNA methylation affects gene alternative splicing in plants: An example from rice. Molecular Plant, 9: 305-307. Wang Y, Li P, Cao X, Wang X, Zhang A, Li X (2009) Identification and expression analysis of miRNAs from nitrogen-fixing soybean nodules. Biochemical and Biophysical Research Communications, 378: 799-803. Wang YN, Li KX, Chen L, Zou YM, Liu HP, Tian YP, Li DX, Wang R, Zhao F, Ferguson BJ, Gresshoff PM, Li X (2015b) MicroRNA167-directed regulation of the auxin response factors GmARF8a and GmARF8b is required for soybean nodulation and lateral root development. Plant Physiology, 168: 984-999. Wang YN, Wang LX, Zou YM, Chen L, Cai ZM, Zhang SL, Zhao F, Tian YP, Jiang Q, Ferguson BJ, Gresshoff PM, Li X (2014) Soybean miR172c targets the repressive AP2 transcription factor NNC1 to activate ENOD40 expression and regulate nodule initiation. Plant Cell, 26: 4782-4801. Xie C, Mao X, Huang J, Ding Y, Wu J, Dong S, Kong L, Gao G, Li CY, Wei L (2011) KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Research, 39: W316-322. Yan Z, Hossain MS, Arikit S, Valdes-Lopez O, Zhai JX, Wang J, Libault M, Ji TM, Qiu LJ, Meyers BC, Stacey G (2015) Identification of microRNAs and their mRNA targets during soybean nodule development: functional analysis of the role of miR393j-3p in soybean nodulation. New Phytologist, 207: 748-759.

186

Yang XZ, Zhang HY, Li L (2012) Alternative mRNA processing increases the complexity of microRNA-based gene regulation in Arabidopsis. Plant Journal, 70: 421-431. Yuan S, Li R, Chen S, Chen H, Zhang C, Chen L, Hao Q, Shan Z, Yang Z, Qiu D, Zhang X, Zhou X (2016) RNA-Seq analysis of differential gene expression responding to different rhizobium strains in soybean (Glycine max) roots. Frontier in Plant Sciences, 7: 721. Yuan SL, Li R, Chen HF, Zhang CJ, Chen LM, Hao QN, Chen SL, Shan ZH, Yang ZL, Zhang XJ (2017) RNA-Seq analysis of nodule development at five different developmental stages of soybean (Glycine max) inoculated with Bradyrhizobium japonicum strain 113-2. Scientific Reports, 7: 42248. Zhang GJ, Guo GW, Hu XD, Zhang Y, Li QY, Li RQ, Zhuang RH, Lu ZK, He ZQ, Fang XD, Chen L, Tian W, Tao Y, Kristiansen K, Zhang XQ, Li SG, Yang HM, Wang J, Wang J (2010) Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Research, 20: 646-654. Zhao C, Beers EP (2013) Alternative splicing of Myb-related genes MYR1 and MYR2 may modulate activities through changes in dimerization, localization, or protein folding. Plant Signaling and Behaviour, 8: e27325. Zhou SL, Liu XY, Zhou C, Zhou QW, Zhao Y, Li GL, Zhou DX (2016) Cooperation between the H3K27me3 chromatin mark and non-cg methylation in epigenetic regulation. Plant Physiology, 172: 1131-1141.

187

Appendix

Table 5-1: List of primers used in this study Gene ID Purpose Direction (5'-3')

Glyma.01g130200 Cloning Forward ctagtctcgagccGGCTGTCTGTACTGTTGTGAGTTATTACT

Glyma.01g130200 Cloning Reverse gagcGAATTCAGCCATTTCTGAAAGGGAATAAGTAATTATGC

Glyma.14g142700 Cloning Forward ctagtctcgagccCCTGTTTAAGTCCACAACCAAAATCC

Glyma.14g14270 Cloning Reverse gagcGGTACCACACATTACAAGGAGCAATGTATTGATCAT

Glyma.04g051000 Cloning Forward gagcCCTAGGCTTATACTTCTGTTGTTTGTAATAGCTCGAC

Glyma.04g051000 Cloning Reverse gagcGGATCCTCCCATATTCTCAACGAGACAAAGAATATCC

Glyma.07g218400 Cloning Forward GATTAGATCTCTCCCACTTTTCATAAAGGTCAAC

Glyma.07g218400 Cloning Reverse gagcGGTACCCTCCTGGTGTATAAGAGCTGAACTAAAAGC

Glyma.08G109400 qRT-PCR Gene Forward CACTTACCAGAGTACAGGCTTC

Glyma.08G109400 qRT-PCR Reverse ATATCGTCGAGGTTGGTTTCC

Glyma.08G109400_E001 qRT-PCR Exon 1 Forward GCATAATGGAGTGGGTCAGT

Glyma.08G109400_E001 qRT-PCR Reverse AGTCAATGGTGAAGGTCATAGAG

Glyma.01G009700 qRT-PCR Gene Forward TGGATCAAACTTCTCTGCTCTG

Glyma.01G009700 qRT-PCR Reverse GCCATCAAGGTGAAAGGAAATG

Glyma.01G009700_E001 qRT-PCR Exon 1 Forward CCTGTTGTGTTCCCTCTCATTA

Glyma.01G009700_E001 qRT-PCR Reverse TGTGAGCAGTGCTTGGTT

Glyma.17G085500 qRT-PCR Gene Forward CGAAATCTCTGTGGCAGTGG

Glyma.17G085500 qRT-PCR Reverse AGCGGATACTGTCTCTGAGC

Glyma.17G085500_E010 qRT-PCR Exon 10 Forward AGAGCTGCTTTCTTGAGGCA

Glyma.17G085500_E010 qRT-PCR Reverse CCCCCAGGAAACATTTGATCG

Glyma.16G169100 qRT-PCR Gene Forward GTCTTCGAGAGAATGGGCTAAG

Glyma.16G169100 qRT-PCR Reverse GACCTTTAAGGGCCCGATTAT

Glyma.16G169100_E002 qRT-PCR Exon 2 Forward ATCCCATTCCCACTCCAAAG

Glyma.16G169100_E002 qRT-PCR Reverse CTCCACCTTCAACTACCTTCAG

Glyma.09G199800 qRT-PCR Gene Forward GGAAGAAAAGGCCGCTAGATC

Glyma.09G199800 qRT-PCR Reverse TCGCAATGATGCGATAAATTCC

Glyma.09G199800_E002 qRT-PCR Exon 2 Forward ACCACCACCACGGTTCTTCTG

Glyma.09G199800_E002 qRT-PCR Reverse CAACCTCGGAGCCAGAGTTAG

Glyma.04G257000_E013 qRT-PCR Gene Forward CAAAGTCAGTCTACAACTGGTACAC

Glyma.04G257000 qRT-PCR Reverse ACCAAAACGACGAGAGGGTC

188

Table 5-1 continued Glyma.04G257000 qRT-PCR Exon 13 Forward TACGGTTACTGTTCCCGATGGC

Glyma.04G257000_E013 qRT-PCR Reverse GCTGCGGAACGAAACATCTTTACC

Glyma.04G257000_E014 qRT-PCR Exon 14 Forward GGGACAACGATCTATCCGAAGGAG

Glyma.04G257000_E014 qRT-PCR Reverse AAGCACAACAGCTGGGTTGGTC

Glyma.13G227100 qRT-PCR Gene Forward CATTCATGATGGCATTACCATTTGGG

Glyma.13G227100 qRT-PCR Reverse GAATCTTAGCTCTCCCACTTTTGCAT

Glyma.13G227100_001 qRT-PCR Exon 1 Forward GCCTTGGCCTACACAATATATGAGC

Glyma.13G227100_001 qRT-PCR Reverse TTAGTCAAGGTGGCTGCAGCAGT

Glyma.13G227100_007 qRT-PCR Exon 7 Forward GAAGAAACCAAGGAGCAAGAACC

Glyma.13G227100_007 qRT-PCR Reverse TGGCACTGCGAATCACACGA

189

Figure 5-1: Histological analysis of soybean nodules. Nodules collected at 12, 22, 36 dpi were examined by light microscopy (a, c, and e) or electron microscopy (b, d, and f). (a, b) Nodules collected at 12 dpi reveal actively dividing cells with prominent nuclei. (c, d) Nodules collected at 22 dpi are packed with bacteroids with prominent poly-β-hydroxybutyrate granules. (e, f) Nodules collected at 36 dpi exhibit signs of senescence as indicated by loss of internal organization and disintegration of symbiosomes. B = bacteroid; CW = cell wall; DSY = disintegrating symbiosomes; ER = endoplasmic reticulum; IZ = infection zone; M = mitochondria; N = nucleus; P = peroxisome; PHB = poly-β-hydroxybutyrate; SL = sclereid layer; SY = symbiosomes; VB = vascular bundle.

190

Figure 5-2: Transcriptional landscape of developing soybean nodules. (a) PCA of RNA-seq data showing high similarity between the biological replicates and clear separation between root and nodule samples collected at 12, 22, 36 dpi. (b) Venn diagrams showing unique and shared differentially expressed genes in the nodules at 12, 22, 36 dpi. (c) Heatmap displaying five main clusters of genes that exhibit differential expression patterns in the developing nodules. (d) PCA of expression levels of the 9,669 nodule core genes in various soybean tissues and organs. The three nodule stages are clustered in a group separated from other tissues and organs. (e) Expression profiles of 716 and 863 genes among the core set that are showing gradual increases or decreases in expression across the three nodule stages.

191

Figure 5-3: KEGG pathway enrichment analysis of the 9,669 nodule core genes. KEGG pathway analysis was conducted using KOBAS 2.0 (http://kobas.cbi.pku.edu.cn) with a significance cutoff of adjusted P value of 0.05.

192

Figure 5-4: GO term enrichment analysis of the nodule core genes showing gradual increases or decreases in the expression. (a) GO term enrichment analysis of 716 core genes showing gradual increases in the expression across the three nodule developmental stages. (b) GO term enrichment analysis of 863 core genes showing gradual decreases in the expression across the three nodule developmental stages. GO terms enrichment analysis was generated using agriGO (Du et al., 2010) with Fisher’s exact test and Bonferroni multi-test adjustment with a significance cutoff of adjusted p-value of 0.05.

193

Figure 5-5: GO term enrichment analysis of the 9,669 nodule core genes.

194

Figure 5-6: Histochemical localization of GUS activity driven by the promoters of core and stage -specific genes in the developing nodules. Multiple transgenic hairy roots expressing the promoter:GUS fusion constructs of the indicated genes were generated, inoculated with Bradyrhizobium diazoefficiens, and then stained for GUS activity at various nodule developmental stages. (a-c) GUS staining of the core gene Glyma.07G218400 in the nodules at 12, 22, and 36 dpi. (d-f) GUS staining of Glyma.01G130200 specific to the 12-d-old nodules. (g-i) GUS staining of Glyma.14G142700 specific to the 22-d-old nodules. (j-l) GUS staining of Glyma.04G051000 specific –to the 36-d-old nodules. Scale bars = 200 µm

195

Figure 5-7: KEGG pathway analysis of the DEGs unique to the 12, 22, 36-d-old nodules. (a) KEGG pathway analysis of the 2,992 DEGs unique to the 12-d-old nodules. (b) KEGG pathway analysis of the 1,880 DEGs unique to the 22-d-old nodules. (c) KEGG pathway analysis of the 2,430 DEGs unique to the 36-d-old nodules.

196

Figure 5-8: Fold-change plot of differentially expressed transcription factors specific to the 12-, 22-, and 36-d-old nodules. Shown are the fold change values of 263, 160, 139 differentially expressed TFs unique to the 12- , 22-, and 36-d-old nodules, respectively. Each transcription factor is symbolized by a dot. Green and red dots denote upregulation and downregulation, respectively.

197

Figure 5-9: Heat map demonstrating the differential expression pattern of genes encoding small secretory peptides in the 12, 22, 36-d-old nodules. The color bar represents log2 fold change values in the nodules relative to the corresponding root tissues.

198

Figure 5-10: Fold-change plot of differentially expressed genes with transport-related functions in the developing nodules. Shown are the fold change values of 642 DEGs with transport-related functions identified among the core and stage-specific genes.

199

Figure 5-11: Alternative RNA splicing landscape of developing soybean nodules. (a-c) Volcano plots highlighting significance and fold change of differentially spliced events between nodules and roots at 12 (a), 22, (b), and 36 (c) dpi. For each differentially spliced event -log10 adjusted p value and log2 fold change of expression in the nodules versus the corresponding roots were plotted. Red and green dots denote splicing events that are significantly more abundant in nodules or roots, respectively. DUE, differentially used exon; IR, intron retention. (d-f) qPCR quantification of the abundance of differentially spliced events at various nodule developmental stages. Shown are the fold change values of the indicated differentially spliced events in the nodules at 12 (d), 22 (e), and 36 (f) dpi compared to the corresponding root tissues. Data are averages of three biological replicates ± SE. (g) Venn diagram showing unique and common differentially used exons between the three nodule developmental stages. The number of differentially spliced genes is shown in parentheses. (h) Heatmap showing the expression patterns of 70 differentially used exons that are common to the three nodule developmental stages. Note that 68 exons are showing the same expression patterns across the three nodule developmental stages.

200

Figure 5-12: GO term enrichment analysis of 2,323 differentially spliced genes.

201

Figure 5-13: KEGG pathway analysis of 2,323 differentially spliced genes.

202

Figure 5-14: Impacts of gene expression, gene size, exon number, and intron length on alternative splicing in the developing nodules. (a-c) Venn diagrams showing significant overlaps between the differentially expressed genes (DEGs) and differentially spliced genes (DSGs) in the nodules at 12 (a), 22 (b), and 36 (c) dpi. (d-f) Fold-change plots comparing the abundance of the DEGs and differentially spliced exons/introns. Plotted are the log2 fold change values of the differentially spliced events versus log2 fold change values of the corresponding DEGs. DUE, differentially used exon. (g-i) Impacts of gene size, exon number, and intron length on alternative splicing in the developing nodules. DSGs with 3 or more splicing events are larger (g), and contain more exons (h), and longer introns (i) as compared with those having 1 or 2 splicing events.

203

Figure 5-15: Impact of alternative splicing on protein domains organization and miRNAs targeting efficiency. (a) Bubble plot highlighting examples of the most predominant protein domains detected in the differentially used exons. Bubble size represents the number of protein domains assigned to each category. The complete list of the protein domains is provided in Data S12. (b) Bubble plot highlighting examples of the most prevalent miRNA target sites detected in the differentially used exons or introns. Bubble size represents the number of binding sites assigned to each miRNA. The complete list of the 205 putative miRNA binding sites is provided in Data S13. (c, d) Exon expression profile for the bZIP transcription factor Glyma.13G260300 (c) and the ethylene responsive element binding factor5 Glyma.01G206700 (d). Gene diagram showing exons (boxes) and introns (lines between boxes) is included below each blot. Differentially spliced exons are highlighted in pink. Binding sites for gma-miR4378 and gma-miR9752 are shown in (c) and (d), respectively. P values of the significantly differentially spliced exons are indicated above each exon.

204

Figure 5-16: qPCR confirmation of differential usage of DNA binding domains-containing exons of three transcription factors. Shown are the fold change values of the indicated differentially spliced exons of Glyma.13G227100 (a), Glyma.04G257000 (b), and Glyma.09G199800 (c) in the nodules at 12, 22, and 36 dpi relative to the corresponding root tissues. Data are averages of three biological replicates ± SE.

205

Figure 5-17: Heatmap highlighting differential gene expression of key genes of DNA methylation pathway. The RKPM values of the indicated genes row-wise normalized using Z score and used to construct the heat map.

206

Figure 5-18: Methylome landscape of developing soybean nodules. (a-c) Comparison of global DNA methylation levels between nodules (red) and roots (light blue) over protein-coding genes at 12 (a), 22 (b) and 36 (c) dpi in the CG, CHG and CHH sequence contexts. (d-f) Comparison of global DNA methylation levels between nodules and roots over TEs at 12 (d), 22 (e) and 36 (f) dpi in the CG, CHG and CHH sequence contexts. (g, h) Total numbers of DMRs identified in the nodules at 12, 22, and 36 dpi (g) and their classification into hyper-and hypomethylation (h). (i) Hierarchical clustering of the DMRs identified in the nodule at 12, 22, and 36 dpi. (j) DNA methylation sequence contexts of the DMRs identified in the nodule at 12, 22, and 36 dpi. (k) Stacked bar graph demonstrating the percentage of DMRs overlapping with different features of protein-coding genes, including promoter, gene body, and UTRs. (l) Stacked bar graph demonstrating the percentage of DMRs mapped to various families of TEs.

207

Figure 5-19: Differential DNA methylation occurred preferentially in TEs adjacent to genes. The numbers of DMR-associated TEs identified in the nodules at 12, 22, and 36 dpi were determined for 1-Kb non-overlapping bins and used to generate the graph.

208

Figure 5-20: Impacts of DNA methylation on gene expression levels. (a) Venn diagram showing overlaps between differentially expressed genes (DEGs) and differentially methylated genes (DMGs) in the nodules at 12, 22, and 36 dpi. (b-d) Impacts of DNA methylation in gene body in the CG (b), CHG (c), and CHH (d) sequence contexts on gene expression levels. (e-g) Impacts of DNA methylation in regulatory regions (promoter and UTRs) in the CG (e), CHG (f), and CHH (g) sequence contexts on gene expression levels. (h, i) Impacts of DNA methylation in TEs located upstream (h) or downstream (i) of the closet genes on their expression levels.

209

Figure 5-21: GO term enrichment analysis of the 1864 differentially expressed/differentially methylated genes.

210

Chapter 6: Conclusions

211

Soybean is among the world-leading crops grown as a source of protein and oil for human and animal consumption. In addition, soybean plays a critical role in soil nitrogen recycling through biological nitrogen fixation. Commercially, soybean has been primarily cultivated in the temperate regions but lately, its cultivation was extended to the tropics. The increasing global demand for soybean products requires a continuous improvement for soybean seed yield and quality. Efforts to improve soybean agronomic and seed quality traits, through the use of various conventional and biotechnological techniques have been undertaken ever since. The objective of this dissertation was to explore the genetic and epigenetic approaches of manipulating soybean agronomic and seed quality traits. Previous studies have associated SCN resistance to potential yield drag, which makes uncertain the development of high-yielding and SCN-resistant soybean cultivars. To investigate the impact of SCN resistance alleles on soybean yield, a population of

115 F5-derived recombinant inbred lines (RILs) segregating for yield and resistance to soybean cyst nematodes, was developed through a conventional breeding method and tested in multiple locations for yield and seed quality traits. We demonstrated that it was possible to combine SCN resistance and high yield without infringing on the later. In fact, The RIL top yielders (SCN- 038= 4096 kgha-1, SCN-095=3991kgha-1) were found in the same yield group as the high-yield check (Ellis = 4057kg ha-1). Meanwhile, the mean seed yield of the resistant lines and the susceptible lines were not significantly different. These results suggest that soybean yield potential can be improved while enhancing soybean cyst nematode resistance among US soybean cultivars. Future initiatives may focus on finding and/or developing soybean elite cultivars combining high yield and high protein with acceptable levels of oil content as well as broad resistance to various races of SCN. Indeed, SCN is the most devastating and the first yield reducing soybean pathogen. The use of resistant cultivars remains the main strategy to control this disease. Rhg1 and Rhg4 loci have been confirmed to be responsible of SCN resistance found in most commercial soybean lines in the US and in other soybean growing areas. It has been reported that high copy number of Rhg1- b is needed for SCN resistance, whereas low copy number of Rhg-a requires Rhg4. To elucidate the role of these two loci in the resistance to SCN found in our RILs population, we conducted a stepwise investigation of the Rhg1 and Rhg4 loci from 6 selected representative RILs

212

distinguishable by their SCN resistance phenotypes. The selected lines were either resistant, moderately resistant or susceptible to SCN race 2 based on their respective FI. In addition to SCN phenotyping in the greenhouse, we investigated the sequence polymorphism at Rhg1 and Rhg4 coding sequence and promoter, expression levels, copy number variation, and SSR and SNP markers. The coding sequences of Rhg1 (Gm a-SNAP18) and Rhg4 (Gm SHMT08) from the selected RILs matched the sequences of the resistant cultivar Forrest (Peking-type). As for the promoters, the 2kb sequences upstream of the translational start codon, Rhg1 and Rhg4 loci, revealed 9 distinguishing SNPs in Rhg1 and 6 in Rhg4 but were not decisive as all resistant and susceptible lines as well as Williams 82 showed similar polymorphisms. The copy number variation (CNV) especially at Rhg1 locus, was identified as a major factor in conferring the resistance to SCN. To understand the role that may be played by CNV in SCN resistance among our RILs, we determined the copy number at Rhg1 and Rhg4 loci using the CGH array. All our selected RILs and their parents revealed three copies of the Rhg1 and one copy of Rhg4 in comparison to the single copy reference, Williams 82, suggesting that the resistance to SCN in our population is independent of the CNV of Rhg1 and Rhg4. The relative expression of Gm a-SNAP18 and Gm SHMT08 were measured to quantify their respective transcript abundance as their increased expression has been associated with enhanced resistance to SCN. Using real time RT-PCR (qRT–PCR), we analyzed the expression of GmSNAP18 and Gm SHMT08 in the selected RILs in response to SCN infection (race 2) at 3, 5, 7 and 9-days post infection (dpi) relative to non-infected conditions. Surprisingly, on one hand, the Rhg1 (Gm a-SNAP18) expression did not change significantly either in the resistant or the susceptible in response. On the other hand, Rhg4 (SHMT08) was highly expressed both in the resistant and susceptible line at 3 and 9 dpi, suggesting that the resistance to SCN race 2 in our RILs is under the control of genetic factors other than Rhg1 and Rhg4. Considering the parental pedigree of our RIL population, and previously reported SCN resistance QTLs, we investigated their possible to DNA markers. We screened the selected lines with previously reported SSR markers linked to Rhg1, 4 and 5. These markers were Satt309 and Sat_168 (linked to rhg1), Sat_162 and Satt632 (linked to rhg4) and Satt574 and Satt082, linked to cqSCN-005 loci alternatively Rhg5. Intriguingly, all the selected RILs showed the same resistance alleles at Rhg1 and 4 loci. However, Satt574 and Satt082, showed a clear differentiation pattern for SCN resistance between the

213

resistance and the susceptible RILs, suggesting a potential major resistance QTL on chromosome 17 as previously reported. Taken together, the resistance to SCN race 2 found in our RIL population is controlled by a yet to be identified gene on chromosome 17. More focus should now be on fine-mapping the region marked by this QTL in order to identify novel resistance genes, on chromosome 17, conferring the resistance to SCN race 2.

In Africa, one of the strategies to sustainably mitigate the low soybean yield potential, in most cases due narrow genetic diversity within local germplasm, is through exotic plant introductions from external soybean breeding programs. However, most of soybean introductions from temperate regions, tend to flower earlier, and be very short in height resulting in reduced yields. In Rwanda, the few available genetic materials can only produce less than a half of yields obtained in other soybean producing countries especially the USA. In order to improve the existing soybean germplasm and increase the current yield production potential, we introduced a population of RILs from the University of Tennessee, soybean breeding program and tested them over two seasons in two locations in Rwanda. A total of 115 lines were tested in season B-2019 and A-2020 for yield and other agronomic traits. Interestingly, at low altitude, the US-developed RILs could out-yield the highest yielding local check by almost 1.2 MT/ha, whereas at the middle altitude the yield of the RILs was in the range of the most performing local check, suggesting that a selection of candidate adaptable lines can be made from the US-developed RIL population to improve the existing local material. The next move should be testing selected lines in more locations in order to assess the GXE impact on the overall performance. In addition, it would be interesting to manipulate and assess the impact of genes responsible for soybean adaptation to tropical environments.

Nodulation and atmospheric nitrogen fixation processes play key roles in the biosynthesis of amino acids used as the building blocks of storage protein, the major constituent of soybean seed. A better understanding of gene expression profiles and splicing patterns in developing nodules as well as how they are affected by the epigenetic mechanism would help to improve its efficiency. We assessed gene expression changes, alternative splicing events, and DNA methylation patterns as well as their interactions, during 3 stages of nodule development. This includes nodule formation (12dpi), development (22dpi) and senescence (36dpi) stages. A total

214

of 55,312 differentially expressed genes (DEGs), 116,210 differentially methylated genes and 7,205 differentially spliced events were identified in comparison with root tissues. Among the DEGs, 9,669 core genes and 7,302 stage-specific genes were discovered. Also,7845 DMRs overlapping with protein coding genes were identified. Among the differentially spliced events, the most frequent were introns retention and exons skipping. Alternative splicing events seem to be impacted by gene expression as 40% of the differentially spliced genes were found to be differentially expressed at the same nodule stage. Similarly, DNA methylation affected gene expression as 1864 DMGs were also differentially methylated. Lastly, DNA methylation appeared to have an impact on gene splicing efficiency. Our findings provide an insight to the role played by gene expression, DNA methylation and gene splicing as well as their interaction in determining nodule identity and functions. Importantly, our analysis pointed into several key genes that can be used to directly enhance soybean nodulation and nitrogen fixation efficiency under normal and stress conditions with ultimate goal of improving seed yield and quality. Overall, this study contributed to a better understanding of genetic ways of manipulating soybean agronomic and seed quality traits and how they can be influenced by epigenetic processes especially DNA methylation.

215

Vita

Daniel NIYIKIZA is originally from the southern province of Rwanda. After finishing his high school in biochemistry section, he joined the then University of Rwanda where he graduated with a bachelor’s degree with Hons. in crop production and horticulture. He went on to do a post graduate diploma in irrigation and drainage offered by the same institution. He later on, joined the University of Liege-Gembloux Agro-biotech, where he pursued an international master’s program in tropical and sub-tropical plant protection and finished with distinction. He joined the University of Tennessee, Knoxville in 2016, where he enrolled for a Doctor of Philosophy program in the department of plant sciences under the leadership of Dr. Tarek Hewezi and Dr. Vince Pantalone. His dissertation focused on understanding the genetic and epigenetic mechanisms of controlling the agronomic and seed quality traits of Soybean. Professionally, the author started his career with the then Rwanda Agriculture development Authority (RADA) which was later merged to become the Rwanda Agriculture Board, the institution in charge of Agriculture research in Rwanda. His research was part of a grant from the Borlaug Higher Education for Agricultural Research and development (BHEARD) and the United States Agency for International Development (USAID).

216