ANALYSIS OF SOYBEAN GENE REGULATION REVEALS REGULATORY MECHANISMS OF RHG1-MEDIATED RESISTANCE TO SOYBEAN CYST NEMATODE

BY

USAWADEE CHAIPROM

DISSERTATION

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Informatics in the Graduate College of the University of Illinois at Urbana-Champaign, 2019

Urbana, Illinois

Doctoral Committee:

Professor Matthew Hudson, Chair Professor Stephen Moose Assistant Professor Amy Marshall-Colon Assistant Professor Sihai Dave Zhao

ABSTRACT

Yield suppression due to soybean cyst nematode (SCN) infestation has been estimated to cost one billion US dollars per year. The majority of SCN-resistant soybean varieties carry the same resistance locus (Rhg1) which is a copy number polymorphism of 1-11 copies of a 31.2kb genomic region. Since Rhg1-mediated resistance is so widely used, this research project aims to identify regulatory mechanisms in soybeans by which copy number variation (CNV) at the Rhg1 locus mediates nematode resistance. We performed -wide analyses using high-throughput methods in three studies:

The first study focused on RNA-seq profiling to identify early response genes in SCN- infected roots. We found changes vary among susceptible (Glycine max cv.

Williams 82; a single copy) and resistant soybeans (G. max cv. Peking; three copies of Rhg1, G. max cv. Fayette; 11 copies and a wild relative G. soja PI 468916; non-Rhg1 resistance; a single copy). This study demonstrates the importance of the early plant responses to migrating nematodes in pathogenicity determination. The most extensive resistant transcriptome reaction was observed in PI 468916, where the resistant response was qualitatively different from that of commonly used

G. max varieties, providing further evidence for the benefits of the PI 468916 as a genetic source to improve the genetic diversity of SCN resistance in soybean.

In the second study, we investigated transcriptional differences between uninfected roots of three Fayette isolines carrying 8-11 copies of the Rhg1 repeat and showing different levels of resistance to SCN, using RNA-seq and Fluidigm-based qRT-PCR. A gene co-expression network suggested key genes involved in signal transduction and ethylene-mediated signaling pathway in a complex transcriptional effect of the repeat sequence. We found a positive correlation between copy number and expression levels of phytoalexin biosynthesis-related genes, which were induced

ii throughout the early stages of SCN infection. This result suggested increased expression of these genes in soybeans harboring Rhg1 repeats could enhance Rhg1-mediated SCN resistance.

The third study revealed potential regulatory mechanisms of genes in copy-number variant

Fayette lines, using small RNA sequencing (sRNA-seq) and whole genome bisulfite sequencing

(WGBS). We identified phased siRNAs that may regulate defense-related genes in both cis and trans, leading to complex alterations in the sRNA regulatory network in Rhg1-mediated SCN resistance. The WGBS analysis provided preliminary evidence for epigenetic control of gene expression and transposon silencing initiated by the Rhg1 repeat and mediated by sRNAs via

RNA-directed DNA methylation (RdDM) process. The results suggested that manipulation of sRNAs and DNA methylation could be an alternative method for fine-tuning of gene expression to enhance SCN resistance in soybeans with CNV at the Rhg1 locus.

iii

ACKNOWLEDGMENTS

I would like to express my deepest gratitude to my advisor, Dr. Matthew Hudson for giving me an opportunity to join his laboratory, to work on diverse datasets, and to participate in several conferences/workshops. My PhD study would have not been possible without his continuous support, patience, guidance, and helpful discussions all the time of my research and writing of the thesis. I appreciated all his understanding, encouragement, inspiration, and cheerfulness during the tough times in my PhD journey.

Besides my advisor, I would like to thank the rest of my advisory and thesis committee members: Dr. Stephen Moose, Dr. Amy Marshall-Colon, and Dr. Dave Zhao for providing useful courses to help me developed the ideas for this research project, for their time and insightful comments during qualifying exam, preliminary exam, and thesis defense.

I would like to acknowledge the funding sources: Thai government scholarship, United

Soybean Board (USB), Conference travel awards from the Illinois Informatics Institute that made my PhD study and research project possible.

My sincere thanks also go to Karin Readel and Dianne Carson for being the best coordinators for the Informatics PhD program and Crop Sciences program, respectively.

I am so lucky to be a part of the Hudson lab. I am profoundly grateful to Kankshita

Swaminathan and Gopal Battu for their encouragement and making here like my home. I thank

Christina Fliege for being the best organizer in our lab, for helping me with everything and for your friendship since the graduate student orientation in Crop Sciences. Thanks also to Mohammad

Belaffif for sharing the ideas, providing me all useful resources about bioinformatics and being like my brother here. I cannot obtain good datasets for analyses without Tong Geon Lee who generated whole genome bisulfite sequencing data, Esmaeil Miraeiz who was the main

iv collaborator in nematode experiments, and Alison Colgrove who provided nematode population.

I also had the great pleasure of working with Jia Dong who are supportive, motivated and friendly.

Thanks should also go to Jessica Mattick and Eric Dyer for helping me setup my first RNA experiment. Many thanks to other lab members and alumni for amazing teamwork: Yuanjin Fang,

Joao Paulo Gomes Viana, Zhicong Lin, Xing Liu, Wei Wei, Christ Austin, Gautam Naishadham, and Jason Ebaugh.

My life in UIUC is more enjoyable because of these people: Siraprapa Rawig and Hsiao-

Ying Huang. Thank you so much for sharing experiences in graduate school and providing such a wonderful friendship. In particular, my roommate Lassamon Maitreemit who is everything for me since the first day in the US until now. Special thanks to Chainarong Amornbunchornvej for inspiration, all the adventures, and being my better half.

Last but not least, I would like to thank my beloved mother for unconditional love, my supportive brother for taking care of our mother throughout my study, and the rest of my family members for their support.

Usawadee Chaiprom

May 2019

v

Dedicated to the memory of my father Werasak Chaiprom, who always believed in my ability to be successful in my PhD journey He was unable to see I made it through the second half of this roller coaster road so this is for him .

vi

TABLE OF CONTENTS

CHAPTER 1 GENERAL INTRODUCTION ...... 1

CHAPTER 2 EARLY TRANSCRIPTIONAL RESPONSES TO SOYBEAN CYST NEMATODE HG TYPE 0 SHOW GENETIC DIFFERENCES AMONG RESISTANT AND SUSCEPTIBLE SOYBEANS ...... 10

CHAPTER 3 IDENTIFICATION OF DEFENSE MECHANISMS INVOLVED IN RHG1- MEDIATED RESISTANCE TO SOYBEAN CYST NEMATODE IN SOYBEAN ROOTS .... 41

CHAPTER 4 REGULATORY MECHANISMS OF GENES IMPLICATED IN RHG1- MEDIATED RESISTANCE TO SOYBEAN CYST NEMATODE...... 77

REFERENCES ...... 119

vii

CHAPTER 1 GENERAL INTRODUCTION

Soybeans and Soybean cyst nematode (SCN)

Soybean (Glycine max (L.) Merr.) is a major source for producing vegetable oil, meal, and biodiesel. In 2018, soybean production in the United States accounted for 4.54 billion bushels, according to USDA Crop Production 2018 Summary (February 2019). However, several soybean diseases caused significant yield suppression. Based on the soybean disease loss estimates from the top 28 soybean-producing states in the US and Ontario in Canada (Bradley, 2017), soybean cyst nematode (SCN; Heterodera glycines Ichinohe) was estimated to reduce approximately greater than 109 million bushels in 2015 which was the highest yield reductions compared to other soybean diseases.

SCN is a sedentary endoparasite that infects soybean roots. The lifespan of the nematode takes about 3-4 weeks (Davis and Tylka, 2000). After a nematode is hatched from an egg, SCN penetrates soybean roots as a second-stage juvenile (J2) and establishes a feeding cell near vascular tissues. The nematode uses a stylet, and secrets effector enzymes to destroy cell walls of neighboring cells to fuse them with the initial feeding cell and form a multinucleate cell, which is known as a syncytium. The J2 female molts twice to enlarge its sausage-shaped body at the third and fourth juvenile stage. Then, the female nematode molts again to become a lemon-shaped adult.

Meanwhile, a vermiform male nematode stops feeding and becomes motile to exit the root for mating with adult females. After copulation, the female lays eggs inside the body and gelatinous matrix outside its body. The lemon-shaped female filled with eggs will finally die, change color to dark brown and form the body that is known as a cyst. The cyst can be visible with the naked eye from outside the roots. During SCN infestation, as a biotrophic parasite, SCN requires the

1 maintenance of a syncytium for its development until the reproductive stage. In the susceptible response, the nematode successfully establishes and maintains the syncytium whereas syncytial necrosis and degeneration are observed in SCN-resistant lines as a resistance response (Kim et al.,

1987).

Together with crop rotation, growing SCN-resistant soybean varieties is a management strategy to deal with SCN infestation. This would maintain soybean yield and control SCN population density in the field. Since the first discovery in 1954 and wide distribution in 1990 of

SCN in the US (Tylka and Marett, 2014), the breeders have been developing many SCN-resistant varieties. Nevertheless, 97% of developed resistant varieties carry resistance genes derived from a single source, namely PI 88788 (Tylka, 2016). Another resistance source is Peking. The overuse of same resistant sources could lead to the shift of the SCN population. Niblack et al. (2008) reported the survey on the SCN infestation in resistant lines in Illinois and showed that the SCN population have adapted to PI 88788. More diversity of resistant soybeans is needed for rotation.

Therefore, understanding molecular mechanisms underlying variations in SCN resistance in soybeans is beneficial for developing new resistant lines.

Copy number variation (CNV) at the resistance QTL (Rhg1)

The susceptible or resistant responses to SCN infestation are driven by genetic variations in soybeans. Studies in quantitative trait loci (QTL) mapping have been performed in several soybean populations to identify loci or genes associated with SCN resistance. The major QTLs for resistance to H. glycines (Rhg) are Rhg1 (previously described as a recessive or co-dominant locus) and Rhg4 (a dominant locus). Copy number variation (CNV) of a 31.2kb region at the Rhg1 locus was predicted to confer SCN resistance (Cook et al., 2012). The study showed that four genes are located within the repeat unit, but three non-canonical resistance genes are expressed, including

2

Glyma.18G022400-an amino acid transporter, Glyma.18G022500-an alpha-soluble NSF attachment protein (alpha-SNAP), and Glyma.18G022700-a predicted wound inducible protein 12

(WI12). The susceptible soybean line Williams 82 or W82 carries one copy of this repeat unit, whereas the resistance alleles in Forrest (Peking-derived line) and Fayette (PI 88788-derived line) contain three copies and ten copies of this repeat unit, respectively. According to Cook et al.

(2014), Peking is categorized into a low-copy-number Rhg1 group while PI 88788 is in a high- copy-number Rhg1 group. Single nucleotide variants (SNVs) at the Rhg1 locus were identified from aligned whole-genome shotgun sequencing (WGS) data (Lee et al., 2015). Based on datasets of 22 soybean germplasm accessions, the results revealed three subtypes of repeat unit, consisting of Williams 82 (W), Peking (P) and two variants of Fayette (FA and FB). For the Rhg4 locus, two

SNPs within an Rhg4 gene encoding a serine hydroxymethyltransferase (SHMT) conferred SCN resistance (Liu et al., 2012). Investigating interaction of Rhg4 and Rhg1 showed that Rhg4 is required to accomplish SCN resistance in plants with P-type Rhg1 (Yu et al., 2016). In addition,

CNV at the Rhg1 locus in Fayette individuals was detected by TaqMan assay and confirmed by

WGS (Lee et al., 2016). This study showed that an increase in copy number increases SCN resistance using a standard SCN test in a greenhouse (Niblack et al., 2009). Taken together, these genomic studies suggested that variations in DNA sequences and genomic structure in the major

QTLs could contribute to the differences in molecular mechanisms underlying SCN resistance.

Transcriptomic changes during plant-nematode interactions

Analyses of soybean gene expression have been performed in both whole roots and syncytia in diverse aspects (Alkharouf et al., 2006; Ithal et al., 2007; Klink et al., 2007a; Klink et al., 2007b; Puthoff et al., 2007; Klink et al., 2009; Klink et al., 2011; Mazarei et al., 2011; Wan et al., 2015). Majority of these studies relied on the microarray experiments. Some studies focused

3 on Rhg1-mediated resistance, including gene expression profiling in the syncytium between SCN- susceptible and SCN-resistant soybean near-isogenic lines (NILs) using microarrays (Kandoth et al., 2011). These NILs are different at the Rhg1 locus, as determined by SSR markers (Mudge,

1999; Li and Chen, 2005). Defense-related genes were induced in the resistant line during the SCN infestation, such as resistance genes or NBS-LRR genes, hypersensitive-like response genes, and salicylic acid signaling genes. In addition, the pericycle cells and syncytia of infected roots were isolated at 3, 6, and 9 days post-inoculation (dpi) using the Laser Capture Microdissection (LCM) technique for exploring gene expression (Matsye et al., 2011). The results showed two genes located in the Rhg1 locus, an amino acid transporter (Glyma.18G022400) and alpha-SNAP

(Glyma.18G022500), were uniquely expressed in the syncytium for all time points in both Peking and PI 88788. The gene expression pattern suggested that the Rhg1-mediated resistance to SCN is localized in the syncytium.

To validate the functions of differentially expressed genes in previous microarray experiments, Matthews et al. (2013) engineered 100 potential SCN resistance genes in soybean roots. The authors demonstrated three impacts of differentially expressed genes when overexpressed in transgenic roots, including supporting the defense response, enhancing SCN susceptibility, and no effect. The overexpression of the alpha-SNAP (Glyma.18G022500) gene derived from Peking in susceptible W82 reduced the SCN infestation (Matsye et al., 2012).

Similarly, the overexpression of three Rhg1 genes in W82 roots enhanced SCN resistance (Cook et al., 2012). Interestingly, high accumulation of Rhg1-resistance-type alpha-SNAPs was found in feeding cells, compared to wild-type alpha-SNAPs (Bayless et al., 2016). Due to the dysfunction of Rhg1-resistance-type alpha-SNAPs, the high expression of the gene induced cytotoxicity to support the degeneration of the syncytium (Bayless et al., 2016). These studies confirmed that the

4 regulation of expression of Rhg1 genes is involved in SCN resistance in soybeans. Thus, investigation of transcriptomic responses and regulatory variations in plant-nematode interactions have a potential to point out candidate genes and key players in the control of gene expression in soybeans during SCN parasitism.

Regulation of gene expression in plant responses to nematode infection

Transcription factors Transcription factors (TFs) are known to transcriptionally regulate gene expression related to various biological processes by binding to cis-regulatory elements (motifs) in the promoter and enhancer DNA regions. Hosseini and Mathews (2014) identified differentially expressed genes in

Peking SCN-infested roots at 6 and 8 dpi. For transcription factor enrichment analysis, over- represented transcription factor binding sites (TFBSs) were predicted in promoter sequences of these genes. This study showed evidence of dynamic regulation of transcription factors in soybean roots during SCN infestation. For prediction of TF-gene interactions and TF motif binding site enrichment, the PlantTFDB v4.0 database (Jin et al., 2017) provided collection of motifs integrated from different sources such as literature, experiments (e.g., DAP-seq, ampDAP-seq, ChIP-seq), and computational prediction; however, the majority of these motifs were derived from the interactions in Arabidopsis. Then, the TFs and their corresponding motifs were projected to soybean genes based on BLAST reciprocal best hits (RBHs). We can utilize the motifs in this database to scan for known binding motifs in promoters of gene sets of interest and perform enrichment analysis by comparing the motifs identified in the gene sets to those in the promoters of all genes in the genome as background.

5

Small RNAs Small RNAs (sRNAs) are non-coding RNAs with lengths of 20-24 nucleotides (nt) that play important roles in transcriptional and post-transcriptional regulation in plant growth and development, nutrient homeostasis and stress responses such as response to nematode infection

(Hewezi et al., 2008; Li et al., 2012; Xu et al., 2014b). The genome-wide sRNA expression profiles can be examined using sRNA sequencing (sRNA-seq) approach to capture variations of two major classes of plant sRNAs: microRNA (miRNA) and small interfering RNA (siRNA).

A. miRNA is 21-22 nt sRNA which mainly functions as post-transcriptional regulators of gene expression by binding targeted mRNA at coding region, 3’UTR or 5’UTR. MIR gene is transcribed by DNA-dependent RNA polymerase II (Pol II) and formed a stem-loop structure or primary-miRNA (pri-miRNA). The pri-miRNA is cleaved by RNA nuclease, named DICER-

LIKE 1 (DCL1) and then becomes precursor miRNA or pre-miRNA. After that, pre-miRNA is cleaved at the loop by DCL1 again to generate a miRNA/miRNA* duplex in the nucleus. In order to stabilize miRNA/miRNA* duplex before exporting to the cytoplasm, methyltransferase or HUA

ENHANCER 1 (HEN1) adds 2’-O-methyl group at 3’ terminal of the duplex (Yu et al., 2005;

Yang et al., 2006). One strand of miRNA duplex is incorporated into ARGONAUTE protein

(AGO) as a part of RNA-induced silencing complex (RISC). The miRNA guides the silencing complex based on the complementary bases with targeted mRNA. Then, mRNA is cleaved and degraded, or its translation process is inhibited. Based on the basis of plant miRNA biogenesis and the availability of genome assembly sequences in public database, bioinformatic approaches can be used to predict mature miRNAs, the hairpin structure of precursors and also the target transcripts.

B. siRNA is involved in both transcriptional and post-transcriptional regulations. For transcriptional regulation, 24-nt siRNA is associated with transposon and heterochromatic regions

6

(heterochromatic siRNA; het-siRNAs), participating in RNA-directed DNA methylation pathway

(RdDM). The precursor of 24-nt is transcribed by DNA-dependent RNA polymerase IV (Pol IV) and turned into double-stranded RNA (dsRNA) by RNA-dependent polymerase 2 (RDR2). Then,

DICER-LIKE 3 (DCL3) chops the dsRNA into 24-nt siRNAs. The 24-nt siRNA guides the AGO4-

RISC to the non-coding scaffold transcripts that are produced by DNA-dependent RNA polymerase V (Pol V). The RISC complex, siRNA and scaffold transcript interact with the chromatin-remodeling complex (DDR) and recruit de novo methyltransferase DOMAIN

REARRANGED METHYLTRANSFERASE (DRM2) for the establishment of DNA methylation

(Law and Jacobsen, 2011; Matzke and Mosher, 2014). Then, in general, the expression of genes and transposons are regulated by DNA methylation. For post-transcriptional regulation, phased siRNA (phasiRNA) is derived from the targets of miRNA that have been cut and converted to dsRNA by RNA-dependent polymerase 6 (RDR6). The dsRNA is subsequently cleaved at every

~21-nt to generate phasiRNAs by DICER-LIKE 4 (DCL4). Unlike miRNA mapping pattern, the pattern of mapped phasiRNAs is distributed equally across a reference sequence. Like other sRNAs, phasiRNA is integrated into RISC complex to process silencing activity by mRNA cleavage which can be in cis or trans (Zhai et al., 2011; Fei et al., 2013). For example, a subset of phasiRNAs, called trans-acting siRNA (ta-siRNA) is well documented in Arabidopsis. The ta- siRNA is derived from Trans-Acting Short Interference RNA3 (TAS3) transcript, one of miR390 targets, which is related to a regulation of lateral root growth and development by inhibiting an activity of auxin response factors or ARF (Marin et al., 2010). The miR390-TAS-ARF regulatory network play roles in lateral root and nodule development in the legume (Hobecker et al., 2017).

In addition, there was evidence of self-targeting siRNAs generated from TAS3 genes or NBS-LRR genes reported in legume (Zhai et al., 2011; Arikit et al., 2014).

7

DNA methylation Gene expression regulation and transposon silencing by DNA methylation are essential for plants in response to environmental conditions such as salicylic acid (SA) responses (Dowen et al., 2012) or pathogen infection (Rambani et al., 2015; Hewezi et al., 2017). Moreover, differentially methylated regions (DMRs) at the Rhg1 locus between SCN-susceptible and SCN- resistant soybeans have been identified using DNA-methylation-restriction enzymes and PCR

(Cook et al., 2014), indicating involvement of DNA methylation in Rhg1-mediated SCN resistance. DNA methylation in plants is induced by the RNA-directed DNA methylation (RdDM) pathway. The RdDM pathway is dependent on the activity of 24-nt sRNA generated from Pol IV- dependent siRNA biogenesis, AGO4-RISC complex, nascent scaffold transcribed by Pol V, and

DRM2 to establish DNA methylation (Law and Jacobsen, 2011; Matzke and Mosher, 2014). To maintain DNA methylation patterns during DNA replication, the methylation in symmetric CG and CHG contexts (in which H represents any bases except G) is maintained by DNA

METHYLTRANSFERASE 1 (MET1 or DMT1) and CHROMOMETHYLSASE 3 (CMT3). In contrast, asymmetric CHH methylation is maintained by DRM2 using the process of RdDM (Cao et al., 2003). The DNA methylation can be explored at single-base resolution in genome-scale using whole genome bisulfite sequencing (WGBS) method to identify variation in DNA methylation that possibly contributes to differences in regulation of gene expression or suppression of transposon activity related to SCN resistance in soybeans.

Overall, this research project aims to understand molecular mechanisms underlying variations in SCN resistance in soybeans carrying CNV at the Rhg1 locus by employing high- throughput sequencing data. The project included three studies:

1) Early transcriptional responses to soybean cyst nematode HG type 0 show genetic differences among resistant and susceptible soybeans

8

2) Identification of defense mechanisms involved in Rhg1-mediated resistance to soybean cyst nematode in soybean roots

3) Transcriptional regulations of genes involved in Rhg1-mediated resistance to soybean cyst nematode

9

CHAPTER 2 EARLY TRANSCRIPTIONAL RESPONSES TO SOYBEAN CYST NEMATODE HG TYPE 0 SHOW GENETIC DIFFERENCES AMONG RESISTANT AND SUSCEPTIBLE SOYBEANS

Introduction

Soybean cyst nematode (SCN) is a billion-dollar pest that causes substantial yield losses.

SCN control is predominantly through the use of resistant cultivars. A few SCN-resistant soybean cultivars (Glycine max) were derived from PI 548402 (Peking) whereas most of soybeans carry PI

88788 source of resistance such as Fayette cultivar (Bernard et al., 1988). Genetically, the difference between SCN-susceptible and most SCN-resistant soybeans is associated with the copy number variation (CNV) of the segment at the Rhg1 locus (Cook et al., 2012; Cook et al., 2014;

Lee et al., 2015; Lee et al., 2016). The Rhg1 locus is located on chromosome 18 and contains three functioning genes: gene encoding amino acid transporter (Glyma.18G022400), gene encoding alpha-soluble NSF attachment protein or alpha-SNAP (Glyma.18G022500) and gene encoding wound-inducible protein 12 (Glyma.18G022700). An alternative source for breeding SCN resistance is G. soja PI 468916, a wild relative which has two SCN-resistance QTLs (cqSCN006 and cqSCN007).

During the early stages of infection, soybean-SCN interaction begins with synchronization between nematode’s life cycle and the presence of soybean. Second-stage juveniles (J2s) of SCN hatch from eggs and search for a plant root. The invading J2s penetrate the root epidermis and intracellularly migrate through the cortical cells until reaching the vascular cylinder. Then, each nematode initiates a feeding cell by inducing morphological changes and fusion of neighboring cells to form a syncytium. The migrating SCN uses mechanical force exerted by its stylet and releases various secretions, including degrading enzymes and effector . These cause

10 activation of plant defense response and immunity. During compatible interaction (susceptible host), SCN can manipulate and suppress host defense to maintain the syncytium until cyst formation. By contrast, resistant plants developed a battery of defense reactions to overcome a pathogen attack, causing degeneration of syncytium during incompatible reaction.

Plant immune systems in defense response to pathogen attacks compose of two tiers. Plants first recognize pathogen- or damage-associated molecular patterns (PAMPs or DAMPs) by pattern-recognition receptors (PRRs) to activate the PAMP-triggered immunity (PTI), for example, influx of calcium ion, reactive oxygen species (ROS) burst, production of secondary metabolites such as phytoalexins, transcriptional reprogramming, hormone signaling, restriction of nutrient transfer, activation of pathogenesis-related genes, and activation of mitogen-activated protein (MAP) kinase signaling cascades. In the second tier, induction of effector-triggered immunity (ETI) is governed by plant intracellular receptors (NBS-LRR proteins) that are involved in the detection of pathogen effectors, leading to robust and rapid defense response at the infection sites such as hypersensitive response (HR) and programmed cell death (PDC).

With the developments of genomic technologies, investigators have identified several defense-related genes involved in plant immune responses in both soybeans and wild soybeans using differential display, microarray and next-generation sequencing (NGS) approaches

(Alkharouf et al., 2004; Alkharouf et al., 2006; Ithal et al., 2007; Klink et al., 2007b; Klink et al.,

2009; Klink et al., 2010; Kandoth et al., 2011; Mazarei et al., 2011; Jain et al., 2016; Zhang et al.,

2017; Zhang and Song, 2017). These transcriptomic studies revealed global and dynamic defense responses of plant hosts to SCN infection during both compatible and incompatible interactions across different parasitism stages. Most of the previous studies mainly focused on the later stages of nematode parasitism. However, both migration and sedentary phases are critical stages

11 determining the fate of pathogenicity (Kammerhofer et al., 2015). Therefore, characterization of gene expression profiles in response to SCN at the early stages of parasitism could provide key early response genes and clues to the physiological changes and transcriptional reprogramming during compatible and incompatible interactions.

This study aims to extend knowledge of root transcriptomic responses to SCN infection at the migratory phase by conducting RNA-seq. We identified differentially expressed genes and predicted transcription factors which are biologically relevant for the establishment of both incompatibility (resistant host) and compatibility (susceptible host) during SCN HG type 0 infection on three soybean genotypes (Peking, Fayette and Williams 82) and a wild soybean representative (G. soja). This is the first analysis of the soybean-SCN interactions as early as 8 hours post inoculation (hpi) using RNA-seq and demonstrates expression profiles of early response genes that potentially enhance SCN-resistance in soybeans.

Materials and methods

Plant materials and nematode procurement for RNA-seq analysis Four different genotypes were used in this experiment, including Glycine max cv. Williams

82, G. max cv. Peking (PI 548402), G. max cv. Fayette (PI 518674) and G. soja PI 468916. For each genotype, seeds were harvested from a single plant grown in the greenhouse. The Fayette seeds were obtained from an individual plant (Fayette#99) harboring 11 copies of the Rhg1 repeat

(Lee et al., 2016). The surface-sterilized seeds were pre-germinated on moistened germination paper in 10 cm Petri dishes at room temperature in the dark for three days, spraying distilled water daily.

The SCN populations, HG Type 0 egg mass, were independently harvested from stock plants, maintained by Dr. Alison Colgrove, University of Illinois at Urbana-Champaign. Eggs were

12 rinsed into Petri dishes with 1 cm depth of distilled water. The dishes containing the inoculum were placed in a shaker incubator at 27 °C with constant gentle agitation of 50 rpm for four days based on Mahalingam et al. (1998). The second stage juvenile nematodes (J2s) were then separated from unhatched eggs by running them through metal screens covering with some filter papers into distilled water and were further purified by being gently centrifuged at 1,000 revolutions per minute (RPM) for one minute to concentrate to about 1500 J2s/ml. Eight uniform seedlings per each control condition and treatment were selected and put into a deep inside 10 cm Petri dish.

Before that, the dishes were covered with one piece of sterilized paper (20 x 30 cm) which were three times folded and moistened with distilled water. One milliliter of inoculum was added directly on the roots and covered with a second piece of moistened filter paper. Control mock- inoculated replicates received the same amount of distilled water. The dishes were placed in a large tray (50 x 30 x 10 cm) with 1 cm water below to add humidity and wrapped in a semi-clear plastic bag for the duration of the time point. The trays were placed under fluorescent lights of 16/8 h

(light/dark) photoperiod at 26 °C. Roots were harvested at 8 hours post inoculation (hpi) and immediately frozen in liquid nitrogen. Each frozen root was ground to a fine powder with a pre- chilled mortar and pestle in liquid nitrogen. The powder was stored in microfuge tubes at –80 °C until RNA extraction. Root samples were randomly selected from each genotype and stained with acid fuchsin (Bybd Jr et al., 1983) to determine the extent of nematode infection under a Leica

MZ16 F stereomicroscope equipped with a Canon EX05 DSLR camera.

RNA extraction and library preparation for RNA-seq Total RNA was extracted from approximately 150 mg of individual fine powder root samples using the E-Z® 96 Plant RNA Kit (Omega Bio-Tek) according to manufacturer’s instructions. The total RNA was treated using DNase I (Thermo Scientific) to remove remaining genomic DNA. The integrity of isolated RNA was assessed by visualizing the intact 18S and 28S 13 ribosomal bands on a 1% agarose gel. RNA concentrations were measured by a NanoDrop ND-

1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). The RNA-seq libraries were prepared with Illumina's “TruSeq Stranded mRNAseq Sample Prep kit” according to the manufacturer’s instructions. The libraries were then quantitated by qPCR and sequenced on one lane for 101 cycles from each end of the fragments on a HiSeq 4000 instrument using a HiSeq

4000 sequencing kit version 1. Three biological replicates per each condition (inoculated and uninoculated) and each genotype were selected for paired-end RNA–sequencing at DNA Services

Lab in the Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, IL.

Differential gene expression analysis Quality checks were performed on the raw reads of 24 RNA-seq libraries using FastQC

0.11.7 (Andrews, 2010). Trimmomatic 0.36 (Bolger et al., 2014) was used for removal of adapter sequences, trimming low-quality bases (quality score < 20) and length filtering (≥ 36 bp) from the

FASTQ-format short read data. After passing the quality control, filtered reads were mapped to the G. max cv. Williams 82 reference genome assembly version Wm82.a2.v1 (Schmutz et al.,

2010) using HISAT2 aligner version 2.0.4 (Kim et al., 2015a). Then, the transcript abundance was estimated by featureCounts (Liao et al., 2014). Differential gene expression analysis was carried out using edgeR package (Robinson, 2010). The filtering was performed in order to remove lowly expressed genes based on a minimum counts per million (CPM) threshold set at 1.0 (which is corresponding 12-13 reads across all samples) in at least three samples. After that, the trimmed mean of M-values normalization method (TMM) was then performed to eliminate composition biases between libraries (Robinson and Oshlack, 2010) and calculate a normalization factor for each RNA-seq library. For each genotype, the common, trended and tagwise negative binomial dispersions were estimated before fitting a generalized linear model (GLM). The likelihood ratio test (LRT) was performed to test for differential gene expression between inoculated versus 14 uninoculated roots of each genotype. The differentially expressed genes (DEGs) were identified for each comparison using a false-discovery rate (FDR) cut-off set at 0.05. The multiple comparisons of DEG sets were conducted and visualized by UpSetR (Conway et al., 2017) package in R. The expression profiles and hierarchical clustering of unique DEGs were performed using function heatmap.2 in gplots (Warnes et al., 2016) package in R.

Gene ontology and pathway enrichment analysis The analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and

(KEGG) pathway enrichment on each DEG list was performed using the AgriGO web-server (Du et al., 2010b) and the KEGG Orthology Based Annotation System (KOBAS) server version 3.0

(Wu et al., 2006; Xie et al., 2011), respectively. The Hypergeometric tests with FDR-corrected P- value cut-off set at 0.05 were performed to identify statistically enriched GO terms and KEGG pathways in each DEG list, compared to the whole genome set as a background. Additional annotation of DEGs was based on the gene annotation in the reference genome of Williams 82 version Wm82.a2.v1 (Schmutz et al., 2010) retrieved from the Phytozome database (Goodstein et al., 2012), and other databases such as SoyBase (Grant et al., 2010), PlantTFDB v4.0 (Jin et al.,

2017), BlastKOALA (Kanehisa et al., 2016) and the KEGG web-server (Kanehisa and Goto,

2000). The DEGs were categorized based on their annotations. The heatmap comparing gene expression profiles in each category was done using the R package ggplot2 (Wickham, 2016).

Results

Identification of differentially expressed genes (DEGs) The root transcriptome analysis was conducted on four selected genotypes with and without SCN HG Type 0 inoculation. One genotype displayed a susceptible reaction (Williams

82) and three genotypes displayed a resistant reaction (Fayette, Peking, and G. soja PI 468916). A

15 total of 50 genes in Williams 82, 93 genes in Fayette, 174 genes in Peking and 364 genes in G. soja were significantly differentially expressed between SCN-infected and uninfected roots. These accounted for 519 unique DEGs across four genotypes. The expression heatmap of all 519 DEGs illustrated that majority of DEGs were induced in response to SCN infection. The hierarchical clustering showed that DEGs were clustered based on similar direction of expression changes; however, the expression levels were different across four genotypes. The expression profiles of all

DEGs in susceptible Williams 82 were highly correlated with those in resistant Fayette, compared to those in Peking and G. soja [Figure 2.1]. The UpSetR plot showed that over 75% of DEGs were genotype-specific, including 33 genes in Williams 82, 30 genes in Fayette, 84 genes in Peking, and 254 genes in G. soja [Figure 2.2A]. The DEGs found in Fayette tended to overlap with DEGs detected in other resistant genotypes, leading to the fewest number of Fayette-specific DEGs. For pairwise comparisons, the direction of expression changes was the same in all overlapped DEGs identified in different genotypes [Figure 2.2B]. A total of 28 genes were found to be significantly up-regulated in SCN-infected roots compared to uninfected roots in common across all three resistant genotypes. The statistically significant induction of these 28 genes was not detected in susceptible Williams 82. However, all genotypes shared the significant induction of six genes during SCN infection [Figure 2.2C].

GO and KEGG analyses revealed genes involving in plant defense system From GO term enrichment analysis [Table 2.1 and Table 2.2], three enriched GO terms in the categories of biological processes were found in common in four genotypes, including oxidation-reduction process (GO:0055114), response to oxidative stress (GO:0006979), and response to chemical stimulus (GO:0042221). For the enrichment of GO molecular functions, the results confirmed activities of common enriched GO terms such as oxidoreductase activity

(GO:0016491), peroxidase activity (GO:0004601), and antioxidant activity (GO:0016209). By 16 contrast, fatty acid synthase activity (GO:0004312), nutrient reservoir activity (GO:0045735), and ion membrane transporter activity (GO:0015075) were specifically enriched in some genotypes.

Moreover, the KEGG pathway enrichment analysis showed that Phenylpropanoid biosynthesis

(gmx00940), Biosynthesis of secondary metabolites (gmx01110), and Metabolic pathways

(gmx01100) were commonly over-represented in the DEGs from all genotypes. However, other

KEGG pathways were enriched in specific genotypes, such as Cyanoamino acid

(gmx00460) and Flavonoid biosynthesis (gmx00941), and Plant-pathogen interaction

(gmx04626). Taken together, the annotation of DEGs revealed the transcriptional changes of many defense-related genes and pathways in response to SCN infection at the migratory phase which varied among genotypes. As well as the difference between susceptible or resistant reactions, different pathways were found to be activated in the different resistant genotypes. We then investigated details of each differentially activated pathway.

Identification of genes related to pathogen detection and signal transduction This study revealed many candidate genes harboring known or putative functions in plant- pathogen interactions [Figure 2.3A]. The annotation of the total set of DEGs showed 37 genes encoding proteins with kinase activity. These genes all belong to the receptor-like kinase (RLKs) family. Generally, RLKs are pattern recognition receptors (PPRs) activated by the pathogen- associated molecular patterns (PAMPs) or damage-associated molecular patterns (DAMPs). The

RLKs play a role in pathogen perception to initiate a general signal transduction cascade resulting in specific plant responses (Afzal et al., 2008). Further investigation showed the significant induction of genes in several subclasses of RLKs mostly in resistant genotypes, for example, 14 leucine-rich repeat (LRR)-RLKs (only Glyma.13G352800 found in all genotypes), nine cysteine- rich RLKs (all detected in G. soja), and one wall-associated kinase (Glyma.14G158600). The activation of many RLKs indicates that initiation of PAMP-triggered immunity (PTI) may be the 17 first layer of plant innate immunity in response to migrating SCN. Involvement of calcium ion fluxes at the plasma membrane, which is part of PTI, has been implicated in the plant-nematode interaction (Teillet et al., 2013; Zhang et al., 2017). In this study, 17 genes related to the calcium signaling were induced in resistant genotypes, including eight genes in Peking, seven genes in G. soja and only Glyma.19G255500 in Fayette. In addition to the PTI, the intracellular nucleotide- binding site-leucine-rich repeat (NBS-LRR) proteins have a known role in effector-triggered immunity (ETI), which confers faster and stronger responses to the pathogen in the second tier of plant immunity. The induction of five genes encoding NBS-LRRs was identified in G. soja or

Peking, including Glyma.02G023800, Glyma.08G192900, Glyma.16G135200,

Glyma.16G136600, and Glyma.18G226500. The results suggested that an ETI may be triggered by SCN-derived effectors in the resistant genotypes, leading to more robust responses such as the hypersensitive response. It is worth noting that the PPR and NBS-LRR genes were predominantly detected in G. soja DEGs. By contrast, the number of DEGs related to calcium ion signaling pathway in Peking is about the same as those in G. soja.

Transcriptional reprogramming by alteration of transcription factors Based on their protein sequences and homology to Arabidopsis thaliana proteins in the

PlantTFDB database, there were 13 TF families annotated from 45 unique DEGs across all genotypes [Figure 2.4]. The results showed several genes belonging to five major TF families, including WRKY (12 genes), MYB or MYB-related (10 genes), Ethylene responsive factors or

ERFs (six genes), C2H2 (seven genes), and bHLH (three genes). However, only one TF gene was identified in each of the other TF families: bZIP, CAMTA, G2-like, HSF, HD-ZIP, MICK-MADS, and ARR. Most of these families were identified in G. soja or Peking DEGs. Interestingly, the

C2H2 family was found in all three resistant genotypes, whereas only one down-regulated gene in the G2-like TF family was found among the susceptible Williams 82 DEGs. 18

Both WRKYs and MYBs are involved in the regulatory interplay between soybean and

SCN (Li et al., 2011; Hosseini and Matthews, 2014). The over-expression of approximately 30

WRKY genes in soybean hairy roots promoted resistance to SCN (Yang et al., 2017). In this experiment, all 12 WRKY genes were induced in SCN-infected roots, and were classified into six subfamilies: WRKY33, WRKY40, WRKY41, WRKY51, WRKY53, and WRKY75. Of these, two

WRKY33 genes (Glyma.01G128100, Glyma.03G042700) and a WRKY41 gene

(Glyma.05G215900) were commonly differentially expressed between Peking and G. soja.

Similarly, of ten MYB or MYB-related genes, Peking and G. soja DEGs have one up-regulated

MYB gene (Glyma.06G303100 or a homolog of MYB-related gene CAPRICE in Arabidopsis) in common. However, half of the MYB genes in G. soja DEGs were suppressed in SCN-infected roots. The down-regulated MYB genes were closely related to the functionally characterized genes in Arabidopsis: AtMYB5 (Glyma.19G257400), AtMYB36 (Glyma.11G194100) and AtMYB102

(Glyma.12G218200, Glyma.13G282100, and Glyma.13G302400) whereas the up-regulated MYB genes were classified into GmMYB29 (Glyma.02G005600, Glyma.10G006600, and

Glyma.10G180800) and GmMYB92 (Glyma.16G023000). Based on published functional analysis,

AtWRKY33 regulates genes involved in redox homeostasis, hormone signaling crosstalk and camalexin biosynthesis pathway in early responses to necrotrophic fungus Botrytis (Birkenbihl et al., 2012). The functions of MYB genes have also been reported as regulators of several defense- related genes, for example, AtMYB102 increased the susceptibility to aphids through the ethylene- dependent signaling pathway (Zhu et al., 2018), and GmMYB29 regulates biosynthesis of isoflavones, which are known phytoalexins in defense against pathogens (Chu et al., 2017).

In addition, six C2H2 genes were induced in Fayette or Peking or G. soja. Of these, two genes (Glyma.04G04490 and Glyma.06G045400) were identified as homologs of AtZAT10 in

19

Arabidopsis. The AtZAT10 gene product is a key regulator in plant defense responses to abiotic stress (Mittler et al., 2006). Interestingly, these two induced genes in soybean (Glyma.04G044900 and Glyma.06G045400) were predicted to interact with WRKYs and MAPKs controlling soybean- rhizobium symbiosis (Yuan et al., 2018). On the other hand, one C2H2 gene (Glyma.17G0064700, a homolog of AtZFP2) was down-regulated in Fayette. Overexpression studies have demonstrated that the AtZFP2 is involved in abscission control (Cai and Lashbrook, 2008), possibly implicating down-regulation of abscission-related functions in response to infection in resistant plants.

Ethylene biosynthesis and signaling in response to SCN at migration phase In this study, five ACC-oxidase genes involved in Ethylene (ET) biosynthesis were significantly induced upon nematode infection [Figure 2.5]. One of them (Glyma.02G268200) was common among all genotypes, while the others were detected in resistant genotypes. This indicated that migrating SCN may trigger some response of ethylene level regardless of interaction type, but resistant plants show a stronger and broader response. In addition, expression of six ERF TF genes was modulated in the resistant interaction. All four of the homologous genes (Glyma.02G006200,

Glyma.10G036600, Glyma.15G079100, and Glyma.20G195900) of subfamily B-3 ERFs in

Arabidopsis were up-regulated in G. soja or Peking. Two genes belong to subfamily B-4 showed an opposite expression pattern in two genotypes: Glyma.04G201900 was up-regulated in G. soja, but Glyma.16G047600 (GmERF113) was down-regulated in Peking. The overexpression of

GmERF113 (a paralog of ABR1, AtRAP2.6, and AtRAP2.6L in Arabidopsis) enhanced resistance to Phytophthora sojae in soybean (Zhao et al., 2017b). Similarly, overexpression of an ERF gene

(RAP2.6) was shown to increase resistance against the beet cyst nematode (Ali et al., 2013). On the other hand, the genetic disruption of AtRAP2.6L in Arabidopsis increased defense responses to

Pseudomonas syringae (Sun et al., 2010). Overall, a complex set of responses were observed

20 consistent with the involvement of ethylene signaling in all genotypes, but involving different sets of genes.

Phenylpropanoid biosynthesis pathway was over-represented in all genotypes The KEGG pathway enrichment analysis showed phenylpropanoid biosynthesis

(gmx00940) as the top enriched pathway in all genotypes. Phenylpropanoids, one of the largest group of secondary metabolites in plants, are mainly involved in response to biotic or abiotic stresses and give broad-spectrum antimicrobial activity against pathogen attacks (Dixon et al.,

2002; Korkina, 2007). The expression levels of 65 genes annotated as involved in the phenylpropanoid pathway were altered in response to SCN, of which most (83%) were significantly induced in response to SCN infection [Figure 2.6]. These DEGs encode components that were classified into two main branches of the phenylpropanoid pathway: lignin biosynthesis and flavonoid/isoflavonoid biosynthesis pathways.

A. Lignin biosynthesis and cell wall-related genes

Cell wall is a physical barrier for plant defense against pathogen attack. One of main components of plant secondary cell wall is lignin which is built from lignin monomers

(monolignols). In this experiment, we found the suppression of genes encoding key enzymes in production of monolignols in both resistant and susceptible genotypes such as Glyma.01G020900,

Glyma.08G080600, Glyma.18G113100, and Glyma.20G045800. However, the majority of lignin biosynthesis-related DEGs were annotated as peroxidases, and either induced (26 genes) or suppressed (5 genes) in SCN-infected roots. Of these, different sets of peroxidase genes were found in each genotype, although the resistant genotypes shared the induction of six genes. The peroxidase genes are also related to oxidation-reduction processes and oxidative stress, over- represented GO terms identified in all genotypes [Figure 2.6]. The peroxidases are responsible for

21 the formation of H2O2 and ROS in oxidative bursts, as part of the early defense response, and also involved in lignin polymerization using H2O2 for cell wall modification (Almagro et al., 2009).

More of the cell wall-related DEGs were found to be differentially expressed and predominantly induced in resistant genotypes, for example, six dirigent–like (DIR) genes and four laccase genes

(Glyma.08G359100, Glyma.11G137500, Glyma.U027300, and Glyma.U027400).

B. Flavonoid/isoflavonoid biosynthesis

In this experiment, 18 DEGs were related to flavonoid and isoflavonoid biosynthesis, for example, six chalcone synthase (CHS) genes were differentially expressed. The CHS is a gatekeeper of flavonoid biosynthesis and its expression leads to accumulation of flavonoid and isoflavonoid phytoalexins in plant defense (Dao et al., 2011). It is worth noting that the flavonoid biosynthesis (gmx00941) pathway was over-represented in all three resistant genotypes.

According to the annotation of cytochrome P450 genes (P450s) in soybean and common bean reported in Reinprecht et al. (2017), seven DEGs were classified into three subfamilies involved in this pathway, including CYP81E (Glyma.09G048900, Glyma.09G049200, Glyma.09G049300, and Glyma.15G156100), CYP93A (Glyma.03G142100 and Glyma.19G146800), and CYP93C

(Glyma.13G173500). Of total 16 P450 genes found induced in this study, four DEGs

(Glyma.09G049200, Glyma.11G062500, Glyma.11G062600 and Glyma.16G195600) were found in common among all the resistant genotypes while the rest of P450 genes were only detected in the resistant genotypes.

Carbohydrate metabolism and transport protein The expression levels of 80 genes involved in carbohydrate metabolism or categorized into transport protein family were found to be altered in this experiment [Figure 2.7]. Of these, 42 and ten DEGs were uniquely detected in resistant G. soja and susceptible Williams 82, respectively.

22

However, they tended to be induced in G. soja, but mostly suppressed in Williams 82. The down- regulated genes detected in Williams 82 were annotated with the functions of sugar transporter

(Glyma.06G167000, a homolog of AtSWEET10), inositol transporter (Glyma.09G087400), pfkB family carbohydrate kinase (Glyma.14G073100 and Glyma.17G251900), MATE transporter

(Glyma.19G120900), potassium channel (Glyma.08G232100), and beta-glucosidase

(Glyma.08G150400).

Gene induction or repression in susceptible vs. resistant genotypes Approximately half of the DEGs in the susceptible Williams 82 were suppressed in SCN- infected roots, while on average only 13.4% of the DEGs showed reduced expression levels in the three resistant genotypes. More importantly, some down-regulated genes in the susceptible genotype were identified to have direct and indirect functions in plant defense, such as genes involved in carbohydrate metabolism and transport proteins. In contrast, many defense-related genes showed increased expression among the resistant genotype DEGs, for instance: genes related to plant innate immunity, phytohormone, and transcriptional regulation. Furthermore, none of the pathogenesis-related (PR) genes were significantly induced in the susceptible genotype.

Wild soybean reaction to nematode infection Although novel resistance loci against SCN have been discovered in some wild soybeans

(Wang et al., 2001; Winter et al., 2007), these are still incompletely characterized genotypes. With

364 significant DEGs (254 unique genes) identified at 8 hours after SCN inoculation, Glycine soja

PI 468916 showed a very broad genomic response to the invasion compared to that of Glycine max genotypes. In addition to the genes that were found in common across resistant genotypes, several genes identified uniquely in G. soja were categorized as directly or indirectly plant-defense related.

G. soja showed the induction of diverse PR genes upon nematode infection, including PR1, PR4, protease inhibitors (PR6), PR10 and PRp27-like genes [Figure 2.3B]. Detection of 18 out of the 23 total of 20 PR genes identified across the experiment shows a remarkable completeness of the PI

468916-type resistance at early time points. Of these, PR4, PR6, and two PR10 genes were uniquely seen in G. soja. For hormone-related genes, three genes in the gibberellin (GA) signaling

(Glyma.02G245600, Glyma.13G361700, and Glyma.20G153400) and two auxin transporter genes

(Glyma.04G252300 and Glyma.06G110200) were down-regulated while two auxin-responsive genes or GH3 genes (Glyma.02G125600 and Glyma.05G101300) were up-regulated. In addition, many genes with either unknown or poorly-known functions were found to be induced in PI

468916.

Differential expression of previously identified SCN resistance genes Resistance quantitative trait loci (QTLs) on chromosomes 18 (Rhg1) and 8 (Rhg4) have been consistently mapped in a variety of soybean germplasm. These QTLs represent the major sources of resistance in soybean cultivars, and the loci underlying them have been characterized at the molecular level (Concibido et al., 2004; Cook et al., 2012; Liu et al., 2012; Yuan et al., 2012;

Bayless et al., 2016). Therefore, the three expressed genes located in the 31kb segment of the Rhg1 locus, including Glyma.18G022400 (encoding a putative amino acid permease),

Glyma.18G022500 (encoding a predicted alpha-SNAP), Glyma.18G022700 (a predicted wound- inducible protein) (Cook et al., 2012) and the Rhg4 gene (Glyma.08G108900) encoding a Ser hydroxymethyl transferase (Liu et al., 2012) were separately investigated for their transcript abundance in this study. No statistically significant differences were found between inoculated and uninoculated conditions except, interestingly, for the significant induction of the wound-inducible protein Glyma.18G022700 in G. soja, with a fold change of 2.05 (FDR = 0.00039).

24

Discussion

Importance of early interaction in determining compatibility It has been stated that compatible and incompatible reactions in soybean-SCN are not determined until females establish a feeding site (Ruben et al., 2006), which is approximately 2 days after inoculation (Klink et al., 2007a). Previous studies mainly focused on the transcriptomic responses during the later sedentary phase of the soybean-SCN pathosystem (Li et al., 2011;

Mazarei et al., 2011; Hosseini and Matthews, 2014; Li et al., 2018). In this study, the differential expression of numerous genes related to defense mechanisms within the first 8 hpi can be clearly attributed to the host reaction to the nematode invasion. This demonstrates that the plant, especially the resistant genotypes, have already mounted an extensive reaction at this time point. Moreover, the tremendous differences in response between resistant and susceptible genotypes at this time point are very likely to determine the subsequent defense cascades leading to either resistance or susceptibility.

Resistant soybeans showed effective plant innate immune system The nematode wounds the root tissue as it migrates to the vascular system, using both physical forces and a mixture of degradative enzymes (Gheysen and Mitchum, 2008). The induction of multiple RLK genes, with a much larger response in resistant genotypes, indicated differences in the sensitivity of pathogen recognition between plants with resistant and susceptible reactions. In addition, many PTI-related DEGs were induced in resistant genotypes, such as genes in the generation of ROS signaling, genes related to calcium ion signaling, and pathogenesis- related genes (e.g., PR1, PR4, PR6 and PR10). This is consistent with the initiation of PTI signal transduction by PPRs in the resistant genotypes. Furthermore, the expression of NBS-LRR genes was elevated in plants with resistant reactions, suggesting that resistant soybeans start integrating

25

PTI and ETI as early as 8 hpi for robust SCN responses, as previously indicated by Zhang et al.

(2017) during the establishment phase (3, 5, and 8 dpi).

G. soja PI 468916 has a wide range of defense mechanisms against SCN G. soja PI 468916 showed the most extensive transcriptome changes against SCN HG Type

0 at the migratory stage, inducing a diverse set of defense-related mechanisms. Consistently, the

RNA-seq analysis revealed the activation of a complex regulatory network in the resistant G. soja

PI 424093 infected with SCN HG type 2.5.7 (Zhang et al., 2017). All WRKY genes identified in this experiment were significantly up-regulated in the resistant G. soja PI 468916 or Peking genotypes, including WRKY33. The overexpression of the AtWRKY33 reduced susceptibility to the beet cyst nematode in Arabidopsis (Ali et al., 2013), suggesting that WRKYs may play a central role in transcriptional regulation in early response to SCN in resistant soybeans. However, the suppression of MYB102 genes and the alteration of some hormone signaling genes, especially auxin- and gibberellin–related genes, were uniquely identified in G. soja PI 468916, in addition to a massive activation of pathogen recognition and pathogen responsive genes. The results confirmed that this G. soja PI 468916 deploys a broader range of defense mechanisms for SCN resistance than the other resistant soybeans investigated, likely indicating a broad genetic base for the resistance present in this line. This provides further evidence for the benefits of PI 468916

(Kabelka et al., 2005; Kim and Diers, 2013; Yu and Diers, 2017) as a genetic source to improve the genetic diversity of SCN resistance in soybean.

Defense responses of rhg1 and rhg4 loci during the migration phase Glyma.18G022700 was the only gene of the three in the rhg1 repeat which was significantly up-regulated, and only in G. soja, an interesting finding as this gene has distant similarity to wound-inducible proteins, yet its role in SCN resistance remains to be established.

Moreover, the rhg4 gene was not significantly up-regulated in our analysis. However, non- 26 significant changes in expression of rhg1 and 4 were observed by previous transcriptomic studies in whole roots (Wan et al., 2015; Zhang et al., 2017; Li et al., 2018), Additionally, the alpha-SNAP protein encoded at Rhg1 (Glyma.18G022500) is specifically induced only in the feeding cell and those surrounding it (Matsye et al., 2011; Bayless et al., 2016). Thus the resolution of our study, where whole root RNA was used, may be insufficient to detect induction responses that are specific to particular cells.

Resistant reactions in soybeans harboring the SCN-resistant Rhg1 locus The nematode-responsive DEGs in Fayette tend to overlap with those in other resistant genotypes, compared to those in the susceptible Williams 82. Fayette relies on introgressed Rhg1- b from PI 88788 for SCN resistance, and is otherwise isogenic with Williams 82 (Bernard et al.,

1988). The CNV at the Rhg1 locus that confers SCN resistance is generally described as three copies in Peking and 8-11 copies in Fayette (Cook et al., 2012; Lee et al., 2016). It is significant that the presence of Rhg1 in this genotype also alters the early response of other genes.

Resistance in Peking also requires Rhg4 (Yu et al., 2016). The DEGs uniquely identified in Peking outnumbered those in Fayette, suggesting a distinct resistance mechanism in Peking from Fayette. The resistant reaction of Peking is rapid while the reaction is prolonged in PI 88788

(Klink et al., 2011). Our findings also revealed candidate genes that may suggest the different mechanisms involved. Peking showed extensive changes of genes in calcium ion signaling and

ERF TF genes (including ERF113) in the ethylene signaling pathway in combination with other

TF genes (e.g., WRKYs and MYBs). By contrast, only two C2H2 TF genes belong Fayette DEGs.

The role of these C2H2 genes in soybean-SCN interactions remains to be explored.

Phenylpropanoid pathway plays a central role in both susceptible and resistant reactions The involvement of the phenylpropanoid pathway has been reported in both compatible and incompatible SCN-soybean interactions (Edens et al., 1995; Kandoth et al., 2011; Li et al., 27

2011; Zhang et al., 2017; Li et al., 2018). This pathway was over-represented in the DEGs of all four genotypes, suggesting its central role in the early defense response to SCN infection. The activation of multiple peroxidases in this study also implies that these genotypes share common strategies to overcome SCN invasion by altering cell wall structures and generating ROS in both susceptible and resistant reactions. Consistently, the functional analysis of peroxidase 53

(AtPRX53) in Arabidopsis roots infected with the beet cyst nematode showed that the AtPRX53 promoter showed up-regulation at the nematode penetration sites and in their migration paths (Jin et al., 2011). Interestingly, the down-regulation of genes in monolignol biosynthesis was also detected in both the susceptible and resistant lines, indicating that plants may have reduced activity of the lignification process during SCN infection in both reactions. In contrast, the resistant genotypes recruited more cell wall-related DEGs for modulating cell wall metabolism such as dirigent-like genes, and laccase genes. The induction of both lignin biosynthesis- and cell wall- related genes together in resistant reactions can be attributed to modifications in cell wall structures as a physical barrier against pathogen invasion. In addition to physical defense, plants produce defensive chemicals such as phytoalexins. The metabolites derived from the flavonoid/isoflavonoid biosynthesis pathway that commonly serve as phytoalexins in plant- nematode interactions mostly belong to the flavonols, isoflavonoids and pterocarpan classes (Chin et al., 2018). The significant up-regulation of P450 genes was detected in response to invading

SCN in both susceptible and resistant genotypes. Several P450 subfamilies have been identified to be involved in phytoalexin biosynthesis and early response to elicitors such as CYP93A1 as defense marker in soybean (Kinzler et al., 2016) and CYP81E genes involved in pathogen defense in Medicago (Liu et al., 2003). These subfamilies were also induced in this experiment, indicating that the biosynthesis of phytoalexins is important for general early responses to SCN infection.

28

P450s seem to be involved in the early response, but probably are also important throughout the

Soybean-SCN interaction, as they have been identified in studies at later time points (Li et al.,

2011; Li et al., 2018).

Suppression of carbohydrate metabolism and transport proteins may contribute to susceptibility in Williams 82 The fewest SCN-responsive DEGs were found in the susceptible Williams 82, which also showed a higher proportion of down-regulated to up-regulated genes, compared to resistant genotypes. Down-regulation was identified in genes encoding proteins in carbohydrate metabolism as well as transport proteins, for examples a beta-glucosidase, carbohydrate kinases, inositol transporter, a potassium channel, and a MATE efflux family protein. The alteration in sugar transporter genes has been previously observed in plant-nematode interactions (Hofmann et al., 2009; Zhao et al., 2018). Our data thus implies that these genes may be suppressed by SCN to enhance susceptibility. Furthermore, manipulation of sugar transport genes during compatible interactions was observed before syncytium establishment, possibly establishing suitable conditions for this process. However, the induction of DEGs that overlapped with resistant genotypes implies that Williams 82 is able to detect the nematode infection and activate defense- related pathways in a (somewhat muted) early response to migrating SCN. Compared to the extensive and diverse reactions in resistant genotypes, however, Williams 82 seems to show a much less comprehensive genomic response to SCN.

Conclusion

This study demonstrated that changes in expression of defense-related genes occur prior to nematode feeding site selection in all four soybean genotypes studied. By far the most extensive changes occurred in the three genotypes with incompatible reactions, indicating a more sensitive

29 and comprehensive response of the resistant plant transcriptome to pathogen invasion. Our findings also provide evidence that obvious differences between incompatible and compatible reactions have occurred by 8 hpi. Although some defense-related genes were also activated in the susceptible genotype, this reaction seems to be inadequate for SCN resistance. We also observed changes in genes involved in carbohydrate metabolism and transport that may help establish the conditions for a compatible interaction. This, in turn, suggests a pivotal role of this early transcriptomic response in the determination of pathogenicity success. Our results revealed many candidate genes involved in this response that may be potentially useful for engineering broader- spectrum SCN resistance in response to population genetic shifts.

30

Figures and Tables

Figure 2.1 Heatmap and dendrograms of hierarchical clustering showing the expression patterns of all 519 unique differentially expressed genes (DEGs) in response to nematode invasion in four genotypes (Fayette, Peking, Glycine soja PI 468916, and Williams 82). The colors represent the up-regulation (blue) and down-regulation (red) of statistically significant DEGs with different magnitudes (brightness) of log2 fold change values, in inoculated relative to uninoculated roots. The dendrogram on top shows the clustering of genotypes. The dendrogram and color bar on side shows the clustering of DEGs.

31

Figure 2.2 Multiple comparisons of four differentially expressed gene (DEG) sets identified in Williams 82, Fayette, Peking and Glycine soja PI 468916.

A) The UpSetR plot showing number of DEGs in each set and each comparison.

B) The pairwise comparisons showing the number of overlapped DEGs between two DEG sets based on their expression patterns.

C) The heatmap showing log2 fold change values of DEGs that were identified commonly in three or four genotypes between SCN-infected versus uninfected roots. The significantly induced DEGs were displayed in blue, while the significantly suppressed DEGs were displayed in red. The statistically significant changes in gene expression between SCN- infected versus uninfected roots were identified using a false-discovery rate (FDR) cut-off set at 0.05. “.” FDR < 0.1, “*” FDR < 0.05, “**” FDR < 0.01, “***” FDR < 0.001.

32

Figure 2.3 Bar graphs and heatmaps showing the gene frequency and expression profiles of pattern-triggered immunity (PTI)/effector-triggered immunity (ETI) receptor and calcium- mediated signaling genes (A and B) and pathogenesis-related genes (C and D). The bar graphs on the top of heatmaps represent the number of differentially expressed genes (DEGs) in each genotype which were uniquely identified or overlapped either susceptible or other resistant genotypes. The heatmaps display the expression patterns of each DEG in each of four genotypes (Fayette, Peking, Glycine soja PI 468916, and Williams 82). The colors in heatmaps represent the up-regulation (blue) and down-regulation (red) of DEGs with different magnitudes (brightness) of log2 fold change values, comparing between inoculated and uninoculated roots. The statistically significant changes in gene expression were identified using a false-discovery rate (FDR) cut-off set at 0.05. “.” FDR < 0.1, “*” FDR < 0.05, “**” FDR < 0.01, “***” FDR < 0.001.

33

Figure 2.4 Bar graph and heatmap showing the gene frequency and expression profiles of transcription factor genes that were differentially expressed in response to nematode invasion at 8 hours post inoculation (hpi). (A) The bar graph represents the number of transcription factor genes which were uniquely identified or overlapped either susceptible or other resistant genotypes. (B) The heatmap displays the expression patterns of each transcription factor genes in four genotypes (Fayette, Peking, Glycine soja PI 468916, and Williams 82). The colors in the heatmap show the up-regulation (blue) and down-regulation (red) of genes with different magnitudes (brightness) of log2 fold change values, comparing between inoculated and uninoculated roots. The statistically significant changes in gene expression were identified using a false-discovery rate (FDR) cut-off set at 0.05. “.” FDR < 0.1, “*” FDR < 0.05, “**” FDR < 0.01, “***” FDR < 0.001.

34

Figure 2.5 Bar graph (A) and heatmap (B) showing the gene frequency and expression profiles of differentially expressed genes (DEGs) related to hormone biosynthesis and signaling. The bar graph on the top of heatmap represents the number of DEGs in each genotype which were uniquely identified or overlapped either susceptible or other resistant genotypes. The heatmap displays the expression patterns of each DEG in each of four genotypes (Fayette, Peking, Glycine soja PI 468916, and Williams 82). The colors in heatmaps represent the up-regulation (blue) and down- regulation (red) of DEGs with different magnitudes (brightness) of log2 fold change values, comparing between inoculated and uninoculated roots. The statistically significant changes in gene expression were identified using a false-discovery rate (FDR) cut-off set at 0.05. “.” FDR < 0.1, “*” FDR < 0.05, “**” FDR < 0.01, “***” FDR < 0.001.

35

Figure 2.6 Bar graphs and heatmaps showing the gene frequency and expression profiles of differentially expressed genes (DEGs) involved in phenylpropanoid pathway (A and B) and oxidation-reduction process (C and D). The bar graphs on the top of heatmaps represent the number of DEGs in each genotype which were uniquely identified or overlapped either susceptible or other resistant genotypes. The heatmaps display the expression patterns of each DEG in each of four genotypes (Fayette, Peking, Glycine soja PI 468916, and Williams 82). The colors in heatmaps represent the up-regulation (blue) and down-regulation (red) of DEGs with different magnitudes (brightness) of log2 fold change values, comparing between inoculated and uninoculated roots. The statistically significant changes in gene expression were identified using a false-discovery rate (FDR) cut-off set at 0.05. “.” FDR < 0.1, “*” FDR < 0.05, “**” FDR < 0.01, “***” FDR < 0.001.

36

Figure 2.7 Bar graphs and heatmaps showing the gene frequency and expression profiles of differentially expressed genes (DEGs) classified in transport protein family (A and B) or involved in carbohydrate metabolism (C and D). The bar graphs on the top of heatmaps represent the number of DEGs in each genotype which were uniquely identified or overlapped either susceptible or other resistant genotypes. The heatmaps display the expression patterns of each DEG in each of four genotypes (Fayette, Peking, Glycine soja PI 468916, and Williams 82). The colors in heatmaps represent the up-regulation (blue) and down-regulation (red) of DEGs with different magnitudes (brightness) of log2 fold change values, comparing between inoculated and uninoculated roots. The statistically significant changes in gene expression were identified using a false-discovery rate (FDR) cut-off set at 0.05. “.” FDR < 0.1, “*” FDR < 0.05, “**” FDR < 0.01, “***” FDR < 0.001.

37

Table 2.1 The over-represented GO terms identified in differentially expressed genes (DEGs) of each genotype across the experiment, using the AgriGO web-server.

GO biological processes False discovery rate (FDR)† GO term GO biological processes description Fayette Peking G. soja Williams 82 GO:0055114 oxidation reduction 7.20E-06 3.30E-09 2.20E-16 0.00018 GO:0006979 response to oxidative stress 0.006 5.70E-07 4.80E-11 0.00021 GO:0009607 response to biotic stimulus 0.014 - 2.70E-05 - GO:0006950 response to stress 0.027 0.0049 1.70E-05 0.82 GO:0042221 response to chemical stimulus 0.038 0.00015 1.70E-07 0.00021 GO:0006952 defense response 0.04 - 0.0014 - GO:0050896 response to stimulus 0.047 0.024 0.00013 0.046 GO:0008152 metabolic process 0.19 0.0042 4.80E-06 1 GO:0051704 multi- process - - 0.0062 - GO molecular function False discovery rate (FDR)† GO term GO molecular function description Fayette Peking G. soja Williams 82 GO:0005506 iron ion binding 2.40E-10 2.40E-11 4.70E-12 1.50E-05 GO:0020037 heme binding 3.40E-10 2.40E-11 7.20E-13 1.10E-05 GO:0046906 tetrapyrrole binding 3.40E-10 2.40E-11 7.20E-13 1.10E-05 oxidoreductase activity, acting on GO:0016705 paired donors, with incorporation or 1.50E-06 8.30E-07 3.70E-06 0.022 reduction of molecular oxygen GO:0016491 oxidoreductase activity 2.40E-06 1.30E-09 3.10E-16 0.00016 GO:0046914 transition metal ion binding 1.00E-05 8.60E-08 0.00015 0.0012 GO:0009055 electron carrier activity 2.00E-05 0.00066 0.004 6.60E-02 GO:0003824 catalytic activity 0.00013 0.012 2.50E-06 0.6 GO:0043167 ion binding 0.0013 1.10E-07 4.70E-05 0.00016 GO:0046872 metal ion binding 0.0013 1.10E-07 4.70E-05 0.00016 GO:0043169 cation binding 0.0013 1.10E-07 4.70E-05 0.00016 oxidoreductase activity, acting on GO:0016684 0.0017 2.70E-07 1.00E-12 0.00016 peroxide as acceptor GO:0004601 peroxidase activity 0.0017 2.70E-07 1.00E-12 0.00016 GO:0016209 antioxidant activity 0.0033 1.10E-06 7.20E-13 0.00031 GO:0016829 lyase activity 0.041 - 1 - GO:0015075 ion transmembrane transporter activity 0.044 1 1 - GO:0045735 nutrient reservoir activity - 0.0013 0.83 0.002 GO:0004312 fatty-acid synthase activity - 0.0037 0.00024 -

38

Table 2.1 Continued

GO molecular function (Continued) False discovery rate (FDR)† GO term GO molecular function description Fayette Peking G. soja Williams 82 3-oxoacyl-[acyl-carrier-protein] GO:0004315 - 0.0037 0.00024 - synthase activity oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2- GO:0016706 - 0.078 0.035 - oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors GO cellular component False discovery rate (FDR)† GO term GO cellular component description Fayette Peking G. soja Williams 82 GO:0044421 extracellular region part - 0.029 - - GO:0031012 extracellular matrix - 0.029 - -

† The analysis of GO term enrichment was performed on DEGs of each genotype, compared to genes in whole genome as a background, using the hypergeometric tests with FDR cut-off set at 0.05.

39

Table 2.2 The over-represented KEGG pathway in biological process category identified in differentially expressed genes (DEGs) of each genotype across the experiment, using the KOBAS.

False discovery rate (FDR)† KEGG ID KEGG pathway Fayette Peking G. soja Williams 82 gmx00940 Phenylpropanoid biosynthesis 1.10E-08 7.04E-12 1.09E-20 1.79E-08 Biosynthesis of secondary gmx01110 8.33E-07 7.04E-12 4.78E-20 1.67E-05 metabolites gmx00460 Cyanoamino acid metabolism 0.000 0.014 2.65E-06 0.105 gmx01100 Metabolic pathways 0.001 2.37E-07 6.16E-12 1.67E-05 gmx04626 Plant-pathogen interaction 0.012 1.03E-05 0.076 - gmx00500 Starch and sucrose metabolism 0.032 0.150 0.009 0.359 gmx00941 Flavonoid biosynthesis 0.035 0.001 2.65E-06 0.105 Ubiquinone and other terpenoid- gmx00130 0.164 0.014 0.023 - quinone biosynthesis Cysteine and methionine gmx00270 0.287 0.014 0.046 0.057 metabolism gmx04712 Circadian rhythm - plant 0.164 0.014 0.006 - gmx00900 Terpenoid backbone biosynthesis 0.164 0.356 0.023 - gmx00591 Linoleic acid metabolism - - 0.024 0.005 gmx00902 Monoterpenoid biosynthesis - - 0.024 - gmx00910 Nitrogen metabolism 0.159 - 0.045 -

† The analysis of GO term enrichment was performed on DEGs of each genotype, compared to genes in whole genome as a background, using the hypergeometric tests with FDR cut-off set at 0.05.

40

CHAPTER 3 IDENTIFICATION OF DEFENSE MECHANISMS INVOLVED IN RHG1-MEDIATED RESISTANCE TO SOYBEAN CYST NEMATODE IN SOYBEAN ROOTS

Introduction

Soybean cyst nematode or SCN (Heterodera glycines Ichinohe) is a sedentary endoparasite that infects soybean roots, causing a dramatic decrease in soybean yield. SCN penetrates soybean roots as a second-stage juvenile (J2) and intracellularly migrates to establish a feeding cell near vascular tissues. The nematode uses a stylet, and secrets effector enzymes to destroy cell walls of neighboring cells to fuse them with the initial feeding cell and form a multinucleate cell, which is known as a syncytium.

The management of SCN infestation in the field is by a combination of using SCN-resistant soybeans, crop rotation, and seed treatments. One of the plant breeding challenges is a limited number of resistance sources to improve the SCN resistance in soybeans. Using a single resistance source (PI 88788) led to a decrease in efficiency of SCN-resistant soybeans. The fine mapping of the SCN resistance locus rhg1-b that is derived from PI 88788 (Kim et al., 2010) and further investigation in structural variation revealed the copy number variation (CNV) of a 31.2kb segment at the Rhg1 locus conferring the SCN resistance (Cook et al., 2012). The overexpression of three Rhg1 genes in susceptible W82 roots enhanced SCN resistance (Cook et al., 2012). In addition, CNV at the Rhg1 locus was reported in Fayette lines ranging from 8 to 11 copies with different levels of resistance to SCN (Lee et al., 2016). The Fayette cultivar was classified into high-copy number Rhg1 group with two types of repeat units: one found in PI 88788 (FA, FB) and one W type found in Williams 82, as Fayette lines harboring SCN-resistant Rhg1-b allele which

41 was developed from introgression of resistant loci in PI 88788 into Williams background (Bernard et al., 1988; Lee et al., 2015).

Understanding differences in molecular mechanisms between susceptible and resistant responses against SCN would be beneficial for the improvement of SCN resistance in soybeans.

There have been numerous gene expression studies to identify the candidate genes related to defense responses in PI 88788 (Klink et al., 2010; Klink et al., 2011; Matsye et al., 2011). For

Rhg1-mediated resistance, RNA microarray analysis comparing between SCN-susceptible versus

SCN-resistant soybean near-isogenic lines (NILs) that differ at the Rhg1 locus identified potential

SCN-responsive genes such as genes encoding CC-NB-LRR proteins, hypersensitive-like response proteins, and salicylic acid signaling proteins (Kandoth et al., 2011).

Plants have developed complex defense systems against pathogen attack such as pathogen recognition and activation of multiple signaling pathways. The systems subsequently lead to dynamic changes of plant cell structures (e.g., callose deposition and lignification at the cell wall) and production of toxic chemicals (e.g., biosynthesis of secondary metabolites or phytoalexins) during the plant-pathogen interactions. Recently, we detected differential expression of diverse early response genes during the incompatible interaction between resistant Fayette#99 and SCN

HG type 0 at 8 hours post inoculation or migration phase described in Chapter 2 (Miraeiz et al., submitted), suggesting that other defense-related genes may involve in the SCN resistance mediated by CNV at the Rhg1 locus. However, little is known about an impact of CNV of the

Rhg1 repeat on global transcriptomic changes involved in plant defense responses.

This study examined genome-wide gene expression profiles in uninfected soybean roots using RNA-seq to identify differentially expressed genes (DEGs) in isogenic Fayette lines with

CNV at the Rhg1 locus. The co-expression network was constructed to determine gene clusters

42 and potential defense mechanisms involved in the Rhg1-mediated SCN resistance. Candidate genes were selected based on the differential expression in the Fayette lines and previously identified as early response genes in Chapter 2 (Miraeiz et al., submitted). Then, we further validated the dynamic expression of the candidate genes in soybean roots infected with SCN HG- type 0 at three time points (8, 24 and 48 hours post inoculation), from migratory phase to the establishment of syncytium stage using high-throughput qRT-PCR.

Materials and Methods

Plant materials and RNA extraction for RNA-seq analysis of three Fayette lines Three Fayette lines with CNV at the Rhg1 locus were selected from the TaqMan assay screen (Lee et al., 2016), including Fayette#01 (low, 8-9 predicted copies at the Rhg1 locus),

Fayette#19 (mid, 9-10 predicted copies at the Rhg1 locus) and Fayette#99 (high, 11 predicted copies). Using seeds harvested from a single plant, ten vapor-sterilized seeds were germinated and each grown in a cone-tainer filled with autoclaved a mixture of sand and turface (2:1). The plants were maintained in the growth chamber for 15 days under the 16/8 h (light/dark) condition at 26

°C. For each Fayette line, whole roots were collected from four biological replicates and immediately frozen in liquid nitrogen. Plant tissues were kept in -80 °C freezer until RNA extraction. The frozen tissue was ground to a fine powder with a mortar and pestle, homogenized in CTAB buffer prepared for RNA extraction (Chang et al., 1993), and purified in acidic phenol/chloroform. DNase I (New England Biolabs) was used for removing contaminated DNA according to manufacturer’s protocol. The RNA was precipitated with isopropanol and resuspended in RNase-free water. Then, RNA concentration and quality were measured using

Nanodrop and Bioanalyzer. The RNA samples were sent for library preparation and sequencing at the Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, IL.

43

RNA-seq data pre-processing and differential expression analysis The 100nt paired-end reads were sequenced using Illumina’s HiSeq2500 sequencing system. After quality check using Trimmomatic (Bolger et al., 2014) and FastQC (Andrews, 2010), the reads containing low-quality bases (q < 20) were trimmed and removed based on their lengths

(less than 35 nucleotides). Then, HISAT2 (Kim et al., 2015a) was used to map quality assessed reads against soybean (G. max cv. Williams 82) reference genome (Schmutz et al., 2010), obtained from the Phytozome database (Goodstein et al., 2012). The reads that were mapped to exon regions were quantified for each gene, using featureCount (Liao et al., 2014). Identification of differentially expressed genes (DEGs) among three Fayette lines was performed by ANOVA-like tests for the negative binomial generalized linear model implemented in edgeR (Robinson et al.,

2010) with adjusted p-value (FDR) cut-off set at 0.05. The heatmap.2 function in gplot (Warnes et al., 2016) R package was used to draw heatmap of expression profiles.

Co-expression network analysis and clustering The normalized expression values (TPM) of DEGs were used for calculating the Pearson correlation coefficient of every gene pairs to create a square adjacency matrix. Using igraph R package (Csardi and Nepusz, 2006), the co-expression network was constructed based on strongly positive or negative co-expressed genes with the correlation coefficient cut-off set at ± 0.9. The network properties were examined to identify gene clusters, bridging genes that connect between different clusters, and hub genes. The Girvan-Newman algorithm (Newman and Girvan, 2004) was applied to detect gene clusters based on edge betweenness. The edge betweenness represents the number of shortest paths that pass through an edge between gene pairs. High betweenness of an edge indicates that this edge tends to link between two genes from different clusters, while low edge betweenness shows a connection within the cluster. Similarly, betweenness centrality of a node is the number of shortest paths pass through that node between two nodes. The betweenness

44 centrality of a gene can be used to indicate whether the gene is a member of a cluster or a bridge between two clusters. In addition, the degree of connectivity was measured to identify highly connected genes or hub genes, which can be key regulators in the gene clusters.

GO term enrichment analysis and KEGG pathway annotation A set of DEGs in the co-expression network was identified in each gene cluster and performed gene set enrichment analysis based on their gene ontology (GO) terms and KEGG pathways, compared to those annotations of all genes in soybean genome as background. The GO term enrichment tool (Morales et al., 2013) embed on SoyBase database (Grant et al., 2010) was used to identify over-represented GO terms with Fisher’s exact test and Bonferroni correction for multiple testing cut-off set at 0.05. For KEGG pathways annotation using KOBAS (Xie et al.,

2011) and BlastKOALA (Kanehisa et al., 2016).

Transcription factor binding site analysis A set of promoter sequences (1kb upstream sequences from transcript start site) of DEGs in each gene cluster was retrieved from Biomart tool embed in the Phytozome database to examine enrichment analysis of transcription factor (TF) binding sites using the PlantTFDB 4.0 (Jin et al.,

2017). The TFs and their motifs pre-identified in the database were integrated from different sources such as literature, experiments, and computational prediction; however, the majority of these were derived from Arabidopsis to soybean by the BLAST reciprocal best hits. The tool integrated two steps for the enrichment analysis. The first step is to scan for TF binding site motifs in promoter sequences based on FIMO (Grant et al., 2011) with p-value cut-off at 10e-5. Then, the

Fisher’s exact tests were applied to identify over-represented motifs in the promoter sequence set with adjusted p-values (Benjamini-Hochberg; p < 0.01), compared those sites found in the promoter sequences of all genes in soybean genome as background.

45

Plant materials, nematode inoculation and RNA extraction for qRT-PCR Based on our previous RNA-seq analysis of soybeans in response to SCN infection at 8 hours post inoculation described in Chapter 2 (Miraeiz et al., submitted), soybean seeds were harvested from a single plant of four genotypes grown in a greenhouse, including Glycine max cv.

Williams 82, G. max cv. Peking, G. max cv. Fayette and G. soja PI 468916. The Fayette seeds were obtained from a Fayette#99 plant harboring 11 copies of the 31.2kb repeat segment at the

Rhg1 locus (Lee et al., 2016). The vapor-phase sterilized seeds were pre-germinated at room temperature in the dark for three days to prepare for mock (water) and SCN inoculation. Before the inoculation, eggs of SCN HG Type 0 were obtained from Dr. Alison Colgrove, University of

Illinois at Urbana-Champaign. According to the inoculum preparation in Mahalingam et al., (1998) with some modification, Petri dishes containing eggs in distilled water was incubated at 27 °C with constant gentle agitation of 50 rpm for four days. Using a metal screen and filter papers, the unhatched eggs were removed to select for the second stage juvenile nematodes (J2s) and prevent a variation due to infection of late hatching J2s. For inoculation, a three-days-old seedling was placed on top of a sterilized germination paper (20x30 cm2), which were folded three times and moistened with distilled water in a 10-cm petri dish. Each seeding was inculcated by pipetting one ml of inoculum (1500 J2s) or one ml of distilled water onto the roots above the root tip and covered with a piece of moistened filter paper. The Petri dishes were randomly placed in plastic trays (50 x 30 x 10 cm), which were pre-filled with 1-cm water. Each tray was covered with a humidity dome and double layered by a semi-clear plastic bag. The seedlings were maintained in the growth chamber under fluorescent lights of 16/8 h (light/dark) photoperiod at 26 °C. Eight biological replicates of root tissues were harvested for each condition and flash frozen in liquid nitrogen at three time points, including 8, 24 and 48 hours post inoculation (hpi). All tissue samples were kept in -80 °C freezer until RNA extraction. Total RNA was extracted from individual root samples

46 using E-Z® 96 Plant RNA Kit (Omega Bio-Tek) and treated with DNAse I (Thermo Scientific) according to manufacturer’s protocol. The quality of RNA was assessed with NanoDrop ND-1000 spectrophotometer (NanoDrop, Wilmington, DE, USA) and 1% agarose gel electrophoresis.

Quantitative real-time RT-PCR (qRT-PCR) The total RNA was used for cDNA synthesis with the High-Capacity cDNA Reverse

Transcription Kit (Applied Biosystems). This included four biological replicates of cDNA samples for 8 and 24 hpi, but three biological replicates for 48 hpi. Primer sets were designed for target genes using NCBI Primer-Blast tool (Ye et al., 2012). Then, cDNA samples and primer sets were prepared to perform qPCR at the Functional Genomics Unit of the Roy J. Carver Biotechnology

Center, University of Illinois at Urbana-Champaign, IL. The samples were processed on the

Fluidigm Dynamic Array and the BioMarkTM HD system according to the manufacture’s procedures. For differential expression analysis, Ct values were obtained from the Biomark Real- time PCR Software (version 4.3.1) and normalized by the geometric mean of three reference genes, including Glyma.12G051100 (SKIP16), Glyma.20G130700 (TIP41), and Glyma.20G141600

(Ubiquitin). Using limma R package (Ritchie et al., 2015), log2 relative expression values of each gene normalized by internal gene references were used to fit a linear model and followed by estimation of variances with empirical Bayes approach. For each gene and genotype, log2 fold change values were calculated from pairwise comparisons between SCN-infected versus uninfected roots. The differentially expressed genes were identified at an adjusted p-values cut-off of 0.05. The heatmap showing expression profiles was drown using ggplot2 R package (Gómez-

Rubio, 2017).

47

Results

Differentially expressed genes In total, 465 differentially expressed genes (DEGs) were identified in uninfected roots of

Fayette lines with CNV at the Rhg1 locus, including Fayette#01 (low, 8-9 predicted copies at the

Rhg1 locus), Fayette#19 (mid, 9-10 predicted copies at the Rhg1 locus) and Fayette#99 (high, 11 predicted copies). The expression profiles of all DEGs showed that the expression pattern of DEGs in Fayette#19 was more similar to those in Fayette#99 than Fayette#01. There was no statistical significance detected in expression levels of three Rhg1 genes in uninfected roots, comparing between any of these Fayette lines. However, 58 genes were also differentially expressed in soybean roots infected with SCN HG-type 0 at 8 hours post inoculation (hpi), a migration phase of SCN described in Chapter 2 (Miraeiz et al., submitted) [Figure 3.1]. This common gene set suggested that some genes expressed differentially in Fayette lines due to the CNV at the Rhg1 locus possibly also involved in the Rhg1-mediated resistance to SCN at the early time points.

Co-expression network of differentially expressed genes The expression levels of 465 DEGs in all 12 samples of Fayette lines were normalized and calculated the Pearson correlation coefficient of every gene pairs. After removing weakly correlated genes (|R| < 0.9), a co-expression network was constructed by 393 strongly correlated

DEGs (3,731 interactions), consisting of ten network components. The largest component was made of 373 co-expressed DEGs and 3,720 interactions [Figure 3.2A] while the other nine components consisted of either two or three co-expressed genes. For the largest sub-network, the degree of connectivity ranged from one to 72 with an average of 19.95. The gene clusters identified based on edge betweenness in this sub-network accounted for seven clusters (cluster 1 – cluster

7), consisting of 102, 99, 67, 65, 21, 14 and five gene members in each cluster, respectively [Figure

48

3.2B]. In addition, 54 early-response genes were clustered in only the first four clusters; however, majority of these genes belong to either cluster 1 (18 genes) or cluster 4 (29 genes).

Expression profiles of genes in each cluster and their predicted functions The heatmap and GO term enrichment analysis revealed distinct expression profiles of

DEGs and their functions in each cluster [Figure 3.3]. The genes clusters were detected based on the edge betweenness in co-expression network; thus, each cluster contained genes that showed both up- and down-regulation in a specific group of samples, representing a group of genes involved in biological processes that were activated in specific samples. The network properties were examined to identify bridging genes that connect between two clusters and hub genes in each cluster [Figure 3.4].

The largest cluster (cluster 1) represented a set of genes that were highly expressed in

Fayette#01. A gene encoding C2H2 zinc finger transcription factor (Glyma.17G236200) had the highest degree of connectivity (72 neighbors). Moreover, total 21 transcription factor (TF) genes were found in this cluster, including seven APETALA2/Ethylene responsive factor (AP2/ERF) genes, five

WRKY genes (Glyma.05G215900, Glyma.08G021900, Glyma.09G061900, Glyma.16G026400, and Glyma.19G254800), four C2H2 zinc-finger genes (Glyma.04G044900, Glyma.06G045400,

Glyma.17G236200, and Glyma.20G133200), two C3H zinc-finger genes (Glyma.02G296600 and

Glyma.14G016300), and one gene of each three other TF families (bZIP, NAC, and HSF).

Interestingly, the GO term enrichment analysis showed 11 over-represented terms in this cluster such as intercellular signal transduction (GO:0035556), ethylene-mediated signaling pathway and biosynthetic process (GO:0009873 and GO:0009693) and respiratory burst involved in defense response (GO:0002679). Further investigation showed that some genes were classified into several

GO terms; therefore, this cluster represented a group of genes involved in multiple biological processes related to signaling. 49

The cluster 2 contained three highest connectivity genes (Glyma.03G054100 encoding

NBS-LRR, Glyma.03G073300 encoding proteasome subunit alpha type-2-A, and

Glyma.03G111400 encoding cellulose synthase-like protein). These genes were strongly correlated and directly connected to 49 other genes. The cluster represented a group of defense- related genes (GO:0006952) such as 16 NBS-LRR genes (mostly located in complex NBS-LRR loci on chromosome 3 and 13), three dirigent-like genes (Glyma.03G044300, Glyma.03G045600, and Glyma.03G046000) and two lipoxygenase genes (Glyma.04G105900 and

Glyma.15G026500). Approximately, half of NBS-LRR genes in cluster 2 were found adjacently on chromosome 3. These genes were closely related to the NBS-LRR Rps-k-2 gene (Resistance to

Phytophthora sojae) identified in soybean (Gao and Bhattacharyya, 2008). In addition, the heatmap showed that the defense-related genes in these clusters were either intensely up-regulated or down-regulated in Fayette#01, compared to those in the other two lines. Moreover, similar expression patterns were found in cluster 7, the smallest cluster. Of total five genes, cluster 7 contains two genes encoding transporter proteins in Multi-antimicrobial extrusion (MATE) family.

In cluster 3, genes were significantly induced or suppressed in Fayette#19, compared to those in Fayette#01 and Fayette#99. A gene with the highest degree of connectivity was a receptor- like kinase (RLK) gene (Glyma.12G142100), connecting to 16 genes. The over-represented GO terms were xenobiotic catabolic and metabolic process (GO:0042178 and GO:0006805), including five glycosyltransferase genes (Glyma.08G338500, Glyma.08G338600, Glyma.08G338700,

Glyma.08G338900, and Glyma.15G054500). Additionally, two other RLK genes

(Glyma.06G261000 and Glyma.09G240700) and seven NBS-LRR genes were also found in this gene cluster. Most of these NBS-LRR genes were found in complex NBS-LRR loci on chromosome 8 (Glyma.08G317400, Glyma.08G317700, Glyma.08G318000, Glyma.08G319300,

50 and Glyma.08G323200) and their sequences were homologous to the RPM1 genes encoding NSB-

LRR disease resistance to Pseudomonas syringe in Arabidopsis (Grant et al., 1995).

The cluster 4 had 16 of 65 total genes related to the oxidation-reduction process

(GO:055144). This GO term was not significantly enriched in the biological process category; however, 12 of these genes were also classified in heme binding (GO:0020037), which was one of significantly over-represented GO terms in the molecular function category, including nine cytochrome P450 (P450) genes and three peroxidase genes (Glyma.01G130800,

Glyma.03G038200, and Glyma.03G038700). One of P450 genes (Glyma.03G143700) was hub gene, and its expression was strongly correlated with 39 other genes. The expression levels of genes in this cluster tended to be positively correlated with the CNV at the Rhg1 locus. In other words, the genes were highly expressed in high-copy-number Fayette#99 (11 copies), lower expressed in mid-copy Fayette#19 (9-10 predicted copies) and lowest in low-copy Fayette#01 (8-

9 predicted copies).

The most frequent genes in cluster 5 belong to NBS-LRR family which lie tightly to each other on chromosome 13. The predicted closest protein homologs of these genes were NBS-LRR

RPG1-B, conferring resistance to Pseudomonas syringe in soybean (Ashfield et al., 2004). One of these was hub gene (Glyma.13G190300) connected to ten genes in the cluster. The heatmap showed up-regulation or down-regulation of genes in Fayette#99 relative to those in two other

Fayette lines.

Similarly, the high expression levels of genes in cluster 6 were illustrated in Fayette#99 compared to those in other lines. This cluster represented a group of genes related to trichoblast differentiation (GO:0010054) such as five genes encoding pollen Ole e1 allergen and extensin

51 family proteins (Glyma.03G188300, Glyma.10G016600, Glyma.11G232100, and

Glyma.18G025200, and Glyma.19G188400).

Bridging between clusters Based on the top 10 genes ranked by the betweenness centrality, the clusters in the co- expression network were linked by diverse functional genes [Figure 3.4C]. Of these, eight genes acted as connectors between cluster 2 and other clusters. A gene with the highest betweenness in this network was AP2/ERF gene (Glyma.02G016100), bridging between cluster 1 and cluster 2.

The AP2/ERF is a transcription factor superfamily that plays regulatory roles in plant development and stress responses. This gene was directly connected to different types of genes, for instance,

Glyma.11G228100 encoding a homolog protein of HSPRO-Arabidopsis ortholog of beet cyst nematode resistance gene (Cai et al., 1997), RLK genes (Glyma.08G255000 and

Glyma.12G002400), and NBS-LRR genes (Glyma.03G039500, Glyma.06G259400, and

Glyma.13G194100). Interestingly, one of these NBS-LRR genes (Glyma.03G039500) was connecting three clusters (cluster 1, cluster 2, and cluster 4). Another NBS-LRR gene

(Glyma.13G188300) was a connector of two clusters (cluster 3 and cluster 5). In addition, cell wall-related genes are involved in both cluster 2 and cluster 3, such as Glyma.02G024900 encoding wall-associated receptor kinase, Glyma.02G231600 encoding laccase gene, and

Glyma.13G076900 encoding l-ascorbate oxidase gene.

The enrichment analysis of transcription factor motif binding sites The enrichment of 31 motifs in the promoter sequences of DEGs [Figure 3.5; Figure 3.6A] were examined separately based on the clusters in the co-expression network with adjusted p-value cut-off set at 0.01. These genes were categorized into ten TF families, including six WRKY motifs, six NAC motifs genes, five HD-ZIP motifs, five MYB motifs, two C2H2 motifs, two ERF motifs, two BBR-BPC motifs, and one motif of each other families (bZIP, Trihelix, and CAMTA). Of 52 seven gene clusters, the significantly over-represented motifs were detected in promoter sequences of only five clusters (cluster 1, 2, 3, 4, and 5). Based on the TF families, motifs of five TF families

(BBR-BPC, C2H2, WRKY, MYB, NAC) were significantly over-represented in multiple clusters.

For example, three TF families (BBR-BPC, C2H2, WRKY) were predicted to regulate genes in cluster 1 and 3 whereas MYB motif binding sites were significantly enriched in the promoters of

DEGs in three clusters (cluster 1, 2 and 5). For cluster–specific motifs, DEGs in cluster 1 were targeted by one CAMTA motif. The sites of six HD-ZIP motifs were over-represented in the promoters of DEGs in cluster 3. Moreover, the enriched motifs of the ERF family were identified in the promoters of DEGs in cluster 4. The motif of the bZIP TF family was enriched in cluster 5.

One Trihelix motif was significantly over-represented in promoters of DEGs in cluster 2.

Considering all unique target genes of each TF family [Figure 3.6B], the motif binding sites of major TF families (e.g., BBR-BPC, C2H2, and MYB) were generally found in the promoters of

DEGs correlated with the number of DEGs in each cluster. However, some TF families tended to bind to promoters of DEGs in specific clusters regardless of the total number of DEGs in each cluster, for example, the target DEGs of ERF and NAC TF families in cluster 4 outnumbered those in other clusters.

Expression profiling of genes in response to SCN at 8, 24 and 48 hpi by qRT-PCR. Primer sets were designed for selected genes [Table 3.1], including DEGs in uninfected roots of three Fayette lines which were overlapped with early response genes induced by the SCN infection at 8 hpi (overlapped genes), other randomly selected early response genes which were non-overlapped with the DEGs identified in three Fayette lines (non-overlapped genes), candidate genes located in major SCN-resistance QTLs (Rhg1, Rhg4, cqSCN006, and cqSCN007), and reference genes. For statistical analysis, the differentially expressed genes were identified between

SCN-infected roots versus uninfected roots of each genotype using the limma R package with an 53 adjusted p-value cut-off set at 0.05. For overlapped genes, the log2 fold change values (SCN- infected / uninfected roots) estimated by qRT-PCR were closely correlated with the RNA-seq data

(RFayette#99 = 0.8661 and RWilliams 82= 0.6635), validating the pattern of expression observed in the

RNA-seq experiment [Figure 3.7A]. We attribute the differences to biological variations in the plant-nematode experimental system, the different error characteristics of RNA-seq and qPCR technology, and the different statistical analyses employed. The qRT-PCR results showed the statistically significant changes in expression levels of 36 out of 58 genes in response to SCN invasion in at least one genotype. The majority of genes were transiently induced at 8 hpi in susceptible Williams 82, but those remained up-regulated until 24 hpi or 48 hpi in resistant

Fayette#99, especially overlapped genes in cluster 4 [Figure 3.7C]. For example, four cytochrome

P450 genes (Glyma.09G049200, Glyma.11G062500, Glyma.11G062600, and

Glyma.15G156100), one endochitinase PR4 gene (Glyma.13G346700), one glycinol-4- dimethylallyltransferase (Glyma.01G134600), one gene encoding stress-induced protein SAM22

(Glyma.17G030400), one gene encoding ATP-binding cassette (ABC) transporter protein

(Glyma.15G012000), and one bHLH TF gene (Glyma.09G064200).

The temporal expression profiles were further investigated on non-overlapped genes that were altered during SCN-infection at 8 hpi in at least one of four genotypes described in the RNA- seq analysis in Chapter 2 [Figure 3.7B and 3.7D]. The high correlation between the qRT-PCR and

RNA-seq data was found in all four genotypes (RFayette#99 = 0.7272, RWilliams 82= 0.7025, RPeking =

0.8346, RG. soja = 0.7332). As expected, the genes were induced at 8 hpi and predominantly found in G. soja which had a wide range of gene expression changes in response to SCN infection, validating the early response genes identified in the RNA-seq data analysis in Chapter 2. The qRT-

PCR showed 38 differentially expressed genes in at least one genotype. Of these, 18 DEGs were

54 transiently induced at 8 hpi in both susceptible and resistant genotypes, for example, a receptor- like kinase (RLK) gene (Glyma.02G088700), a calmodulin gene (Glyma.03G157800), an ACC- oxidase gene (Glyma.02G268200), two dirigent-like genes (Glyma.01G127100 and

Glyma.03G044900), two peroxidase genes (Glyma.15G129200 and Glyma.20G169200), a beta- glucosidase gene (Glyma.11G129700). Moreover, some genes were continuously up-regulated in resistant lines at 8 hpi and later time points, for example, P450 gene CYP82A

(Glyma.01G135200), eugenol synthase gene (Glyma.04G131100), gene encoding subtilisin-like serine endopeptidase family protein (Glyma.10G119400), tryptophan aminotransferase-related gene (Glyma.16G162400), flavin-dependent monooxygenase gene (Glyma.17G046600), and gene encoding quinohemoprotein ethanol dehydrogenase (Glyma.18G256900). Interestingly, other genes were induced at two or three time points only in G. soja, including three peroxidase genes

(Glyma.09G277900, Glyma.09G284700, and Glyma.18G055600), two WRKY genes

(Glyma.07G023300 and Glyma.13G370100), one chitinase gene (Glyma.03G024900) and one

LRR-RLK (Glyma.16G064200). The statistically significant down-regulation of a gene encoding inositol transporter (Glyma.09G087400) was uniquely found in SCN-infected roots of susceptible

Williams 82 at 8 hpi and perfectly confirmed the RNA-seq experiment.

In addition, the expression of candidate genes located in major SCN-resistance QTLs

(Rhg1, Rhg4, cqSCN006, and cqSCN007) was measured during the SCN infection [Figure 3.7B and 3.7D]. The statistically significant changes in expression of most genes were not detected.

However, a p-Nitrophenyl phosphatase (Glyma.18G244900) was significantly down-regulated at

8 hpi in G. soja.

55

Discussion

Isogenic Fayette line is a tool to investigate SCN resistance conferred by CNV at the Rhg1 locus The gene expression study in SCN syncytium was previously performed on resistant and susceptible soybeans NILs differing at the Rhg1 locus developed from G. max cv. Evans and PI

209322 (Kandoth et al., 2011). However, we investigated impacts of CNV at the Rhg1 locus on genome-wide transcriptome changes in isogenic lines of Fayette cultivar which propagated by self- fertilization. Therefore, we report putative defense mechanisms involved in the SCN resistance conferred by CNV at the Rhg1 locus, focusing on an SCN-resistant cultivar in high-copy number

Rhg1 group derived from PI 88788, a major resistance source. Based on the efficient genotyping method developed for CNV detection at the Rhg1 locus using TaqMan assay (Lee et al., 2016), a wide distribution of copy number was identified in Fayette isolines. We selected three Fayette lines which represented three groups of copy numbers: Fayette#01 (low, 8-9 predicted copies at the

Rhg1 locus), Fayette#19 (mid, 9-10 predicted copies) and Fayette#99 (high, 11 predicted copies).

Using these isolines for this analysis could reduce noises contributed from variation in genetic background.

In this experiment, the RNA-seq analysis showed 465 genes differentially expressed in whole roots of copy-number variant Fayette lines, suggesting complex biological processes participate in the Rhg1-mediated SCN resistance. On the contrary, the RNA-seq and qRT-PCR results showed that the statistically significant changes in expression of three genes at the Rhg1 repeat were not detected in whole roots of uninfected Fayette lines at 15 days. Moreover, these genes were not significantly altered in whole roots of resistant Fayette#99 infected with SCN at 8,

24, and 48 hpi, suggesting transcript levels of Rhg1 genes in Fayette lines may differ in specific tissues and different stages of SCN invasion. Matsye et al. (2011) previously identified the expression of two Rhg1 genes, an amino acid transporter (Glyma.18G022400) and alpha-SNAP

56

(Glyma.18G022500), in the syncytia of SCN infected roots at three time points (3, 6, and 9 days post-inoculation) in both Peking and PI 88788. In addition, high expression of gene encoding dysfunctional Rhg1-resistance-type alpha-SNAPs, which were highly accumulated in the feeding cells, induced cytotoxicity to support the degeneration of the syncytium (Bayless et al., 2016). In protein levels, the coevolution of defective forms of Rhg1-resistance-type alpha-SNAPs and its partner proteins (N-ethylmaleimide sensitive factor, NSFRAN07) was found to be a key mechanism in vesicle trafficking to balance between plant viability and cytotoxicity for the Rhg1-mediated resistance (Bayless et al., 2018). The characterization of alpha-SNAPs indicated the expression of the Rhg1 genes may differ extensively and locally in the syncytia. The atypical Rhg1-resistance- type alpha-SNAP proteins provided both pros and cons for plants; thus, we speculate that the expression levels of Rhg1 genes in Fayette lines may be under control to maintain the balance in other tissues before the syncytium establishment stage or in uninfected roots.

Co-expression network of differentially expressed genes revealed key regulators and complex regulatory network involved in SCN resistance More than a half of the DEGs in the co-expression network were higher expressed or lower expressed in the low-copy-Rhg1 Fayette#01 than those in other Fayette lines, clustering in the two largest clusters (cluster 1 and cluster 2) and the smallest cluster (cluster 7). Majority of genes encoding transcription factors (TFs) was clustered in cluster 1 such as genes encoding TFs in C2H2 zinc finger, WRKY and AP2/ERF families. These TF families are classified in multiple over- represented GO terms, which are related to signal transduction, hormone signaling, and respiratory burst, suggesting that the CNV at the Rhg1 locus may confer SCN resistant by recruiting many

TFs for transcriptional regulation in several biological processes. Based on sequence homology, a total of four genes (Glyma.04G044900, Glyma.06G045400, Glyma.17G236200, and

Glyma.20G133200) in cluster 1 were homologous to SALT TOLERANCE ZINC FINGER

57

(STZ/ZAT10) in Arabidopsis. One of these (Glyma.17G236200) was a hub gene closely related to gene encoding soybean cold-inducible zinc finger (SCOF-1) protein (Kim et al., 2001). The functional characterization of SCOF-1 in soybean or ZAT10 in Arabidopsis suggested that these genes regulate plant defense responses to abiotic stresses (Kim et al., 2001; Mittler et al., 2006), enhance tolerance to photoinhibitory light and exogenous H2O2 (Rossel et al., 2007), and involve in early responsive suppression of jasmonate (JA) biosynthesis-related gene (Pauwels and

Goossens, 2008). In addition, Yuan et al. (2018) found this set of C2H2 zinc finger genes plays a role in legume-rhizobia symbiosis and could form a complex transcriptional regulatory network by interacting with genes encoding WRKYs and several protein kinases such as MAPK, cGMP- dependent protein kinase and 5’AMP-activated protein kinase. In this study, the qRT-PCR illustrated that couple C2H2 zinc finger genes (Glyma.04G044900 and Glyma.06G045400) and a

WRKY gene (Glyma.19G254800) were induced in SCN infected roots in resistant genotypes, suggesting that these genes perhaps control plant defense responses to biotic stress, including SCN infection. The interactions of the C2H2 genes with other TFs and genes in signaling pathways were predicted in the co-expression network, indicating a complex regulatory network involved in

Rhg1-mediated SCN-resistance.

Activation of different sets of R-genes may lead to diverse defense mechanisms involved in the Rhg1-mediated SCN-resistance As the largest group of resistance genes, NBS-LRR genes encoding intercellular immune receptors mediate effector-triggered immunity (ETI) in response to pathogen attack. In co- expression network, NBS-LRR genes were identified in several gene clusters (e.g., cluster 1, 2, 3 and 5) and also categorized into “defense response,” one of over-represented GO terms in the cluster 2. These included two hub genes and two bridging genes in the network, indicating the

SCN resistance conferred by the CNV at the Rhg1 locus may require NBS-LRR genes functioning

58 as main players in detecting pathogen and activating downstream defense mechanisms. Most of

NBS-LRR genes in plant genome are generally organized in genomic clusters. The differentially expressed NBS-LRR genes in these Fayette lines showed distinct expression patterns in each network cluster and high correlation with other NBS-LRR genes at proximity in complex NBS-

LRR gene loci (mostly on chromosome 3, 8, and 13). These genes were homologous to well- characterized resistance genes and located in QTLs of other diseases. For example, the NBS-LRR genes in cluster 5, which showed distinct expression levels in high-copy-Rhg1 Fayette from those in other lines, were closely related to RPG1-B resistance to Pseudomonas syringe in soybean

(Ashfield et al., 2004). The RPG1-B indirectly detects pathogen effectors by interacting with

RIN4-like proteins in soybean (Selote and Kachroo, 2010), indicating that these differentially expressed NBS-LRR genes in Fayette lines perhaps act as guards and recognize board-spectrum effectors. Based on the observations in uninfected Fayette lines, these sets of NBS-LRR genes were predicted to interact with different types of defense-related genes within and between network clusters. Thus, we hypothesize that alteration of these NBS-LRR genes in Fayette lines may lead to diverse pathogen recognition by plant and subsequent activation of downstream defense responses which can affect the levels of resistance to SCN or multiple biotic stresses. It is worth noting that few differentially expressed NBS-LRR genes found in uninfected Fayette lines were overlapped with early response genes activated at 8 hpi, implying these NBS-LRR genes might not directly involved in the SCN response at early stages of infection.

The continuous induction of phytoalexin biosynthesis-related genes at the early stages of infection probably enhances Rhg1-mediated SCN resistance Genes related to the oxidation-reduction process (e.g., P450 and peroxidase genes) were clustered in cluster 4, and their expression levels tended to be higher in high-copy-Rhg1 Fayette.

About half of SCN-responsive genes at the early infection stage were identified in this cluster.

59

Consistently, Guttikonda et al. (2010) examined expression profiles of P450 genes in soybean from

99 microarray libraries in public database and found several P450 genes were induced by SCN

(e.g., CYP736A28, CYP93E1, CYP82A4, CYP94C18, and CYP81E11). The time-course expression profiles validated that the induction of the overlapped genes in cluster 4 was found in

SCN-infected roots at the migration phase (8 hpi) and remained up-regulated until the syncytium establishment stage (48 hpi) in resistant high-copy-Rhg1 Fayette. On the contrary, these genes were induced at only 8 hpi in susceptible Williams 82, suggesting that this group of genes play an important role in early defense response in both resistant and susceptible reactions, but a stable induction perhaps enhances Rhg1-mediated SCN-resistance in resistant Fayette lines. Based on protein sequence similarity, their roles were predicted to be a part of the biosynthesis of secondary metabolite, especially pathways in phenylpropanoid biosynthesis. For example, peroxidase genes involved in the lignin biosynthesis, five P450 genes (e.g., two CYP81E, two CYP71D8, and one

CYP93A1) participated in flavonoid/isoflavonoid biosynthesis towards phytoalexins which were reported as insect-induced defense response gene in Medicago (Liu et al., 2003), elicitor early- responsive genes in soybean (Schopfer and Ebel, 1998), and defense response marker in soybean

(Kinzler et al., 2016), respectively. Assuming the continuous induction of these genes holds true for other Fayette lines with different expression levels correlated to the copy number, the results implied different rates of phytoalexin accumulation in SCN-infected roots of Fayette lines with

CNV at the Rhg1 locus, leading to variation in levels of SCN-resistance.

Ethylene is a crucial hormone to regulate early response genes in SCN resistance Ethylene plays a role in modulating root colonization of cyst nematodes (Wubben et al.,

2001; Bent et al., 2006), attractiveness of roots to SCN (Hu et al., 2017), and throughout much of the SCN life cycle in infected roots (Tucker et al., 2010). The AP2/ERF is another key TF family in an ethylene signaling pathway involved in plant-cyst nematode interactions such as RAP2.3 60

(Hermsmeier et al., 2000), GmEREBP1 (Mazarei et al., 2002) and RAP2.6 (Ali et al., 2013). In this study, one AP2/ERF gene (Glyma.02G016100; homolog to ERF71/HRE2 in Arabidopsis) was identified as a bridging gene between cluster 1 and cluster 2. The ERF71/HRE2 gene plays a role in low oxygen signaling in Arabidopsis (Licausi et al., 2010) and is regulated by ethylene, validated by the functional characterization of its closely related gene, ERF73/HRE1 (Hess et al., 2011). The role of ERF71 and ERF73 genes may regulate ROS homeostasis (Yao et al., 2017), suggesting a possible function of Glyma.02G016100 gene in response to SCN infection; however, further analysis of this gene is still needed. The enrichment analysis of TF binding motifs in the promoters of DEGs illustrated three TF families (ERFs, NAC, and HD-ZIP) tend to bind to a specific group of genes, such as ERF motifs were most frequently identified in the promoters of DEGs in cluster

4 compared to those in the other clusters, suggesting that the ERFs may preferentially regulate genes related to oxidation-reduction processes in the Rhg1-mediated resistance to SCN. In addition, there were seven other AP2/ERF genes and other genes related to ethylene biosynthesis and signaling identified in the co-expression network. The results suggested that ethylene may play an important role in the regulation of defense-related genes at the early stage of infection.

Conclusion

This study demonstrated a wide range of defense-related genes involved in the Rhg1- mediated resistance to SCN by performing RNA-seq analysis on isogenic lines of Fayette cultivar that carry 8 to 11 copies of a 31.2kb repeat segment at the Rhg1 locus. The predicted defense mechanisms included integration of several TF families in signaling pathways, a modulation of ethylene signaling, activation of R-genes in plant immunity and accumulation of secondary metabolites such as phytoalexins for chemical defense. We also provided evidence for the participation of genes related to the biosynthesis of secondary metabolites in the early response to

61

SCN infection in resistant reactions. The candidate genes will be further functionally characterized to confirm their roles in the Rhg1-mediated SCN resistance and can apply to improve SCN resistance in PI 88788-derived cultivars.

62

Figures and Table

Figure 3.1 Venn Diagram comparing between two sets of differentially expressed genes (DEGs). Red circle represents the DEGs identified in RNA-seq analysis of three Fayette lines (Fayette#01, Fayette#19, and Fayette#99) with copy number variation (CNV) at the Rhg1 locus. Blue circle represents the DEGs identified in RNA-seq analysis of four genotypes (Fayette#99, Peking, Glycine soja PI 468916, and Williams 82) in early response to SCN infection (described in Chapter 2). The rectangle represents candidate genes that were selected for qRT-PCR using Fluidigm Biomark HD system. Purple rectangle denotes overlapped genes identified in both sets of DEGs. Blue rectangle denotes non-overlapped genes that were differentially expressed in early response to SCN infection, but not differentially expressed in uninfected Fayette lines and genes located within Rhg1, Rhg4, cqSCN006, and cqSCN007.

63

A Co−eCxpressiono-expr netwessorkio ofn DEGsnetwork B

Glyma.06G302700 Glyma.18G025200 Glyma.15G223800 Glyma.11G232100 Glyma.02G149100 Glyma.12G027200 Glyma.10G016600 Glyma.09G281800 Glyma.19G188400 Glyma.09G048400 Glyma.17G081300 Glyma.15G155700 Glyma.06G156600 Glyma.01G214400 Glyma.11G070500 Glyma.11G227500

Glyma.03G188300

Glyma.10G282200 Glyma.03G185600 Glyma.20G196800 Glyma.07G087400 Glyma.09G049200 Glyma.01G232400 Glyma.13G363000 Glyma.01G130800 Glyma.01G211800 Glyma.03G168000 Glyma.19G014600 Glyma.04G196400 Glyma.09G269500 Glyma.06G226700 Glyma.04G230400 Glyma.02G062600 Glyma.03G038200 Glyma.02G307300Glyma.10G295300 Glyma.03G004300 Glyma.13G184900 Glyma.16G195600Glyma.13G218200 Glyma.18G000400 Glyma.02G298500 Glyma.03G038700Glyma.11G062600 Glyma.18G148700 Glyma.04G121700 Glyma.17G041200Glyma.16G146700 Glyma.15G156100 Glyma.03G143700Glyma.01G134600 Glyma.02G184200 Glyma.20G103600 Glyma.13G068800 Glyma.02G053600 Glyma.04G230500 Glyma.02G235700 Glyma.10G285400 Glyma.08G174900Glyma.11G051800 Glyma.11G228600 Glyma.17G245100 Glyma.19G151100 Glyma.04G058100 Glyma.13G346700Glyma.06G286700Glyma.03G147700 Glyma.09G205700 Glyma.11G062500 Glyma.02G249300 Glyma.01G085700 Glyma.15G012000 Glyma.16G008500 Glyma.15G038700 Glyma.13G067700 Glyma.20G099100 Glyma.13G188400 Glyma.13G246600 Glyma.19G076800 Glyma.03G249100 Glyma.16G043800 Glyma.15G036800 Glyma.09G064200 Glyma.13G190800 Glyma.13G190300 Glyma.17G030400 Glyma.13G190400 Glyma.15G026900 Glyma.13G234400

Glyma.15G077800 Glyma.13G193300 Glyma.11G225600 Glyma.02G024600 Glyma.07G238700 Glyma.15G034000 Glyma.11G090200

Glyma.15G057400 Glyma.02G012100 Glyma.19G229700 Glyma.12G213200 Glyma.03G039800 Glyma.02G020000 Glyma.03G205400 Glyma.20G137800 Glyma.10G159100 Glyma.01G018000 Glyma.15G021100 Glyma.08G177100 Glyma.20G156800 Glyma.02G028400 Glyma.10G093900 Glyma.08G355000 Glyma.02G023800 Glyma.03G052800 Glyma.02G012800 Glyma.09G131100 Glyma.09G130400 Glyma.03G253200 Glyma.14G002800 Glyma.16G026400 Glyma.19G254800 Glyma.04G044900 Glyma.10G160300 Glyma.02G028700 Glyma.16G033900 Glyma.10G230600Glyma.02G040700 Glyma.06G295400 Glyma.15G025300 Glyma.06G290000Glyma.05G211700Glyma.15G077100Glyma.18G260700 Glyma.03G046000 Glyma.02G048600 Glyma.09G061900Glyma.13G005100Glyma.12G217500Glyma.18G026900Glyma.05G154900 Glyma.08G177200 Glyma.02G016200 Glyma.04G100300Glyma.08G021900 Glyma.02G012700 Glyma.13G184500 Glyma.11G181200Glyma.12G002400Glyma.14G016300Glyma.13G095100Glyma.12G117000Glyma.12G130200Glyma.12G221200Glyma.13G065300Glyma.04G228800 Glyma.12G214100Glyma.16G178500 Glyma.14G212200Glyma.04G092100Glyma.11G180500Glyma.04G228900Glyma.18G029300 Glyma.06G260100 Glyma.04G136200 Glyma.05G047100 Glyma.03G044300 Glyma.15G028500 Glyma.05G064200Glyma.06G045400 Glyma.13G195900 Glyma.15G019600 Glyma.02G018300 Glyma.12G197300 Glyma.06G259400Glyma.14G041700Glyma.17G236200Glyma.05G175000Glyma.02G255400Glyma.06G099700Glyma.13G314100Glyma.03G015800 Glyma.02G016100 Glyma.06G102000Glyma.12G236500Glyma.05G215900Glyma.20G133200 Glyma.03G099400 Glyma.07G215500 Glyma.12G186800 Glyma.15G109500 Glyma.02G049300 Glyma.17G128900Glyma.06G287000Glyma.06G144000 Glyma.02G242900Glyma.03G029600 Glyma.02G015700 Glyma.11G228100 Glyma.12G093100Glyma.13G054300 Glyma.03G039500Glyma.03G102900 Glyma.12G119200 Glyma.06G135900Glyma.02G296600 Glyma.12G060700 Glyma.04G020700 Glyma.19G016100 Glyma.01G067100Glyma.05G187200 Glyma.01G114400Glyma.08G106900 Glyma.04G105900Glyma.15G026500Glyma.03G044000 Glyma.11G194600 Glyma.20G070000 Glyma.07G063300 Glyma.02G024800Glyma.13G194100 Glyma.03G063400 Glyma.16G136200 Glyma.13G194900 Glyma.10G159800 Glyma.15G019400 Glyma.14G199600 Glyma.03G188900 Glyma.03G048500Glyma.15G026000Glyma.13G196100 Glyma.20G169000 Glyma.10G204400 Glyma.02G125900 Glyma.07G094000 Glyma.10G160200Glyma.03G100500 Glyma.13G200400 Glyma.01G231000 Glyma.03G043000Glyma.03G073300 Glyma.15G022300 Glyma.02G024900 Glyma.03G043500Glyma.02G028100Glyma.10G158600Glyma.10G105900Glyma.13G186200 Glyma.04G190400 Glyma.10G109500 Glyma.13G302500 Glyma.12G141700 Glyma.02G028800Glyma.03G054100Glyma.03G111400Glyma.13G194800Glyma.10G094900 Glyma.10G101200Glyma.13G208800 Glyma.13G204100 Glyma.02G229600 Glyma.13G184800Glyma.03G043600Glyma.02G060300Glyma.03G045600 Glyma.03G038100 Glyma.18G005300 Glyma.02G054800Glyma.13G179900 Glyma.03G191300 Glyma.12G022700 Glyma.02G059900 Glyma.02G037500Glyma.13G173900Glyma.02G021600Glyma.10G124800 Glyma.03G115100 Glyma.03G188500 Glyma.19G226800 Glyma.13G201500 Glyma.12G187400 Glyma.17G253000 Glyma.08G143400 Glyma.03G047000 Glyma.03G045300 Glyma.13G195100 Glyma.03G189900 Glyma.14G176400 Glyma.17G251400 Glyma.15G019300 Glyma.05G070600 Glyma.02G033900 Glyma.09G102800 Glyma.02G032800 Glyma.15G274600 Glyma.12G135000 Glyma.02G037300 Glyma.13G188300 Glyma.06G269700 Glyma.02G018700 Glyma.06G194600 Glyma.01G190600 Glyma.13G178100 Glyma.20G044100 Glyma.19G202900 Glyma.03G185100 Glyma.17G111100 Glyma.05G055700 Glyma.02G030200 Glyma.10G108500 Glyma.14G087200 Glyma.12G129500 1 2 3 4 5 6 7 Glyma.18G300300 Glyma.02G231600 Glyma.09G240700

Glyma.11G233400 Glyma.06G261000 Glyma.06G050900 Glyma.16G189300 Glyma.17G140300 Glyma.13G144300 Glyma.08G323200Glyma.08G333900 Glyma.02G132100 Glyma.16G009000 Glyma.10G255000 Glyma.05G130500 Glyma.18G190900 Glyma.08G338600 Cluster ID Glyma.07G212800 Glyma.02G303300 Glyma.12G141400 Glyma.08G319300 Glyma.08G338500

Glyma.14G167800 Glyma.15G054500 Glyma.12G147700 Glyma.02G311900 Glyma.15G216700 Glyma.14G136300 Glyma.08G305500 Glyma.16G070000 Glyma.08G331800 Glyma.08G230000 Glyma.08G317800Glyma.08G317400Glyma.12G139800 Glyma.10G059700 Glyma.08G314400 Glyma.08G334100 Glyma.08G338700 Glyma.12G136900Glyma.U041300 Glyma.08G336200 Glyma.04G190900 Glyma.08G321800 Glyma.12G142100 Glyma.13G076900 Glyma.08G318000 Glyma.08G339000Glyma.08G338900 Glyma.18G275600 Early response genes Glyma.07G259000 Glyma.02G272500 Glyma.11G236700 Glyma.18G191900 Glyma.04G213900 Glyma.11G036300 Glyma.08G317700 Glyma.08G312300 Glyma.08G323400 identified in Miraeiz et al. (submitted) Glyma.15G044400 Glyma.07G097100

Glyma.08G325100 Transcription factor genes

Figure 3.2 Co-expression network of differentially expressed genes (DEGs) and gene clusters.

A) Co-expression network was constructed based on strongly co-expressed DEGs with the Person correlation coefficient cut-off set at ± 0.9. Colors represent seven gene clusters (green, orange, blue, pink, light green, yellow, and ivory) and DEGs overlapped with early response genes (purple) described in the RNA-seq analysis in Chapter 2 (Miraeiz et al., submitted). The circle node represents protein encoding genes whereas triangle represents a subset of those genes encoding transcript factors.

B) Bar chart showing number of DEGs and early response genes in each cluster identified based on edge betweenness using Girvan-Newman algorithm.

64

Color Key Color Key Cluster2 Cluster4

−2 −1 0 1 2 −2 −1 0 1 2 Value Cluster 1 Value Cluster 4 Glyma.04G190400Glyma.12G187400 Glyma.06G286700Glyma.10G159100 Glyma.13G065300Glyma.20G169000 Glyma.13G363000 Glyma.12G214100Glyma.12G119200 Glyma.15G036800 Glyma.14G016300Glyma.18G260700 Glyma.19G076800Glyma.09G281800 Glyma.14G002800Glyma.15G077100 Glyma.15G012000Top enriched GO term: Glyma.09G130400Glyma.14G041700Top enriched GO term: Glyma.13G234400 Glyma.16G026400Glyma.10G230600 Glyma.04G121700Glyma.04G230400 Glyma.08G177200Glyma.03G253200 Glyma.15G156100 Glyma.13G302500Glyma.02G125900 Glyma.09G049200 Glyma.13G314100Glyma.02G012700 Glyma.18G148700 Glyma.20G137800Glyma.13G005100 Glyma.01G211800 Glyma.12G117000Glyma.08G177100 Glyma.19G151100Glyma.16G008500MF - Heme binding Glyma.12G002400Glyma.06G290000BP - Signal transduction Glyma.03G147700 Glyma.19G254800Glyma.01G067100 Glyma.08G174900 Glyma.09G131100Glyma.16G178500 Glyma.04G196400Glyma.16G195600 Glyma.02G040700Glyma.05G211700 Glyma.03G038200 Glyma.04G044900Glyma.16G033900 Glyma.03G038700 Glyma.18G026900Glyma.06G295400 Glyma.17G041200Glyma.13G218200 Glyma.11G181200Glyma.09G061900 Glyma.02G053600 Glyma.02G023800Glyma.12G130200BP - Ethylene signaling Glyma.16G146700 Glyma.14G199600Glyma.02G016100 Glyma.04G230500Glyma.04G058100 Glyma.05G047100Glyma.16G136200 Glyma.07G087400 Glyma.02G242900Glyma.14G212200 Glyma.03G143700 Glyma.04G020700Glyma.17G128900 Glyma.01G134600 Glyma.05G175000Glyma.06G260100 Glyma.11G062500 Glyma.07G215500 Glyma.11G062600Glyma.11G0518009 P450 genes Glyma.08G021900Glyma.12G236500BP - Respiratory burst Glyma.05G215900 Glyma.10G295300 Glyma.05G154900Glyma.06G259400 Glyma.09G269500 Glyma.13G054300Glyma.06G045400 Glyma.03G185600Glyma.06G156600 Glyma.03G015800Glyma.06G144000 Glyma.01G130800 Glyma.04G136200Glyma.12G186800 Glyma.02G298500 Glyma.04G228900Glyma.06G099700 Glyma.11G070500Glyma.19G014600 Glyma.02G296600Glyma.04G228800 Glyma.02G3073003 peroxidase genes Glyma.17G236200Glyma.12G060700 Glyma.16G043800 Glyma.12G093100Glyma.04G092100 Glyma.10G282200 Glyma.18G029300Glyma.11G180500 Glyma.06G226700Glyma.03G168000 Glyma.15G109500Glyma.11G228100 Glyma.20G099100 Glyma.08G106900Glyma.19G016100 Glyma.17G245100 Glyma.02G255400Glyma.20G133200 Glyma.02G249300 Glyma.20G070000Glyma.06G135900 Glyma.10G285400Glyma.09G2057004 ABC transporter Glyma.13G195900Glyma.03G02960020 TF genes: C2H2, AP2/ERFs, WRKYs Glyma.18G000400 Glyma.11G194600Glyma.10G204400 Glyma.20G103600 Color Key Glyma.06G287000Glyma.01G231000 Glyma.17G030400Glyma.02G024600 Glyma.06G102000Glyma.05G064200 Glyma.15G026900 Glyma.12G217500Glyma.04G100300 Glyma.13G246600 Glyma.03G063400Glyma.12G221200 Glyma.13G068800 Glyma.13G095100Glyma.01G114400 Glyma.09G064200Glyma.13G346700 Cluster7 Glyma.08G355000Glyma.05G1872005 NBS-LRR genes Glyma.02G184200 Glyma.14G176400Glyma.20G156800 Glyma.07G238700Glyma.03G249100 Glyma.12G022700Glyma.19G226800 Glyma.15G057400 Glyma.03G205400Glyma.06G194600

Color Key

1

1

1

1

9

9

9

9

9

9

9

9

1

1

1

1

9

9

9

9

9

9 9

9 4 RLK genes Cluster3

0

0

0

0

1

1

1

1

9

9

9

9

0

0

0

0

1

1

1

1

9

9

9

9

e

e

e

e

e

e

e

e

e

e

e

e

t

t

t

t

t

t

t

t

t

t

t

t

e

e

e

e

e

e

e

e

e

e

e

e

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t t

−2 −1 0 1 2 t

t

t

t

t

t

t

t

t

t

t

t

t

e

e

e

e

e

e

e

e

e

e e

−2 −1 0 1 2 e

e

e

e

e

e

e

e

e

e

e

e

e

y

y

y

y

y

y

y

y

y

y y

5 y

y

y

y

y

y

y

y

y

y

y y

Value y Value

a

a

a

a

a

a

a

a

a

a a

Cluster a

a

a

a

a

a

a

a

a

a

a a

2 a

F

F

F

F

F

F

F

F

F

F F

Cluster F

F

F

F

F

F

F

F

F

F

F F F Glyma.20G196800 Glyma.18G005300 Glyma.13G184900 Glyma.03G115100Glyma.02G028800 Glyma.03G004300 Glyma.17G111100Glyma.12G135000 Glyma.15G038700 Glyma.12G129500Glyma.02G018700 Glyma.03G039800 Glyma.13G204100Glyma.06G269700 Glyma.11G227500 Glyma.02G030200Glyma.08G143400 Glyma.10G160200Glyma.11G233400Top enriched GO term: Glyma.01G232400 Glyma.03G048500 Glyma.02G062600 Glyma.03G054100Glyma.03G100500 Glyma.15G034000 Glyma.13G186200Glyma.13G194900 Glyma.13G194100Glyma.13G196100 Glyma.13G190800 Glyma.03G043000Glyma.03G044000 Glyma.13G190300 Glyma.02G024800Glyma.03G043500 Glyma.13G193300 Glyma.07G063300Glyma.15G026500BP - Defense response Glyma.13G190400 Glyma.12G141700Glyma.15G028500 Glyma.13G1884005 NBS-LRR genes (Chr. 13) Glyma.13G184800Glyma.07G094000 Glyma.13G067700 Glyma.13G200400Glyma.02G028100 Glyma.02G229600Glyma.15G019400 Glyma.11G225600 Glyma.03G038100Glyma.03G045600 Glyma.01G085700 Glyma.02G020000Glyma.10G093900 Glyma.15G077800 Glyma.15G021100Glyma.19G229700BP - SAR Glyma.06G302700 Glyma.03G111400Glyma.03G052800 Glyma.11G228600 Glyma.10G094900Glyma.13G194800 Glyma.02G012100Glyma.03G191300 Glyma.02G235700

Glyma.02G048600Glyma.03G039500

1

1

1

1

9

9

9

9

9

9 9

Glyma.04G105900 9

0

0

0

0

1

1

1

1

9

9 9

Glyma.03G099400 9

e

e

e

e

e

e

e

e

e

e e

Glyma.03G044300 e

t

t

t

t

t

t

t

t

t

t t

Glyma.02G012800 t

t

t

t

t

t

t

t

t

t

t t

Glyma.10G160300 t

e

e

e

e

e

e

e

e

e

e e

Glyma.02G024900 e

y

y

y

y

y

y

y

y

y

y y

Glyma.03G046000Glyma.02G015700 y

a

a

a

a

a

a

a

a

a

a a

Glyma.03G188900 a

F

F

F

F

F

F

F

F

F

F F Glyma.20G044100Glyma.03G189900 F Glyma.02G059900Glyma.15G019600 Color Key Glyma.03G102900Glyma.10G109500 Glyma.15G025300Glyma.13G184500 Cluster6 Glyma.02G033900Glyma.02G01830016 NBS-LRR genes (Chr. 3 & 13) Glyma.02G049300Glyma.11G090200 Glyma.02G016200Glyma.02G028700 6 Glyma.13G179900 −2 −1 0 1 2 Cluster Glyma.10G124800 Glyma.15G026000 Value Top enriched GO term: Glyma.10G101200 Glyma.10G159800Glyma.03G188500 Glyma.13G195100Glyma.03G073300 Color Key Glyma.10G105900Glyma.02G0548003 Dirigent-like genes Glyma.17G251400 Glyma.10G158600Glyma.13G208800 Glyma.01G214400 Glyma.02G060300Glyma.02G021600 Glyma.02G028400Glyma.15G022300 Glyma.17G081300BP - Trichoblast differentiation Glyma.05G070600Glyma.03G185100 Glyma.12G213200 Glyma.13G178100Glyma.02G037500 Glyma.11G232100 Cluster1 Glyma.03G045300Glyma.03G043600 Glyma.15G019300Glyma.03G0470002 Lipoxygenase genes Glyma.10G016600 Glyma.13G201500Glyma.13G173900 Glyma.12G027200 Glyma.02G032800Glyma.17G253000 Glyma.09G048400 Glyma.15G223800

Glyma.18G025200

1

1

1

1

9

9

9

9

9

9 9

9 Glyma.15G1557005 Pollen Ole e1 allergen and

0

0

0

0

1

1

1

1

9

9 9

9 Glyma.03G188300

e

e

e

e

e

e

e

e

e

e e

−2 −1 0 1 2 e Glyma.02G149100

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t t

t Glyma.19G188400

e

e

e

e

e

e

e

e

e

e

e

e

1

1

1

1

9

9

9

9

9

9 9

Value 9

y

y

y

y

y

y

y

y

y

y

y

y

0

0

0

0

1

1

1

1

9

9

9

9

e

e

e

e

e

e

e

e

e

e e

3 e

t

t

t

t

t

t

t

t

t

t t

t extensin family protein genes

t

t

t

t

t

t

t

t

t

t t

Cluster t

a

a

a

a

a

a

a

a

a

a

a

a

e

e

e

e

e

e

e

e

e

e

e

e

y

y

y

y

y

y

y

y

y

y

y

y

a

a

a

a

a

a

a

a

a

a

a

a

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F

F F Glyma.08G331800 Glyma.08G314400Glyma.08G319300 Glyma.12G139800Glyma.12G136900 Glyma.08G305500 Color Key Glyma.15G054500 Cluster57 Glyma.04G213900Glyma.14G136300 Glyma.11G036300 −1 0 1 Glyma.10G108500 Value Cluster Glyma.12G142100Glyma.08G339000 Glyma.14G167800Top enriched GO term: Glyma.15G274600 Glyma.08G334100Glyma.08G338900 Glyma.08G338600 Glyma.09G102800 Glyma.08G323200Glyma.08G338500 Glyma.10G255000 Glyma.19G2029002 Multi-antimicrobial extrusion Glyma.16G009000 Glyma.02G132100 Glyma.01G190600 Glyma.02G303300Glyma.12G141400BP - Xenobiotic metabolic process

Glyma.08G338700Glyma.07G212800 Glyma.02G037300

1

1

1

1

9

9

9

9

9

9

9

9

0

0

0

0

1

1

1

1

9

9 9

Glyma.06G050900 9

e

e

e

e

e

e

e

e

e

e

e

e

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t t

Glyma.11G236700 t

e

e

e

e

e

e

e

e

e

e

e

e

y

y

y

y

y

y

y

y

y

y

y

y

a

a

a

a

a

a

a

a

a

a a

Glyma.01G018000 a (MATE) genes

F

F

F

F

F

F

F

F

F

F F Glyma.12G197300 F

Glyma.04G190900Glyma.18G275600

1

1

1

1

9

9

9

9

9

9 9 Glyma.18G300300Glyma.13G144300 9

Glyma.16G070000Glyma.08G317800

0

0

0

0

1

1

1

1

9

9 9 Glyma.09G240700Glyma.05G055700 9

Glyma.06G261000Glyma.16G189300

#

#

#

#

#

#

#

#

#

# # Glyma.08G333900Glyma.13G1883005 Glycosyltransferase genes #

Glyma.08G312300Glyma.18G190900

e

e

e

e

e

e

e

e

e

e e

Glyma.08G317700Glyma.07G259000 e

t

t

t

t

t

t

t

t

t

t t

Glyma.08G321800Glyma.15G044400 t

t

t

t

t

t

t

t

t

t

t t Glyma.U041300 t

Glyma.08G318000Glyma.08G3362007 NBS-LRR genes (Chr. 8)

e

e

e

e

e

e

e

e

e

e e Glyma.12G147700Glyma.08G317400 e

Glyma.17G140300Glyma.08G323400

y

y

y

y

y

y

y

y

y

y y Glyma.08G325100Glyma.07G097100 y

Glyma.14G087200Glyma.05G130500

a

a

a

a

a

a

a

a

a

a a Glyma.13G076900Glyma.02G2316003 RLK genes a

Glyma.08G230000Glyma.15G216700

F

F

F

F

F

F

F

F

F

F F Glyma.02G311900Glyma.10G059700 F

Glyma.18G191900Glyma.02G272500

1

1

1

1

9

9

9

9

9

9

9

9

1

1

1

1

9

9

9

9

9

9

9

9

0

0

0

0

1

1

1

1

9

9

9

9

0

0

0

0

1

1

1

1

9

9

9

9

e

e

e

e

e

e

e

e

e

e

e

e

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

#

#

#

#

#

#

#

#

#

#

#

#

e

e

e

e

e

e

e

e

e

e

e

e

y

y

y

y

y

y

y

y

y

y

y

y

a

a

a

a

a

a

a

a

a

a

a

a

e

e

e

e

e

e

e

e

e

e

e

e

F

F

F

F

F

F

F

F

F

F

F

F

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

t

e

e

e

e

e

e

e

e

e

e

e

e

y

y

y

y

y

y

y

y

y

y

y

y

a

a

a

a

a

a

a

a

a

a

a

a

F

F

F

F

F

F

F

F

F

F

F F

Figure 3.3 Heatmap showing gene expression profiles and over-represented GO terms in each cluster. The cluster IDs were represented by the circle on top of each heatmap with number and color corresponding to cluster 1 to cluster 7. The colors in heatmap represent normalized expression levels in each samples. BP stands for enriched GO terms in biological process category. MF stands for enriched GO terms in molecular function category.

65

Figure 3.4 Hub genes in each cluster and top 10 genes ranked by the betweenness centrality.

A) The co-expression network showing hub genes (blue) and top 10 genes (bridging genes) ranked by betweenness centrality (red).

B) The list of hub genes identified based on the highest degree of connectivity in each cluster. The colors represents corresponding cluster of each hub gene.

C) The list top 10 genes (bridging genes) ranked by betweenness centrality. The colors represent clusters in the network that the genes link them together.

66

TF Motif ID Gene ID Method Data source Data source ID Motif family

MP00540 Glyma.04G026300 BBR-BPC DAP PMID:27203113 (O’Malley et al., 2016) AT5G42520

MP00253 Glyma.05G137100 BBR-BPC DAP PMID:27203113 (O’Malley et al., 2016) AT2G01930

MP00126 Glyma.13G345200 bZIP DAP PMID:27203113 (O’Malley et al., 2016) AT1G06070

MP00229 Glyma.02G293300 C2H2 DAP PMID:27203113 (O’Malley et al., 2016) AT1G72050

MP00118 Glyma.10G257900 C2H2 ampDAP PMID:27203113 (O’Malley et al., 2016) AT5G67450

MP00043 Glyma.15G053600 CAMTA PBM PMID:25215497 (Weirauch et al., 2014) M0582_1.02

MP00623 Glyma.12G117000 ERF PBM PMID:25215497 (Weirauch et al., 2014) M0004_1.02

MP00311 Glyma.03G111700 ERF DAP PMID:27203113 (O’Malley et al., 2016) AT2G44840

Glyma.13G149500 MP00486 HD-ZIP DAP PMID:27203113 (O’Malley et al., 2016) AT5G03790 Glyma.10G064700

MP00323 Glyma.13G169900 HD-ZIP DAP PMID:27203113 (O’Malley et al., 2016) AT3G01470

MP00322 Glyma.18G258400 HD-ZIP DAP PMID:27203113 (O’Malley et al., 2016) AT3G01220

MP00320 Glyma.16G021000 HD-ZIP DAP PMID:27203113 (O’Malley et al., 2016) AT2G46680

MP00225 Glyma.01G044600 HD-ZIP DAP PMID:27203113 (O’Malley et al., 2016) AT1G69780

MP00510 Glyma.16G063700 MYB DAP PMID:27203113 (O’Malley et al., 2016) AT5G14340

MP00479 Glyma.11G107100 MYB DAP PMID:27203113 (O’Malley et al., 2016) AT4G38620

MP00438 Glyma.09G206200 MYB DAP PMID:27203113 (O’Malley et al., 2016) AT4G17785

MP00337 Glyma.U027500 MYB DAP PMID:27203113 (O’Malley et al., 2016) AT3G09230

MP00259 Glyma.02G019300 MYB DAP PMID:27203113 (O’Malley et al., 2016) AT2G02820

MP00612 Glyma.20G192500 NAC ChIP-seq SRA:SRX669382 (Zhang et al., 2015) SRX669382

MP00439 Glyma.17G154100 NAC DAP PMID:27203113 (O’Malley et al., 2016) AT4G17980

MP00363 Glyma.09G167400 NAC DAP PMID:27203113 (O’Malley et al., 2016) AT3G17730

MP00343 Glyma.20G192300 NAC DAP PMID:27203113 (O’Malley et al., 2016) AT3G10500

MP00333 Glyma.16G043200 NAC DAP PMID:27203113 (O’Malley et al., 2016) AT3G04070

MP00179 Glyma.20G172100 NAC DAP PMID:27203113 (O’Malley et al., 2016) AT1G34190

MP00004 Glyma.11G247300 Trihelix PBM PMID:25215497 (Weirauch et al., 2014) M1287_1.02

MP00618 Glyma.03G109100 WRKY PBM PMID:24477691 (Franco-Zorrilla et al., 2014) WRKY12

MP00539 Glyma.18G238600 WRKY DAP PMID:27203113 (O’Malley et al., 2016) AT5G41570

MP00531 Glyma.04G054200 WRKY DAP PMID:27203113 (O’Malley et al., 2016) AT5G26170

MP00525 Glyma.04G238300 WRKY DAP PMID:27203113 (O’Malley et al., 2016) AT5G24110

MP00408 Glyma.06G142000 WRKY DAP PMID:27203113 (O’Malley et al., 2016) AT3G56400

MP00094 Glyma.02G306300 WRKY SELEX PMID:26531826 (Mathelier et al., 2016) MA0589.1

Figure 3.5 List of 31 over-represented transcription factor (TF) motifs identified in promoters of differentially expressed genes in Fayette lines. The motifs and their metadata were retrieved from PlantTFDB v.4.

67

Figure 3.6 Enrichment analysis of transcription factor (TF) motifs and number of unique genes harboring over-represented motifs in their promoters.

A) The 31 over-represented motifs were identified in promoters of differentially expressed genes (DEGs) in this study. The enrichment analyses were performed based on gene clusters by comparing motifs found in promoter sequences of DEGs in each cluster, compared to those found in the promoters of all genes in soybean genome as background. The motifs were classified into ten TF families. The size of circle represents number of genes harboring each motif. The colors represent levels of adjusted p-values. The significantly enriched GO terms identified using Fisher’s exact test with adjusted p-values (p < 0.01). B) Bar charts show number of unique genes harboring over-represented motifs in their promoters. The motifs were classified into ten TF families: BBR-BPC, bZIP, C2H2, CAMTA, ERF, HD-ZIP, MYB, NAC, Trihelix, and WRKY. The colors represent genes in each cluster (cluster 1 to cluster 7) and purple color represents early response genes that were differentially expressed in early response to SCN infection at 8 hours post inoculation (hpi) based on our previous RNA-seq analysis described in Chapter 2 (Miraeiz et al., submitted) and differentially expressed in uninfected Fayette lines.

68

A B

Correlation between qR T−PCR and RNA−Seq (58 overlapped genes) Correlation between qRT−PCR and RNA−Seq (non−overlapped genes)

q

q

e

e

S

S

0

8

1

A

A

N

N R

R = 0.8661 R

Fayette#99

RG. soja = 0.7332

y

y

b

b

p−value < 2.2e−16 p−value = 7.48e −14

)

)

6

s

s

t

t o

o RPeking = 0.8346 o

R = 0.6635 o r

Williams 82 r

p−value < 2.2e −16

d

d e

p−value = 1.857e−08 e

t

t

4

5

c

c

e

e

f

f

n

n

i

i

n

n

u

u

/

/

2

d

d

e

e

t

t

c

c

e

e

f

f

n

n

i

i

0

0

N

N C

C RFayette#99 = 0.7274

S

S (

( Genotypes

p−value = 1.459e −13

e

e

2 g

g G. soja

− n

n RWilliams 82 = 0.7025 a

Genotypes a h

h p−value = 2.155e −12 Peking

c

c

● Fayette#99 ● Fayette#99 d

d log2 fold change of non−overlapped genes

l

l

o

o

4

5 f

Williams 82 f Williams 82

− 2

2 Fayette#99 Peking Soja Williams 82

g

g o

Glyma.11G228100o . *

l l −4 −2 0 2 4 6 8 Glyma.08G317400 −5 0 5 . 10 15 UC Glyma.02G016200 log2** fold change of non −o verlapped genes log2 fold change (SCN−infected/uninfected roots) by qRT−PCR Glyma.20G169200 ** log2 fold change*** (SCN −inf ected/uninf*** ected roots) by qR* T−PCR Glyma.19G245400 Fayette#99 Peking *** Soja Williams 82 Glyma.18G256900 *** *** * . * *** *** ** * Glyma.11G228100 . * Glyma.16G162400 *** *** * *** ** Glyma.08G317400 . UC Glyma.15G129200 *** . ** *** ** Glyma.02G016200 ** Glyma.11G129700 . ** * * Glyma.10G119400 *** * ** *** * Glyma.20G169200 log2** fold change of non*** −o verlapped genes*** * Glyma.08G091400 * * ** ** 8hpiDEGs−Fayette Glyma.19G245400 *** C Glyma.04G131100D **Fayette#99* ** Peking *** Soja Williams 82 Overlapped genes Glyma.18G256900 *** *** * . * *** *** ** * log2 fold change of overlapped genes Glyma.04G126300 N on -o v erla p pe d g e ne s Glyma.16G162400Glyma.11G228100 log2*** fold*** change* of non . −o verlapped genes**** ** Glyma.03G044900 ** . *** *** ** Fayette#99 Williams 82 Glyma.15G129200Glyma.08G317400 *** . ** *** . ** UC Glyma.03G024900 * ** Glyma.11G129700Glyma.02G016200 **. Fayette#99 ** Peking * Soja * Williams 82 Glyma.19G254800 * * . Glyma.02G268200 * * *** * Glyma.10G119400 *** * ** *** * Glyma.11G228100Glyma.02G088700 ** ***. **** ** Glyma.18G026900 ** * Glyma.20G169200Glyma.08G091400 ** * **** ***** *** 8hpiDEGs−Fayette Glyma.08G317400 . UC Glyma.14G212200 * * . * Glyma.01G135200Glyma.19G245400Glyma.04G131100 ** * . ** *** * Glyma.02G016200 ** Glyma.13G065300 . Glyma.18G256900Glyma.04G126300 *** *** * . * *** *** ** * Glyma.20G248900 Glyma.12G093100 Glyma.16G162400Glyma.03G044900 ***** ***. * *** *** ** ** Glyma.20G169200Glyma.20G001400 **. **** ***** . *. Glyma.15G129200Glyma.03G024900 *** . ** *** * ** ** Glyma.11G181200 . . Glyma.19G245400Glyma.19G162200 . * *** * Glyma.11G129700Glyma.02G268200 *. *** **** * Glyma.10G230600 * ** . Glyma.18G256900Glyma.18G055600 *** **** * . * *** ***** ** * Glyma.10G119400Glyma.02G088700 ***** * ***** *** *** DEGs Glyma.09G130400 * . * Glyma.16G162400Glyma.17G053600 *** *** * *** ** Glyma.08G091400Glyma.01G135200 ** * . *** ** *** 8hpiDEGs−Fayette Glyma.15G129200Glyma.17G046600 **** *. * ***** *** * . *** found Glyma.08G021900 . 1 Glyma.04G131100 ** * ** *** 1 Glyma.11G129700Glyma.17G030300 . ** **** * * in Glyma.06G295400 ** . Glyma.20G248900Glyma.04G126300 Glyma.10G119400Glyma.17G030000 ***. * **. *** *** Glyma.06G045400 * . Glyma.20G001400Glyma.03G044900 **. . **** ***** . **. Fayette Glyma.08G091400Glyma.16G064200 * * ** ** * ** 8hpiDEGs−Fayette Glyma.05G215900 . Glyma.19G162200Glyma.03G024900 . * * ** * Glyma.04G131100Glyma.15G142400 ** * ** *** Glyma.04G044900 Glyma.18G055600Glyma.02G268200 * * *. *** ** * Glyma.04G126300Glyma.15G079100 . * *** Glyma.17G053600Glyma.02G088700 ** *** *** ** Glyma.03G253200 Glyma.03G044900Glyma.14G049000 **. . *** ***** ** Glyma.17G046600Glyma.01G135200 *** * *. ***** *** * . * Glyma.02G242900 * * . Glyma.03G024900Glyma.13G370100 * ** * ** * Glyma.17G030300 *** * Glyma.02G023800 Glyma.02G268200Glyma.13G302400 * * *** * * Glyma.20G248900Glyma.17G030000 . . *** ** Glyma.02G088700Glyma.12G110100 ** ***. ***** *** Glyma.01G114400 Glyma.20G001400Glyma.16G064200 . * ** ** *. . Glyma.01G135200Glyma.11G116500 *** * . ** * 8hpiDEGs−nonFayette Glyma.19G162200Glyma.15G142400 . * * Glyma.15G026500 Glyma.09G284700Glyma.18G055600 * *. **** *** * 2 Glyma.15G079100Glyma.20G248900 . * *** Glyma.02G028100 2 Glyma.09G277900Glyma.17G053600 * ** *** . Glyma.14G049000Glyma.20G001400 . **** ** . . Glyma.09G198900Glyma.17G046600 * * * *** *** * . * Glyma.13G370100Glyma.19G162200 * . * ** * * Glyma.17G140300 Glyma.09G172100 * * ** . . s Glyma.17G030300 *** * Glyma.13G302400Glyma.18G055600 * . *** ** Glyma.16G070000 . e Glyma.09G087400 ** Glyma.17G030000Glyma.12G110100 . . ***** *** n Glyma.17G053600 Glyma.14G136300 Glyma.07G186100 . * 3 e Glyma.16G064200 ** * 3 Glyma.11G116500Glyma.17G046600 * * * *** *** * . * 8hpiDEGs−nonFayette Glyma.07G212800 * ** . ** G Glyma.07G023300Glyma.15G142400 . ** * Glyma.09G284700Glyma.17G030300 * **** * * Glyma.06G303100Glyma.15G079100 . **** *** Glyma.04G213900 . Glyma.09G277900Glyma.17G030000 . * . ***** *** . ** Glyma.03G157800Glyma.14G049000 . ***** ** ** Glyma.09G198900Glyma.16G064200 * *** *** * Glyma.19G151100 * * . * Glyma.03G042700Glyma.13G370100 * ** ** * * s Glyma.09G172100Glyma.15G142400 * * ** . . DEGs

Glyma.19G076800 . e Glyma.03G024600Glyma.13G302400 * * Glyma.09G087400Glyma.15G079100 . * *** ** Glyma.18G148700 ** * . n Glyma.02G064300Glyma.12G110100 . ** * found e Glyma.07G186100Glyma.14G049000 . *** ** . s Glyma.02G029400 ** e Glyma.11G116500 * 8hpiDEGs−nonFayette Glyma.17G245100 * ** * G Glyma.07G023300Glyma.13G370100 * . ** * * * in n Glyma.01G128100 ** .

e Glyma.09G284700 * * * * Glyma.17G030400 * * Glyma.06G303100Glyma.13G302400 *** *** non- G Glyma.01G127100Glyma.09G277900 * *** ** *** . * Glyma.16G195600 ** * ** * Glyma.03G157800Glyma.12G110100 . **. ** *** Glyma.09G198900Glyma.03G042700 * ** *** * * Fayette Glyma.16G146700 * . Glyma.11G116500Glyma.20G126700 * 8hpiDEGs−nonFayette Glyma.09G172100 * * ** . . s Glyma.03G024600 * * Glyma.16G008500 Glyma.09G284700Glyma.16G159700 * . * * * ** . e Glyma.09G087400 ** Glyma.02G064300Glyma.09G277900 * ** *** . n Glyma.16G145600 Glyma.15G156100 ** * . * Glyma.07G186100 . e Glyma.02G029400Glyma.09G198900 * ***** * 8hpiDEGs−old Glyma.15G012000 Glyma.05G083900 ** ** * . G Glyma.07G023300 . ** * Glyma.01G128100Glyma.09G172100 * *** ** . . s Glyma.03G057500

Glyma.13G346700 * ** ** ** e Glyma.06G303100Glyma.01G127100 *** ***** * Glyma.09G087400Glyma.01G125400 ** Glyma.13G068800 . . n Glyma.03G157800 . ** **

e Glyma.07G186100 . Glyma.20G126700Glyma.03G042700 ** *

Glyma.11G070500 * G Glyma.18G245200Glyma.07G023300 . ** * Glyma.16G159700Glyma.03G024600 * . * Glyma.11G062600 * * * * Glyma.18G244900Glyma.06G303100 *** **** Glyma.16G145600Glyma.02G064300 Glyma.11G062500 ** ** * ** 4 Glyma.18G244800Glyma.03G157800 . ** ** 8hpiDEGs−old 4 Glyma.05G083900Glyma.02G029400 ** Glyma.11G051800 * * . Glyma.18G244700Glyma.03G042700 ** * cqSCN Glyma.03G057500Glyma.01G128100 ** . Glyma.18G244600Glyma.03G024600 * * Glyma.10G295300 . Glyma.01G125400Glyma.01G127100 *** ** * Glyma.18G244500Glyma.02G064300 . Glyma.09G064200 * ** * Glyma.15G191200 Glyma.02G029400Glyma.18G245200 ** Glyma.09G049200 * ** . * Glyma.20G126700 Glyma.01G128100 ** . Glyma.18G244900Glyma.16G159700 . * Glyma.18G022700 Glyma.08G174900 ** * * Glyma.01G127100 *** ** * Glyma.18G244800Glyma.16G145600 cqSCN006 Glyma.07G087400 . Glyma.18G022500 8hpiDEGsRhg1 −old Glyma.18G244700Glyma.05G083900 cqSCN& Glyma.04G230400 ** ** ** * Glyma.18G022400Glyma.20G126700 Glyma.18G244600Glyma.03G057500 Glyma.04G121700 ** ** . ** Glyma.16G159700 . cqSCN007 Glyma.18G244500Glyma.01G125400Glyma.11G234600 . Glyma.16G145600 Glyma.04G058100 * ** . Glyma.15G191200Glyma.11G234500 8hpiDEGsRhg1Paralog−old Glyma.05G083900 Glyma.18G245200 Glyma.03G147700 . . Glyma.11G234300 Glyma.03G057500Glyma.18G022700 Glyma.03G038700 Glyma.18G244900 * Glyma.01G125400 Glyma.18G022500Glyma.18G244800Glyma.08G108900 . Rhg1Rhg4 Glyma.02G307300 * * Rhg1 Glyma.18G022400Glyma.18G244700 cqSCN Glyma.01G211800 . . Glyma.18G245200 Glyma.20G129300Glyma.18G244600 * Glyma.01G134600 * ** ** * Glyma.18G244900Glyma.11G234600 * Glyma.18G291500Glyma.18G244500 . Glyma.18G244800Glyma.11G234500 Rhg1PRhargalog1 Glyma.18G040000Glyma.15G191200 Glyma.15G026400 . Glyma.18G244700 cqSCNJia Glyma.11G234300Glyma.16G154200 homeolog weakly Glyma.18G244600 Glyma.14G049200 * * Glyma.18G022700Glyma.12G194800 NoClustercorrelated Glyma.08G108900Glyma.18G244500 . . RRhg4hg4 Glyma.10G261200 ** ** * * genes Glyma.18G022500Glyma.12G065600 Rhg1 Glyma.15G191200 Glyma.06G087800 * . Glyma.18G022400 Glyma.20G129300 T08 T24 T48 T08* T24 T48 T08 T24 T48 T08 T24 T48 Glyma.18G291500Glyma.18G022700 T08 T24 T48 T08 T24 T48 Glyma.11G234600 Glyma.18G040000Glyma.18G022500 Hour s post inoculation Rhg1 Hours post inoculation Glyma.11G234500 Rhg1PJiaaralog Glyma.16G154200Glyma.18G022400 Glyma.11G234300 Glyma.12G194800 log2 fold change Glyma.11G234600 log2 fold change Glyma.12G065600Glyma.08G108900 . Rhg4 (SCN−infected/uninfected roots) Glyma.11G234500 (SCN −infected/uninf ected roots) Rhg1Paralog Glyma.11G234300Glyma.20G129300 T08 T24 T48 T08* T24 T48 T08 T24 T48 T08 T24 T48 Glyma.18G291500 Hour s post inoculation −4 0 4 Glyma.08G108900 . Rhg4 Glyma.18G040000 −5 0 5 10 Jia Glyma.16G154200 Glyma.20G129300 * Glyma.12G194800 log2 fold change Glyma.18G291500 Glyma.12G065600 Glyma.18G040000 (SCN −infected/uninf ected roots) Jia Glyma.16G154200 T08 T24 T48 T08 T24 T48 T08 T24 T48 T08 T24 T48 Glyma.12G194800 Hour s post inoculation Glyma.12G065600 −5 0 5 10 T08 T24 T48 T08 T24 T48 T08 T24 T48 T08 T24 T48 Figure 3.7 Temporal expression profiles of candidate genes measureHourlog2s post fold inoculationchange by qRT-PCR using Fluidigm (SCN−infected/uninfected roots) log2 fold change Biomark HD system. Overlapped genes denote genes that were (SCN−5 −inf 0ected/uninfdifferentially5 ected10 roots) expressed in early −5 0 5 10 response to SCN infection at 8 hours post inoculation (hpi) and differentially expressed in uninfected Fayette lines. Non-overlapped genes denote genes that were differentially expressed in early response to SCN infection at 8 hpi, but not differentially expressed in uninfected Fayette lines. Log2 fold change value represents normalized expression level of a gene in SCN–infected roots relative to those in uninfected roots.

69

Figure 3.7 Continued

A) Scatter plot showing correlation between log2 fold change values of overlapped genes measured by RNA-seq described in Chapter 2 (Miraeiz et al., submitted) and qRT-PCR at 8 hpi in Fayette#99 (black circle) and Williams 82 (green triangle).

B) Scatter plot showing correlation between log2 fold change values of non-overlapped genes measured by RNA-seq described in Chapter 2 (Miraeiz et al., submitted) and qRT-PCR at 8 hpi in Fayette#99 (black circle), Peking (red square) and Glycine soja PI 468916 (blue diamond) and Williams 82 (green triangle). C) Heatmap showing expression profiles of overlapped genes at 8, 24 and 48 hpi in Fayette#99 and Williams 82. The genes were grouped by the gene clusters identified in co-expression network. D) Heatmap showing expression profiles of non-overlapped genes at 8, 24 and 48 hpi in Fayette#99, Peking, Glycine soja PI 468916 and Williams 82. The genes were grouped into two main categories: 1) Differentially expressed genes (DEGs) previously identified in RNA-seq analysis described in Chapter 2 (Miraeiz et al., submitted) including DEGs found in Fayette and DEGs found in non-Fayette genotypes, and 2) Genes located in QTLs including cqSCN006, cqSCN007, Rhg1, Rhg1 homeolog, and Rhg4. The colors in heatmap represent up-regulation (purple) and down-regulation (orange). The statistically significant changes in gene expression were identified using FDR-adjusted p-value cut-off set at 0.05. (0 – 0.001 “***”, 0.001 – 0.01 “**”, 0.01 – 0.05 “*”, and 0.05 – 0.1 “.”)

70

Table 3.1 Gene list and primer sequences (forward and reverse sequences) designed for qRT-PCR. Three endogenous reference genes (Bold) were used for normalization and calculation relative expression values.

Gene List Description Forward Sequences Reverse Sequences Glyma.01G114400 Otubain CTGCTGCACTTCTTGTTCTCA CCAGACACATGATACCACAAAATC Glyma.01G125400 NBS-LRR TTGAGGATCCTGGAAACCGT GGCCTCAGTCCCCTTGTTA Glyma.01G127100 dirigent-like gene CCCTCATTTACGCCAATGCC GTTGCTTCCGGTGAAGGTCT Glyma.01G128100 WRKY33 GAAACCCAAACCCAAGGAGT GGTGTTGTTGGAGGTGTTGT Glyma.01G134600 Glycinol 4-dimethylallyltransferase GTTGGTGTGTTACAAGCTGCG GGGATAATTGCCCGGATGCC Glyma.01G135200 CYP82A ATCAAGCTCGGTGCCAAGAA ATGAGCTCAGCGACGAGAAG Glyma.01G211800 NADPH:isoflavone reductase CTGGAGTTAAGCTGATTCAGGGA ACGGTCTACATCCAACCCAA Glyma.02G016200 Pollen Ole e 1 allergen and extensin AGCTTGAGAACCCCAAACCC CGGCTCCTTGAATTGGGGAA family protein Glyma.02G023800 NBS-LRR TGACATTGATCCCTCGCACG AGTTCAGACTCCATCCTGTTGAC Glyma.02G028100 Matrix metalloproteinase TTCGCAATAACGACCCTCAG CGCCGAAATGGATGATTGGTC Glyma.02G029400 C2H2 zinc finger CTGCATTAATTTCCCACCGCA AGCCAAATAGCCTGAGCCTT Glyma.02G064300 ribonuclease 1 ACCCGTAGGATACTGCAAGC CTTATTGTCGCAATCTATAGCCGTT Glyma.02G088700 Receptor-like kinase CTGGCAGATAGGCTCGTGAA CTGCTACCTGGAGTTGCACA Glyma.02G242900 U-box domain-containing protein CATCGAACGGATCCCCACTC CTCACTTTCCCTTCCCCACG Glyma.02G268200 ACC-oxidase AATTGAGGTAATCACGAATGGGAG TTGCTCTGTCTCCTGTGCCT Glyma.02G307300 NAD(P)H-dependent 6'-deoxychalcone GGAAGTTCTCCTTTCCTATTGAGGT TGCAAGGTTCATCTCCACTTGA synthase Glyma.03G024600 NA AATCGAAAGCAAATGAAGCACA GGCATGCATATTTGTGGTGTGT Glyma.03G024900 Chitinase CACTCCCCACCTTATCGAGC GGTCATCTCCTCCGATGCTG Glyma.03G038700 Peroxidase TGGTGGATGAGATCAAAGAGGC CCCCTAGCGCAACGACAGA Glyma.03G042700 WRKY33 ATCCAAACCCAAGGAGTTACTACA GCTGGTGTTGGTGTTGTTGTT Glyma.03G044900 dirigent-like gene GGGAGAGTGGAGGGGTTGTA CCCAAGATCGTGATGGTGCT Glyma.03G057500 DNAJ heat shock family protein GGAGAAGGCTCCAGCAATGT CCCGGCTTGATCTCAATGGT

71

Table 3.1 Continued Gene List Description Forward Sequences Reverse Sequences Glyma.03G147700 dirigent-like gene GCAGGAAATATCACTCTTCACTCT CAGACAAAGAAAGTGGATTTGGC Glyma.03G157800 calmodulin ACCGTGTCACACATTGCTCA ATGAATCCATCAGAGTCCTTGTCA Glyma.03G253200 Indole-3-acetic acid induced protein TCTCACCAGACGTGGCTACT CACGCAAATCGGCAACATCA ARG-2 homolog Glyma.04G044900 C2H2 zinc finger CAGAACTGCCACTCCATCGT CCTCGGTGCATGAAGAGTGT Glyma.04G058100 acetyltransferase GGTCGCCTTGTCACCCAAAAAT ACTGAACAACGAGTGGGACG Glyma.04G121700 Polyphenol oxidase CCGAAGGTGTAGAACCCACC GATGAGCAGCGGATCGTACA Glyma.04G126300 Membrane steroid binding protein 1 AGGTTGTCCAACAAGCTAAGA GACAAACAACGCACAACTGC Glyma.04G131100 eugenol synthase TCCTCCGTTCCAAGCTTTCC AGCACAGCTTTTGTATCACCG Glyma.04G213900 alcohol dehydrogenase 1 GGAAGCCAAGGGACAGACAC GAGATCCGTCACACCCTCAC Glyma.04G230400 NA CAGAACTGCTCTGTTGCTGA GTGGTACACATGGTTCGTCG Glyma.05G083900 NA TAGCCTCCGAAACATTGCCT GCACTCTTTGCACAACGGAC Glyma.05G215900 WRKY41 TCCTTTAGCCTCACTCCTGCT ATGAGACCCTTGGAGTGGTGA Glyma.06G045400 C2H2 zinc finger/ZAT10 CCACAAGCATAAGCACAAGCA CGTCCACGGTGTCTCTAAGT Glyma.06G087800 Malic enzyme GTGGTGGTGTGAGGGACTTG AGCATCTCTCTCTCCCTCGG Glyma.06G295400 NA AGAAGATTGTGGCCTCGGTT ACTCCTGAGTGTGAAAGAACCA Glyma.06G303100 MYB_related CPC GCTGACATAGATCGCTCCTT GCAATCAAAGACCACCTCTCC Glyma.07G023300 WRKY40 AGTGGGTTGGCTGGTTTAGA CGCTTTCTACCTCCTTGGGAA Glyma.07G087400 Cytosolic aldehyde dehydrogenase AACACTCTCTACCCTGCCTC CGTTTTTCCTGAAACAGAATCTAGG Glyma.07G186100 putative branched-chain-amino-acid GAGCATTGTTAGGGGTGGCA TAGTGCACCCTTCTGGTAGC aminotransferase 7 Glyma.07G212800 Nitrite and sulphite reductase CTGGCAACCAGTTTTGTGGG CTCACAGGCCTAGTCACAGC Glyma.08G021900 WRKY41 TCAAGTCTCGTTCTGGCTAAGT TGATGCTCTTCTCTGGAAGCA Glyma.08G091400 Glutamate decarboxylase TGGCTGCAATCTTGGGTTCA ACCACTTGCAGCATCCACAT Glyma.08G108900 serine hydroxymethyltransferase TTAACTTCGCGGTGTTCCCT TATTTTCCAAGCGCAACGGC Glyma.08G174900 Glutathione S-transferase ACATTGACAAAAAGGTGTATCCTGC CCCAAACGTGTCACCTCCATA Glyma.08G317400 NBS-LRR-RPM1 ACAGGGGGTTTCCGAACCTA GCACCTCTGTCAGTTCGTCA

72

Table 3.1 Continued Gene List Description Forward Sequences Reverse Sequences Glyma.09G049200 CYP81E10 CCTTGCTTTGGCCATGCTTT CGGTCTTGCCCTACTTGTGT Glyma.09G064200 bHLH TCGGAATCAACGGTCATACCTT CACCCCATGAAAACAGCAGTC Glyma.09G087400 inositol transporter 2 GAAAGAACACACCCTGAAACCA AGCCCCAGAAATAACTCCAGTG Glyma.09G130400 heavy-metal-associated domain- GCAGTATCTGGTATTTCAGGGGT AGCTTGCCAAGACTGAATTTTGTA containing protein Glyma.09G172100 U-box domain AGAAGAGATGGTGTTGGGATGG GGATTTGCCACCTTTGCGAC Glyma.09G198900 dirigent-like gene CAGCCAACAGGTCCTCCAAA AGCCCGTCGAGACTAGATGT Glyma.09G277900 Peroxidase GTCCCAACGCTGGTTCGATT GTAGTTCCACCAAGAGCAACG Glyma.09G284700 Peroxidase GAAGCGCAGTAGAGAAAGCA CAGAACCATCACAACCCCTGA Glyma.10G119400 Subtilisin-like serine endopeptidase GGGAGTCGTCCAATCCACAG TCCAATCCTGCCATTCTCGC family protein (Cucumisin) Glyma.10G230600 heavy-metal-associated domain- ATCCCCCTTTTCTCATTTTCCCT AGAAACTGACTCAACCCCTGAA containing protein Glyma.10G261200 PRp27-like protein CAGGGTTGTGCCAAATGGTG GCCTGACCATTCCCATTCCA Glyma.10G295300 Glycinol 4-dimethylallyltransferase CCATCACATGCAGACCGTTG ACGCAAATCCAAAATGCCCG Glyma.11G051800 CYP81E8 ATCGGCCATCTGCTTTCCTC TACAGCGGAAGTCTCTGTGC Glyma.11G062500 CYP71D8 GGCAACAACAGTGAAGCACA TCCAGCAGCAAATATGTCCCAT Glyma.11G062600 CYP71D8 GCAACGGTAGTGAAGCGGAG CCAGCAGCAAATATGTTCCATATCA Glyma.11G070500 NADPH:isoflavone reductase TCAGGAGTAAATCTAATTCAGGGAG TCGTGACGATCCACATCCAA Glyma.11G116500 Arabidopsis protein of unknown TGGTTAGCTTATTAGAGAGCGTG ATCTGCCAATTGTGAGCAACC function Glyma.11G129700 Beta-glucosidase AGGTCCTAATTTGACTTAGAGGGT ACCGATACACATCTTTGTTTCACC Glyma.11G181200 F-box family protein CCGTATCGTCCCTCATGCAC CGGAAGGAGGAGTTGTGGAG Glyma.11G228100 Hs1pro-1 protein AGTTCCGACATGCCTTCCAA TCGTAAGTGGCGCAAACAGA Glyma.11G234300 Paralog Rhg1- WI 12 GCAACGCGGAATCTCTTTTCT GCTTGTTGCTGGAATCAGCG Glyma.11G234500 Paralog Rhg1-alpha-SNAP GCATGACCCCTCTGGATTCT AGACAGACACACAAAACAGTGTA Glyma.11G234600 Paralog Rhg1-amino acid transporter ATTGTTTTAAGAGATGTCCACGG TGGCAAGCACACTTGTAACC Glyma.12G051100 F-BOX ONLY PROTEIN 3 GAGCCCAAGACATTGCGAGAG CGGAAGCGGAAGAACTGAACC

73

Table 3.1 Continued Gene List Description Forward Sequences Reverse Sequences Glyma.12G093100 CCR4-NOT transcription complex CAGCGAAGATCCGAATCCCA TTGAAGGGTTTGGTGGGGTC family protein Glyma.12G110100 NA CACGCTTGTACGAAACGCC CTTGTCTTTGCCTTGCAGACC Glyma.13G065300 NA GATTCGCGTGTTCGTTCTCG TGTCCGCGTACTGTTCCTTC Glyma.13G068800 CYP82A3 GAGGAGCACCGTCAGAAGAA CCTCCCAAAATCAATTCCAAGC Glyma.13G302400 MYB-like 102 GGGAAACAAGTGGTCGGCTAT TGGATTAACGTTGCCAATTGCT Glyma.13G346700 Endochitinase PR4 TTGATCAGGCTGACTCTGGC TCAATGTGGCAAAAATGTCCAGT Glyma.13G370100 WRKY40 TCGTTTGCTTTCAGCAAGAGC TCCACCTCCTTTTTGGGAAGT Glyma.14G049000 ACC-oxidase AGAGGAAGGCTACCCTTAACC TGACACTGCCTCCTTGAACC Glyma.14G049200 ACC-oxidase TCCTAATATGCACCAACAAACACA GGGTATTCCATGATTCACCAACTCA Glyma.14G136300 Phytochromobilin:ferredoxin CAGTTTTACTTGTGCCTTGAGGA CTGTAAAGGCGAAGGGATCAAG oxidoreductase Glyma.14G212200 U-box domain-containing protein TTGTCCACAAGGCAAAACGC ACCTTTCACAAGTCACACCTC Glyma.15G012000 ABC transporter CAAGACCAAGCATCCCCGT TTGACCTTCAGACGTCATTGGT Glyma.15G026400 Lipoxygenase AGAAAAGATCCCAACAGTGAGA ACGTCTTGAGCCACTGATTTG Glyma.15G026500 Lipoxygenase GGCACCGGCTTAGACTTCTT TCCAACTTTTCCTTTCCCACCA Glyma.15G079100 ERF AAGCTGCACAAGTAGCACAG TCGGTGTCCTGTAAGACAGA Glyma.15G129200 Peroxidase AGTTGCTTTGTCAGGTGCTCA AGTTGGGTCACTGTTGCCAT Glyma.15G142400 Beta-glucosidase GGGTTTGGAGGAAAGAGGCT GGATTCCAACCTTACACATCACT Glyma.15G156100 CYP81E AACAACACCACCGTAGGCTC TCCTCGTTCGAGTTCTTGGC Glyma.15G191200 gamma-SNAP TCAGTCTCACGAGGTGGAGT CCGCATCCCAAGGTGAAGAA Glyma.16G008500 Kinase-like protein ACCGGAAAAAGACCAAACACATATC CATGGTTGTCAAATGGCCTGTCA Glyma.16G064200 LRR-RLK GTTGAGCTGATATCAATTGTGAA TGTTTGGGGCATTGAAGACA Glyma.16G070000 Tubby C 2 GGGGTAATACCTCTGAGTTGC AGTTGGAGATTCACCGGCAT Glyma.16G145600 MLO-like protein 12 TCCGAATCTTCGATTAAACCGT TGAACCACTTTCCAATACCATGA Glyma.16G146700 Mitochondrial phosphate transporter ATTCCTTGATGGCTCCCGTG GCAAGCAGCATAGAACGACG Glyma.16G159700 NBS-LRR TCTTCCAACCTCACAAGATGCT CTCTTAGGCTCGGCAGATGG

74

Table 3.1 Continued Gene List Description Forward Sequences Reverse Sequences Glyma.16G162400 Tryptophan aminotransferase-related TGCTAATGCTGGAAGTGGGG CGGTGATTGCATTCCCAACG protein 4 Glyma.16G195600 CYP71A26 ATGGGCATGGTGATGGTGTC CCAAACATATCCAGTATCAAAGCCT Glyma.17G030000 PR10 TGTGAATGTGATCCAAGGTG ATATTTGAAGGGGCTAGCTTCATTG Glyma.17G030300 Stress-induced protein SAM22 GGCTGTTGATGCCTTCAGGA TCTCCATCCTCAACGAAAGTGA Glyma.17G030400 Stress-induced protein SAM22 TAGCTACAGCGTAGTGGGTG ATTCGACAGTGAGCTTGCCA Glyma.17G046600 Flavin-dependent monooxygenase 1 GGGAAGTCACTGGCTTTGGA TCTGCCCCTTTCTTCTGTGG Glyma.17G053600 calmodulin-binding family protein CTGCGCTTGCTTTTCTCTCG CTTCAGTTTGTCGTTCTTGTATGGT Glyma.17G140300 Protease inhibitor/seed storage/LTP GGTGTTCCCAAGGGATTCGT TGCCGGGCACAAATACTGAA family Glyma.17G245100 NA CGGTGAACTCAGGGAGTCAG AGATCGAACCCATGTTGCCA Glyma.18G022400 amino acid transporter CGGAGATGTGCTATCTGGAA CCACTGCAAGAAGAGTTGAC Glyma.18G022500 α-SNAP TCGCCAAATCATGGGACAAGG CAATGTGCAGCATCGACATGGG Glyma.18G022700 predicted wound-inducible protein CACTGTATGACGCCCTAAACTC ATGGACTGCGGAACGAATC Glyma.18G026900 RLK (putative CCR3) ACATCTCCTTCCCCTGTCCA ACATGTCTAGTGGCGTTGGG Glyma.18G055600 Peroxidase CTGTCGTAGCATTGGGTGGT ACCCCTTGTTTGAAAAGGCAG Glyma.18G148700 1-Deoxy-D-xylulose 5-phosphate TCATCAAAGCACATTGCAGCC CCAGGTCCCAGGTCAAGATCC synthase 2 Glyma.18G244500 Lecithin:cholesterol acyltransferase CTCACAAAGCTTCTGGGGGA GAAAGGGCAAGCCAAGCAAA Glyma.18G244600 AP2 domain CCCTACGGTGGTAACTCAGC GTGTGGTGGTTGGTTCCTCA Glyma.18G244700 Calcineurin-like phosphoesterase GGTTGTCGGTGCTTCCTTTG CAGTCTCGAGTCTCAAGGGTT Glyma.18G244800 Chromatin assembly factor 1 subunit A TCCCTCATCTCCATCATCTGAG CCATGAAGAAAATTGTGTCTTGCG Glyma.18G244900 p-Nitrophenyl phosphatase CGGCTAATTATAATCGTAACCGTTC TTTTGCCCTTTGACCGAAGC Glyma.18G245200 LETM1-like protein TCGAAGACGCGCATTTCCT TTATCACCGTCGCTCATGAAAAC Glyma.18G256900 PQQ enzyme repeat - Quinohemoprotein TTCGATTCCATGCTGCCGAT TCCATGCCCATTAGAACTTGCT ethanol dehydrogenase type-1 Glyma.19G076800 Lysine histidine transporter 1 GTCAGAGCTTGGATGGGGTC CAACCACAAGCTGTTGAGGC Glyma.19G151100 dirigent-like gene TGACTCGAAACTTGTGGGCA AACACTGATGGTGCTTCCGT

75

Table 3.1 Continued Gene List Description Forward Sequences Reverse Sequences Glyma.19G162200 NA CAGAATCTGGGGTAGCTTGC GGAAGCGATTGCATGAGAAGC Glyma.19G245400 PR4 TGGGACGCTAGCAAACCTTA TATTTGTCACCCGCAAGCAC Glyma.19G254800 WRKY53 GTCAGCGTACCACTTGGACA CCAGGATTGGGGACTTGGTG Glyma.20G001400 Peroxidase GGCAATAGTTAAAAGCACGGT ATCGCAACCCCTGACAAAG Glyma.20G126700 Cyclin GAAAGGGAGTGATGGGTGTGA CCTTCATCGCAGTAAAGGGC Glyma.20G130700 TIP41-like family AGGATGAACTCGCTGATAATGG CAGAAACGCAACAGAAGAAACC Glyma.20G137800 Cysteine-rich receptor-like protein GCGCTACTCTACTCTACAACCAA GCACTGATTGCATTCCAACTCT kinase Glyma.20G141600 Ubiquitin family GTGTAATGTTGGATGTGTTCCC ACACAATTGAGTTCAACACAAACCG Glyma.20G169200 Peroxidase CGTGCCAATTGCTCCGTTAG GCATGCATCGACCCAGTCA Glyma.20G248900 Protein phosphatase 2C family protein GCCACAGATGGGGTATGGGA TAGCCGCCTCAACCAACATT

76

CHAPTER 4 REGULATORY MECHANISMS OF GENES IMPLICATED IN RHG1-MEDIATED RESISTANCE TO SOYBEAN CYST NEMATODE

Introduction

Small RNA (sRNA) is 20-24nt non-coding RNA that regulates gene expression, including processes involved in plant development, stress responses, and disease defense. There are two major classes of sRNAs: micro RNA (miRNA) and small interfering (siRNA). The functions of miRNA are in posttranscriptional regulation by mRNA cleavage and translational inhibition, while siRNA has roles in both transcriptional and posttranscriptional regulation. A few studies of sRNA related to nematode parasitism in host plants have been reported. The sugar beet cyst nematode was used in an Arabidopsis sRNA study during plant-nematode interactions (Hewezi et al., 2008).

The susceptibility to the nematode decreased in Arabidopsis that carries mutated genes involved in sRNA biogenesis, indicating the sRNA regulation of gene expression in plant host during cyst nematode infestation.

For the soybean cyst nematode, Li et al. (2012) found 40 SCN race 3-responsive miRNA families in roots using high-throughput sequencing of sRNAs (sRNA-seq) to compare miRNA expression between resistant and susceptible soybean cultivars planted in SCN-inoculated and uninoculated soil for 30 days after emerging. Moreover, sRNA populations were identified in roots of two sister soybean lines that are different in SCN race 4 resistance using sRNA-seq from

Illumina GAIIx (Xu et al., 2014a). To predict targets of identified miRNAs, miRNA-induced cleavage sites were predicted using parallel analysis of RNA ends (PARE) or degradome analysis, which is a high-throughput technique to capture miRNA-cleaved mRNAs containing 5’- monophosphorylated RNAs and poly (A) tails. In addition, Xu et al. (2014a) validated 14

77 differentially expressed miRNAs using qRT-PCR and found seven upregulated miRNAs in SCN- infected roots. Of these miRNAs, the authors identified their targets in the SCN genome based on computational miRNA target prediction, indicating potential cross-species regulatory roles of sRNAs between plant and nematode. Genome-wide identification of soybean miRNA responsive to SCN infection and a predicted regulatory network showed candidate miRNAs (e.g., miR159, miR171, miR398, miR399, miR408, miR1512, miR2119, and miR9750) for manipulating SCN infection (Tian et al., 2017). The functional characterization revealed that the regulatory network of soybean miR396 and its target, growth-regulating factor (GRF) transcription factor genes, is important for female SCN development in infected soybean roots (Noon et al., 2019).

In addition to posttranscriptional control, sRNA is involved in transcriptional control via

DNA methylation and histone modification. Lister et al. (2008) showed the correlation between sRNA and DNA methylation location and abundance in Arabidopsis. DNA methylation is one of the epigenetic control mechanisms that are important for regulating patterns of gene expression and transposon silencing. DNA methylation in plants is induced by the RNA-directed DNA methylation (RdDM) pathway. According to the process of RdDM, siRNA incorporates into

ARGONAUTE 4 (AGO4) in the RNA-induced silencing complex (RISC) to target nascent scaffold transcribed by Pol V and recruits DNA methyltransferase (DOMAINS REARRANGED

METHYLTRANSFERASE 2 or DRM2) to establish DNA methylation (Law and Jacobsen, 2011).

The maintenance of DNA methylation during DNA replication is depended on types of methylation context. The symmetric CG and CHG methylation (in which H represents A, T or C) is maintained by DNA METHYLTRANSFERASE 1 (MET1 or DMT1) and

CHROMOMETHYLSASE 3 (CMT3). In contrast, asymmetric CHH methylation is maintained by DRM2 using the process of RdDM (Cao et al., 2003). Variation in siRNA populations and

78

DNA methylation modifications are also involved in plant-nematode interactions. Clusters of 23-

24nt siRNAs (heterochromatic siRNA; hc-siRNAs) were over-represented in Arabidopsis root galls induced by root-knot nematode and colocalized in putative promoters of genes differentially expressed in galls, suggesting a role of siRNA populations in transcriptional regulation by RdDM during nematode infection (Cabrera et al., 2018). The beet cyst nematode parasitism induces dynamic changes in methylomes of the Arabidopsis roots during syncytium formation and maintenance phases (Hewezi et al., 2017). DNA methylation was investigated in SCN-susceptible soybean roots using whole-genome bisulfite sequencing (Rambani et al., 2015). Approximately one-third of hypo or hypermethylated regions overlapped with unique genes, including 278 uniquely expressed genes in the syncytium, indicating that SCN induces epigenetic controls of gene expression during the compatible interactions. For Rhg1-mediated SCN resistance, differentially methylated regions (DMRs) have been reported at the Rhg1 locus, using restriction enzyme-based methylation (Cook et al., 2014) and bisulfite sequencing (Schmitz et al., 2013). The majority of DMRs were located at the common promoter between an amino acid transporter

(Glyma.18G022400) and an alpha-SNAP (Glyma.18G022500) as well as in both the 5’ and 3’ flanking regions of the neighboring WI12 gene (Glyma.18G022700). The results suggested the regulation of Rhg1 genes by DNA methylation involved in SCN resistance.

Little is known about regulation of sRNA and its relationship to DNA methylation marks related to SCN resistance conferred by the CNV at the Rhg1 locus. Therefore, this study focused on the analysis of regulatory variations in uninfected Fayette lines with three different forms of the CNV at the Rhg1 locus (8-11 copies), by integrating small RNA sequencing (sRNA-seq) and whole-genome bisulfite sequencing (WGBS) data. We identified candidate genes or transposable elements (TEs) that were potentially under regulation of either sRNAs or DNA methylation

79 determined by the proximity of gene/TEs locations to the differentially expressed sRNA clusters or differentially methylated regions.

Materials and Methods

Plant materials for sRNA-seq experiment and RNA extraction Fayette cultivar isolines harboring the resistant Rhg1-b allele originally developed by introgression of PI 88788 locus in Williams background (Bernard et al., 1988) were selected for sRNA-seq experiments based on the copy number of the 31.2kb segment at the Rhg1 locus that were reported in previous studies (Cook et al., 2012; Lee et al., 2015; Lee et al., 2016), including

Fayette#01 (low, 8-9 predicted copies at the Rhg1 locus), Fayette#19 (mid, 9-10 predicted copies at the Rhg1 locus) and Fayette#99 (high, 11 predicted copies). Seed for three Fayette lines were harvested from single plants propagated in the greenhouse at University of Illinois at Urbana-

Champaign. The sterilized seed was germinated and grown in a cone-tainer filled with autoclaved mixture of sand and turface (2:1). The plants were maintained in growth chamber for 15 days under

16/8h (light/dark) condition at 26 °C. Four biological replicates of whole roots were harvested and flash-frozen in liquid nitrogen. Plant tissues were stored in -80 °C freezer until RNA extraction.

Frozen tissue was ground to fine powder with a mortar and pestle tissue, homogenized in CTAB buffer for RNA extraction (Chang et al., 1993) and purified in acidic phenol/chloroform. DNase I

(New England Biolabs) was used for removing contaminated DNA according to manufacturer’s protocol. The RNA was precipitated with isopropanol and re-suspended in RNase-free water.

Then, RNA concentration and quality was measured using Nanodrop and Bioanalyzer. The RNA samples were sent for small RNA library preparation and sequencing at the DNA services unit in the Roy J. Carver Biotechnology Center, University of Illinois at Urbana-Champaign, IL.

80

Plant materials for DNA methylation Young leaves of three Fayette lines (Fayette#01, Fayette#19 and Fayette#99) were harvested from a single soybean plant of each line. Based on methods described in previous studies

(Cook et al., 2012; Lee et al., 2016), the plants was grown and maintained in a growth chamber for 10 days under photocycle of 16/8 h (light/dark), 23/20°C (light/dark) and 50% relative humidity. The leaf DNA samples were prepared using a previously reported CTAB protocol

(Doyle and Dickson, 1987). The DNA samples were then sent for construction of Whole Genome

Bisulfite Sequencing (WGBS) library and sequencing at the DNA services unit in the Roy J. Carver

Biotechnology Center, University of Illinois at Urbana-Champaign, IL. Since the cost for WGBS per sample is still high, a single biological replicate per Fayette line was selected to perform

WGBS. Before sequencing, the genomic DNA samples were treated with sodium bisulfite to convert unmethylated cytosines to uracils, which are sequenced as thymines. On the other hand, the methylated cytosine bases remain unchanged. The method enables researchers to distinguish between methylated and unmethylated cytosine residues across the soybean genome with a single- base resolution.

Analysis of sRNA-seq data and miRNA target prediction The 50nt single-end reads were generated from Illumina’s HiSeq2500 sequencing system, according to the Illumina protocol, at the Roy J. Carver Biotechnology Center at the University of

Illinois at Urbana-Champaign. After quality check using FastQC (Andrews, 2010), the reads were processed with following steps: adapter and low quality base trimming, tRNA and rRNA removing, and length filtering. The 18-27nt trimmed reads that were not matched tRNA and rRNA were selected for miRNA identification. The reads were aligned against plant mature miRNA sequences retrieved from miRBase version 21 (Kozomara and Griffiths-Jones, 2014). The mapped reads classified as conserved miRNAs were quantified, normalized and performed differential

81 expression analysis using edgeR (Robinson et al., 2010). Then, the unmapped reads were mapped against soybean reference genome (Schmutz et al., 2010) to predict new miRNA precursor loci

(MIR) with major miRNA sequences and phasiRNA precursor loci, using the stand-alone application for small RNA gene annotation and quantification, namely ShortStack v 3 (Axtell,

2013). In addition to the prediction of new miRNAs and phasiRNA, the sRNA clusters were quantified to identify differentially expressed sRNA clusters using edgeR (Robinson et al., 2010).

The characterization of differentially expressed sRNA clusters were based on their mapping loci such as transcribed regions (5’UTR+gene body+3’UTR) and 2kb flanking sequences of transcribed regions using BEDtools (Quinlan and Hall, 2010). For miRNA target prediction, the sequences of mature miRNAs identified from sRNA-seq data and preloaded soybean transcripts were input to feed in the psRNATarget pipeline (Dai and Zhao, 2011), a web tool implemented for sRNA target analysis in plants. The default setting of schema V2 (2017 release) with some modifications for a calculation of target accessibility was used in this study. The expectation was set at 2.5 as a medium stringent cut-off threshold, allowing 1 mismatch at the seed region (from

2nd to 13th nucleotide of miRNA-mRNA duplex).

Preprocessing and analysis of WGBS data Quality of paired-end reads sequenced from three WGBS libraries were assessed using

TrimGalore (Krueger, 2015). The adapters and low-quality bases in bisulfite-treated reads were trimmed before mapping the reads against soybean reference genome (Schmutz et al., 2010). The genome alignment was performed using a three-letter aligner, Bismark (Krueger and Andrews,

2011). Cytosines in both reference sequence and bisulfite-treated reads were substituted with

Thymines. Then, the reference was indexed and aligned using Bowtie (Langmead, 2010).

Methylation calling was consequently carried out by using the bismark_methylation_extractor function, producing report of context-dependent (CpG, CHG, and CHH) methylation for each 82 library. For statistical analysis, differentially methylated regions (DMRs) were identified using

DMRcaller R package (Catoni et al., 2018). The genome sequence each chromosome was divided into 500 bp non-overlapped windows to calculate weighted methylation levels for each windows which is equal to the sum of methylated reads divided by the total number of reads for a cytosine sequence context (Schultz et al., 2012). The minimum cytosine number of 4 was required in each window for calculating the methylation levels and further pairwise comparisons. The Fisher’s exact test and multiple testing correction (adjusted p-value < 0.01) were applied to detected DMR for each window with a minimum methylation difference of 0.4, 0.2, and 0.1 for CG, CHG, and

CHH, respectively. The adjacent DMRs were merged together or separated with a minimum gap between DMRs set at 200 bp. Moreover, locations of DMRs were investigated to identify DMR- associated genes and DMR-associated transposable elements (TEs) using BEDtools. The DMR- associated genes represent genes overlapped with DMRs in their transcribed regions or 2kb flanking gene regions whereas DMR-associated TEs denote TE with DMRs located in the intervals of TE body or 2kb flanking regions. The genome-wide and local methylation profiles were visualized using the circlize R package (Gu et al., 2014) and karyoploteR (Gel and Serra, 2017).

Results

Expression of miRNAs in Fayette lines Mature miRNAs of two families (miR5037 and miR5374) were differentially expressed in

Fayette lines [Figure 4.1]. These miRNA families were suppressed in Fayette#99 compared to those in other Fayette lines. The prediction of miRNA targets using psRNATarget tool showed that transcripts of 14 genes were potentially targeted by miR5037, including two homeologous transcription factor genes (Glyma.05G045300 and Glyma.17G127500). These genes encode

GRAS family transcription factors which are homologous to Arabidopsis RGL1 gene, a negative

83 regulator of gibberellin responses (Wen and Chang, 2002). No target gene was predicted for miR5374 based on the expectation value cut-off threshold set at 2.5 in this study; however, the potential targets of miR5374 previously identified in SCN-infected soybean roots were NBS-LRR transcripts (Li et al., 2012).

In addition, the prediction of miRNA precursor loci was performed based on the sRNA mapping to genome sequence after filtered out mature miRNA sequences. Of total 37 predicted miRNA loci, 16 miRNA precursor loci with at least 100 mapped reads were selected to investigate the similarity of predicted miRNA sequences compared to mature miRNA sequences in miRBase and their genomic loci. Predicted miRNA sequences in eight loci showed high similarity to previously reported mature miRNA sequences in miRbase, but only five loci were previously identified as MIR precursor loci in soybean genome. Therefore, out of 16 predicted loci, there were

11 MIR precursor loci generating either new mature miRNAs (eight loci) or variants of known mature miRNAs (three loci) in Fayette lines [Table 4.1]. The expression levels of miRNAs in predicted precursor loci were not differentially expressed in Fayette lines.

No significant differential expression of genes and sRNAs was observed among three Fayette lines carrying different copy numbers at the Rhg1 locus Copy number variation at the Rhg1 locus among Fayette lines allows us to investigate the relationship between copy number variation and sRNA regulation. However, there was no significant difference in the expression levels of sRNAs at the Rhg1 locus among three Fayette lines [Figure 4.2A]. Similarly, according to the RNA-seq data described in Chapter 3, there was not enough evidence to conclude that the mRNAs of three genes encoded within the Rhg1 repeat were differentially expressed among three Fayette lines using FDR cut-off of 0.05 [Figure 4.2B].

The relative expression measured by qRT-PCR confirmed that the expression levels of the three

84 expressed genes in the Rhg1 locus were not detectably altered in uninfected roots of Fayette lines even though copy number variations were present [Figure 4.2C]. sRNA clusters at the Rhg1 locus were predicted as PHAS loci in Fayette lines Based on the prediction of PHAS loci with a phase score threshold set at 100 calculated by the ShortStack pipeline, 55 PHAS loci were identified in sRNA generated from Fayette root samples [Table 4.2], residing at transcribed regions (44 loci), non-transcript regions (nine loci) and upstream sequence plus 5’end of transcribed regions (two loci). NBS-LRR genes were the most frequent genes harboring PHAS loci (28 genes). It is worth noting that two PHAS loci were predicted in the repeat unit at the Rhg1 locus. One PHAS locus was overlapped the promoter and transcribed region of Glyma.18G022400 encoding amino acid transporter protein

(Chr18:1637483-1640792). Another PHAS locus was found at transcribed regions and downstream of Glyma.18G022700 encoding a putative wound inducible protein 12

(Chr18:1652929-1655160) [Figure 4.3]. siRNA-associated genes Based on the FDR threshold set at 0.05, 300 sRNA clusters were differentially expressed in Fayette lines. In total, there were 248 genes harboring differentially expressed sRNA clusters at either transcribed regions or 2kb flanking regions of genes, representing sRNA-associated genes in Fayette lines [Figure 4.4]. The defense response (GO:0006952) was the over-represented GO term in this set of genes, including 33 genes encoding nucleotide-binding site and leucine-rich repeat (NBS-LRR) proteins, a dirigent-like gene (Glyma.03G045600), a glutathione-S-transferase gene (Glyma.08G306900), a receptor-like protein gene (Glyma.03G052400), a callose synthase gene (Glyma.06G292500), an allene oxide synthase (Glyma.17G246500) and a lipoxyenase gene

(Glyma.15G026500). The intersection of two gene sets, sRNA-associated genes and differentially expressed genes identified in Fayette roots (from RNA-seq data in Chapter 3), showed 33 85

“overlapping” genes, denoting sRNA-associated DEGs. Significantly for nematode resistance, the most frequent genes in the overlapping set belong to the NBS-LRR family (18 genes), preferentially located at NBS-LRR gene clusters on chromosome 3, 8 and 13. The scatter plots between normalized expression values of sRNA and mRNA mapping to each overlapping gene illustrated that the sRNA abundance within sRNA clusters tended to be positively correlated with the mRNA abundance of the genes. However, a negative correlation was found in nine clusters, the genomic locations of which contained all or part of seven genes (Glyma.03G100500,

Glyma.08G338900, Glyma.13G194800, and Glyma.13G194900, Glyma.13G199500,

Glyma.13G208800, and Glyma.15G026500) [Figure 4.5]. Interestingly, the PHAS loci were identified in two of these genes, Glyma.13G193300 and Glyma.13G194900, meaning that both positive and negative correlation between sRNA and mRNA expression was observed. Further investigation of sRNA mapping at Glyma.13G194900 showed low sRNA abundance in

Fayette#01, compared to those sRNAs clustered tightly in one exon encoding NB-ARC domain in other two Fayette lines [Figure 4.6].

Global methylation levels and differentially methylated regions in Fayette leaves The WGBS reads, after quality filtering, were mapped against the bisulfite-converted genome reference to investigate genome-wide DNA methylation profiles in leaves of three uninfected Fayette lines (Fayette#01, Fayette#19, and Fayette#99), using the Bismark pipeline.

The range of mapping efficiency accounted for 66.60-67.60 % of total paired-end reads analyzed in each library. The methylation calling was performed in three contexts (CG, CHG and CHH where H represents any nucleotides except G) to calculate the weighted methylation levels of cytosines in each context within 500 bp windows. In this study, the breakdown of cytosine methylation was approximately 51.43 % in CG, 30.13 % in CHG and 2.50 % in the CHH context of all methylation calls. The global methylation levels in Fayette#01 were likely to be higher than 86 those in other Fayette lines, especially in the CHG and CHH contexts, although the mapping efficiency of three samples was roughly the same [Figure 4.7; Table 4.3].

Hypomethylated regions outnumber hypermethylated regions in Fayette#19 and Fayette#99 relative to Fayette#01. Genome-wide differentially methylated regions (DMRs) were identified in each context based on three pairwise comparisons: DMRs in Fayette#19 relative to Fayette#01 (mF19vsF01),

DMRs in Fayette#19 relative to Fayette#01 (mF99vsF01), and DMRs in Fayette#99 relative to

Fayette#19 (mF99vsF19). The density plots of DMRs in each comparison showed both hypermethylation and hypomethylation in the CG and CHG context in all three pairwise comparisons [Figure 4.8]. For the number of DMRs in CG and CHG, Figure 4.8 shows a considerably greater number of hypomethylated regions than hypermethylated regions in both mF19vsF01 and mF99vsF01. On the other hand, the number of hypermethylated regions in mF99vsF19 was moderately greater than the number of hypomethylation regions. Interestingly, most of CHH regions (> 99%) were hypomethylated in mF19vsF01 and mF99vsF01, but only eight methylated CHH regions were found in mF99vsF19, indicating a relatively high similarity of CHH methylation profiles between Fayette#19 and Fayette#99, which were both more distinct from the CHH methylation in Fayette#01. An extensive increase in cytosine methylation levels in

Fayette#01 (low, 8-9 predicted copies at the Rhg1 locus) compared to those in Fayette#19 (mid,

9-10 predicted copies) and Fayette#99 (high, 11 predicted copies) implied wide-ranging impacts of copy number on DNA methylation, and in turn of methylation on transcriptional control potentially involved in Rhg1-mediated SCN resistance [Figure 4.9A].

We next investigated the intersection of DMRs identified from three pairwise comparisons.

More than half of DMRs were specific to each comparison. The mF99vsF01 and mF19vsF01 frequently shared DMRs which represented 12 %, 8 % and 33 % of the union sets of DMRs in CG

87

(5,969 DMRs), CHG (10,502 DMRs) and CHH (1,448 DMRs), respectively. The DMRs common to all three comparisons were the least frequent, but these still included nine methylated CG regions, 54 methylated CHG regions and one methylated CHH region. This indicated that the

DMRs with different methylation levels among all three Fayette lines were relatively rare in the genome [Figure 4.9B].

DMR-associated genes Based on locations of DMRs in the genome, the union set of DMRs in each context from all three pairwise comparisons were used to further identify genes that were potentially regulated by DNA methylation (DMR-associated genes). In the CG and CHH contexts, about 60% of total

DMRs overlapped either transcribed regions (5’UTR+gene body+3’UTR) or 2kb flanking regions of the gene. However, only 30% of methylated CHG regions were located at those regions. The genes harboring DMRs in transcribed regions were found predominately in the CG context whereas the genes with DMRs in their flanking regions outnumbered the genes with DMRs in transcribed regions in the CHG and CHH contexts. In total, the DMR-associated genes accounted for 3,844, 3,628 and 1,120 genes in the CG, CHG, and CHH contexts, respectively. Of these, we found 68 DMR-associated genes were differentially expressed in uninfected roots of Fayette lines with CNV at the Rhg1 locus (from RNA-seq analysis described in Chapter 3). Moreover, these

DMR-associated genes included 13 of the early response genes that were differentially expressed in Fayette#99 roots infected with SCN HG type 0 at 8 hpi, described in Chapter 2 (Miraeiz et al., submitted). Taken together, we found 76 DMR-associated DEGs were either related to CNV at the

Rhg1 locus or responsive to SCN infection. More than half of these genes contain CG DMRs in transcribed regions. Of these candidate genes, five genes were found to be differentially expressed in both the early-response and copy-number experiments (Glyma.04G230400, Glyma.02G028100,

Glyma.11G062500 and Glyma.14G136300, Glyma.19G076800), including three genes highly 88 expressed in Fayette#99 and induced in SCN infection (Glyma.04G230400 encoding an unknown protein, Glyma.11G062500 encoding cytochrome P450 71D8, and Glyma.19G076800 encoding a predicted lysine/histidine transporter). Segments in exonic regions of Glyma.11G062500 and

Glyma.19G076800 were hypomethylated in Fayette#99 relative to Fayette#01 in CHH and CG contexts, respectively whereas a hypermethylated CG site was located 2kb downstream of the gene boundaries of Glyma.04G230400. These three differentially methylated sites also contained annotated transposable elements [Figure 4.10 and Figure 4.11].

DMR-associated transposable elements (TEs) The proportion of DMRs mapped to regions either annotated as transposons or within 2kb of annotated transposon regions ranged from 68 - 85 % of the total DMRs identified in the three comparisons. Unlike the DMRs in gene regions, the proportion of DMRs in the CHG context were the highest, indicating that involvement of CHG methylation in TE expression control may lead to variation in Fayette lines. The DMRs were preferentially located at the 2kb flanking TE regions compared to those in TE regions in all methylation contexts. In total, we identified 2,233 CG,

7,081 CHG and 1,610 CHH DMR-associated TEs. Based on the annotation of TEs retrieved from

Phytozome (Schmutz et al., 2010) and SoyTE database (Du et al., 2010a), the DMR-associated

TEs were mainly classified into two classes, including class I - long terminal repeat (LTR) retrotransposons (Gypsy; RLC, Copia; RLG, and unknown; RLx) and class II – DNA transposons

(Helitron and two superfamilies of terminal inverted repeat (TIR) DNA transposons: CACTA and

Mutator-like; MULE). The class II DNA transposons were the most abundant in CHH DMRs, accounted for ~ 67% of total TEs. On the other hand, the class I retrotransposons, especially Gypsy superfamily, were predominately overlapped the DMRs in CHG sites. The CG DMRs were evenly distributed in the regions of two TE classes [Figure 4.10 and Figure 4.12].

89

DMR-associated sRNA clusters Small RNA clusters with at least 100 mapping sRNA reads were identified using the

ShortStack pipeline based on Fayette sRNA-seq data (see section on identification of sRNA clusters differentially expressed in Fayette lines ). We observed proportions of CHH DMRs that were distinct from the proportions of CG and CHG DMRs mapping to sRNA clusters. About 40

% of DMRs with the CHH context were overlapped with sRNA clusters, while sRNA clusters overlapped less than 10% of total DMRs in other contexts. There were 323, 801 and 551 sRNA clusters overlapped with DMRs in CG, CHG, and CHH contexts, respectively. The total and unique 1,510 DMR-associated sRNA clusters included four DMR-associated sRNA clusters in the

CHH context located within the Rhg1 locus. However, only in 17 CG, 20 CHG and one CHH

DMR-associated sRNA clusters did sRNA abundance show statistically significant differences among the copy-number variant Fayette lines. These represented a total of 28 sRNA clusters, mapping to unique 27 TEs and 32 genes with no significant alteration of their transcript levels found between Fayette lines [Figure 4.10, Figure 4.13 and Figure 4.14].

Discussion

In this study, we investigated profiles of sRNA expression and DNA methylation marks in uninfected lines isolated from a population of the Fayette cultivar that showed CNV at the Rhg1 locus, ranging from predicted values of eight to 11 copies, to determine potential transcriptional and posttranscriptional regulation associated with CNV. High similarity between expression levels of miRNA families were observed in uninfected roots of the CNV Fayette lines, except for downregulation of miR5037 and miR5374 in the 11-copy Fayette#99 line relative to the other

Fayette lines. Relatively few changes in expression of conserved miRNA families in the Fayette lines was consistent with the isogenic nature of these lines outside the Rhg1 CNV. However, the

90

Fayette#99 soybeans with high copy number at the Rhg 1 locus showed significantly altered levels of these miRNAs, possibly indicating a role for their down-regulation in preventing SCN infection. miR5037 was previously shown to be induced by freezing stress in Medicago (Shu et al., 2016) and involved in boron-deficiency response in Citrus sinensis leaves. Its role in boron deficiency response is thought to be by targeting genes encoding major facilitator superfamily protein in order to limit boron export and improve plant tolerance to boron-deficiency (Lu et al., 2015). In this study, the predicted targets of miR5037 in soybean were found to include two GRAS transcription factor genes (Glyma.05G045300 and Glyma.17G127500) which were closely related to a repressor of gibberellin (GA) responses (RGL1) functionally characterized in Arabidopsis (Wen and Chang,

2002). This indicates a possible role of the abiotic stress-related miR5037 in modulating the GA signaling response in Fayette lines. Consistent with the implications of our study, miR5374 was found to be downregulated in resistant soybean (G. max cv. Harbin xiaoheidou) roots infected with

SCN race 3 at 30 days after seedling emergence, and found to potentially target NBS-LRR transcripts (Li et al., 2012). The suppression of these two miRNA families in 11-copy Fayette#99, which shows higher resistance levels to SCN than the other Fayette lines (Lee et al., 2016), suggests that posttranscriptional regulation of hormone signaling and plant immunity could contribute to the enhanced resistance of this high copy isolate relative to other Fayette lines. The role of these miRNAs in SCN resistance needs to be further explored.

Defense-related genes were potentially regulated by siRNAs in Fayette lines The majority of sRNA-associated genes were annotated as NBS-LRR genes. The NBS-

LRR genes, also known as resistance (R) genes, play essential roles in pathogen recognition by detecting effector molecules and initiating effector-triggered immunity (ETI). The activation of

ETI leads to rapid and locally strong defense responses such as the hypersensitive response (HR) and programed cell death (PCD). We also found alterations in expression of these NBS-LRR genes 91

(and other sRNA-associated genes related to defense responses) in RNA-seq analysis in Fayette lines with CNV at the Rhg1 locus using the same samples (see Chapter 3). The co-localization of differentially expressed sRNA clusters located in or near the differentially expressed genes indicated the involvement of sRNAs in regulating gene expression, notably the NBS-LRR genes found in this study. However, the transcript levels of the genes were usually positively correlated with the level of sRNAs mapping to the locus, suggesting indirect silencing mechanisms or sRNAs as a degradation byproduct (Li et al., 2015). A previous large scale sRNA-seq study on several soybean tissues demonstrated that NBS-LRR genes in soybean preferentially harbor PHAS loci for generating phasiRNAs triggered by miRNAs (Arikit et al., 2014). This mechanism was also reported to play a role in biotic interactions between plants and pathogens (Shivaprasad et al.,

2012; Zhao et al., 2015) and in symbiotic interactions (Zhai et al., 2011). In this study, we detected two predicted PHAS loci with differentially expressed sRNAs and located in differentially expressed NBS-LRR genes (Glyma.13G193300 and Glyma.13G194900). The Glyma.13G194900 was a predicted target gene of miR1510 and produced phasiRNA in soybean (Fei et al., 2018) whereas Glyma.13G193300 was also predicted as a PHAS locus in another and independent soybean study (Sun et al., 2016). Due to the high levels of sequence similarity between NBS-LRR genes, the target prediction of phasiRNAs generated from these PHAS loci showed that they could potentially regulate several loci in both cis and trans, indicating a high complexity of sRNA regulation network in defense response with many NBS-LRRs potentially affected by this system.

Based on the presence of PHAS loci in NBS-LRR genes in the CNV Fayette lines, we hypothesize that these phasiRNAs act to suppress some of these NBS-LRR transcripts during SCN infection, controlling the impact of ETI on defense responses and leading to different levels of SCN resistance.

92

DMR-associated genes reveals candidate genes potentially under DNA methylation control CG methylated regions were more frequently associated with transcribed regions than with flanking regions in the data generated in this study. In soybean, CG body methylation was previously found to be abundant in whole genome duplication (WGD) genes, which were more highly expressed than single-copy genes (Kim et al., 2015b). A more recent study, however, showed that the methylation levels of the first intron were negatively correlated with gene expression levels (Anastasiadi et al., 2018). For DMR-associated genes, we found five candidate genes that were possibly regulated by DNA methylation and showed differential expression in response to both CNV at the Rhg1 locus and early SCN infection. Some of these candidate genes have predicted functions directly related to defense response. Differential CG methylation was observed in the transcribed regions of Glyma.19G076800. This gene encodes a predicted amino acid transporter closely related to Arabidopsis lysine histidine transporter 1 (ATLHT1), which is involved in the uptake of the ACC precursor for ethylene biosynthesis (Shin et al., 2015). In addition, CHH hypomethylation was seen in transcribed regions of Glyma.11G062500, encoding a cytochrome P450 (CYP71D8), a gene also highly expressed in Fayette#99 and induced during early SCN response. The CYP71D8 gene is one of genes related to glyceollin biosynthesis and activated by the elicitors described in a prior study (Schopfer and Ebel, 1998). We also found

Glyma.02G028100, encoding a matrix metalloproteinase, was differentially methylated in the upstream regions in CG and CHG contexts. The matrix metalloproteinase takes part in defense responses to stress, such as in the interaction of Arabidopsis with necrotrophic and biotrophic pathogens (Zhao et al., 2017a) as well as biotic and abiotic stress responses in soybean (Liu et al.,

2001; Cho et al., 2009). Any role of these genes in suppressing SCN parasitism still needs to be further investigated, but the differential expression and methylation of clear defense-related genes

93 in the high-copy Rhg1 lines is highly suggestive of a role of Rhg1 copy number in creating a transcriptional profile able to suppress SCN.

Unusually high levels of genome-wide CHH methylation in Fayette#01 The WGBS revealed distinct methylation profiles in leaves of the three Fayette CNV lines.

Consistent with genome-wide expression profiles of genes and sRNAs, the global methylation patterns in low-copy Fayette (8-9 copies) were less similar than those in the other two Fayette lines, even though the detection of CNV at the Rhg1 locus categorized Fayette#19 as carrying 9-

10 predicted copies of the repeat based on statistical tests of differences in the copy number and

SCN assay tests between Fayette lines and PI 88988 (P > 0.05) (Lee et al., 2016). A marginal increase in methylation levels in was observed in Fayette#99 compared to Fayette#19. On the other hand, hypomethylation was detected generally throughout the whole genome in mid-copy-Rhg1

Fayette (9-10 copies) and high-copy-Rhg1 Fayette (11 copies) relative to low-copy-Rhg1 Fayette

(8-9 copies), especially in non-CG contexts. According to the roles of non-CG methylation characterized in Arabidopsis (Stroud et al., 2014), this suggested extensive changes in histone modification and sRNA regulation occurring in Fayette#01 compared to the other Fayette lines.

Further characterization of DMRs based on their genomic locations revealed that the non-CG methylated regions tended to fit with annotated transposon regions and sRNA clusters in Fayette lines. Consistent with previous DNA methylation studies in plants such as soybean (Song et al.,

2013) and sugar beet (Zakrzewski et al., 2017), CHH methylation was found to be associated with class II DNA transposons and the presence of sRNAs. In contrast to CHH methylation, the retrotransposon regions tended to harbor CHG methylation sites. These different methylation contexts may be involved in regulating different classes of TEs. Moreover, the 24-nt sRNAs were prominent in DMR-associated sRNA clusters which overlapped TE regions, indicating the contribution of RNA-directed DNA methylation (RdDM) to reinforce silencing of the TEs. 94

However, most of the sRNAs in DMR-associated sRNA clusters were not differentially expressed among Fayette lines. Thus, elevation of CHH methylation levels via RdDM in

Fayette#01, compared to the other lines, was not supported by sRNA data. Since WGBS analysis was performed in just one biological replicate for each Fayette line in this study, these results require confirmation using an experiment with more samples to avoid biases due to variation of

DNA methylation in plant samples. Moreover, Fayette leaves were used for this DNA methylation experiment. The dynamic and tissue-specific patterns of DNA methylation have been reported in past studies, for example, CHH methylation level changes over time during soybean seed development (An et al., 2017), lower CHH methylation has been found in roots compared to other organs in soybean (Song et al., 2013), SCN induced hypomethylation in infected soybean roots

(Rambani et al., 2015), and hypomethylation was induced during sugar beet cyst nematode parasitism in Arabidopsis (Hewezi et al., 2017). The large-scale hypomethylation of higher copy

Fayette lines observed in leaves provides preliminary results supporting a hypothesis of DNA methylation involvement in Rhg1-mediated resistance, which can be further validated in Fayette roots during SCN infection. Such validation could be achieved using lower cost, locus-specific technologies such as PCR-based methods. sRNAs at the Rhg1 locus possibly involved in locally DNA methylation and also regulate other genes in trans. High levels of sRNAs with sequences matching the Rhg1 locus, especially in the intergenic regions, were found in all three Fayette lines, but neither gene transcription levels nor sRNA abundance showed a statistically significant difference between the different copy number lines.

These results suggested a slight increase in the copy number at the Rhg1 locus (within the range of 1-2 copies) might not affect the expression of sRNA or mRNA sufficiently to be detectible in the assay. However, DMRs were identified in this locus based on the generally higher levels of

95

CHH methylation were observed in Fayette#01 than those in other Fayette lines. A previous study showed high methylation levels in all contexts at the Rhg1 locus in soybeans with three copies relative to the methylation in a single copy soybean (Cook et al., 2014); however this was a comparison between resistant and susceptible lines and a 3-fold copy number difference. The

Fayette lines harbor 8-11 copies; the difference is small but the lowest copy line showed somewhat greater methylation. This is likely the result of natural variation between biological lines, and biological replicates are necessary to validate any predicted differences. We speculated, however, that the presence of 24nt sRNAs overlapping the DMRs in Fayette may trigger RNA-directed DNA methylation to maintain DNA methylation marks that locally regulate the Rhg1 genes. Moreover, two PHAS loci were predicted at this locus, revealing another potential role of sRNAs produced at the Rhg1 locus that may regulate another genes in trans.

Conclusion

This study revealed a number of changes outside the Rhg1 locus that appear to be under the control of the repeat at this locus, even with the relatively small percentage change of the studied lines in CNV at the Rhg1 locus, ranging from 8-11 copies. These changes, in mRNA and sRNA and genome methylation, indicate potential regulatory mechanisms whereby the CNV at the Rhg1 locus may control SCN resistance. We identified key miRNAs and phasiRNAs that we predict regulate the transcription levels of defense-related genes such as NBS-LRR genes, leading to complex alterations in the sRNA regulatory network in Rhg1-mediated SCN resistance in

Fayette lines. Our results suggest that manipulation of the sRNAs implicated in this mechanism could be an alternative method to enhance SCN resistance in soybeans. In addition, we found a putative association of differentially methylated DNA sites with genes, TEs and sRNA clusters, providing preliminary evidence for epigenetic control of gene expression and transposon silencing

96 initiated by the Rhg1 repeat and mediated by sRNAs via the RdDM process. We were unable to generate enough biological replicates for statistically valid WGBS analysis, however, our results can be used to identify candidate genes for further, gene-level analysis and potential application to

SCN resistance.

97

Figures and Tables

Figure 4.1 Identification of micro RNAs (miRNAs) in Fayette lines. A) Multidimensional scaling (MDS) plot showing high similarity of miRNA expression between Fayette lines with different Rhg1 copy numbers B) Boxplot showing expression profiles of two miRNA families differentially expressed between uninfected roots of lines with different Rhg1 copy number.

98

Figure 4.2 Expression profiles of small RNAs (sRNAs) and genes in the repeat unit of Rhg1 locus in Fayette#01 (low, 8-9 predicted copies at the Rhg1 locus), Fayette#19 (mid, 9-10 predicted copies at the Rhg1 locus), and Fayette#99 (high, 11 predicted copies)

A) The boxplot showing sRNA expression at the Rhg1 locus based on sRNA-seq data. The non-transcribed regions are represented by intergenic # for example, intergenic 3 refers to the non-transcribed interval between Glyma.18G022400 and Glyma.18G022500. B) The boxplot showing normalized expression of genes in the Rhg1 locus generated using RNA-seq. C) The boxplot showing relative expression of Rhg1 genes generated using qRT-PCR. The relative expression levels of each gene were normalized to the expression level of the reference gene, Glyma.12G051100 (SKIP16).

99

Figure 4.3 Genomic alignment diagrams of the Rhg1 locus showing small RNAs indicating the presence of two PHAS loci. Locus#1 (Chr18:1637483-1640792) mapping to promoters and transcribed regions of Glyma.18G022400 (amino acid transporter). Locus#2 (Chr18:1652929- 1655160) overlapping transcribed regions and downstream of Glyma.18G022700 encoding a putative wound inducible protein.

100

Figure 4.4 sRNA clusters and sRNA-associated genes in Fayette lines A) Heatmap showing the expression profiles of 300 sRNA clusters differentially expressed between Fayette lines with different Rhg1 copy number. All are significant at corrected P < 0.05. The color represents normalized expression levels (log2 count per million mapped reads) standardized across samples. The darker colors, the higher expression levels. B) Whole genome view diagram showing uneven distribution of 248 sRNA-associated genes (red) located on chromosome 2, 3, 8 and 13. These loci included 33 NBS-LRR genes (upside down black triangle) preferentially located on chromosome 3 and 13. C) Over-represented GO terms found in the list of sRNA-associated genes

101

Glyma.02G012100 Glyma.02G028400 Glyma.02G059900 Glyma.03G038700 Glyma.03G043000 Glyma.03G043500 Glyma.03G043600 Glyma.03G045300 Glyma.03G045600

Cluster_13588 Cluster_14093 Cluster_15134 Cluster_28554 Cluster_28804 Cluster_28818 Cluster_28826 Cluster_28915 Cluster_28937 6 6 ● ● ●●●●● 8 6 ● ● ●● 6 ● 6 ● ● ● ●●● ● ● ● ● 6 ●● ●● ●●●● ● 6 ● ● ● ● ●● ●● ● ●●●●● 6 ● ●● ●● ●● ●● ●●● 6 ● ● ● ● ● ● 4 ● ● 4 ● 4 ● 4 ● ● ● ● 4 ● 4 ● 4 4 ● 4 ● ● 2 ● ● ● ●● 2 ● 2 2 2 ● ● ● 2 ● 2 2 2 ● ● ● ● 0 ●●●●●● 0 ●● ● 0 0 0 0 0 0 0 0 1 2 3 0 2 4 6 0 1 2 3 4 0 2 4 6 8 0 1 2 3 0.0 0.5 1.0 1.5 2.0 2.5 0 1 2 3 0.0 0.5 1.0 1.5 2.0 0.0 2.5 5.0 7.5

Glyma.03G047000 Glyma.03G054100 Glyma.03G054100 Glyma.03G100500 Glyma.03G100500 Glyma.08G317400 Glyma.08G318000 Glyma.08G318000 Glyma.08G321800

Cluster_29025 Cluster_29310 Cluster_29316 Cluster_32279 Cluster_32281 Cluster_102336 Cluster_102349 Cluster_102358 Cluster_102486 8 6 8 6 ●● ●●● 6 ● ● ●● ● ●●● 6 ●● ●● ● ● ● ● ● ● 6 ●● ●● ● ●● ● ● 6 ● ● ● ● ●● ● ● 6 ● ● ● ● ● ● ●● ● ● ● 6 ● ● ● ● 4 ● ● ●● ● 4 ●● 4 4 4 4 ● ● 4 ● ●● 4 4 ●● ● ● ● 2 2 2 2 2 2 ● 2 2 2 ● 0 ●● ● ● 0 ●● 0 ●● 0 ● 0 ● 0 0 0 0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0 1 2 3 4 0 1 2 3 4 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 4 5

Glyma.08G336200 Glyma.08G338900 Glyma.08G338900 Glyma.13G184800 Glyma.13G184800 Glyma.13G184800 Glyma.13G184900 Glyma.13G188300 Glyma.13G190300

Cluster_103005 Cluster_103127 Cluster_103128 Cluster_161655 Cluster_161656 Cluster_161657 Cluster_161662 Cluster_161811 Cluster_161884 10.0 ● 10.0 8 8 ● ● ● ●● ● ● ●● ●● ● ●●●● 6 ● ● ● 6 ● ●●● ●● ●●●● ● ●●●●● ●● ● 6 ●● ● ● ● ● ● ● ● ● ● ●● ● 6 ● ● 9 ●● ● 7.5 7.5 ●● ● 6 6 ●● ● ● ● ● ●● ● ● ●●● ●● ●●● ● ● ● ● 4 ● ●● ● ● 4 4 4 ● 4 4 ●● ● 6 5.0 5.0 ● ● ●● ● ● ● 2 ● 2 2 2 2 ● ● 2 ● 3 2.5 2.5 ● ●

0 0 0 0 0 0 0 0.0 0.0 s

l 0 1 2 3 4 0 2 4 6 0 2 4 6 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3

e

v e

l Glyma.13G190400 Glyma.13G190400 Glyma.13G190400 Glyma.13G190400 Glyma.13G190800 Glyma.13G190800 Glyma.13G190800 Glyma.13G190800 Glyma.13G193300 A

N Cluster_161890 Cluster_161891 Cluster_161892 Cluster_161897 Cluster_161903 Cluster_161904 Cluster_161905 Cluster_161906 Cluster_162017

R s ● ● 6 ● 8 ●● ● ● ● 8 ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● 6 ● 7.5 ● ●●● 7.5 ● ● ● ●● ● 5.0 ●● ● ●● ●● 5.0 ●● ●●● ● ● ● ● ● ● ● ● ● 6 ● ●● ●●●● ● ● ● ● ● ● ●● ● 6 ● 6 ● 4 ● ● ● ● 4 5.0 ● 5.0 4 4 2.5 ● 4 2.5 ●● ● ● ● ● ● ● 2 ● 2 2 2.5 2.5 2 ● ● 2 ●● 0.0 ● ●● 0.0 ● ●●● 0 ●● 0 0.0 0.0 0 0 0 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 0 1 2 3

Glyma.13G193300 Glyma.13G193300 Glyma.13G193300 Glyma.13G194100 Glyma.13G194100 Glyma.13G194100 Glyma.13G194100 Glyma.13G194800 Glyma.13G194900

Cluster_162018 Cluster_162019 Cluster_162020 Cluster_162051 Cluster_162052 Cluster_162053 Cluster_162055 Cluster_162087 Cluster_162095 6 ● 9 8 10.0 6 ● ● ● ● ● ● ● ●●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● 7.5 ● ●●● ● ● 7.5 ● ● ● 7.5 ●●● ● 6 ● ● 6 ●● ● ● 6 ●● ●● ●● 7.5 ● 4 ● ● ● 4 ● ● ● 5.0 5.0 ● ● 5.0 4 ● ●● 4 5.0 ●● 3 ● ● ● ● ● ●● ●● ● ●● 2 2 2.5 ●● ● 2.5 2.5 ● 2 ● 2 ● 2.5 0 ● ●●●●● ● ● 0.0 ●● ● 0 0 0.0 0.0 0 ● ● 0 ● 0.0 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 3 0 1 2 3

Glyma.13G199500 Glyma.13G204100 Glyma.13G208800 Glyma.15G026500 Glyma.15G034000 Glyma.17G245100 Soybeans Cluster_162279 Cluster_162423 Cluster_162597 Cluster_180446 Cluster_180683 Cluster_214045 8 6 ● Fayette#01 6 ● ●● ●●● ●●●● ● ● ●●● ● ● 6 ●● ● ●● ●● 6 ●● 6 ● ● ● ●●●● ● ●● ● ● 4 ● ● ● ● ●●● ●● 4 Fayette#19 4 ●● 4 ● ● ● ● 4 4 ● ● 2 ● ● 2 ● ● 2 2 0 ●●●●● ● Fayette#99 2 ●●● 2 −2 0 0 0 0 0 0 2 4 6 0.0 2.5 5.0 7.5 0 1 2 0 2 4 0 1 2 0 2 4 6 mRNAlevels

Figure 4.5 Scatter plots showing the correlation between expression levels of 33 sRNA-associated differentially expressed genes (DEGs) and sRNA levels in each sRNA clusters mapping to the same locus. Each dot denotes each biological replicate of three Fayette lines, including Fayette#01 (red), Fayette#19 (green), and Fayette#99 (blue).

102

Figure 4.6 Genomic read alignment diagrams showing sRNA-seq and RNA-seq reads mapping to a predicted PHAS locus in an NBS-LRR gene (Glyma.13G194900). The dashed lines represent an interval of the PHAS locus, residing in NB-ARC domain of the gene with detectible read depth within the interval in both sRNA-seq and RNA-seq data. The sRNA-seq data showing high levels of sRNA abundance identified in Fayette#19 and Fayette#99 in relative to those in Fayette#01. The opposite expression patterns of mRNAs to those of sRNAs was found in RNA-seq data, representing a negative correlation between sRNA and mRNA expression levels in this PHAS locus. The alignment of sRNA reads below showing phasing patterns of mapped sRNA reads and a locus that was a predicted target site of miR1510 to trigger phased siRNA biogenesis.

103

Percentage of C methylation calculated from all methylation calls

Fayette#01 Fayette#19 Fayette#99 60.00%

52.80% 51.90% 50.00% 49.60%

40.00%

32.90%

30.00% 28.50% 29.00%

20.00%

10.50% 10.10% 10.00% 7.50% 4.70%

1.50% 1.30% 0.00% C methylated in CpG C methylated in CHG C methylated in CHH C methylated in context context context unknown context (CN or CHN)

Figure 4.7 Analysis of cytosine methylation in whole genome bisulfite sequencing (WGBS) data using Bismark. Bar graph showing the percentage of global DNA methylation calls in three contexts (CG, CHG, and CHH) in three Fayette lines with varying Rhg1 copy number, including Fayette#01 (blue), Fayette#19 (orange), and Fayette#99 (grey).

104

Figure 4.8 Circular plots showing density of differentially methylated regions (DMRs) across the soybean genome, identified in three pairwise comparisons of three Fayette lines with different Rhg1 copy number (Fayette#19 vs Fayette#01, Fayette#99 vs Fayette#01, and Fayette#99 vs Fayette#19). Four main tracks display the density of DMRs in CG context, CHG context, CHH context and gene density, respectively. Each main track contains two sub-tracks showing hypermethylated regions (red) and hypomethylated regions (blue).

105

Figure 4.9 Bar graphs showing the number of differentially methylated regions (DMRs) A) Number of regions where hypermethylation (red) and hypomethylation (blue) was detected in each context for each comparison between three Fayette lines with different Rhg1 copy number, including Fayette#19 vs Fayette#01 (F19/F01), Fayette#99 vs Fayette#01 (F99/F01), and Fayette#99 vs Fayette#19 (F99/F19). B) Number of DMRs in the union set identified from the three pairwise comparisons of Fayette lines with different Rhg1 copy number. In total, 5,969 CG, 10,502 CHG, and 1,448 CHH DMRs were identified.

106

Figure 4.10 Percentage of differentially methylated regions (DMRs) between Fayette copy variant lines that mapped to genomic regions annotated as genes, Transposable Elements (TEs) and sRNA clusters

A) Percentage of DMRs mapped to genes and 2kb flanking regions around genes B) Percentage of DMRs mapped to TEs and 2kb flanking regions around TEs C) Percentage of DMRs mapped to sRNA clusters D) Number of DMR-associated genes, DMR-associated TEs, and DMR-associated sRNA clusters identified in the CG, CHG and CHH methylation contexts.

107

Figure 4.11 Differentially methylated region (DMR)-associated genes A) Number of genes harboring DMRs in gene regions (dark blue), 2kb upstream regions (grey) and 2kb downstream regions (light blue) in three different methylation contexts (CG, CHG, and CHH). B) Number of DMR-associated genes and DMR-associated differentially expressed genes (DEGs). The DEGs denote the genes previously identified in uninfected Fayette lines with different copy numbers or in Fayette#99 induced by SCN infection at 8 hours post inoculation. C) Gene ID and annotation of five DMR-associated DEGs related to CNV at the Rhg1 locus and induced in early response. D) Methylation profiles of two candidate genes (Glyma.19G076800 and Glyma.11G062500) in three methylation contexts (mCG, mCHG, and mCHH). Each dot represents a weighted methylation level in each 500 bp windows measured in each of three Fayette samples, including Fayette#01 (black), Fayette#19 (green), and Fayette#99 (dark purple). The rectangle represents different regions, including sRNA clusters (grey), DMR-associated TEs (light purple), hypermethylated regions (red) and hypomethylated regions (blue).

108

Figure 4.12 Differentially methylated region (DMR)-associated transposable elements (TEs)

A) Number of TEs harboring DMRs in TE regions (dark green), 2kb upstream regions (grey) and 2kb downstream regions (light green) in three different methylation contexts (CG, CHG, and CHH). B) Proportion of DMR-associated TEs categorized into two classes. Class I - long terminal repeat (LTR) retrotransposons included three superfamilies (Gypsy; RLC, Copia; RLG, and unknown; RLx). Class II – DNA transposons included Helitron and two superfamilies of terminal inverted repeat (TIR) DNA transposons: CACTA and Mutator-like; MULE). The methylation was illustrated in in three different methylation contexts (CG, CHG, and CHH)

109

Figure 4.13 Differentially methylated region (DMR)-associated sRNA clusters

A) Number of DMR-associated sRNA clusters, number of DMR-associated sRNA clusters differentially expressed between Fayette lines with different Rhg1 copy number, number of genes overlapped DMR-associated sRNA clusters differentially expressed between Fayette lines with different Rhg1 copy number, and number of DEGs overlapped DMR- associated sRNA clusters differentially expressed in Fayette lines with different Rhg1 copy number B) Locations of DMRs, sRNA clusters and TEs on chromosome 13 in three methylation contexts (mCG, mCHG, and mCHH). The rectangle represents different regions, including sRNA clusters (grey), DMR-associated sRNA clusters (orange), DMR-associated TEs (light purple), hypermethylated regions (red) and hypomethylated regions (blue).

110

Figure 4.14 Differentially methylated region (DMRs) located at the Rhg1 locus that overlap annotated genes and sRNA clusters. The chromosome bar represents a segment on chromosome 18 (1620000-1670000 bp), containing three expressed genes (Glyma.18G022400, Glyma.18G022500, and Glyma.18G022700) in a repeat unit of Rhg1 locus. Three panels on top of chromosome bar showing the methylation profiles in three methylation contexts (mCG, mCHG, and mCHH). Each dot represents a weighted methylation level in each 500 bp windows measured in each of three Fayette samples, including Fayette#01 (black), Fayette#19 (green), and Fayette#99 (dark purple). The rectangle represents different regions, including sRNA clusters (grey), DMR- associated sRNA clusters (orange), DMR-associated TEs (light purple), hypermethylated regions (red), and hypomethylated regions (blue). Nine panels below the chromosome bar represent the DMRs identified in each methylation context from three different pairwise comparisons, including Fayette#19 vs Fayette#01 (F19/F01), Fayette#99 vs Fayette#01 (F99/F01), and Fayette#99 vs Fayette#19 (F99/F19).

111

Table 4.1 Prediction of micro RNA (miRNA) loci in Fayette lines. 16 small RNA clusters were predicted to be miRNA loci using ShortStack pipeline. Of these, 11 miRNA loci generating either new mature miRNAs (eight loci) or variants of known mature miRNAs (three loci) in Fayette lines and five precursor loci confirmed precursors previously found in the miRbase (shaded in grey).

Cluster Dicer Homologous Homologous miRNA Locus Major RNA reads Genomic loci ID call miRNA in miRBase sequences in miRBase Chr19:1947558- Cluster_ >gma-miR5786 UGUCGCAGGAUAGAGGGCACUG 22 UGUCGCAGGAUAGAGGGCACU Intergenic 1947775 230083 MIMAT0023222 Chr09:17134641- Cluster_ >gma-miR1507c-3p CCUCAUUCCAAACAUCAUCUAA 22 CCUCAUUCCAAACAUCAUCU Intergenic 17134930 109108 MIMAT0021006 Chr07:5432604- Cluster_ GAGCUCUCAGCACUCCAGUGU 21 no match Glyma.07G061200 5432870 77948 Chr07:35657345- Cluster_ >gma-miR397a UCAUUGAGUGUAGCAUUGAUG 21 UCAUUGAGUGCAGCGUUGAUG Intergenic 35657439 84965 MIMAT0021627 Chr13:36156849- Cluster_ AAAUUGGAUAUAUAUCAAGGACCU 24 no match Glyma.13G256100 36157130 164215 Chr01:33062176- Cluster_ >gma-miR2111a CUCCUUGGGAUACAGAUUAUC 21 GUCCUUGGGAUGCAGAUUACG Intergenic 33062253 6160 MIMAT0022457 Chr09:49753895- Cluster_ >gma-miR5774a UCGACACGUGGCAUGAGACUA 21 GCUGGCGUCGACACGUGGCAU Intergenic 49754113 117397 MIMAT0023176 Chr05:40444487- Cluster_ >gma-miR10419 CAAAAAACACAUAUGAAACUG 21 CAAAAAACACAUAUGAAACUG Glyma.05G225800 40444640 61118 MIMAT0041665 Chr15:3707147- Cluster_ UACGUUCUGUAUCCAUUUUCC 21 no match Glyma.15G046500 3707324 181044 Chr15:6767917- Cluster_ UGCGCUACAGGUAUAGGGACC 21 no match Glyma.15G088100 6768097 182115 Chr01:56697631- Cluster_ CAGGUGAUUCGUAAAACUCAC 21 no match Intergenic 56697709 13023 Chr05:39506572- Cluster_ AUGGGUGACAGUAAGAUUCAAU 22 no match Intergenic 39506768 60732 Chr16:37660412- Cluster_ AGGACUAAAAGUAAAAAGCACU 22 no match Glyma.16G219200 37660636 202981

112

Table 4.1 Continued Cluster Dicer Homologous Homologous miRNA Locus Major RNA reads Genomic loci ID call miRNA in miRBase sequences in miRBase Chr05:36014268- Cluster_ GGCUCUGUGAAUCUGUCUCCGA 22 no match Glyma.05G169800 36014412 59449 Chr19:1832784- Cluster_ >gma-miR4413a UGAUACAGUGACUUACAAUUC 21 AAGAGAAUUGUAAGUCACUG Glyma.19G017600 1832904 230032 MIMAT0018338 Chr18:16051456- Cluster_ >gma-miR395a CUGAAGUGUUUGGGGGAGCUU 21 CUGAAGUGUUUGGGGGAACUC Intergenic 16051584 220315 MIMAT0021624

113

Table 4.2 Prediction of PHAS loci for potentially generating phased siRNAs (phasiRNAs) in Fayette lines. 55 PHAS loci were identified, including two loci located in the repeat unit of the Rhg1 locus (bold). The siRNAs in two PHAS loci were differentially expressed and also overlapped genes differentially expressed between Fayette lines with different copy number (highlighted in orange).

Major Phase Phase Locus Cluster RNA Genomic Loci Annotation Size Score Reads common promoter of Chr01:52908699-52909341 Cluster_11620 630 20 121.2 intergenic Glyma.01G194800- Glyma.01G194900 Chr02:2814079-2816537 Cluster_14165 70 21 101 Glyma.02G030500 NBS-LRR Chr03:18686167-18686737 Cluster_30586 1266 21 262.5 Glyma.03G075300 NBS-LRR Chr03:25988024-25988962 Cluster_31489 113 21 110.7 Glyma.03G087600 NBS-LRR DTW domain-containing Chr03:35270985-35272331 Cluster_34687 387 21 128.1 Glyma.03G136600 protein Chr04:21176905-21179420 Cluster_44996 208 21 157.9 Glyma.04G137800 NBS-LRR Chr04:46771881-46772682 Cluster_48850 4368 21 626.3 Glyma.04G195800 NA Chr04:49035279-49037832 Cluster_49788 9313 21 1634.6 Glyma.04G219600 NBS-LRR Chr05:38268881-38270070 Cluster_60215 985 21 186 Glyma.05G198500 XS domain-containing protein Chr05:656561-659174 Cluster_51414 673 21 345.7 Glyma.05G006900 NBS-LRR Chr06:11932940-11934246 Cluster_66283 3769 21 446 Glyma.06G146200 NBS-LRR Chr06:19392854-19394272 Cluster_69558 6756 21 1729.1 Glyma.06G205100 NBS-LRR Chr06:43286820-43290286 Cluster_72541 2107 21 493 Glyma.06G254300 NBS-LRR Chr06:45089044-45090508 Cluster_73151 3196 21 544.5 Glyma.06G263500 NBS-LRR Chr06:47410360-47411454 Cluster_74144 405 21 163 Glyma.06G285500 NBS-LRR NAC domain comtaining Chr07:4028403-4029792 Cluster_77434 4655 21 763.1 Glyma.07G048000 protein NAC domain comtaining Chr07:4043422-4044570 Cluster_77441 2037 21 517.5 Glyma.07G048100 protein Glyma.08G179100 and Chr08:14355437-14358167 Cluster_94503 2235 24 316.1 intergenic Glyma.08G179000

114

Table 4.2 Continued Major Phase Phase Locus Cluster RNA Genomic Loci Annotation Size Score Reads Chr08:41922814-41923905 Cluster_101559 1195 20 122.4 Glyma.08G301200 NBS-LRR Chr08:41924119-41924724 Cluster_101560 872 21 147.5 Glyma.08G301200 NBS-LRR Chr09:2736883-2737914 Cluster_105090 2299 21 318.2 Glyma.09G032400 NA Glyma.09G281600 Chr09:49693551-49694319 Cluster_117367 140 21 176.4 intergenic dowmstream- Glyma.09G281500 Chr11:30571590-30574287 Cluster_139462 1224 21 611 Glyma.11G212800 NBS-LRR Chr12:14942331-14943305 Cluster_146893 140 21 187.4 Glyma.12G132200 NBS-LRR Chr12:39528729-39532374 Cluster_151950 808 24 169.5 Glyma.12G236500 NBS-LRR GRAS family transcription Chr13:19079809-19081369 Cluster_156834 131 21 124.7 Glyma.13G081700 factor Glyma.13G083000(GDP- fucose protein O- fucosyltransferase), Chr13:19268507-19270175 Cluster_156916 5852 21 475.3 intergenic Glyma.13G083100(DUAL SPECIFICITY PROTEIN KINASE TTK) Chr13:30664782-30665581 Cluster_162020 1035 21 150.5 Glyma.13G193300 NBS-LRR Chr13:30836086-30837210 Cluster_162095 192 21 115.1 Glyma.13G194900 NBS-LRR Chr14:1746879-1747476 Cluster_168610 191 21 156.5 Glyma.14G024400 NBS-LRR between Glyma.14G155900 (ZINC FINGER CLIPPER - Chr14:33937892-33938552 Cluster_175711 725 21 102.3 intergenic RELATED),Glyma.14G156000 (F14J8.16 PROTEIN- RELATED) common downstream of Chr14:6866779-6868844 Cluster_170444 738 24 119.3 intergenic Glyma.14G079900 and Glyma.14G079800 Chr15:15008639-15010541 Cluster_185698 892 20 156.2 Glyma.15G168500 NBS-LRR Chr15:15011282-15011745 Cluster_185699 1025 21 172.3 Glyma.15G168500 NBS-LRR

115

Table 4.2 Continued Major Phase Phase Locus Cluster RNA Genomic Loci Annotation Size Score Reads Chr15:41539939-41543351 Cluster_189645 629 21 145 Glyma.15G226100 NBS-LRR Chr15:43699364-43700634 Cluster_190088 297 21 171.6 Glyma.15G232400 NBS-LRR Chr15:43720423-43721620 Cluster_190101 203 21 165.6 Glyma.15G232600 NBS-LRR between Glyma.15G236400, Chr15:44545761-44548227 Cluster_190386 9036 21 1724.1 intergenic Glyma.15G236400 Chr16:10358688-10359220 Cluster_197089 1628 21 227.7 Glyma.16G085900 NBS-LRR Chr16:10648224-10648931 Cluster_197173 420 21 270.1 Glyma.16G087100 NBS-LRR NAC domain comtaining Chr16:1452614-1453782 Cluster_193833 355 21 114.7 Glyma.16G016600 protein NAC domain comtaining Chr16:1461638-1463784 Cluster_193837 6430 21 550.6 Glyma.16G016700 protein between Glyma.16G101900 Chr16:20513877-20514428 Cluster_198170 482 20 121.9 intergenic and Glyma.16G101800 promoter + 5' end Chr16:30784375-30785322 Cluster_200997 6158 21 1146.8 Glyma.16G147100 (NA) Glyma.16G147100 Chr16:31891181-31894678 Cluster_201448 1378 21 281.5 Glyma.16G159100 NBS-LRR Chr16:5769781-5771773 Cluster_195553 5778 21 1075.8 Glyma.16G058900 NA miR5372,miR4409,RPP Chr17:40464527-40464896 Cluster_214216 740 21 267.2 Glyma.17G249500 NA Chr17:40465109-40465356 Cluster_214217 362 21 131.4 Glyma.17G249500 NA Chr18:13208541-13209402 Cluster_219552 275 21 163.8 Glyma.18G111400 NBS-LRR promoter and amino acid transporter Chr18:1637483-1640792 Cluster_215372 972 21 162.7 Glyma.18G022400 (Rhg1) Glyma.18G022700 + Chr18:1652929-1655160 Cluster_215384 1169 23 214.3 W12 (Rhg1) downstream Callose synthase 1 family Chr18:5084146-5084934 Cluster_216654 405 22 136.9 Glyma.18G057600 protein Protein AUXIN SIGNALING Chr19:34719663-34720641 Cluster_235436 477 21 169 Glyma.19G100200 F-BOX Chr19:40038232-40039049 Cluster_237871 233 21 105.1 Glyma.19G138900 NA

116

Table 4.2 Continued Major Phase Phase Locus Cluster RNA Genomic Loci Annotation Size Score Reads between Gyma.19G138900 and Chr19:40041016-40042617 Cluster_237872 488 21 143.5 intergenic Glyma.19G139000

117

Table 4.3 Bismark report summary showing total whole genome bisulfite sequencing (WGBS) reads analyzed in each Fayette sample, mapping efficiency, number of cytosines (C’s) processed during methylation calls, and global methylation levels in three contexts (CG, CHG, and CHH)

Fayette#01 Fayette#19 Fayette#99 Sequence pairs analyzed in total 61709042 66268997 54686129 Number of paired-end alignments with a unique best hit 41086371 44270920 36961882 Mapping efficiency 66.60% 66.80% 67.60% Sequence pairs with no alignments under any condition: 15191955 15562470 13644034 Sequence pairs did not map uniquely: 5430716 6435607 4080213 Sequence pairs which were discarded because genomic sequence could not be extracted: 193 235 177

Total number of C's analyzed: 1660700580 1733556582 1560255054

Total methylated C's in CpG context: 79256917 80233153 72467572 Total methylated C's in CHG context: 63953956 58912262 53133664 Total methylated C's in CHH context: 62477861 19805211 15605157 Total methylated C's in Unknown context: 230 176 197

Total unmethylated C's in CpG context: 70954415 81612734 67258185 Total unmethylated C's in CHG context: 130140075 147773745 129886712 Total unmethylated C's in CHH context: 1253917356 1345219477 1221903764 Total unmethylated C's in Unknown context: 1956 2175 1752

C methylated in CpG context: 52.80% 49.60% 51.90% C methylated in CHG context: 32.90% 28.50% 29.00% C methylated in CHH context: 4.70% 1.50% 1.30% C methylated in unknown context (CN or CHN): 10.50% 7.50% 10.10%

118

REFERENCES Afzal AJ, Wood AJ, Lightfoot DA (2008) Plant receptor-like serine threonine kinases: roles in signaling and plant defense. Molecular Plant-Microbe Interactions 21: 507–517

Ali MA, Abbas A, Kreil DP, Bohlmann H (2013) Overexpression of the transcription factor RAP2.6 leads to enhanced callose deposition in syncytia and enhanced resistance against the beet cyst nematode Heterodera schachtiiin Arabidopsis roots. BMC Plant Biology 13: 47

Alkharouf N, Khan R, Matthews B (2004) Analysis of expressed sequence tags from roots of resistant soybean infected by the soybean cyst nematode. Genome 47: 380–388

Alkharouf NW, Klink VP, Chouikha IB, Beard HS, MacDonald MH, Meyer S, Knap HT, Khan R, Matthews BF (2006) Timecourse microarray analyses reveal global changes in gene expression of susceptible Glycine max (soybean) roots during infection by Heterodera glycines (soybean cyst nematode). Planta 224: 838–852

Almagro L, Gómez Ros LV, Belchi-Navarro S, Bru R, Ros Barceló A, Pedreño MA (2009) Class III peroxidases in plant defence reactions. Journal of Experimental Botany 60: 377–390

An YC, Goettel W, Han Q, Bartels A, Liu Z, Xiao W (2017) Dynamic Changes of Genome-Wide DNA Methylation during Soybean Seed Development. Sci Rep. doi: 10.1038/s41598-017-12510-4

Anastasiadi D, Esteve-Codina A, Piferrer F (2018) Consistent inverse correlation between DNA methylation of the first intron and gene expression across tissues and species. & Chromatin 11: 37

Andrews S (2010) Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data. FastQC, https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Arikit S, Xia R, Kakrana A, Huang K, Zhai J, Yan Z, Valdés-López O, Prince S, Musket TA, Nguyen HT, et al (2014) An Atlas of Soybean Small RNAs Identifies Phased siRNAs from Hundreds of Coding Genes[W]. Plant Cell 26: 4584–4601

Ashfield T, Ong LE, Nobuta K, Schneider CM, Innes RW (2004) Convergent evolution of disease resistance gene specificity in two flowering plant families. Plant Cell 16: 309–318

Axtell MJ (2013) ShortStack: comprehensive annotation and quantification of small RNA genes. RNA (New York, NY) 19: 740–51

Bayless AM, Smith JM, Song J, McMinn PH, Teillet A, August BK, Bent AF (2016) Disease resistance through impairment of α-SNAP-NSF interaction and vesicular trafficking by soybean Rhg1. Proc Natl Acad Sci USA 113: E7375–E7382

Bayless AM, Zapotocny RW, Grunwald DJ, Amundson KK, Diers BW, Bent AF (2018) An atypical N-ethylmaleimide sensitive factor enables the viability of nematode-resistant Rhg1 soybeans. PNAS 115: E4512–E4521

Bent AF, Hoffman TK, Schmidt JS, Hartman GL, Hoffman DD, Xue P, Tucker ML (2006) Disease- and Performance-Related Traits of Ethylene-Insensitive Soybean. Crop Science 46: 893–901

119

Bernard RL, Noel GR, Anand SC, Shannon JG (1988) Registration of ‘Fayette’ Soybean. Crop Science 28: 1028–1029

Birkenbihl RP, Diezel C, Somssich IE (2012) Arabidopsis WRKY33 is a key transcriptional regulator of hormonal and metabolic responses towards Botrytis cinerea infection. Plant Physiology 111.192641

Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120

Bradley C (2017) Soybean Disease Loss Estimates From the United States and Ontario, Canada — 2015. Soybean Disease Management 4

Bybd Jr D, Kirkpatrick T, Barker K (1983) An improved technique for clearing and staining plant tissues for detection of nematodes. Journal of Nematology 15: 142

Cabrera J, Ruiz-Ferrer V, Fenoll C, Escobar C (2018) sRNAs involved in the regulation of plant developmental processes are altered during the root-knot nematode interaction for feeding site formation. Eur J Plant Pathol 152: 945–955

Cai D, Kleine M, Kifle S, Harloff HJ, Sandal NN, Marcker KA, Klein-Lankhorst RM, Salentijn EM, Lange W, Stiekema WJ, et al (1997) Positional cloning of a gene for nematode resistance in sugar beet. Science 275: 832–834

Cai S, Lashbrook CC (2008) Stamen Abscission Zone Transcriptome Profiling Reveals New Candidates for Abscission Control: Enhanced Retention of Floral Organs in Transgenic Plants Overexpressing Arabidopsis ZINC FINGER PROTEIN2. Plant Physiology 146: 1305–1321

Cao X, Aufsatz W, Zilberman D, Mette MF, Huang MS, Matzke M, Jacobsen SE (2003) Role of the DRM and CMT3 methyltransferases in RNA-directed DNA methylation. Curr Biol 13: 2212–2217

Catoni M, Tsang JM, Greco AP, Zabet NR (2018) DMRcaller: a versatile R/Bioconductor package for detection and visualization of differentially methylated regions in CpG and non-CpG contexts. Nucleic Acids Res 46: e114

Chang S, Puryear J, Cairney J (1993) A simple and efficient method for isolating RNA from pine trees. Plant Mol Biol Rep 11: 113–116

Chin S, Behm C, Mathesius U (2018) Functions of Flavonoids in Plant–Nematode Interactions. Plants 7: 85–85

Cho C-W, Chung E, Kim K, Soh H-A, Jeong YK, Lee S-W, Lee Y-C, Kim K-S, Chung Y-S, Lee J-H (2009) Plasma membrane localization of soybean matrix metalloproteinase differentially induced by senescence and abiotic stress. Biol Plant 53: 461

Chu S, Wang J, Zhu Y, Liu S, Zhou X, Zhang H, Wang CE, Yang W, Tian Z, Cheng H, et al (2017) An R2R3-type MYB transcription factor, GmMYB29, regulates isoflavone biosynthesis in soybean. PLoS Genetics 13: 1–26

Concibido VC, Diers BW, Arelli PR (2004) A decade of QTL mapping for cyst nematode resistance in soybean. Crop Science 44: 1121–1131

120

Conway JR, Lex A, Gehlenborg N (2017) UpSetR: An R package for the visualization of intersecting sets and their properties. Bioinformatics 33: 2938–2940

Cook DE, Bayless AM, Wang K, Guo X, Song Q, Jiang J, Bent AF (2014) Distinct Copy Number, Coding Sequence, and Locus Methylation Patterns Underlie Rhg1-Mediated Soybean Resistance to Soybean Cyst Nematode. Plant Physiology 165: 630–647

Cook DE, Lee TG, Guo X, Melito S, Wang K, Bayless AM, Wang J, Hughes TJ, Willis DK, Clemente TE, et al (2012) Copy number variation of multiple genes at Rhg1 mediates nematode resistance in soybean. Science 338: 1206–1209

Csardi G, Nepusz T (2006) The igraph software package for complex network research. InterJournal Complex Systems: 1695

Dai X, Zhao PX (2011) psRNATarget: a plant small RNA target analysis server. Nucleic Acids Res 39: W155-159

Dao TTH, Linthorst HJM, Verpoorte R (2011) Chalcone synthase and its functions in plant resistance. Phytochemistry Reviews 10: 397–412

Davis E, Tylka G (2000) Soybean cyst nematode disease. The Plant Health Instructor. doi: 10.1094/PHI- I-2000-0725-01

Dixon RA, Achnine L, Kota P, Liu C, Reddy MS, Wang L (2002) The phenylpropanoid pathway and plant defence—a genomics perspective. Molecular Plant Pathology 3: 371–390

Dowen RH, Pelizzola M, Schmitz RJ, Lister R, Dowen JM, Nery JR, Dixon JE, Ecker JR (2012) Widespread dynamic DNA methylation in response to biotic stress. Proc Natl Acad Sci USA 109: E2183-2191

Doyle JJ, Dickson EE (1987) Preservation of Plant Samples for DNA Restriction Endonuclease Analysis. Taxon 36: 715–722

Du J, Grant D, Tian Z, Nelson RT, Zhu L, Shoemaker RC, Ma J (2010a) SoyTEdb: a comprehensive database of transposable elements in the soybean genome. BMC Genomics 11: 113

Du Z, Zhou X, Ling Y, Zhang Z, Su Z (2010b) agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Research 38: W64–W70

Edens R, Anand S, Bolla R (1995) Enzymes of the phenylpropanoid pathway in soybean infected with Meloidogyne incognita or Heterodera glycines. Journal of Nematology 27: 292

Fei Q, Xia R, Meyers BC (2013) Phased, secondary, small interfering RNAs in posttranscriptional regulatory networks. Plant Cell 25: 2400–2415

Fei Q, Yu Y, Liu L, Zhang Y, Baldrich P, Dai Q, Chen X, Meyers BC (2018) Biogenesis of a 22-nt microRNA in Phaseoleae species by precursor-programmed uridylation. PNAS 115: 8037–8042

Gao H, Bhattacharyya MK (2008) The soybean-Phytophthora resistance locus Rps1-k encompasses coiled coil-nucleotide binding-leucine rich repeat-like genes and repetitive sequences. BMC Plant Biol 8: 29

121

Gel B, Serra E (2017) karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics 33: 3088–3090

Gheysen G, Mitchum MG (2008) Molecular insights in the susceptible plant response to nematode infection.

Gómez-Rubio V (2017) ggplot2 - Elegant Graphics for Data Analysis (2nd Edition). Journal of Statistical Software; Vol 1, Book Review 2 (2017). doi: 10.18637/jss.v077.b02

Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, et al (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40: D1178–D1186

Grant CE, Bailey TL, Noble WS (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics 27: 1017–1018

Grant D, Nelson RT, Cannon SB, Shoemaker RC (2010) SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res 38: D843–D846

Grant MR, Godiard L, Straube E, Ashfield T, Lewald J, Sattler A, Innes RW, Dangl JL (1995) Structure of the Arabidopsis RPM1 gene enabling dual specificity disease resistance. Science 269: 843–846

Gu Z, Gu L, Eils R, Schlesner M, Brors B (2014) circlize Implements and enhances circular visualization in R. Bioinformatics 30: 2811–2812

Guttikonda SK, Trupti J, Bisht NC, Chen H, An Y-QC, Pandey S, Xu D, Yu O (2010) Whole genome co-expression analysis of soybean cytochrome P450 genes identifies nodulation-specific P450 monooxygenases. BMC Plant Biol 10: 243

Hermsmeier D, Hart JK, Byzova M, Rodermel SR, Baum TJ (2000) Changes in mRNA Abundance within Heterodera schachtii-Infected Roots of Arabidopsis thaliana. MPMI 13: 309–315

Hess N, Klode M, Anders M, Sauter M (2011) The hypoxia responsive transcription factor genes ERF71/HRE2 and ERF73/HRE1 of Arabidopsis are differentially regulated by ethylene. Physiol Plant 143: 41–49

Hewezi T, Howe P, Maier TR, Baum TJ (2008) Arabidopsis Small RNAs and Their Targets During Cyst Nematode Parasitism. MPMI 21: 1622–1634

Hewezi T, Lane T, Piya S, Rambani A, Rice JH, Staton M (2017) Cyst Nematode Parasitism Induces Dynamic Changes in the Root Epigenome. Plant Physiology 174: 405–420

Hobecker KV, Reynoso MA, Bustos-Sanmamed P, Wen J, Mysore KS, Crespi M, Blanco FA, Zanetti ME (2017) The MicroRNA390/TAS3 Pathway Mediates Symbiotic Nodulation and Lateral Root Growth. Plant Physiology 174: 2469–2486

Hofmann J, Hess PH, Szakasits D, Blöchl A, Wieczorek K, Daxböck-Horvath S, Bohlmann H, van Bel AJ, Grundler FM (2009) Diversity and activity of sugar transporters in nematode-induced root syncytia. Journal of Experimental Botany 60: 3085–3095

122

Hosseini P, Matthews BF (2014) Regulatory interplay between soybean root and soybean cyst nematode during a resistant and susceptible reaction. BMC plant biology 14: 300

Hu Y, You J, Li C, Williamson VM, Wang C (2017) Ethylene response pathway modulates attractiveness of plant roots to soybean cyst nematode Heterodera glycines. Sci Rep. doi: 10.1038/srep41282

Ithal N, Recknor J, Nettleton D, Hearne L, Maier T, Baum TJ, Mitchum MG (2007) Parallel genome- wide expression profiling of host and pathogen during soybean cyst nematode infection of soybean. Mol Plant Microbe Interact 20: 293–305

Jain S, Chittem K, Brueggeman R, Osorno JM, Richards J, Nelson BD (2016) Comparative transcriptome analysis of resistant and susceptible common bean genotypes in response to soybean cyst nematode infection. PLoS ONE 11: e0159338

Jin J, Hewezi T, Baum TJ (2011) Arabidopsis peroxidase AtPRX53 influences cell elongation and susceptibility to heterodera schachtii. Plant Signaling and Behavior 6: 1778–1786

Jin J, Tian F, Yang D-C, Meng Y-Q, Kong L, Luo J, Gao G (2017) PlantTFDB 4.0: toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res 45: D1040– D1045

Kabelka E, Carlson S, Diers B (2005) Localization of two loci that confer resistance to soybean cyst nematode from Glycine soja PI 468916. Crop Science 45: 2473–2481

Kammerhofer N, Radakovic Z, Regis JM, Dobrev P, Vankova R, Grundler FM, Siddique S, Hofmann J, Wieczorek K (2015) Role of stress‐related hormones in plant defence during early infection of the cyst nematode Heterodera schachtii in Arabidopsis. New Phytologist 207: 778–789

Kandoth PK, Ithal N, Recknor J, Maier T, Nettleton D, Baum TJ, Mitchum MG (2011) The Soybean Rhg1 Locus for Resistance to the Soybean Cyst Nematode Heterodera glycines Regulates the Expression of a Large Number of Stress- and Defense-Related Genes in Degenerating Feeding Cells. PLANT PHYSIOLOGY 155: 1960–1975

Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Research 28: 27–30

Kanehisa M, Sato Y, Morishima K (2016) BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. J Mol Biol 428: 726–731

Kim D, Langmead B, Salzberg SL (2015a) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12: 357–360

Kim JC, Lee SH, Cheong YH, Yoo CM, Lee SI, Chun HJ, Yun DJ, Hong JC, Lee SY, Lim CO, et al (2001) A novel cold-inducible zinc finger protein from soybean, SCOF-1, enhances cold tolerance in transgenic plants. Plant J 25: 247–259

Kim KD, El Baidouri M, Abernathy B, Iwata-Otsubo A, Chavarro C, Gonzales M, Libault M, Grimwood J, Jackson S a. (2015b) A comparative epigenomic analysis of polyploidy-derived genes in soybean and common bean. Plant Physiology 168: 1433–1447

123

Kim M, Diers BW (2013) Fine mapping of the SCN resistance QTL cqSCN-006 and cqSCN-007 from Glycine soja PI 468916. Crop Science 53: 775–785

Kim M, Hyten DL, Bent AF, Diers BW (2010) Fine Mapping of the SCN Resistance Locus rhg1-b from PI 88788. The Plant Genome Journal 3: 81

Kim YH, Riggs RD, Kim KS (1987) Structural Changes Associated with Resistance of Soybean to Heterodera glycines. J Nematol 19: 177–187

Kinzler AJ, Prokopiak ZA, Vaughan MM, Erhardt PW, Sarver JG, Trendel JA, Zhang Z, Dafoe NJ (2016) Cytochrome P450, CYP93A1, as defense marker in soybean. Biol Plant 60: 724–730

Klink VP, Hosseini P, MacDonald MH, Alkharouf NW, Matthews BF (2009) Population-specific gene expression in the plant pathogenic nematode Heterodera glycines exists prior to infection and during the onset of a resistant or susceptible reaction in the roots of the Glycine max genotype Peking. BMC genomics 10: 111

Klink VP, Hosseini P, Matsye PD, Alkharouf NW, Matthews BF (2011) Differences in gene expression amplitude overlie a conserved transcriptomic program occurring between the rapid and potent localized resistant reaction at the syncytium of the Glycine max genotype Peking (PI 548402) as compared to the prolonged and potent. Plant Molecular Biology 75: 141–165

Klink VP, Hosseini P, Matsye PD, Alkharouf NW, Matthews BF (2010) Syncytium gene expression in Glycine max([PI 88788]) roots undergoing a resistant reaction to the parasitic nematode Heterodera glycines. Plant Physiol Biochem 48: 176–193

Klink VP, Overall CC, Alkharouf NW, MacDonald MH, Matthews BF (2007a) A time-course comparative microarray analysis of an incompatible and compatible response by Glycine max (soybean) to Heterodera glycines (soybean cyst nematode) infection. Planta 226: 1423–1447

Klink VP, Overall CC, Alkharouf NW, MacDonald MH, Matthews BF (2007b) Laser capture microdissection (LCM) and comparative microarray expression analysis of syncytial cells isolated from incompatible and compatible soybean (Glycine max) roots infected by the soybean cyst nematode (Heterodera glycines). Planta 226: 1389–1409

Korkina L (2007) Phenylpropanoids as naturally occurring antioxidants: from plant defense to human health. Cellular and Molecular Biology 53: 15–25

Kozomara A, Griffiths-Jones S (2014) miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res 42: D68–D73

Krueger F (2015) Trim Galore:A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files.

Krueger F, Andrews SR (2011) Bismark: A flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27: 1571–1572

Langmead B (2010) Aligning short sequencing reads with Bowtie. Current protocols in bioinformatics 32: 11–7

124

Law JA, Jacobsen SE (2011) Establising, maintaining and modifying DNA methylation patterns in plants and animals. Nat Rev Genet 11: 204–220

Lee TG, Diers BW, Hudson ME (2016) An efficient method for measuring copy number variation applied to improvement of nematode resistance in soybean. The Plant Journal 88: 143–153

Lee TG, Kumar I, Diers BW, Hudson ME (2015) Evolution and selection of Rhg1, a copy-number variant nematode-resistance locus. Molecular Ecology 24: 1774–1791

Li Q, Li Y, Moose SP, Hudson ME (2015) Transposable elements, mRNA expression level and strand- specificity of small RNAs are associated with non-additive inheritance of gene expression in hybrid plants. BMC Plant Biology 15: 168

Li S, Chen Y, Zhu X, Wang Y, Jung K-H, Chen L, Xuan Y, Duan Y (2018) The transcriptomic changes of Huipizhi Heidou (Glycine max), a nematode-resistant black soybean during Heterodera glycines race 3 infection. Journal of Plant Physiology 220: 96–104

Li X, Wang X, Zhang S, Liu D, Duan Y, Dong W (2012) Identification of soybean microRNAs involved in soybean cyst nematode infection by deep sequencing. PLoS ONE 7: e39650

Li XY, Wang X, Zhang SP, Liu DW, Duan YX, Dong W (2011) Comparative profiling of the transcriptional response to soybean cyst nematode infection of soybean roots by deep sequencing. Chinese Science Bulletin 56: 1904–1911

Li YH, Chen SY (2005) Effect of the rhg1 Gene on Population Development of Heterodera glycines. Journal of nematology 37: 168–177

Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30: 923–930

Licausi F, van Dongen JT, Giuntoli B, Novi G, Santaniello A, Geigenberger P, Perata P (2010) HRE1 and HRE2, two hypoxia-inducible ethylene response factors, affect anaerobic responses in Arabidopsis thaliana. Plant J 62: 302–315

Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR (2008) Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis. Cell 133: 523–536

Liu C-J, Huhman D, Sumner LW, Dixon RA (2003) Regiospecific hydroxylation of isoflavones by cytochrome p450 81E enzymes from Medicago truncatula. Plant J 36: 471–484

Liu S, Kandoth PK, Warren SD, Yeckel G, Heinz R, Alden J, Yang C, Jamai A, El-Mellouki T, Juvale PS, et al (2012) A soybean cyst nematode resistance gene points to a new mechanism of plant resistance to pathogens. Nature 492: 256–260

Liu Y, Dammann C, Bhattacharyya MK (2001) The matrix metalloproteinase gene GmMMP2 is activated in response to pathogenic infections in soybean. Plant Physiol 127: 1788–1797

Lu Y-B, Qi Y-P, Yang L-T, Guo P, Li Y, Chen L-S (2015) Boron-deficiency-responsive microRNAs and their targets in Citrus sinensis leaves. BMC Plant Biol 15: 271

125

Mahalingam R, Knap HT, Lewis SA (1998) Inoculation Method for Studying Early Responses of Glycine max to Heterodera glycines. J Nematol 30: 237–240

Marin E, Jouannet V, Herz A, Lokerse AS, Weijers D, Vaucheret H, Nussaume L, Crespi MD, Maizel A (2010) miR390, Arabidopsis TAS3 tasiRNAs, and their AUXIN RESPONSE FACTOR targets define an autoregulatory network quantitatively regulating lateral root growth. The Plant cell 22: 1104–17

Matsye PD, Kumar R, Hosseini P, Jones CM, Tremblay A, Alkharouf NW, Matthews BF, Klink VP (2011) Mapping cell fate decisions that occur during soybean defense responses. Plant Mol Biol 77: 513–528

Matsye PD, Lawrence GW, Youssef RM, Kim KH, Lawrence KS, Matthews BF, Klink VP (2012) The expression of a naturally occurring, truncated allele of an α-SNAP gene suppresses plant parasitic nematode infection. Plant Molecular Biology 80: 131–155

Matthews BF, Beard H, MacDonald MH, Kabir S, Youssef RM, Hosseini P, Brewer E (2013) Engineered resistance and hypersusceptibility through functional metabolic studies of 100 genes in soybean to its major pathogen, the soybean cyst nematode. Planta 237: 1337–1357

Matzke MA, Mosher RA (2014) RNA-directed DNA methylation: an epigenetic pathway of increasing complexity. Nature Reviews Genetics 15: 394–408

Mazarei M, Liu W, Al-Ahmad H, Arelli PR, Pantalone VR, Stewart CN (2011) Gene expression profiling of resistant and susceptible soybean lines infected with soybean cyst nematode. Theoretical and Applied Genetics 123: 1193–1206

Mazarei M, Puthoff DP, Hart JK, Rodermel SR, Baum TJ (2002) Identification and Characterization of a Soybean Ethylene-Responsive Element-Binding Protein Gene Whose mRNA Expression Changes During Soybean Cyst Nematode Infection. MPMI 15: 577–586

Miraeiz E, Chaiprom U, Afsharifar A, Karegar A, Drnevich J, Hudson M (submitted) Early transcriptional response to soybean cyst nematode HG type 0 show genetic differences among resistant and susceptible soybeans.

Mittler R, Kim Y, Song L, Coutu J, Coutu A, Ciftci-Yilmaz S, Lee H, Stevenson B, Zhu J-K (2006) Gain- and loss-of-function mutations in Zat10 enhance the tolerance of plants to abiotic stress. FEBS Lett 580: 6537–6542

Morales AMAP, O’Rourke JA, Mortel M van de, Scheider KT, Bancroft TJ, Borém A, Nelson RT, Nettleton D, Baum TJ, Shoemaker RC, et al (2013) Transcriptome analyses and virus induced gene silencing identify genes in the Rpp4-mediated Asian soybean rust resistance pathway. Functional Plant Biol 40: 1029–1047

Mudge J (1999) Marker-assisted selection for soybean cyst nematode resistance. Ph.D. thesis. University of Minnesota, St. Paul, MN.

Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Physical Review E. doi: 10.1103/PhysRevE.69.026113

126

Niblack TL, Colgrove AL, Colgrove K, Bond JP (2008) Shift in Virulence of Soybean Cyst Nematode is Associated with Use of Resistance from PI 88788. Plant Health Progress 9: 29

Niblack TL, Tylka GL, Arelli P, Bond J, Diers B, Donald P, Faghihi J, Gallo K, Heinz RD, Lopez- Nicora H, et al (2009) A Standard Greenhouse Method for Assessing Soybean Cyst Nematode Resistance in Soybean: SCE08 (Standardized Cyst Evaluation 2008). Plant Health Progress 10: 33

Noon JB, Hewezi T, Baum TJ (2019) Homeostasis in the soybean miRNA396–GRF network is essential for productive soybean cyst nematode infections. J Exp Bot 70: 1653–1668

Pauwels L, Goossens A (2008) Fine-tuning of early events in the jasmonate response. Plant Signal Behav 3: 846–847

Puthoff DP, Ehrenfried ML, Vinyard BT, Tucker ML (2007) GeneChip profiling of transcriptional responses to soybean cyst nematode, Heterodera glycines, colonization of soybean roots. Journal of Experimental Botany 58: 3407–3418

Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26: 841–842

Rambani A, Rice JH, Liu J, Lane T, Ranjan P, Mazarei M, Pantalone V, Stewart CN, Staton M, Hewezi T (2015) The Methylome of Soybean Roots during the Compatible Interaction with the Soybean Cyst Nematode. Plant Physiol 168: 1364–1377

Reinprecht Y, Perry GE, Pauls KP (2017) A Comparison of Phenylpropanoid Pathway Gene Families in Common Bean. Focus on P450 and C4H Genes. doi: 10.1007/978-3-319-63526-2_11

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43: e47–e47

Robinson M (2010) EdgeR: A biocon-MD, McCarthy DJ and Smyth GK: EdgeR: A biocon-DJ and Smyth GK: EdgeR: A biocon-GK: EdgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139–140

Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139–140

Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11: R25

Rossel JB, Wilson PB, Hussain D, Woo NS, Gordon MJ, Mewett OP, Howell KA, Whelan J, Kazan K, Pogson BJ (2007) Systemic and Intracellular Responses to Photooxidative Stress in Arabidopsis. Plant Cell 19: 4091–4110

Ruben E, Jamai A, Afzal J, Njiti V, Triwitayakorn K, Iqbal M, Yaegashi S, Bashir R, Kazi S, Arelli P (2006) Genomic analysis of the rhg1 locus: candidate genes that underlie soybean resistance to the cyst nematode. Molecular Genetics Genomics 276: 503–516

Schmitz RJ, He Y, Valdés-López O, Khan SM, Joshi T, Urich MA, Nery JR, Diers B, Xu D, Stacey G, et al (2013) Epigenome-wide inheritance of cytosine methylation variants in a recombinant inbred population. Genome Research 23: 1663–1674

127

Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, et al (2010) Genome sequence of the palaeopolyploid soybean. Nature 463: 178–183

Schopfer CR, Ebel J (1998) Identification of elicitor-induced cytochrome P450s of soybean (Glycine max L.) using differential display of mRNA. Mol Gen Genet 258: 315–322

Schultz MD, Schmitz RJ, Ecker JR (2012) ‘Leveling’ the playing field for analyses of single-base resolution DNA methylomes. Trends Genet 28: 583–585

Selote D, Kachroo A (2010) RPG1-B-Derived Resistance to AvrB-Expressing Pseudomonas syringae Requires RIN4-Like Proteins in Soybean. Plant Physiology 153: 1199–1211

Shin K, Lee S, Song W-Y, Lee R-A, Lee I, Ha K, Koo J-C, Park S-K, Nam H-G, Lee Y, et al (2015) Genetic identification of ACC-RESISTANT2 reveals involvement of LYSINE HISTIDINE TRANSPORTER1 in the uptake of 1-aminocyclopropane-1-carboxylic acid in Arabidopsis thaliana. Plant Cell Physiol 56: 572–582

Shivaprasad PV, Chen H-M, Patel K, Bond DM, Santos BACM, Baulcombe DC (2012) A microRNA superfamily regulates nucleotide binding site-leucine-rich repeats and other mRNAs. Plant Cell 24: 859–874

Shu Y, Liu Y, Li W, Song L, Zhang J, Guo C (2016) Genome-Wide Investigation of MicroRNAs and Their Targets in Response to Freezing Stress in Medicago sativa L., Based on High-Throughput Sequencing. G3 (Bethesda) 6: 755–765

Song Q-X, Lu X, Li Q-T, Chen H, Hu X-Y, Ma B, Zhang W-K, Chen S-Y, Zhang J-S (2013) Genome- wide analysis of DNA methylation in soybean. Molecular plant 6: 1961–74

Stroud H, Do T, Du J, Zhong X, Feng S, Johnson L, Patel DJ, Jacobsen SE (2014) The roles of non- CG methylation in Arabidopsis. Nat Struct Mol Biol 21: 64–72

Sun F, Liu P, Xu J, Dong H (2010) Mutation in RAP2.6L, a transactivator of the ERF transcription factor family, enhances Arabidopsis resistance to Pseudomonas syringae. Physiological and Molecular Plant Pathology 74: 295–302

Sun Y, Mui Z, Liu X, Yim AK-Y, Qin H, Wong F-L, Chan T-F, Yiu S-M, Lam H-M, Lim BL (2016) Comparison of Small RNA Profiles of Glycine max and Glycine soja at Early Developmental Stages. Int J Mol Sci. doi: 10.3390/ijms17122043

Teillet A, Dybal K, Kerry BR, Miller AJ, Curtis RH, Hedden P (2013) Transcriptional changes of the root-knot nematode Meloidogyne incognita in response to Arabidopsis thaliana root signals. PloS One 8: e61259

Tian B, Wang S, Todd TC, Johnson CD, Tang G, Trick HN (2017) Genome-wide identification of soybean microRNA responsive to soybean cyst nematodes infection by deep sequencing. BMC Genomics 18: 572

Tucker ML, Xue P, Yang R (2010) 1-Aminocyclopropane-1-carboxylic acid (ACC) concentration and ACC synthase expression in soybean roots, root tips, and soybean cyst nematode (Heterodera glycines)-infected roots. J Exp Bot 61: 463–472

128

Tylka GL (2016) More SCN-Resistant Soybean Varieties Than Ever , but Diversity of Resistance is Lacking. Integrated Crop Management, http://crops.extension.iastate.edu/cropnews/2016/10/more- scn-resistant-soybean-varieties-ever-diversity-resistance-lacking

Tylka GL, Marett CC (2014) Distribution of the Soybean Cyst Nematode, Heterodera glycines, in the United States and Canada: 1954 to 2014. Plant Health Progress 15: 14–16

Wan J, Vuong T, Jiao Y, Joshi T, Zhang H, Xu D, Nguyen HT (2015) Whole-genome gene expression profiling revealed genes and pathways potentially involved in regulating interactions of soybean with cyst nematode (Heterodera glycines Ichinohe). BMC Genomics 16: 148

Wang D, Diers B, Arelli P, Shoemaker R, Genetics A (2001) Loci underlying resistance to race 3 of soybean cyst nematode in Glycine soja plant introduction 468916. Theoretical Applied Genetics 103: 561–566

Warnes MGR, Bolker B, Bonebakker L, Gentleman R (2016) Package ‘gplots’’.’ Various R Programming Tools for Plotting Data

Wen C-K, Chang C (2002) Arabidopsis RGL1 Encodes a Negative Regulator of Gibberellin Responses. Plant Cell 14: 87–100

Wickham H (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York

Winter SM, Shelp BJ, Anderson TR, Welacky TW, Rajcan I (2007) QTL associated with horizontal resistance to soybean cyst nematode in Glycine soja PI464925B. Theoretical Applied Genetics 114: 461–472

Wu J, Mao X, Cai T, Luo J, Wei L (2006) KOBAS server: a web-based platform for automated annotation and pathway identification. Nucleic Acids Research 34: W720–W724

Wubben MJ, Su H, Rodermel SR, Baum TJ (2001) Susceptibility to the sugar beet cyst nematode is modulated by ethylene signal transduction in Arabidopsis thaliana. Mol Plant Microbe Interact 14: 1206–1212

Xie C, Mao X, Huang J, Ding Y, Wu J, Dong S, Kong L, Gao G, Li C-Y, Wei L (2011) KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res 39: W316-322

Xu M, Li Y, Zhang Q, Xu T, Qiu L, Fan Y, Wang L (2014a) Novel MiRNA and PhasiRNA Biogenesis Networks in Soybean Roots from Two Sister Lines That Are Resistant and Susceptible to SCN Race 4. PLoS One. doi: 10.1371/journal.pone.0110051

Xu Y, Guo M, Liu X, Wang C, Liu Y (2014b) Inferring the soybean (Glycine max) microrna functional network based on target gene network. Bioinformatics 30: 94–103

Yang Y, Zhou Y, Chi Y, Fan B, Chen Z (2017) Characterization of Soybean WRKY Gene Family and Identification of Soybean WRKY Genes that Promote Resistance to Soybean Cyst Nematode. Scientific Reports 7: 17804

Yang Z, Ebright YW, Yu B, Chen X (2006) HEN1 recognizes 21–24 nt small RNA duplexes and deposits a methyl group onto the 2′ OH of the 3′ terminal nucleotide. Nucleic Acids Res 34: 667–675

129

Yao Y, Chen X, Wu A-M (2017) ERF-VII members exhibit synergistic and separate roles in Arabidopsis. Plant Signal Behav. doi: 10.1080/15592324.2017.1329073

Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden TL (2012) Primer-BLAST: A tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics 13: 134

Yu B, Yang Z, Li J, Minakhina S, Yang M, Padgett RW, Steward R, Chen X (2005) Methylation as a Crucial Step in Plant microRNA Biogenesis. Science 307: 932–935

Yu N, Diers BW (2017) Fine mapping of the SCN resistance QTL cqSCN-006 and cqSCN-007 from Glycine soja PI 468916. Euphytica 213: 54

Yu N, Lee TG, Rosa DP, Hudson M, Diers BW (2016) Impact of Rhg1 copy number, type, and interaction with Rhg4 on resistance to Heterodera glycines in soybean. Theoretical and Applied Genetics 129: 2403–2412

Yuan C-P, Li Y-H, Liu Z-X, Guan R-X, Chang R-Z, Qiu L-J (2012) DNA sequence polymorphism of the Rhg4 candidate gene conferring resistance to soybean cyst nematode in Chinese domesticated and wild soybeans. Molecular Breeding 30: 1155–1162

Yuan S, Li X, Li R, Wang L, Zhang C, Chen L, Hao Q, Zhang X, Chen H, Shan Z, et al (2018) Genome-Wide Identification and Classification of Soybean C2H2 Zinc Finger Proteins and Their Expression Analysis in Legume-Rhizobium Symbiosis. Front Microbiol 9: 126

Zakrzewski F, Schmidt M, Van Lijsebettens M, Schmidt T (2017) DNA methylation of retrotransposons, DNA transposons and genes in sugar beet (Beta vulgaris L.). Plant J 90: 1156– 1175

Zhai J, Jeong D-H, De Paoli E, Park S, Rosen BD, Li Y, González AJ, Yan Z, Kitto SL, Grusak MA, et al (2011) MicroRNAs as master regulators of the plant NB-LRR defense gene family via the production of phased, trans-acting siRNAs. Genes Dev 25: 2540–2553

Zhang H, Kjemtrup-Lovelace S, Li C, Luo Y, Chen LP, Song B-H (2017) Comparative RNA-Seq Analysis Uncovers a Complex Regulatory Network for Soybean Cyst Nematode Resistance in Wild Soybean (Glycine soja). Sci Rep 7: 9699

Zhang H, Song B-H (2017) RNA-seq data comparisons of wild soybean genotypes in response to soybean cyst nematode (Heterodera glycines). Genomics Data 14: 36–39

Zhao D, You Y, Fan H, Zhu X, Wang Y, Duan Y, Xuan Y, Chen L (2018) The Role of Sugar Transporter Genes during Early Infection by Root-Knot Nematodes. International Journal of Molecular Sciences 19: 302

Zhao M, Cai C, Zhai J, Lin F, Li L, Shreve J, Thimmapuram J, Hughes TJ, Meyers BC, Ma J (2015) Coordination of MicroRNAs, PhasiRNAs, and NB-LRR Genes in Response to a Plant Pathogen: Insights from Analyses of a Set of Soybean Rps Gene Near-Isogenic Lines. The Plant Genome. doi: 10.3835/plantgenome2014.09.0044

Zhao P, Zhang F, Liu D, Imani J, Langen G, Kogel K-H (2017a) Matrix metalloproteinases operate redundantly in Arabidopsis immunity against necrotrophic and biotrophic fungal pathogens. PLoS ONE 12: e0183577

130

Zhao Y, Chang X, Qi D, Dong L, Wang G, Fan S, Jiang L, Cheng Q, Chen X, Han D, et al (2017b) A Novel Soybean ERF Transcription Factor, GmERF113, Increases Resistance to Phytophthora sojae Infection in Soybean. Frontiers in Plant Science 8: 1–16

Zhu L, Guo J, Ma Z, Wang J, Zhou C (2018) Arabidopsis transcription factor MYB102 increases plant susceptibility to aphids by substantial activation of ethylene biosynthesis. Biomolecules. doi: 10.3390/biom8020039

131