Protein-Protein Interaction Assay in Phytophthora Sojae Using Yeast Two-Hybrid System

PROTEIN-PROTEIN INTERACTION ASSAY IN PHYTOPHTHORA SOJAE USING YEAST TWO-HYBRID SYSTEM

Abasi Aikebaierjiang

A Dissertation

Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

May 2020

Committee:

Vipaporn Phuntumart, Advisor

Pavel Anzenbacher Graduate Faculty Representative

Raymond Larsen

Paul Morris

Scott Rogers

Abasi Aikebaierjiang

Vipaporn Phuntumart, Advisor

An oomycete pathogen, Phytophthora sojae is one of the most serious threats to soybean production worldwide. Transcription factors are crucial for the survival of all living organisms including oomycetes. This feature provides a useful clue for selection of transcription factors as targets for the control methods.

A potential transcription factor in P. sojae, Ps1365 (PHYSODRAFT_342624) was discovered via yeast one-hybrid system (Rutter, 2012). In this research, I aimed to find the proteins which interact with Ps1365, with the hypothesis that these interactive proteins function together with Ps1365 to activate expression of other genes. Ps1365 is a small protein of 156 amino acids (17,424 Da). Genomic analysis indicated that Ps1365 is one of 22 paralogous proteins in P. sojae but only four orthologs were noted in P. infestans. Ps1365 was used as a bait in yeast two-hybrid analysis (Y2H) to screen cDNAs library of P. sojae mycelia. The assays showed that Ps1365 lacks autoactivation and toxicity in the yeast strain Saccharomyces cerevisiae

Y2HGold. Following the Y2H, high-throughput sequencing was performed and revealed two prey sequences that are potential partners of Ps1365. These sequences were identified as

PHYSODRAFT_291312 and PHYSODRAFT_356433.

Analysis of deduced amino acid sequences of PHYSODRAFT_291312 and

PHYSODRAFT_356433 showed that they are small globular proteins of 8,288 Da and 11,737.68

Da, respectively and contains α-helices, the simplest form of a transcription factor. SignalP analysis showed that both PHYSODRAFT_291312 and PHYSODRAFT_356433 lack signal peptide and do not contain nuclear localization signals. These predictions partly support our iv hypothesis that they can freely pass through nuclear membrane because of their small sizes to interact with Ps1365. Co-expression analysis using existing data from the Fungal and Oomycete

Genomics Resource (FungiDB) as well as from the National Center for Biotechnology

Information (NCBI) showed that Ps1365, PHYSODRAFT_291312 and

PHYSODRAFT_356433 expressed together during mycelial growth and during infection, further confirmed that Ps1365 and the two candidate prey proteins may function together to drive expression of the genes downstream.

All together, these results support the hypothesis that the two candidate P. sojae proteins,

PHYSODRAFT_291312 and PHYSODRAFT_356433 may interact with Ps1365 to regulate the expressions of downstream genes. v

This Dissertation is dedicated to my lovely parents, Abbas Abla and Maramnisa Mosayuf. vi ACKNOWLEDGMENTS

First of all, I would like to express my deep appreciation to my dissertation advisor, Dr.

Vipaporn Phuntumart, for the great role she has played, expert guidance she has provided, as well as the both academic and technical support she has given during the process of initiating, conducting and finalizing my research. When I joined her research group, Dr. Phuntumart made the process of adjustment to her lab much easier by patiently demonstrating how to use the equipment and tools in the lab to me. During the process of conducting my research, Dr.

Phuntumart provided me with constructive criticisms, guidances on how to develop new scientific ideas, ways to troubleshoot lab protocols, and many research and writing tips. Without this very generous and selfless provision of her time, it would have been almost impossible for me to complete my research and dissertation. I can say that Dr. Phuntumart is one of the best and most conscientious advisors I have ever seen in my life.

At this point, I also express my special gratefulness to the other members of my committee. When I was doing my research, my committee members provided me with the new research ideas, constructive criticisms, which guaranteed the continuation of my research work in a more fruitful direction. Their generous support also resulted in the successful completion of my preliminary examination, proposal as well as my dissertation defense and final oral examination.

Just at the beginning of this research, Dr. Paul Morris provided me with the first sample, the mycelium of Phytophthora sojae strain P6497. When my research encountered difficulties and bottlenecks, Dr. Morris provided me with expert direction and support. He also helped evaluate my analysis results and showed the crucial points to consider. During the process of taking courses and doing research, Dr. Scott Rogers made me completely realized the importance of thinking biological phenomena in the context of evolution, and this played great role when I was working on my research. When I was in the process of constructing the phylogenetic tree of vii Ps1365, Dr. Rogers helped me evaluate my analysis and showed me essential points to consider

and compliment. When I was writing my dissertation, Dr. Rogers took his precious time helped

me review my dissertation despite his own busy schedule and heavy workload and gave me

important revising suggestions, crucial criticisms and showed me the correct way to express and

organize ideas in the academic writing. Dr. Raymond Larsen made me become realize the

importance of research methods by repeatedly emphasizing that we should not forget to learn

about research methods previous people utilized in light of their “flashy” results, and this learning

strategy was a great help for me. In addition, Dr. Larsen gave me many valuable comments and

criticisms to enable me to orient my research strategies. Dr. Pavel Anzenbacher has guaranteed the

successful completion my preliminary exam, proposal meeting, dissertation defense and final oral

exam according to the rules of the Graduate College by closely collaborating with me as well as

all the other members of the committee.

Here, I also would like to thanks all the former and current lab members of the

Phuntumart Lab. Eric Budge, Rebecca Cull and Angsana Keeratijarut showed me the way to

correctly use a lab equipment whenever I asked them. Dilshan Beligala, Shannon Miller, James

Artman, Alexander Howard, Gayathri Beligala, Satyaki Ghosh, Maheshi Kukulekanagme and

Kevin Rowlands provided me with their generous help whenever I needed of assistance. I would

say that without the help from them, it would not have been easy for me the successfully complete

the research.

Moreover, I express my gratitude to Bowling Green State University and the United

States Department of Agriculture - National Institute of Food and Agriculture (USDA-NIFA)

Agriculture and Food Research Initiative Oomycete-Soybean CAP (award 2011-68004-30104) for providing my research with funds. I also express my gratefulness to staff from both

Department of Biological Sciences and Bowling Green State University because they provided me viii with a supportive learning environment. I also wish to thank the Graduate College for providing me with the excellent academic service.

Finally, I would like to express my thankfulness to both of my parents, Abbas Abla and

Maramnisa Mosayuf for the material and emotional support they provided during the entire process of my study at BGSU. They gave me the courage and confidence whenever my study encountered problems and hardships. Without their support, it would have been difficult to achieve the success of today’s results. ix

TABLE OF CONTENTS

Page

CHAPTER 1. INTRODUCTION ...... 1

Introduction ...... 1

I Phytophthora sojae ...... 1

General Information ...... 1

Taxonomy ...... 2

Life Cycle ...... 5

II Gene Regulation ...... 6

III Gene Regulation in Oomycetes...... 8

Changes of Transcription Levels in Oomycetes ...... 8

Gene Silencing in Oomycetes ...... 9

Regulatory Elements in Oomycetes ...... 11

Transcription Factors in Oomycetes ...... 13

References ...... 17

CHAPTER 2. YEAST TWO-HYBRID ANALYSIS ...... 23

Introduction ...... 23

I Yeast Two-hybrid Assay (Y2H) ...... 23

II Phylogenetic Analysis ...... 25

Hypotheses and Aims ...... 26

Material and Methods ...... 27

I Biological Material ...... 27

II Bioinformatic Analysis of Ps1365 ...... 31

III Y2H Bait Construction ...... 35 x

IV Bait Autoactivation Assay ...... 45

V Bait Toxicity Assay ...... 50

VI Prey Library Construction ...... 50

VII Yeast Two-hybrid Screening ...... 54

VIII Prey Colony Insert-Checking ...... 55

IX Plasmid Rescue ...... 56

Results ...... 57

Aim 1) Analysis of the Bait Sequence, Ps1365 ...... 57

Bait Sequence Analysis ...... 57

Hydrophobicity Analysis ...... 61

Signal Peptide and Conserved Domain Prediction ...... 61

Secondary Structure Prediction ...... 63

Subcellular Localization Prediction ...... 64

In Silico Gene Expression Analysis ...... 66

Phylogenetic Analysis ...... 66

Aim 2) Cloning the Bait Sequence, Ps1365 into Yeast, Saccharomyces

cerevisiae Y2HGold ...... 69

Bait Vector Construction ...... 69

Analysis of Bait Construct...... 76

Bait Transformation and Autoactivation Assay ...... 78

Bait Toxicity Assay ...... 81

Aim 3) Identify Interactive Proteins of a Novel P. sojae Transcription Factor,

Ps1365 Using Yeast Two-hybrid Assay ...... 81

Prey Library Construction ...... 81

Y2H Screening ...... 86 xi

Plasmid Rescue ...... 92

Discussion ...... 95

References ...... 97

CHAPTER 3. ANALYSIS OF THE POTENTIAL INTERACTOR PROTEINS ...... 103

Introduction ...... 103

I Bioinformatic Analysis of Interactor Proteins of Ps1365 ...... 103

Hypotheses and Aims ...... 104

Materials and Methods ...... 105

I Yeast Two-hybrid Assay ...... 105

II Bioinformatic Analysis ...... 106

Results ...... 110

Aim 1) Analysis of the Potential Interactive Sequences of Ps1365 Obtained

from Yeast Two-hybrid Assay ...... 110

Analysis of the Potential Prey Inserts ...... 110

Analysis of the Potential Prey Inserts by BLAST ...... 112

Validation of the Gene Models of the Prey Candidates ...... 112

Protein Features of PHYSODRAFT_291312 ...... 116

PHYSODRAFT_291312 against Protein Databases ...... 117

PHYSODRAFT_291312: Globular or Membrane ...... 117

Domain and Protein Family Prediction of PHYSODRAFT_291312 ..... 121

Secondary Structure of PHYSODRAFT_291312 ...... 121

Tertiary Structure and Functional Analysis of

PHYSODRAFT_291312 ...... 122

Signal Peptide and Subcellular Localization Prediction of

PHYSODRAFT_291312 ...... 126 xii

Protein Features of PHYSODRAFT_356433 ...... 129

PHYSODRAFT_356433: Globular or Membrane ...... 129

Analysis of PHYSODRAFT_356433 against Protein Databases...... 133

Tertiary Structure and Functional Prediction of

PHYSODRAFT_356433 ...... 135

Signal Peptide and Subcellular Localization Analyses of

PHYSODRAFT_356433 ...... 139

Co-expression Analysis of Bait and Prey Protein Genes ...... 140

Discussion ...... 143

References ...... 148

APPENDIX A. CONSENSUS SEQUENCES OF THE FOUR PREY CONTIGS

OBTAINED FROM SEQUENCHER ...... 160

xiii

LIST OF FIGURES

Figure Page

1.1 Eukaryotic tree of life ...... 4

2.1 The two-hybrid principle ...... 24

2.2 Parameters used for constructing the phylogenetic tree of Ps1365, its 21

homologous proteins in P. sojae as well as some of their homologous counterparts

in other oomycetes ...... 33

2.3 pGBKT7 Vector picture ...... 37

2.4 Vectors used for the autoactivation assay ...... 49

2.5 Partial amino acid sequence of Ps1365 ...... 59

2.6 Full-length nucleotide (A) and deduced amino acid sequences (B) of Ps1365 ...... 60

2.7 The Kyte-Doolittle Hydropathy Plot analysis of Ps1365 ...... 61

2.8 SignalP analysis showed that Ps1365 did not contain any cleavage site indicating

a recognizable signal peptide ...... 62

2.9 Predicted secondary structure of Ps1365 by JPred 4, a secondary structure

predictor server ...... 64

2.10 Prediction of Ps1365 subcellular localization by CELLO ...... 65

2.11 In silico gene expression analysis of Ps1365 in three different developmental

stages of P. sojae, mycelium, cyst and infection...... 66

2.12 Phylogenetic divergences of the 22 proteins belong to a novel protein family in

P. sojae, including Ps1365 ...... 68

2.13 Agarose gel electrophoresis analysis of Ps1365 gradient PCR products ...... 70

xiv

2.14 Agarose gel electrophoresis analysis of pGBKT7 (BD) vector after restriction

digestion with EcoRI and/or BamHI at 37°C for one hour ...... 72

2.15 Agarose gel electrophoresis analysis of colony PCR reactions of transformed

TOP10 E. coli cells containing the vector pGBKT7-Ps1365 ...... 74

2.16 Agarose gel electrophoresis of colony PCR of E. coli transformant colony #4

containing pGBKT7-Ps1365 ...... 75

2.17 Alignment of the Ps1365 gene sequence in plasmid pGBKT7-Ps1365 with the

Ps1365 database gene sequence ...... 78

2.18 Autoactivation assay of Ps1365-encoding gene in S. cerevisiae Y2HGold ...... 80

2.19 Toxicity assay of Ps1365-encoding gene in S. cerevisiae Y2HGold...... 81

2.20 Agarose gel electrophoresis of total RNA extracted from P. sojae mycelium ...... 83

2.21 Agarose gel electrophoresis of P. sojae mycelium first-strand cDNAs synthesized

using SMART III Oligo, CDSIII and CDSIII/6 primers ...... 84

2.22 Agarose gel electrophoresis of LD-PCR amplicons of P. sojae mycelium cDNAs

synthesized using 5’ PCR and 3’ PCR primers ...... 85

2.23 Agarose gel electrophoresis of bait colony PCR using Ps1365 gene-specific

forward and reverse primers ...... 87

2.24 Agarose gel electrophoresis of blue diploid colonies insert-check PCR using

Matchmaker Insert Check PCR Mix 2 ...... 89

2.25 Agarose gel electrophoresis of blue diploid colonies batch insert-check PCR

reactions using Matchmaker Insert Check PCR Mix 2 ...... 91

2.26 Agarose gel electrophoresis of plasmids extracted from E. coli after transformed

with prey plasmids ...... 92 xv

2.27 Agarose gel electrophoresis of E. coli transformants colony PCR and direct

PCR of plasmids from some E. coli colonies using Matchmaker Insert Check

PCR Mix 2 ...... 94

3.1 Default parameters of Sequencher 5.4.5 used to align the prey sequences ...... 106

3.2 Alignment of the PHYSODRAFT_291312 (is shown here as

PHYSODRAFT_291312-t26_1) gene model with RNA-Seq data ...... 113

3.3 DNA sequence of the predicted new gene model of PHYSODRAFT_291312

after aligning the original gene model from FungiDB with RNA-Seq data from

the same database ...... 114

3.4 Alignment of PHYSODRAFT_356433 gene model (is shown here as

PHYSODRAFT_356433-t26_1) with RNA-Seq data ...... 116

3.5 Features of PHYSODRAFT_291312 protein achieved from FungiDB ...... 117

3.6 Prediction of potential transmembrane domains in PHYSODRAFT_291312 ...... 120

3.7 Secondary structure prediction of PHYSODRAFT_291312 ...... 122

3.8 The structural model of PHYSODRAFT_291312 predicted by Phyre2 web

server ...... 123

3.9 Tertiary structure of PHYSODRAFT_291312 predicted by RaptorX ...... 124

3.10 SignalP Signal Peptide Prediction of PHYSODRAFT_291312 ...... 125

3.11 Analysis of PHYSODRAFT_291312 by CoSiDe Combined Signal Peptide

Predictor predicted the best cleavage site at the 23rd amino acid residue ...... 127

3.12 Subcellular localization prediction of PHYSODRAFT_291312 using Phobius .... 128

3.13 Features of PHYSODRAFT_356433 protein achieved from FungiDB ...... 129

3.14 Prediction of potential transmembrane domains in PHYSODRAFT_356433 ...... 130 xvi

3.15 Secondary structure prediction of PHYSODRAFT_356433 ...... 134

3.16 Structure model of PHYSODRAFT_356433 by Phyre2 web server represented

in ribbon diagram ...... 136

3.17 Prediction of PHYSODRAFT_356433 tertiary structure by RaptorX ...... 137

3.18 SignalP Signal Peptide Prediction showed that PHYSODRAFT_356433 contains

no signal peptide because no cleavage site was observed from all the three scores 138

3.19 Prediction of subcellular localization of PHYSODRAFT_356433 by Phobius ..... 139

3.20 Transcription levels of three P. sojae protein genes: Ps1365

(PHYSODRAFT_342624), PHYSODRAFT_291312 and

PHYSODRAFT_356433 during the three developmental stages of P. sojae ...... 142

xvii

LIST OF TABLES

Table Page

2.1 Primers in this research ...... 38

3.1 NCBI BLAST analysis results of the contigs of P. sojae prey inserts ...... 111

CHAPTER 1. INTRODUCTION

Introduction

I Phytophthora sojae

General Information

Phytophthora sojae is one of the most important plant pathogens, causing substantial

losses in worldwide soybean production every year. Some wildflower species, for example,

lupins, have also been reported to be host for P. sojae infection (Tyler, 2007). Phytophthora

sojae belongs to Class Oomycota, literally meaning “egg fungus”. There are numerous

species belonging to Genus Phytophthora and almost all of them are plant pathogens. For

example, P. infestans causes blight disease in potatoes (Erwin, Bartnicki-Garcia, & Tsao,

1983); and P. citrophthora and P. nicotianae cause fruit and root rot, as well as gummosis on

citrus (Ahmed et al., 2012). Some Phytophthora species have very high host specificity, such

as P. sojae infects soybean as its primary host (Tyler, 2007), while other members of

Phytophthora can infect a wide range of plants. For example, P. nicotianae infects more than

72 genera of plants (Grote et al., 2002).

The Genus Phytophthora may encompass 200-600 species (Brasier, 2007), and among them more than 50 species have been reported to severely affect plants, including

economically important crops (Erwin et al., 1983); for example, P. infestans that resulted in

the famous Irish Great Famine of 1845-49 (Erwin et al., 1983; Yoshida et al., 2013); P. ramorum, responsible for sudden oak death of oak trees (Tyler et al., 2006). Meanwhile, P. sojae has been causing substantial decreases in soybean production worldwide for decades. It 2 has been reported to cause $1-2 billion dollars of economic loss in the world every year, and this includes $200 million of loss annually in the United States alone (Tyler, 2007).

Taxonomy

Oomycetes are in the protist group called Heterokonta (Stramenopiles), while fungi are within the Opisthokonta, which includes Animalia and Fungi. The morphology of oomycetes is similar to that of fungi. In addition, oomycetes and fungi are similar to each other in the following physiological and pathological aspects:

1) They are osmotrophs (obtain nutrients by direct absorption);

2) They are mainly filamentous (grow by extending hyphae);

3) They reproduce by forming spores;

4) Although not universal, both true fungi and oomycetes include parasitic members

(Money, 1998).

On the other hand, the differences between them can be categorized into the following aspects:

1) Cell wall chemistry: the cell walls of oomycetes are composed mainly of glucans

(β-1,3 and β-1,6) and cellulose (reviewed in Erwin et al., 1983). On the contrary, true fungi cell wall primarily consists of chitin and/or chitosan (Judelson & Blanco, 2005; Lamour &

Kamoun, 2009; Tyler, 2007).

2) Hypha structure: in oomycetes, hyphae have no septa, but are coenocytic; in true fungi, many are unicellular, while multicellular hyphal cells are separated by septa, with each cell containing one or more nuclei (Judelson & Blanco, 2005; Lamour & Kamoun, 2009;

Tyler, 2007). 3

3) Motile asexual spore: motile asexual spores are very common in oomycetes and

usually are biflagellated zoospores; in true fungi, motile asexual spores are uncommon

(Judelson & Blanco, 2005; Lamour & Kamoun, 2009; Tyler, 2007).

4) Predominant asexual spore formation: in oomycetes, predominant asexual spores are produced from a sporangium, which is usually undesiccated and composed of a multi-nuclear single cell; in true fungi, predominant asexual spores usually form as desiccated conidia, which are either unicellular or multicelluar and each cell contains only a single haploid nucleus (Judelson & Blanco, 2005; Lamour & Kamoun, 2009; Tyler, 2007).

5) Sexual spores: oomycetes only use oospores as sexual spores, while true fungi have

various types of sexual spores. Sexual spores of oomycetes develop on hyphal termini; in

fungi, sexual spores form within enclosing structures and in large quantities (Judelson &

Blanco, 2005; Lamour & Kamoun, 2009; Tyler, 2007).

6) Ploidy of hyphae: oomycete hyphae are mostly diploid, but the hyphae cells of true

fungi are either haploid, dikaryotic, or polynucleate (Judelson & Blanco, 2005; Lamour &

Kamoun, 2009; Tyler, 2007).

7) Biochemistry: oomycete spores use mycolaminarin and lipids as energy reserves

while true fungi spores use glycogen and trehalose; toxic secondary metabolites are very

common in fungi, but have not been reported in oomycetes; fungi use peptides as mating

hormones, but oomycetes may use lipids; oomycetes are usually without pigments, by

contrast, pigments are very common in fungi (Judelson & Blanco, 2005; Lamour & Kamoun,

2009; Tyler, 2007). 4

The difference between fungi and oomycetes also lies on their taxonomic positions.

Together with diatoms and brown algae, oomycetes belong to Stramenopiles (Tyler, 2007).

By contrast, fungi belong to Opisthokonta (an unranked taxonomic unit) (Lee, Ristaino, &

Heitman, 2012), close to animals (Judelson & Blanco, 2005; Lamour & Kamoun, 2009; Tyler,

2007). These differences can tell us that oomycetes are not fungi at all despite the similarities

on their morphology and life style (Figure 1.1).

Figure 1.1. Eukaryotic tree of life. Oomycetes (red arrow) are phylogenetically distant from fungi (light blue arrow). This figure was adapted from Baldauf (2003) and Lee, Ristaino, and Heitman (2012).

Until now, one of the existing hypotheses about the origin of oomycetes views that

oomycetes, as well as other stramenopiles, such as brown algae are the products of symbiosis 5 event between eukaryotes. According to this hypothesis, heterokonts originated when a heterotrophic eukaryotic ancestor engulfed a red algae (Rogers, 2012).

Life Cycle

Just like other oomycetes and many fungi, P. sojae uses hyphae to absorb nutrients

(i.e., osmotrophy) from its host (usually a soybean plant) or environment. Oomycetes appear to have gained osmotrophy via horizontal transfer of genes from fungi. Phytophthora species reproduce both sexually and asexually. For asexual reproduction, P. sojae hyphae produce sporangia. The sporangium has two destinies: either it germinates directly to form hyphae or it forms many zoospores inside. Zoospores are asexual spores with two flagella, and they can move freely within water or wet soil. Flooding or irrigation will induce P. sojae to produce zoospores. The life span of zoospores is very short. Once they find the root of a soybean plant, they attach themselves on the root surface and form a cyst. At this point, zoospores lose their flagella and start developing hyphae which penetrate into host root system to absorb nutrients.

In harsh environments, P. sojae produces chlamydospores instead of zoospores.

Chlamydospores have very thick cell walls and can withstand harsh environments (Tyler,

2007).

Phytophthora sojae utilizes oospores for sexual reproduction. It and other oomycetes have two sexual organs: antheridia (male reproductive organs) and oogonia (female reproductive organs). Those are the places for male and female meiosis, respectively, where haploid gametes are produced. During fertilization, the oogonium and antheridium will fuse with each other, then the haploid nucleus from antheridium will enter the oogonium and fertilization occurs, followed by the production of many oospores. Oospores lack flagella but 6

have thick cell walls. Compared with zoospores, the life span of oospores is longer. Under

favorable environments, oospores can germinate to form functional hyphae (Tyler, 2007).

II Gene Regulation

Gene regulation is one of the most important processes in living organisms. In

eukaryotes, regulation of gene expression can occur at different levels, including

chromatin-level gene regulation, transcriptional regulation, post-transcriptional regulation,

translational regulation and post-translational regulation. Among them, transcriptional

regulation is the key process of overall gene expression, as transcription is the bridge between

genes and proteins. Transcription is regulated by the interactions between trans-factors

(transcription factors) and cis-elements (enhancer/silencer sequences on DNA). Transcription

is simpler in prokaryotes: in some cases only RNA polymerase is sufficient for initiating

transcription, while in other cases a few activator proteins are needed. However, eukaryotic

transcriptional initiation process is more complex and requires numerous transcription factors.

Eukaryotic transcription factors can be divided into two main categories:

1) General Transcription Factors (GTFs) - These transcription factors are essential to

help initiate the transcription by binding with a TATA box in the promoter region of

protein-coding genes and forming the basal transcription machinery with RNA polymerase II and mediator proteins. GTFs are common for the transcription of all class II genes in eukaryotes and they are highly conservative. There are six GTFs: TFIIA, TFIIB, TFIID,

TFIIE, TFIIF, and TFIIH.

2) Specific Transcription Factors - These transcription factors (TFs) are specific for a single or a group of special gene targets. They recognize and bind to specific DNA sequence 7 motifs and create a complex that allows RNA polymerase to begin RNA synthesis. Since TFs control the rate of mRNA synthesis, they are very crucial for the regulation of gene transcription in eukaryotes. According to their effects on transcription, they can be divided into activators and repressors. Activators activate the transcription of target genes by binding to DNA sequences called enhancers while repressors inhibit the transcription of target genes by binding to enhancers, silencers, or by directly interacting with activators to suppress their functions (Krebs, Goldstein, & Kilpatrick, 2011).

Although very diverse, transcription factors share a ubiquitous feature, and that is they all have DNA-binding domains (DBD). Some transcription factors have trans-activating domains (TAD). DBDs bind to promoter or enhancer sites on DNA, while TADs bind with necessary proteins to initiate transcription. Transcription factors can be divided into several main families based on the structure and shape of their DNA-binding domains:

(1) Leucine Zippers: every seventh amino acid residue of TFs in this group is a leucine.

Two such polypeptides bind together with the hydrophobic interactions between leucine residues which are located in two polypeptides and are shaped like a zipper, so leucine zippers always appear as dimers (Krebs et al., 2011).

(2) Helix-turn-helix (HTH): TFs in this group are composed of two helixes. One lies in a major groove of DNA double helix, and another one also associates within the major groove, both connected by a section of the protein that allows the two to shift relative to one another

(i.e., the turn) (Krebs et al., 2011).

(3) Zinc-finger: it is composed of a zinc-binding site and a loop, which has 23 amino acid residues (Krebs et al., 2011). 8

(4) Basic Helix-loop-helix (bHLH): this domain is composed of two regions, an

N-terminal basic region and a C-terminal helix-loop-helix (Peng, Shan, Kuang, Lu, & Chen,

2013). Every helix of this structure is amphipathic, i.e. one side is hydrophobic, while another

side is hydrophilic. And there is a connecting loop between the two α-helixes. The length of the loop is 12 to 28 amino acids. TFs with this domain usually form homodimers or heterodimers, and the basic regions of these domains are responsible for binding with DNA

(Krebs et al., 2011). The basic region recognizes and binds with an E-box (5'-CANNTG-3') of the target gene (Peng et al., 2013).

III Gene Regulation in Oomycetes

Although the genome of P. sojae was completed and published (Tyler et al, 2006), oomycete gene regulation is understudied (Seidl, Wang, Ackerveken, Govers, & Snel, 2012).

The knowledge of promoter motifs and transcription factors associated with development is essential to fully understand an organism (Xiang, Kim, Roy, & Judelson, 2009).

Changes of Transcription Levels in Oomycetes

Genes have different expression levels at different stages of a life cycle and under

different living conditions. cDNA macroarray analysis found that this is also true for

oomycetes (Kim & Judelson, 2003). Through the analysis, Kim & Judelson (2003) identified

a set of genes whose transcriptional levels change during the stages of the formation of

sporangia, hyphae and cyst. They also found that some of those genes were activated during

starvation or in a mutant non-sporulating strain. They predicted that the products of those

genes include possible regulators, enzymes, and transporters, and those regulators include

TFs, while the enzymes were mainly dehydrogenases. The same study showed that 9

dehydrogenase-encoding genes were mainly expressed during sporulation and stress

responses. An appealing point in their study is that they found a creatine kinase-like enzyme

in oomycetes, because previously creatine kinase was found only in animals and

trypanosomes (Kim & Judelson, 2003).

Wang et al. (2009) studied the genes involve in asexual sporulation by developing

mutant P. sojae strains (using UV irradiation) that cannot produce oospores. Then, they used

suppression subtractive hybridization to compare gene expressions of the mutant strain and

the normal strain during their asexual sporogenesis period. Their results indicated that 39

putative genes were expressed in high levels in the normal strain, and they predicted that

those gene products function in processes such as, metabolism, cell cycle control, signaling, cell defense, protein biosynthesis, and regulation of transcription. They also reported that the following six proteins were related to the formation of oospores: developmental protein

DG1037, glycoside hydrolase, hypothetical protein UB145, FAD-dependent pyridine nucleotide-disulphide oxidoreductase, phosphatidylinositol-4-phosphate 5-kinase, and a sugar transporter (Wang et al., 2009).

Gene Silencing in Oomycetes

Gene silencing methods are very conducive and essential for understanding the

mechanisms of transcription regulation. Although several gene-silencing methods for

oomycetes have been developed, different methods resulted in different efficiencies

(Ah-Fong, Bormann-Chung, & Judelson, 2008). Gene silencing takes place at both

transcriptional and post-transcriptional levels (Ah-Fong et al., 2008). Transcriptional

silencing can be achieved by transferring a copy of the target gene into the target 10 cell/organism, and this may result in changes, such as chromatin remodeling that affects

transcription. On the contrary, post-transcriptional silencing is achieved by introducing a

double-stranded RNA homologous of the target gene into the target cell so that the transcripts

of the target gene will be degraded by the RNA-induced silencing complex (RISC) (Ah-Fong

et al., 2008). However, recent studies showed that there is no distinct boundary between

transcriptional and post-transcriptional silencing, and double-stranded RNAs used for

post-transcriptional silencing can also trigger transcriptional silencing (Ah-Fong et al., 2008).

Gene silencing in some oomycete species has been reported. For example, van West,

Kamoun, van ’t Klooster, and Govers (1999) silenced the P. infestans inf1 gene by introducing inf1 sense, antisense and promoter-less sequences into P. infestans (van West et al., 1999). Ah-Fong et al. (2008) transferred sense, antisense, and hairpin sequences into P. infestans and analyzed the efficiency in silencing the inf1 elicitin gene of P. infestans

(Ah-Fong et al., 2008), and reported that a hairpin sequence was most effective in silencing

the target gene. They also detected small RNAs that were 21 nucleotides in length and

homologous to inf1 from the partially-silenced P. infestans strains and according to their

opinion this might be the manifestation of RNAi-like silencing of the target gene.

RNAi has three pathways: miRNA, siRNA and piRNA (Wilson & Doudna, 2013).

miRNAs and siRNAs silence the expression of the target gene by interfering with the target

mRNA through RISC complexes (in cytoplasm) (Wilson & Doudna, 2013); in addition,

siRNAs inhibit the transcription of the target gene through RITS (RNA-induced initiation of

transcriptional silencing) complexes (inside nucleus) (Deng et al., 2015; Verdel et al., 2004).

In addition, it was found that silencing signals can be transmitted between the two nuclei in a 11

heterokaryon (in oomycetes) (van West et al., 1999). Artificial gene silencing also facilitates

the understanding of the role of RNA helicase in oomycetes. Walker & van West (2007)

silenced an RNA helicase-encoding gene using RNA interference, and showed that

RNA-helicase is essential for the normal formation of zoospores.

Regulatory Elements in Oomycetes

Specific DNA sequences play important roles in transcriptional regulation, and these

sequences include promoters, terminators and their specific motifs. The length of intergenic

regions in oomycetes is usually less than 500 bp, and this may imply that oomycete

promoters are not as complex as plant and metazoan promoters (Xiang et al., 2009). Judelson

et al. (1992), using transient assays with β-glucuronidase as a reporter, observed the activities of promoter and terminator sequences in three oomycete species (P. infestans, P. megasperma

f. sp. glycinea, and Achlya ambisexualis). Their results indicated that there was a vast

difference between oomycete and higher fungi transcriptional machineries. Roy, Poidevin,

Jiang, and Judelson (2013) analyzed the core promoters in oomycetes and showed that promoter regions of some oomycetes have initiator-like sequences and these sequences were

16-19 nt long, and flanked by FPR (flanking promoter region) sequences. Using expectation maximization they found sequences resembling INR (initiator), FPR, and a novel regulatory sequence called as DPEP (Downstream Promoter Element Peronosporales), but no TATA box was found (Roy et al., 2013). They also conducted mutagenesis analysis and showed that

DPEP was a core motif. Their genome-wide search also indicated that only a small portion of

P. infestans genes have INRs and/or FPRs, and the promoters without INR/FPR motifs possess pyrimidine-rich sequences in the regions close to transcription start sites (Roy et al., 12

2013). Their findings indicated the following correlations between the distribution of

INR/FPR motifs and the target gene functions:

1) Genes with combined INR+FPR motifs expressed at higher levels and they were related to infection and development;

2) Genes with DPEP and FPR motifs were expressed constitutively (Roy et al.,

2013).

The correlation between these motifs and development/infection indicates that oomycete promoter motifs not only participate in the initiation of general transcription, but also in the regulation of the expression of life-cycle-related genes (Roy et al., 2013). INR,

FPR, and combined INR+FPR motifs are also present in other oomycete species, such as P. sojae, Pythium ultimum, Hyaloperonospora arabidopsidis, and Saprolegnia parasitica, while

DPEP exists in all studied oomycetes, except S. parasitica (Roy et al., 2013). They estimated that stramenopiles other than oomycetes have INR motifs, but not FPR and DPEP motifs. The difference of core elements between oomycetes and animals/fungi/plants can explain why animal/fungal/plant core promoters cannot work well in oomycetes. Besides, TATA-like motifs were found in some oomycete genes, but there was not conclusive evidence about their function (Roy et al., 2013).

Seidl et al. (2012) predicted 19 conserved DNA motifs from the genome sequences of

three Phytophthora spp. (P. i nfestans, P. sojae, and P. ramorum) and in planta gene

expression data. Some of these DNA motifs were found on the regions upstream of

transporter/RXLR (a protein motif composed of four amino acid residues: Arg-X-Leu-Arg)

(Li, 2010) effector/transcriptional regulator-encoding genes. Specific elements near to the 13

INR or INR/FPR sequences of sporulation-related genes, Cdc14 and Pks1, have been

reported and showed that they were essential for the normal expression of those genes. A very

specific motif called a cold-box, which is only 7 nt-long, has been shown to mediate the

temperature-related expression of the genes specific to zoospore formation (Seidl et al., 2012).

In their study, 19 conserved motifs were predicted by comparing the upstream regions of

co-expressed genes in P. infestans with the upstream regions of their orthologs in two other

Phytophthora species (P. sojae and P. ramorum). They also predicted that some of those

motifs are upstream of effector or transcription regulator genes (Seidl et al., 2012).

Xiang et al. (2009) analyzed Pks1, a protein kinase in P. infestans using a Pks1

promoter fused with the β-glucuronidase reporter gene, and showed that the expression of the

gene occurs during the intermediate stage of sporulation (Xiang et al., 2009). They also

reported that transcription start sites of the gene were located within Inr-like and T-rich

regions. They identified a CCGTTG sequence as a major regulator of sporulation-specific

transcription and located 110-nt upstream of the transcription start site. The sequence was

also found in other sporulation-induced promoters (Xiang et al., 2009).

Transcription Factors in Oomycetes

Transcription factors are one of the key components of transcription regulation.

Alteration of TFs gives rise to morphological and physiological diversity of organisms

(Gamboa-Meléndez, Huerta, & Judelson, 2013). It has been estimated that Phytophthora spp.

genomes each contain approximately 700 transcription factors, and this number is similar to

that of fungi, such as Fusarium graminearum and Magnaporthe oryzae (Judelson, 2012).

Transcription factors can be divided into several families according to their domain structure 14 and conformation, and previous studies have identified some important members of those families in oomycetes. Among them, basic leucine zipper (bZIP) family transcription factors regulate development and stress response in eukaryotes (Gamboa-Meléndez et al., 2013). bZIP transcription factors exist exclusively in eukaryotes and they control physiological processes of animals, plants, and fungi (Ye, Wang, Dong, Tyler, & Wang, 2013). To date, the bZIP transcription factors in oomycetes are still not well-understood (Ye et al., 2013). The only bZIP whose function is known in oomycetes is Pibzp1 in P. infestans, and its function is associated with zoospore motility and plant infection, and it interacts with a protein kinase

(Ye et al., 2013).

Gamboa-Meléndez et al. (2013) computationally identified 38 bZIP-family transcription factors from P. infestans using PFAM, INTERPRO, BLASTP, and SMART. Half of the bZIP TFs have amino acids other than asparagine at the site corresponding to residue

235 of GCN4 in S. cerevisiae, while bZIP TFs in non-oomycete eukaryotes always have asparagine at the same site. Through interspeciﬁc comparisons they identified these amino

acid substitutions as specific for oomycetes (Gamboa-Meléndez et al., 2013). They also

observed that transcription levels of approximately two-thirds of P. infestans bZIP TFs changed dramatically in the life cycle of P. infestans, and the transcription of the majority of

those genes increased in zoospores, sporangia, or cysts formed by zoospores. Their findings

also indicated that the function of eight P. infestans bZIP TFs was to mediate defense of P.

infestans against peroxide damage (Gamboa-Meléndez et al., 2013). Additionally, Blanco &

Judelson (2005) showed that a member of the bZIP family TF, Pibzp1 is essential for the

swimming activity of zoospores and formation of appressoria in P. infestans. A study by Ye et 15 al. (2013) predicted potential bZIPs in several oomycetes, including P. sojae, two diatoms,

and two fungi. They hypothesized that the expansions of novel bZIPs in oomycete genomes

are the products of gene duplication. They also found that bZIP transcription levels changed

mainly during zoosporangia, zoospore, cyst and infection stages. Besides, many putative

bZIPs have novel DNA-binding domains (Ye et al., 2013). In addition, Pibzp1-silenced

strains (Pibzp1 is a bZIP transcription factor from P. infestans) have the following

characteristics:

1) High efficiency in cyst germination;

2) Low efficiency in pathogenicity because they cannot produce appressoria (Blanco

& Judelson, 2005; Walker & van West, 2007).

LIM is a zinc-binding protein motif which mediates protein-protein interactions

(Ravinder & Goyal, 2017). Tani, Kim, and Judelson (2005) identified seven genes potentially

encoding NIFs (nuclear LIM interactor-interacting factors) from P. infestans, and among

them, four were associated with sporulation, while three were associated with hypha. The promoter fusion constructs of these genes with a β-glucuronidase reporter gene showed specific spatial and temporal activity. In their previous study, they identified two novel transcription regulators whose expression increased significantly during sporulation and zoosporogenesis, and both of them were similar to nuclear LIM interactor-interacting factors.

Another important group of TFs are Myb-family TFs, and this group of TFs have Myb domains as DNA-binding domains, and Myb domains include THT domains as its structural parts (Ambawat, Sharma, Yadav, & Yadav, 2013). In P. infestans, Xiang and Judelson (2014) found that transcription levels of some Myb genes fluctuated with day-and-night cycle, and 16 showed a general trend of increase following the number of sporangia formed. On the contrary, the transcription of the Myb2R4-encoding gene does not follow this pattern. When overexpressed, Myb2R4 doubled the amount of sporulation and greatly increased the

transcription of Myb2R1. Using chromatin immunoprecipitation, they found that Myb2R4

binds with the promoter of Myb2R1. When they tried to silence all the eight Myb genes using

DNA-directed RNA interference, only one of them, Myb2R3 was silenced and resulted in reduced sporulation. They also found that seven of those genes had negative effects on vegetative growth when overexpressed, while Myb3R6 negatively affected the dormancy of sporangia. Their research showed that the effectiveness of silencing triggered by hairpin constructs was determined by its copy number, and induced abnormal expression is interrupted by epigenetic silencing and excision of transgene in P. infestans (Xiang &

Judelson, 2014).

Myb proteins are extremely diverse DNA-binding proteins and Myb TFs usually have two or three tandem arrays of Myb domains (Xiang & Judelson, 2010). In oomycetes, Myb

TF R2R3 proteins possess helices similar to c-Myb, while Myb TF R1R2R3 proteins have either c-Myb-like or novel sequences. They also found that the transcription levels of eight

R2R3 and R1R2R3 proteins increased at the sporulation stage and the expression of three

R2R3 and R1R2R3 proteins increased when zoospores were being released. The oomycete species, Hyaloperonospora arabidopsidis, which has less R2R3 and R1R2R3 genes than

Phytophthora, simply cannot produce zoospores. Their work showed that the expression of most R2R3 and R1R2R3 genes are specific for germination or sporulation.

Zhang et al. (2012) studied the role of P. sojae Myb-family TFs in the functioning of 17 protein kinase, PsSAK1. They sequenced the transcriptome of PsSAK1-silenced P. sojae strain during cyst stage and at 1.5h after infection, and they found that the transcription levels of several Myb-family TFs altered, including a R2R3 Myb TF, PsMYB1. Their results

showed that the transcription factor PsMYB1 works as downstream of PsSAK1 and essential

for the development of zoospores.

References

Ah-Fong, A. M. V., Bormann-Chung, C. A., & Judelson, H. S. (2008). Optimization of

transgene-mediated silencing in Phytophthora infestans and its association with

small-interfering RNAs. Fungal Genetics and Biology, 45(8), 1197-1205.

doi:10.1016/j.fgb.2008.05.009

Ahmed, Y., D'onghia, A. M., Ippolito, A., Shimy, H. E., Cirvilleri, G., & Yaseen, T.

(2012). Phytophthora nicotianae is the predominant Phytophthora species in citrus

nurseries in Egypt. Phytopathologia Mediterranea, 51(3), 519-527.

Ambawat, S., Sharma, P., Yadav, N., & Yadav, R. (2013). MYB transcription factor genes as

regulators for plant responses: An overview. Physiology and Molecular Biology of

Plants, 19(3), 307-321. doi:10.1007/s12298-013-0179-1

Baldauf, S. L. (2003). The deep roots of eukaryotes. Science, 300(5626), 1703-1706.

doi:10.1126/science.1085544

Blanco, F. A., & Judelson, H. S. (2005). A bZIP transcription factor

from Phytophthora interacts with a protein kinase and is required for zoospore

motility and plant infection. Molecular Microbiology, 56(3), 638-648.

doi:10.1111/j.1365-2958.2005.04575.x 18

Brasier, C. (2007). Phytophthora biodiversity: How many Phytophthora species are

there? Proceedings of the Fourth Meeting of the International Union of Forest

Research Organizations (IUFRO) Working Party S07.02.09: Phytophthoras in Forest

and Natural Ecosystems, Gen. Tech., 101-115.

Deng, X., Zhou, H., Zhang, G., Wang, W., Mao, L., Zhou, X., . . . Lu, H. (2015). Sgf73, a

subunit of SAGA complex, is required for the assembly of RITS complex in fission

yeast. Scientific Reports, 5(1), 14707. doi:10.1038/srep14707

Erwin, D., Bartnicki-Garcia, S., & Tsao, P. (1983). Phytophthora: Its biology, taxonomy,

ecology, and pathology. St. Paul, Minn: American Phytopathological Society.

Gamboa-Meléndez, H., Huerta, A. I., & Judelson, H. S. (2013). bZIP transcription factors in

the oomycete Phytophthora infestans with novel DNA-binding domains are involved

in defense against oxidative stress. Eukaryotic Cell, 12(10), 1403-1412.

doi:10.1128/EC.00141-13

Grote, D., Olmos, A., Kofoet, A., Tuset, J. J., Bertolini, E., & Cambra, M. (2002). Specific

and sensitive detection of Phytophthora nicotianae by simple and

nested-PCR. European Journal of Plant Pathology, 108(3), 197-207.

doi:1015139410793

Judelson, H. S. (2012). Dynamics and innovations within oomycete genomes: Insights into

biology, pathology, and evolution. Eukaryotic Cell, 11(11), 1304-1312.

doi:10.1128/EC.00155-12

Judelson, H. S., & Blanco, F. A. (2005). The spores of Phytophthora : Weapons of the plant

destroyer. Nature Reviews Microbiology, 3(1), 47-58. doi:10.1038/nrmicro1064 19

Judelson, H. S., Tyler, B. M., & Michelmore, R. W. (1992). Regulatory sequences for

expressing genes in oomycete fungi. Molecular & General Genetics : MGG, 234(1),

138-146. doi:10.1007/bf00272355

Kim, K. S., & Judelson, H. S. (2003). Sporangium-specific gene expression in the oomycete

phytopathogen Phytophthora infestans. Eukaryotic Cell, 2(6), 1376-1385.

doi:10.1128/EC.2.6.1376-1385.2003

Krebs, J. E., Goldstein, E. S., & Kilpatrick, S. T. (2011). Lewin's essential genes (3rd ed.).

Burlington, MA: Jones & Bartlett Learning.

Lamour, K., & Kamoun, S. (2009). Oomycete genetics and genomics: diversity, interactions

and research tools. Hoboken, NJ: Wiley-Blackwell.

Lee, S. C., Ristaino, J. B., & Heitman, J. (2012). Parallels in intercellular communication in

oomycete and fungal pathogens of plants and humans. PLoS Pathogens, 8(12),

e1003028. doi:10.1371/journal.ppat.1003028

Li, S. (2010). Characterization of soybean GmPUB1 proteins that interact with the

Phytophthora sojae effector Avr1b protein (Master's thesis, Iowa State University).

Money, N. P. (1998). Why oomycetes have not stopped being fungi. Mycological

Research, 102(6), 767-768. doi:10.1017/S095375629700556X

Peng, H., Shan, W., Kuang, J., Lu, W., & Chen, J. (2013). Molecular characterization of

cold-responsive basic helix-loop-helix transcription factors MabHLHs that interact

with MaICE1 in banana fruit. Planta, 238(5), 937-953.

doi:10.1007/s00425-013-1944-7

Ravinder, R., & Goyal, N. (2017). Cloning, characterization and subcellular localization of 20

nuclear LIM interactor interacting factor gene from leishmania donovani. Gene, 611,

1-8. doi:10.1016/j.gene.2017.02.007

Rogers, S. O. (2012). Integrated molecular evolution (1st ed.). Boca Raton, FL: CRC Press.

Roy, S., Poidevin, L., Jiang, T., & Judelson, H. S. (2013). Novel core promoter elements in

the oomycete pathogen Phytophthora infestans and their influence on expression

detected by genome-wide analysis. BMC Genomics, 14(1), 106.

doi:10.1186/1471-2164-14-106

Seidl, M. F., Wang, R. P., Van den Ackerveken, G., Govers, F., & Snel, B. (2012).

Bioinformatic inference of specific and general transcription factor binding sites in the

plant pathogen Phytophthora infestans. PLoS One, 7(12), e51295.

doi:10.1371/journal.pone.0051295

Tani, S., Kim, K., & Judelson, H. (2005). A cluster of NIF transcriptional regulators with

divergent patterns of spore-specific expression in Phytophthora infestans. Fungal

Genetics and Biology, 42(1), 42-50. doi:10.1016/j.fgb.2004.09.005

Tyler, B. M. (2007). Phytophthora sojae: Root rot pathogen of soybean and model

oomycete. Molecular Plant Pathology, 8(1), 1-8.

doi:10.1111/j.1364-3703.2006.00373.x

Tyler, B. M., Tripathy, S., Zhang, X., Dehal, P., Jiang, R. H. Y., Aerts, A., . . . Boore, J. L.

(2006). Phytophthora genome sequences uncover evolutionary origins and

mechanisms of pathogenesis. Science, 313(5791), 1261-1266.

doi:10.1126/science.1128796 van West, P., Kamoun, S., van ’t Klooster, J. W., & Govers, F. (1999). Internuclear gene 21

silencing in Phytophthora infestans. Molecular Cell, 3(3), 339-348.

doi:10.1016/S1097-2765(00)80461-X

Verdel, A., Jia, S., Gerber, S., Sugiyama, T., Gygi, S., Grewal, S. I. S., & Moazed, D. (2004).

RNAi-mediated targeting of heterochromatin by the RITS

complex. Science, 303(5658), 672-676. doi:10.1126/science.1093686

Walker, C. A., & van West, P. (2007). Zoospore development in the oomycetes. Fungal

Biology Reviews, 21(1), 10-18. doi:10.1016/j.fbr.2007.02.001

Wang, Z., Wang, Z., Shen, J., Wang, G., Zhu, X., & Lu, H. (2009). Identification

of Phytophthora sojae genes involved in asexual sporogenesis. Journal of

Genetics, 88(2), 141-148. doi:10.1007/s12041-009-0021-2

Wilson, R. C., & Doudna, J. A. (2013). Molecular mechanisms of RNA interference. Annual

Review of Biophysics, 42(1), 217-239. doi:10.1146/annurev-biophys-083012-130404

Xiang, Q., & Judelson, H. S. (2010). Myb transcription factors in the

oomycete Phytophthora with novel diversified DNA-binding domains and

developmental stage-specific expression. Gene, 453(1), 1-8.

doi:10.1016/j.gene.2009.12.006

Xiang, Q., & Judelson, H. S. (2014). Myb transcription factors and light regulate sporulation

in the oomycete Phytophthora infestans. PLoS One, 9(4), e92086.

doi:10.1371/journal.pone.0092086

Xiang, Q., Kim, K. S., Roy, S., & Judelson, H. S. (2009). A motif within a complex promoter

from the oomycete Phytophthora infestans determines transcription during an

intermediate stage of sporulation. Fungal Genetics and Biology, 46(5), 400-409. 22

doi:10.1016/j.fgb.2009.02.006

Ye, W., Wang, Y., Dong, S., Tyler, B. M., & Wang, Y. (2013). Phylogenetic and

transcriptional analysis of an expanded bZIP transcription factor family

in Phytophthora sojae. BMC Genomics, 14(1), 839. doi:10.1186/1471-2164-14-839

Yoshida, K., Schuenemann, V. J., Cano, L. M., Pais, M., Mishra, B., Sharma, R., . . .

Burbano, H. A. (2013). The rise and fall of the Phytophthora infestans lineage that

triggered the irish potato famine. eLife, 2, e00731. doi:10.7554/eLife.00731

Zhang, M., Lu, J., Tao, K., Ye, W., Li, A., Liu, X., . . . Wang, Y. (2012). A Myb transcription

factor of Phytophthora sojae, regulated by MAP kinase PsSAK1, is required for

zoospore development. PLoS One, 7(6), e40246. doi:10.1371/journal.pone.0040246 23

CHAPTER 2. YEAST TWO-HYBRID ANALYSIS

Introduction

I Yeast Two-hybrid Assay (Y2H)

The majority of living activities are manifested by protein-protein interactions. Several methods have been developed for identifying potential interactive protein pairs or complexes.

These include affinity chromatography, coimmunoprecipitation (Phizicky & Fields, 1995), microscale thermophoresis (Wienken, Baaske, Rothbauer, Braun, & Duhr, 2010), bimolecular fluorescence complementation (Ohad & Yalovsky, 2010) and yeast two-hybrid (Y2H) analysis (Coates & Hall, 2003). Among them, Y2H is widely used because of its effectiveness, accuracy and that it does not require special and expensive instruments.

The principle of Y2H is based on the very nature and function of transcription factors.

In general, a transcription factor would have a DNA-binding domain (DBD) and a trans-activating domain (TAD). While the DBD is responsible for recognizing and binding to the regulatory sites of the target gene, the TAD binds with other proteins to facilitate transcription initiation. However, to actually initiate transcription, a DBD and a trans-activating domain (TAD) should come into close proximity to activate the basal transcription machinery. To investigate the interaction of two proteins such as A and B, two fusion proteins are constructed: protein A is fused with DBD, and is called a "bait"; and protein B is fused with TAD, and is called a "prey". If A and B are interactive proteins, they bind with each other, and this binding physically brings the DBD and the TAD into close proximity, so that the transcription of the downstream gene known as a reporter gene is activated (Figure 2.1). 24

Figure 2.1. The two-hybrid principle. Matchmaker® Gold Yeast Two-Hybrid

USA, Inc.). In this system, one protein is fused with DNA-binding domain of GAL4 (orange, marked as “GAL4 DNA-BD”), and this protein under question is called “bait”. Another protein is fused with trans-activating domain of GAL4 (green, marked as “GAL4 AD”), and this protein is called “prey”. If the bait and prey proteins interact, GAL4 BD and GAL4 AD come into close proximity, the transcription of the reporter genes occurs. This figure was adapted from Matchmaker® Gold Yeast Two-Hybrid System User Manual, Protocol No.:

PT4084-1 (Clontech Laboratories, Inc., Mountain View, CA).

The viability of the Y2H system has been well-established in many studies. For example, Peng et al. (2013) utilized Y2H to successfully show the interactions of five

MabHLHs that form heterodimers as well as their interactions with MaICE1 in banana fruit.

Rajagopala et al. (2012) tested the feasibility of Y2H by analyzing the known structures of the E. coli DNA polymerase III complex, MntR complex, using a Y2H assay as well as by

reviewing the literature of previously conducted Y2H assays on the protein complexes,

Varicella Zoster Virus ribonucleotide reductase, bacteriophage λ, yeast proteasome, and

human spliceosome. Their analysis showed that Y2H is suitable for analyzing interaction

networks of subunits within a protein complex. In P. sojae, H. Brar and M. K. Bhattacharyya

used Y2H to identify several proteins from soybean that potentially interact with Avr1b, one

of the well-studied effector proteins from P. sojae (unpublished, reviewed in Li, 2010). Li

(2010) used Y2H to further confirm that some conserved residues in the Avr1b protein were

needed for the interactions between Avr1b and soybean GmPUB1s. This study demonstrated

that Y2H analysis is suitable for studying the interactions of P. sojae proteins. In addition,

Cheng et al. (2018) studied the molecular mechanism of GmPIB1, a soybean bHLH

transcription factor that expressed in resistant soybean against P. sojae infection. Using Y2H,

they showed that GmPIB1 can form homodimers. Taken together, it could be seen that Y2H is

suitable for studies of TFs including TFs that form homodimers. Naveed, Bibi and Ali (2019)

using Y2H identified the potential interactive targets of P. infestans RXLR effector, PiAvrblb2

in tomato.

II Phylogenetic Analysis

Phylogenetic analysis plays an important role in biological studies of the evolutionary 26 relationships of organisms, genes, and proteins. Through phylogenetic analysis, evolutionarily connections can be made within and between species using similarities between nucleotide and amino acid sequences. This is especially useful when studying conserved domains of proteins. Comparative genomics can help identify conserved domains/motifs in proteins under study through comparison and alignment of/with known protein sequences. For example, Yan et al. (2013) studied the phylogeny of ERF proteins in

sorghum. In their study, they determined the hidden Markov model (HMM) of conserved AP2

domain by analyzing Arabidopsis AP2/ERF proteins via PFAM database

(http://pfam.janelia.org/search/sequence), then they used this HMM to find AP2/ERF proteins

in sorghum genome. Additional study conducted by Martinez-Duncker et al. (2003) using a

neighbor-joining phylogentic analysis of four groups of enzymes (POFUT1, POFUT2, α2-

and α6-FUTs) concluded that the four groups are, in fact, four distinct enzyme families, i.e.

the members within each group are closer to each other than to the members of other groups.

Furthermore, they were able to infer the approximate time of the divergence events of the

enzymes relative to the evolution and emergence of the host species from the phylogenetic

tree.

Hypotheses and Aims

In this chapter, I hypothesized that a potential TF, Ps1365, interacts with one or more

proteins in P. sojae. This study has three aims:

1) Bioinformatic and phylogenetic analysis of the bait sequence, Ps1365.

2) Cloning of the bait sequence into yeast strain Saccharomyces cerevisiae

Y2HGold to construct bait yeast strain. 27

3) Identify interactive proteins of Ps1365 via Y2H.

Materials and Methods

I Biological Materials

Phytophthora sojae: P. sojae strain P6497 was grown using V8 agar plates. To prepare

1 L of V8 agar media, 200 mL V8 vegetable juice was added to 2.5 g CaCO3, and adjusted to

1 L by adding deionized water (dH2O). The pH was adjusted to 6-7 by using one or more

drops of 10 M NaOH. Then, 15 g of Bacto agar was added and stirred until homogenized, then autoclaved at 121°C for 30 min. The autoclaved media was poured into sterile petri

dishes. An autoclaved polycarbonate membrane (Sigma-Aldrich, St. Louis, MO) was

carefully placed onto each newly-made V8 agar plate using sterile forceps. To propagate P.

sojae mycelium, a plug with well-grown P. sojae mycelia was removed from an old V8 agar

plate using a cork borer (the cork borer penetrated through the V8 agar media until it hit the

bottom of the petri dish). Then the plug with mycelia on top was placed onto the center of the

polycarbonate membrane on the newly-made V8 agar plate using a sterile forceps or a sterile

needle, then the new V8 agar plate was incubated at room temperature (about 18-22°C) for

5-7 days. The mycelia were scraped off from the polycarbonate membrane surface using a razor blade and placed into sterile screw-capped 2.0 ml microfuge tubes and flash-frozen in liquid nitrogen, then immediately stored at -80°C until use.

Escherichia coli: One Shot® Mach1™-T1R and One Shot® TOP10 Chemically

Competent E. coli cells (Invitrogen, Carlsbad, CA) were used to propagate the yeast plasmids pGBKT7 and pGADT7-Rec. Cells were cultured on LB agar plates and in LB broth with 50

μg/ml kanamycin or 100 μg/ml ampicillin for selection. To prepare 500 mL of LB broth, 500 28

mL dH2O was added to 12.5 g LB ready-made powder (Lab Express International Inc.,

Fairfield, NJ), then stirred until the LB ready-made powder is completely dispersed. The pH

was adjusted to 7.0, if needed. Then, the media was autoclaved at 121°C for 30 min. LB agar

was prepared similarly, except 7.5 g of Bacto agar was added, followed by autoclaving. It

was then poured into sterile petri dishes when cooled down to about 55°C. To make LB broth

with kanamycin (50 μg/mL) or LB broth with ampicillin (100 μg/mL), one volume of 50

mg/mL kanamycin stock solution or 100 mg/mL ampicillin stock solution were added to

1000 volumes of LB broth when the LB broth was at room temperature, then the container

included LB broth was vigorously shaken to mix well. LB agar kanamycin (50 μg/mL) or LB

agar ampicillin (100 μg/mL) media was made by adding one volume of 50 mg/mL kanamycin

stock solution or 100 mg/mL ampicillin stock solution to 1000 volumes of LB agar when the

LB agar cooled down to about 55°C, then the container included LB agar was vigorously

shaken to mix well. The achieved media including antibiotics were immediately poured into

sterile petri dishes. The E. coli cells were cultured on LB agar plates at 37°C and incubated

overnight; E. coli cells were cultured in LB broth with 225 rpm shaking at 37°C overnight.

Saccharomyces cerevisiae: two yeast strains, S. cerevisiae Y2HGold and Y187 were

used. Both of them were grown on YPDA agar plates if they did not contain any vector.

Y2HGold with bait vector was grown on SD/-Trp single dropout (SDO) agar plates, while

Y187 with prey constructs was grown on SD/-Leu agar plates for selection. The diploid yeast

cells which were formed by the budding and mating of the bait and prey yeast strains were

grown on SD/–Leu/–Trp double dropout media (DDO). X-α-Gal and Aureobasidin A (AbA)

were used for observing the activity of reporter genes. Ready-made media powder pouches 29 from Clontech were used for the preparation of all the above YPDA and SD media (including all the nutrient-dropout media, e.g. SDO, DDO). Their broth media were prepared by dissolving a pouch of corresponding ready-made broth media powder from Clontech in 500 mL of dH2O, pH was adjusted to near 5.8 by adding 10 M NaOH, then autoclaved at 121°C

for 15 min. Their agar media were prepared either through two different procedures: (1) using

a pouch of corresponding ready-made broth media powder from Clontech. The procedure was

exactly same as that of the preparation of the broth media, except 10 g of Bacto agar was

added to each 500 mL of the broth media, followed by autoclaving; (2) directly using ready-made agar media powder from Clontech. The procedure was exactly same as that of the preparation of the broth media, except it was simply homogenized before autoclaving. The components of YPDA agar media were as follows:

peptone 20 g/L yeast extract 10 g/L agar 20 g/L adenine 120 mg/L The components of SD agar (includes SDO and DDO) media were as follows:

yeast nitrogen base 6.7 g/L

agar 20 g/L

amino acids see below “amino acid compositions of SD

media”

dextrose 2%

The amino acid compositions of SD media are as follows:

L-Adenine hemisulfate salt 20 mg/L 30

L-Arginine HCl 20 mg/L

L-Histidine HCl monohydrate 20 mg/L

L-Isoleucine 30 mg/L

L-Leucine 100 mg/L

L-Lysine HCl 30 mg/L

L-Methionine 20 mg/L

L-Phenylalanine 50 mg/L

L-Threonine 200 mg/L

L-Tryptophan 20 mg/L

L-Tyrosine 30 mg/L

L-Uracil 20 mg/L

L-Va line 150 mg/L

SDO and DDO lack the corresponding amino acids. For example, SD/-Trp media lacks L-Tryptophan on its amino acid composition.

X-α-Gal stock solution was prepared by dissolving 250 mg of X-α-Gal in 12.5 mL of dimethylformamide (DMF) solution, and stored at -20°C. AbA stock solution was prepared by dissolving 1 mg of AbA in 2 mL of ethanol and stored at 4°C. To make yeast media including X-α-Gal, 1 mL of X-α-Gal stock solution (20 mg/mL) was added to 500 mL yeast media (when the yeast media cooled to 55°C after autoclaving) and thoroughly mixed by shaking the container (containing yeast media). To make yeast media including AbA, 200 µL of AbA stock solution (500 µg/mL) was added to 500 mL of yeast media (when the yeast media cooled to 55°C after autoclaving) and thoroughly mixed by shaking the container 31

(containing yeast media).

II Bioinformatic Analysis of Ps1365

To obtain full-length amino acid and nucleotide sequences of Ps1365, its partial amino acid sequence was used as a query to search against P. sojae nucleotide and protein databases at NCBI (National Center for Biotechnology Information) (https://www.ncbi.nlm.nih.gov),

JGI (U.S. Department of Energy Joint Genome Institute) (https://jgi.doe.gov) and FungiDB

(Stajich et al., 2012) (https://fungidb.org/fungidb/) using BLAST tools. The NCBI BLAST

parameters were: database was selected as “Non-redundant protein sequences (nr)”, algorithm

was “blastp (protein-protein BLAST)”. The JGI BLAST parameters were: the alignment

program was set as blastp (blast protein vs. protein), and the database was selected as

"Phytophthora sojae v3.0 filtered model proteins", all the other parameters were set as

default. The FungiDB BLAST parameters were: “Target Data Type” was set as “Proteins”;

“BLAST Program” was set as “blastp”; “Target Organism” was set as “Phytophthora sojae

strain P6497”; all the other parameters were set as default.

The amino acid sequences of the 30 homologous proteins of Ps1365 from P. sojae

were provided by Brian Rutter. The eight homologous protein sequences of Ps1365 from

other oomycetes were retrieved from FungiDB. Proteins with abnormal insertions or

deletions indicating that they are pseudogenes were removed. The alignment of Ps1365 with

its homologous P. sojae proteins and with other related oomycete proteins as well as the

construction of their phylogenetic tree were performed using MEGA 7.0.26 (Kumar, Stecher,

& Tamura, 2016). Two different methods, neighbor-joining (NJ) and maximum likelihood

(ML) were used for constructing the nucleic acid and protein phylogenetic trees. The 32 parameters in Figure 2.2a were used for constructing nucleic acid NJ tree; the parameters in

Figure 2.2b were used for constructing nucleic acid ML tree; the parameters in Figure 2.2c were used for constructing protein NJ tree; the parameters in Figure 2.2d were used for

constructing protein ML tree. 33 a

Figure 2.2. Parameters used for constructing the phylogenetic tree of Ps1365, its 21 homologous proteins in P. sojae as well as some of their homologous counterparts in other oomycetes. The software MEGA 7.0.26 was used for the tree construction. a) nucleic acid NJ tree parameters: b) nucleic acid ML tree parameters; c) protein NJ tree parameters; d) protein

ML tree parameters.

Kyte-Doolittle Hydropathy Plot (Kyte & Doolittle, 1982) was used for analyzing the hydrophobicity of Ps1365, with window size set to 9. Signal peptide was predicted using

SignalP - Signal Peptide Prediction (Petersen, Brunak, von Heijne, & Nielsen, 2011),

organism group was selected as “Eukaryotes”, “D-cutoff values” was set as default, “Method” was set as “Input sequences may include TM regions”. Ps1365 were analyzed for potential

conserved domains using Pfam (Finn et al., 2016). The secondary structure of Ps1365 was

predicted using JPred 4 (Drozdetskiy, Cole, Procter, & Barton, 2015) using default

parameters. The subcellular localization of Ps1365 was predicted by CELLO (Yu, Lin, &

Hwang, 2004; Yu, Chen, Lu, & Hwang, 2006), “Eukaryotes” was selected for

“ORGANISMS”, and “Protein” was selected for “SEQUENCES”.

III Y2H Bait Construction

DNA extraction was performed using frozen P. sojae mycelia ground with a mortar

and pestle in liquid nitrogen. The achieved P. sojae mycelium dry powder was immediately

used for DNA extraction through Qiagen DNeasy Plant Mini Kit (Qiagen, Redwood City, CA)

according to the manufacturer’s protocol.

Cloning of Ps1365 bait gene into E. coli: the Y2H bait was constructed using

Matchmaker Gold Yeast Two-Hybrid System according to its manual (protocol No.:

PT4084-1, Clontech Laboratories, Inc., Mountain View, CA). PCR amplification of

Ps1365-encoding gene was performed using gene-specific forward and reverse primers

(Table 1) and genomic DNA of P. sojae as template through the following PCR program:

initialization 95°C 2 min

denaturation 95°C 30 sec

30 cycles annealing 67°C 30 sec

extension 72°C 1 min

final elongation 72°C 5 min

The PCR products were purified using QIAGEN Mini Elute PCR Purification Kit

(QIAGEN, Redwood City, CA) according to its manual. The process was, at first, 5 volumes of Buffer PB was added to the PCR reaction, at that time the color of the achieved mixture was yellow, indicating its pH was ≤ 7.5. PCR products were filtered using MinElute columns by centrifugation at 13,000 rpm at room temperature for 1 min. Then the DNAs on the

MinElute column membrane were washed using 750 μL of Buffer PE by centrifugation at

13,000 rpm at room temperature for 1 min, and centrifugation was conducted once again at

13,300 rpm at room temperature for 1 min. Finally, the DNAs were eluted using 30 μL dH2O by centrifugation at 13,000 rpm for 1 min. The purified PCR product was sequenced, using

Ps1365-encoding gene forward primer, to confirm the gene sequence and its open reading frame (DNA Analysis, LLC, Cincinnati, OH).

Clontech Laboratories, Inc. (n/k/a Takara Bio USA, Inc.). This plasmid was used as bait. This figure was adapted from Matchmaker® Gold Yeast Two-Hybrid System User Manual,

Protocol No.: PT4084-1 (Clontech Laboratories, Inc., Mountain View, CA).

Table 2.1. Primers in this research

Primer Sequence (5’-3’) PCR

annealing

temperature

(°C)

Ps1365F CATGGAGGCCGAATTCATGAATGTGCGCGGTAGAACGCGT 67

Ps1365 GCAGGTCGACGGATCCTCACTCTTCTTCCTTGTCACC CGG 67

SMART AAGCAGTGGTATCAACGCAGAGTGGCCATTATGGCCGGG 70.8

III Oligo

CDS III ATTCTAGAGGCCGAGGCGGCCGACATG-d(T)30VN 42

CDS ATTCTAGAGGCCGAGGCGGCCGACATG-NNNNNN 42

III/6

5’ PCR TTCCACCCAAGCAGTGGTATCAACGCAGAGTGG 68

Primer

3’ PCR GTATCGATGCCCACCCTCTAGAGGCCGAGGCGGCCGACA 68

Primer

After sequencing, the full-length sequence of Ps1365-encoding gene was cloned into

the bait vector pGBKT7 (BD) (Figure 2.3) by the following process: because both the gene-specific forward and reverse primers (used for cloning the Ps1365 gene) have overhangs homologous, respectively, to the two flanking sequences of the multiple cloning site of the yeast vector pGBKT7 (BD), the amplified sequence was inserted into the multiple cloning site of the vector pGBKT7 (BD) by homologous recombination. To perform homologous recombination, the circular pGBKT7 vector was double digested using EcoRI and BamHI at

37°C for one hour, then incubated at 65°C for 20 min. Then the double-digested vector was purified through ethanol precipitation (Lamitina Lab, 2007).

The ethanol precipitation was conducted as follows:

1) 1/10 volumes of sodium acetate (pH 5.2) was added to the digested vector.

2) Mixed by gently pipetting.

3) 2.5 volumes (vector + sodium acetate) of cold ethanol (stored at -20°C) was added to the reaction.

4) Mixed by gently pipetting.

5) Stored at -20°C for three hours.

6) Centrifuged at 16,300 g (max) for 15 min.

7) Carefully decanted supernatants.

8) Added 1 mL 70% ethanol. Centrifuged at 16,300 g for 2 min, residual ethanol was carefully decanted.

9) Centrifuged once again at 16,300 g for 2 min, residual ethanol was carefully decanted. 40

10) DNA pellets were resuspended in 15 μL dH2O.

The purified double-digested pGBKT7 vector and Ps1365 gene insert were attached to

each other via homologous recombination using an In-Fusion HD Cloning Kit (Clontech

Laboratories, Inc., Mountain View, CA) according to its manual. The process was that, at first, the following reagents were combined by the volumes shown below:

5× In-Fusion HD Enzyme Premix 4 μL

linearized pGBKT7 vector 10.3 μL (50.573 ng)

Ps1365 5.4 μL (50.274 ng)

dH2O 0.3 μL

Incubated at 50°C for 15 min.

The above product of the In-Fusion reaction was then transformed into Invitrogen One

Shot® TOP10 Chemically Competent E. coli cells (Thermo Fisher Scientific Inc., Waltham,

MA) according to its protocol below:

1) Frozen TOP10 cells were taken out from -80°C, and immediately placed on ice.

2 μL of In-Fusion reactions were added to each tube of TOP10 cell, mixed gently.

2) Incubated on ice for 10 min.

3) Heat shocked the cells at 42°C water bath for 30 sec.

4) Immediately incubated on ice for 2 min.

5) Added 250 μL of room temperature SOC medium to each tube of TOP10 cells.

6) Tubes were mixed by shaking at 200 rpm, 37°C for one hour.

7) Plated each tube on one LB ager plate with 50 μg/mL kanamycin.

8) Incubated at 37 °C overnight. 41

The components of the SOC medium are as follows:

Tryptone 2%

Yeast Extract 0.5%

NaCl 10 mM

KCl 2.5 mM

MgCl2•6H2O 10 mM

glucose 20 mM

The pGBKT7 vectors containing Ps1365 gene insert were extracted from the E. coli

cells using Zyppy™ Plasmid Miniprep Kit (Zymo Research Corporation, Irvine, CA)

according to its protocol. First, 100 μL of 7× Lysis Buffer was added to each 600 μL of

bacterial liquid culture, and mixed. Then, 350 μL of 4°C Neutralization Buffer was added to each reaction and mixed. The reactions were centrifuged at 16,000 for 4 min. Supernatants were filtered through Zymo-Spin IIN columns to trap the plasmids on column membranes by centrifugation at 16,000 g for 30 sec. Each of the column membranes were washed with 200

μL Endo-Wash Buffer by centrifugation at 16,000 rpm for 30 sec. This step was repeated with

400 μL Endo-Wash Buffer. The plasmids on each column membrane were eluted with 30 μL dH2O by centrifugation at 16,000 g for 30 sec. The purified plasmid was sequenced by DNA

Analysis, LLC (Cincinnati, OH) to ensure that the gene is in the correct open reading frame

within the vector pGBKT7.

Sub-cloning of the Ps1365 gene in yeast strain S. cerevisiae Y2HGold: the pGBKT7

vector containing a full-length sequence of Ps1365 gene (abbreviated as pGBKT7-Ps1365)

was transformed into the yeast strain S. cerevisiae Y2HGold using Yeastmaker™ Yeast 42

Transformation System 2 (Clontech Laboratories, Inc., Mountain View, CA) according to its protocol. The process was, at first, competent yeast cells were prepared as follows:

1) Yeast cells were incubated on YPDA agar plates at 30°C until the colony diameters reach 2-3 mm.

2) Each single colony with the diameters between 2-3 mm was transferred into 3 mL YPDA broth and liquid-incubated with 250 rpm shaking at 30°C for 12 h.

3) A 5 μL aliquot of culture were transferred into 50 mL YPDA broth and were incubated with 250 rpm shaking at 30°C until OD600 reached 0.194.

4) The yeast culture was centrifuged at 700 g for 5 min to pellet yeast cells, supernatant was discarded. Cell pellets were resuspended with 100 mL fresh YPDA broth, and incubated with 250 rpm shaking at 30 °C until OD600 reached 0.447.

5) The achieved yeast culture was divided into two 50 mL volumes in two sterile

Falcon conical tubes, and centrifuged at 700 g for 5 min to pellet the yeast cells. The

supernatants were discarded.

6) Cells were resuspended with 30 mL dH2O.

7) The suspension was centrifuged at 700 g for 5 min to discard the supernatants.

Resuspended the cell pellets in 1.5 mL 1.1×TE/LiAc (prepared by mixing 1.1 mL of 10X TE

Buffer and 1.1 mL of 1 M LiAc (10X) solution and adding sterile dH2O to make the total

volume 10 mL).

8) The suspension was centrifuged at 16,300 g for 1 min.

9) Supernatants were discarded and cells were resuspended in 600 μL 1.1×TE/LiAc,

then placed on ice. 43

Then, yeast transformation was conducted as follows:

1) The following components were mixed on each empty tube:

Plasmid [pGBKT7-Ps1365] 2 μL (118.6 ng)

Yeastmaker Carrier DNA denatured 5 μL (50 μg)

2) To each tube above, added 50 μL competent cells, mixed gently by gently vortexing.

3) 500 μL of PEG/LiAc (prepared by combining 8 ml of 50% PEG 3350, 1 ml of

10X TE Buffer, and 1 ml of 1 M LiAc (10X)) was added to each tube of reaction, mixed gently by gently vortexing.

4) Incubated at 30°C for 30 min. Cells were mixed every 10 min by gently vortexing.

5) Added 20 μL DMSO (dimethylsulfoxide) and mixed by gently vortexing.

6) Incubated in 42°C water bath for 15 min. Cells were mixed at every 5 min intervals by gently vortexing.

7) Yeast cells were pelleted by centrifugation at 16,300 g for 1 min.

8) Supernatants were discarded. The cells were resuspended in 1 mL YPD Plus

Medium.

9) Yeast cells were pelleted by centrifugation at 16,300 g for 1 min.

10) Supernatants were discarded. The cells were resuspended in 1 mL 0.9% NaCl solution.

11) Cells were diluted to 1/10 and 1/100 concentrations by adding 0.9% NaCl solution and plated on SD/-Trp agar plates, then incubated at 30°C until colonies with 2-3 44 mm of diameters appear (five days).

Then, glycerol stocks of Y2HGold [pGBKT7-Ps1365] (Y2HGold cells including pGBKT7-Ps1365) were made and stored at -80°C.

To confirm that whether the genome of the bait, S. cerevisiae Y2HGold transformants, contained Ps1365-encoding gene sequence, yeast colony PCR was performed on two single bait colonies as follows: firstly, Y2HGold [pGBKT7-Ps1365] cells were grown on SD/-Trp agar plates at 30°C for 5 days, then single yeast colonies whose diameter was between 2-3 mm were picked and lysed with NaOH as follows:

1) Each single yeast colony was picked and suspended in 20 μl of 20 mM NaOH.

2) Incubated the yeast-NaOH suspension at 95°C for 45 minutes to break down yeast cell walls.

3) Centrifuged at maximum speed (20,817 g) for 10 minutes.

Using the supernatants obtained as templates, yeast colony PCR was performed using

Taq 2× Master Mix (New England Biolabs, Ipswich, MA) and Ps1365 gene-specific primers

(Table 1). Meanwhile, the supernatant from untransformed S. cerevisiae Y2HGold cells was used as a negative control, while vector pGBKT7-Ps1365 was used as a positive control. The following PCR program was used:

initiation 95°C 2 min

denaturation 95°C 30 sec

30 cycles annealing 67°C 30 sec

extension 72°C 1 min

final elongation 72°C 5 min

The components of the Taq 2× Master Mix used are as follows:

Tris-HCl 20 mM

KCl 100 mM

MgCl2 3 mM

dNTPs 0.4 mM

Glycerol 10%

IGEPAL® CA-630 0.16%

Tween® 20 0.1%

Taq DNA Polymerase 50 Units/mL

IV Bait Autoactivation Assay

The pGBKT7-Ps1365 bait strain was tested for autoactivation to confirm that Ps1365 would not be able to activate the expression of the reporter genes without the presence of prey proteins. The bait autoactivation assay was performed using the Matchmaker Gold Yeast

Two-Hybrid System according to the manufacturer’s protocol (protocol No.: PT4084-1,

Clontech Laboratories, Inc., Mountain View, CA).

In order to conduct the autoactivation test, five test groups were set up: 46

① empty cell group: Y2HGold cells without any vector

② empty vector group: Y2HGold cells transformed with pGBKT7 (BD) vector without any insert

③ test group: Y2HGold transformed with pGBKT7-Ps1365

④ negative control: Y2HGold [pGBKT7-Lam] X Y187 [pGADT7-T]

⑤ positive control: Y2HGold [pGBKT7-53] X Y187 [pGADT7-T]

Group ④ negative control is the diploid yeast cells resulted from the mating of

Y2HGold [pGBKT7-Lam] and Y187 [pGADT7-T] cells. These diploid yeast cells will not express the reporter genes because of the lack of interaction between the bait (Lam) and prey

(T antigen). Group ⑤ positive control is the diploid yeast cells resulted from the mating of

Y2HGold [pGBKT7-53] and Y187 [pGADT7-T] cells. These diploid yeast cells will express the reporter genes because of the interaction between the bait (p53) and prey (T antigen).

Groups ①, ② and ③ were plated on the following three agar media, respectively, with 1/10, 1/100, 1/1000 dilutions:

(1) SDO (SD/–Trp). Only the yeast cells harboring the pGBKT7 vector can grow on this media.

(2) SDO/X (SD/–Trp/X-α-Gal). Only the yeast cells harboring the pGBKT7 vector can grow on this media. Also, the MEL1 gene that they contained was activated forms a blue pigment on this media.

(3) SDO/X/A (SD/–Trp/X-α-Gal/AbA). Only the yeast cells harboring the pGBKT7 vector, and the AUR1-C genes they contained were activated could grow on this media. Also, if the MEL1 gene was activated, the yeast cells would form typical blue colonies on this 47 media.

The two control groups were plated on the following three different agar media, respectively, with 1/10, 1/100, 1/1000 dilutions:

(1) DDO (SD/–Leu/–Trp). Only the yeast cells harboring both pGBKT7 and pGADT7 vectors can grow on this media.

(2) DDO/X (SD/–Leu/–Trp/X-α-Gal). Only the yeast cells harboring both pGBKT7 and pGADT7 vectors grow on this media. Also, the activation of the MEL1 gene they contained causes the formation of a blue pigment.

(3) DDO/X/A (SD/–Leu/–Trp/X-α-Gal/AbA). Only the yeast cells harboring both pGBKT7 and pGADT7 vectors, as well as AUR1-C gene they contained was activated could

grow on this media. As mentioned above, the activation of their MEL1 gene causes the

formation of a blue pigment.

The plates were incubated at 30°C until colony diameters reached 2-3 mm. If blue

colonies appeared on both SDO/X and SDO/X/A agar plates of the test group, this meant that

Ps1365 caused autoactivation. If white or pale blue colonies appeared on the test group

SDO/X plates and no colonies on the test group SDO/X/A plates, this meant that Ps1365

failed to autoactivate the reporter genes without the presence of prey proteins.

The positive and negative control groups were constructed (as follows) according to

Matchmaker Gold Yeast Two-Hybrid System User Manual:

The diploid yeast cells containing pGBKT7-53 and pGADT7-T were used as a positive control, while the diploid yeast cells containing pGBKT7-Lam and pGADT7-T were used as a negative control in the autoactivation test. 48

First ly, S. cerevisiae Y2HGold cells were divided into two groups: one group was transformed with pGBKT7-53 (Figure 2.4a), while another group was transformed with the pGBKT7-Lam (Figure 2.4b) vector; and, S. cerevisiae Y187 cells were transformed with the pGADT7-T vector (Figure 2.4c) (all these transformations were performed using Yeastmaker

Yeast Transformation System 2 according to its protocol). The resultant Y2HGold

[pGBKT7-53] and Y2HGold [pGBKT7-Lam] cells were screened on SD/-Trp agar plates while Y187 [pGADT7-T] cells were on SD/-Leu agar plates.

Within yeast cells, pGBKT7-53 expressed the GAL4 BD-fused p53 protein; pGBKT7-Lam expressed the GAL4 BD-fused lamin, while pGADT7-T expressed the GAL4

AD-fused T antigen. pGBKT7-53 was used for a positive control because the p53 protein expressed by pGBKT7-53 interacts with the T antigen (which is expressed by pGADT7-T), while pGBKT7-Lam was used for a negative control because lamin expressed by pGBKT7-Lam cannot interact with the T antigen. Thus, the diploid yeast cells containing pGBKT7-53 and pGADT7-T are the positive control, while the diploid yeast cells containing pGBKT7-Lam and pGADT7-T are the negative control in the autoactivation test.

a b

Figure 2.4. Vectors used for the autoactivation assay. a) pGBKT7-53 Vector picture.

Laboratories, Inc. (n/k/a Takara Bio USA, Inc.). This vector contains a built-in murine p53 insert instead of the bait gene. b) pGBKT7-Lam DNA-BD Control Vector picture.

Clontech Laboratories, Inc. (n/k/a Takara Bio USA, Inc.). This vector contains a human lamin

C-encoding gene instead of bait gene. c) pGADT7-T Vector picture. Matchmaker® Gold

(n/k/a Takara Bio USA, Inc.). This vector contains an SV40 large T-antigen gene. p53 and

T-antigen can interact with each other, while lamin C and T-antigen cannot, so that the diploid

yeast cells (formed by Y2H mating) which including both pGBKT7-53 and pGADT7-T

functions as a positive control, while the diploid yeast cells including both pGBKT7-Lam and pGADT7-T function as the negative control.

For mating experiments, when the colony diameters of yeast cells reached between

2-3 mm on SD/-Trp and SD/-Leu plates, the mating was performed by co-culturing the

Y2HGold [pGBKT7-53] or Y2HGold [pGBKT7-Lam] and Y187 [pGADT7-T] cells in 500

μl of 2X YPDA with shaking at 200 rpm, at 30oC, overnight.

V Bait Toxicity Assay

To construct the control group for the toxicity test, the S. cerevisiae Y2HGold cells were directly transformed with pGBKT7 (BD) vectors without any insert and then were plated on SDO agar plates for screening. Both the test group and empty vector group were plated separately on SDO agar plates and incubated at 30 °C for five days. The growth of yeast cells and/or the size of the colonies were used as the indicator for toxicity. If Ps1365 is toxic for yeast, there would be no colonies on the test group SDO plates, or the colony diameters of yeast cells containing Ps1365 gene would be significantly smaller than that of empty vector group. If there is no obvious difference between the colony diameters of empty vector group and that of test group on SDO plates, it can be confirmed that Ps1365 is not toxic for yeast.

VI Prey Library Construction

Total RNAs were extracted from P. sojae mycelia using QIAGEN RNeasy Plant Mini

Kit (Qiagen, Redwood City, CA) according to its protocol. At first, 10 μL of

β-mercaptoethanol (β-ME) was added to each 1 mL of Buffer RLT (from QIAGEN Plant

Mini Kit) and mixed. Phytophthora sojae mycelium powder tubes were quickly taken out from -80°C and immediately put into liquid nitrogen. Then, the tubes were taken out from liquid nitrogen and 450 μL of Buffer RLT (contains β-ME) was immediately added to each of 51 the tubes and vortexed vigorously to homogenize the sample. The lysates were filtered through QIAshredder spin columns by centrifuging at 14,000 rpm for 2 min. Next, 0.5 volumes of RNase-free ethanol was added to each of the collected supernatants of each flow-through, and each was immediately filtered through RNeasy spin columns by centrifuging at 11,000 rpm for 1 min. The flow-throughs were discarded. The filter membrane of each RNeasy spin column was washed with 700 μL Buffer RW1 through centrifugation at

11,000 rpm for 15 sec. Each spin column membrane was washed again with 500 μL Buffer

RPE by centrifugation at 11,000 rpm for 15 sec, then washed once again with 500 μL Buffer

RPE by centrifugation at 11,000 rpm for 2 min. All traces of Buffer RPE were removed by centrifuging once again at 14,000 rpm for 1 min. The RNAs were collected by washing each column membrane with 50 μL RNase-free water through centrifugation at 11,000 rpm for 1 min. This step was repeated using the achieved eluate.

Both the first-strand cDNA synthesis and the long-distance PCR (LD-PCR) were performed according to the protocol of Clontech Make Your Own “Mate & Plate™” Library

System (Clontech Laboratories, Inc., Mountain View, CA) using primers listed in Table 1. For the first-strand cDNA synthesis, 3 μL of template RNA (amount is between 450 ng and 560 ng) and 1 μL of 10 μM CDSIII or CDSIII/6 were combined, then incubated at 72°C for 2 min, then cooled on ice for 3 min, then centrifuged at 4°C, 14,000 rpm for 10 sec. To each of the reactions above, added 5 μL of the following mixture:

5× First-strand Buffer 2 μL

10 mM DTT (dithiothreitol) 1 μL

10 mM dNTP Mix (10 mM of each) 1 μL 52

SMART MMLV Reverse Transcriptase (200 U/μL) 1 μL

The reactions were incubated at 42°C for 10 min (CDSIII/6-primed reactions were incubated at 25°C for 10 min before this step). To each tube 1 μL 10 μM SMARTIII-modified oligo primer was added, mixed and incubated at 42°C for one hour, then placed at 75°C for

10 min to terminate the first-strand synthesis. Then, the reactions were cooled to 20°C, and 1

μL RNase H (2 units) was added to each reaction, incubated at 37°C for 20 min. Finally, the

reactions were stored at -20°C.

To amplify first-strand cDNAs through LD-PCR, 2 μL of first-strand cDNA was taken

for each reaction. A mixture was prepared using the following reagents in the shown volume:

sterile dH2O 368 μL

10× Advantage 2 PCR Buffer 80 μL

10 mM dNTPs (10 mM of each) 16 μL

5’ PCR primer (10 μM) (Table 1) 16 μL

3’ PCR primer (10 μM) (Table 1) 16 μL

5M Betaine Solution 272 μL

50× Advantage 2 Polymerase Mix 16 μL

The components of the 10× Advantage 2 PCR Buffer used are as follows:

Tricine-KOH (pH 8.7 at 25°C) 400 mM

KOAc 150 mM

Mg(OAc)2 35 mM

BSA 37.5 µg/ml

Tween 0.05 % 53

Nonidet-P40 0.05 %

The 50× Advantage 2 Polymerase Mix used contained Taq polymerase, and its buffer composition is as follows:

Glycerol 50%

Tris-HCl (pH 8.0) 15 mM

KCl 75 mM

EDTA 0.05 mM

To each 2 μL of first-strand cDNA, added 98 μL of the mixture above, then run the

following PCR program:

95°C 30 sec

95°C 10 sec 22× 68°C 6 min (increase 5 sec in each subsequent cycle)

68°C 5 min

The LD-PCR products were purified by ethanol precipitation. For this, 1/10 volume of

3M sodium acetate (pH 5.2) was added to each LD-PCR reaction, then mixed by gently

pipetting. 2.5 volumes (the volume of LD-PCR reaction + sodium acetate) of cold ethanol

(stored at -20°C) was added to the mixture, then mixed by gently pipetting. The reactions

were incubated at -20°C overnight. Then, the reactions were taken out from the freezer and

centrifuged at 14,000 rpm for 20 min at room temperature. Supernatants were discarded and

the precipitates were resuspended by adding 1 mL 70% ethanol to each reaction, mixed by 54 gently pipetting, then centrifuged at 14,000 rpm for 2 min at room temperature, the supernatants were carefully decanted. The remained pellets were centrifugated once again at

14,000 rpm for 2 min, and supernatants were carefully decanted. Finally, 21 μL of dH2O (pH

8.11) was added to each reaction to rehydrate the DNAs.

The library stocks were then made according to Clontech Make Your Own “Mate &

Plate™” Library System User Manual (Clontech Laboratories, Inc., Mountain View, CA). For making library stocks, SmaI-linearized pGADT7-Rec vectors and cDNAs were co-transformed into S. cerevisiae Y187 cells using Yeastmaker Yeast Transformation System

2 (Clontech Laboratories, Inc., Mountain View, CA) according to its protocol. The resultant transformants were incubated on SD/-Leu agar plates (with 50 μg/ml kanamycin) at 30°C for

5 days. The ligation process was automatically completed within S. cerevisiae Y187 cells.

The colonies were collected to produce glycerol stocks and stored at -80 °C.

VII Yeast Two-hybrid Screening

The Y2H was performed using Matchmaker® Gold Yeast Two-hybrid System

(Clontech Laboratories, Inc., Mountain View, CA) according to its protocol as follows:

1) A fresh colony (diameter was between 2-3 mm) of the bait yeast construct was

o cultured at 30 C in 50 mL SD/-Trp broth with 260 rpm shaking until OD600 reaches 0.781.

2) The cells were pelleted by centrifugation at 1000 g for 5 min. Cell pellets were

resuspended with 4 mL of SD/-Trp broth to a cell density of 1.16 x 108 cells/mL.

3) Next, 4 mL of the bait strain was mixed with two tubes of P. sojae mycelium prey

library (1 mL each) in a 2 L flask. Then 45 mL of 2X YPDA broth (containing 50 µg/mL

kanamycin) was added to the mixture and incubated with 50 rpm shaking at 30oC for 20 h. 55

Microscopic observation was used to confirm the mating of yeast cells. If mating did not occur, incubation was continued.

4) Once the 3-lobes-shaped structures were found which indicating the successful mating, the culture was centrifuged at 1000 g for 10 min to precipitate the cells. The supernatant was discarded and cell pellets were resuspended with 10 mL 0.5X YPDA (with

50 µg/mL kanamycin), then were plated on SD/-Trp, SD/-Leu and SD/–Leu/–Trp plates respectively with 1/10, 1/100, 1/1000 and 1/10000 dilutions (100 µL to each plate).

5) The remaining culture was plated on 150 mm DDO/X/A agar plates (200 µL to each plate).

6) All plates were incubated at 30°C for five days.

7) Numbers of independent colonies on SDO and DDO plates were counted, and then the number of screened diploid colonies (formed by mating) and mating efficiency were calculated.

VIII Prey Colony Insert-checking

The blue diploid yeast colonies indicating possible positive reactions were randomly analyzed for the potential prey protein gene inserts by using Clontech Insert-check PCR Mix

2 (Clontech Laboratories, Inc., Mountain View, CA) according to its protocol as follows:

1) Each blue yeast colony was picked and suspended in 20 μl of 20 mM NaOH.

2) Incubated the yeast-NaOH suspension at 95°C for 45 minutes to break down

yeast cell walls.

3) The yeast cell debris was pelleted by centrifugation at 20,817 g for 10 min. Then,

8 μL nano-pure water, 2 μL of the supernatant, and 10 μL of Matchmaker Insert-check PCR 56

Mix 2 (PCR polymerase, dNTPs, primers, and buffer) were combined for each yeast colony.

Then, the following insert-check PCR program was run:

94°C 1 min

98°C 10 sec 30× 68°C 3 min

IX Plasmid Rescue

Plasmid extraction was performed on the single blue yeast colonies on DDO/X/A agar

plates using Zymoprep™ Yeast Plasmid Miniprep II Kit (Zymo Research Corp, Irvine, CA)

according to its protocol. For this, each yeast colony was dispensed into 200 μL of Solution 1,

then 5 μL of Zymolyase (5 Units/μL) was added to each reaction. The reactions were

incubated at 37°C for 60 min, then 200 μL of Solution 2 was added to each reaction, after that,

400 μL of Solution 3 was added (to each reaction). The reactions were centrifugated at 20817

g for 3 min. The supernatants were transferred to Zymo-Spin-1 columns, then centrifugated at

20,817 g for 30 sec, flow-throughs were discarded. The spin-column membranes were washed using 550 μL Wash Buffer by centrifugation at 20,817 g for 2 min. Finally, plasmids were eluted from the spin-column membranes using 10 μL of Zyppy Elution Buffer by centrifugation at 20,817 g for 1 min. Then, the plasmids from each individual colony were transformed into One Shot® Mach1™-T1R and One Shot® TOP10 Chemically Competent E. coli cells (Thermo Fisher Scientific Inc., Waltham, MA) according to the manufacturer’s manual. For the transformation, first the frozen Mach1™-T1R and One Shot® TOP10 cells 57 were taken out from -80°C and immediately placed on ice to thaw. Then, 5 μL of plasmid extractions from yeast was added to each tube of thawed competent cells, mixed by gently tapping. The reactions were incubated on ice for 30 min, then heat-shocked in a 42°C water bath for 30 sec. After that, reactions were incubated on ice for 2 min. Next, 250 μL of room-temperature SOC media was added to each tube of reactions, then incubated at 37°C with 225 rpm shaking for one hour. One tube of transformed E. coli cells was plated on each

LB agar plates with 100 μg/ml ampicillin and incubated at 37°C overnight to make primary E.

coli plates. The resultant individual colonies were transferred onto new LB agar plates with

100 μg/ml ampicillin to make secondary E. coli plates. Two secondary E. coli plates and one primary E. coli plate were sent to University of Chicago Comprehensive Cancer Center DNA

Sequencing & Genotyping Facility (UCCCC-DSF), Illinois for high-throughput plasmid extraction and Sanger sequencing.

Results

Aim 1) Analysis of the Bait Sequence, Ps1365

Bait Sequence Analysis

Rutter et al. (2012) identified a potential novel family of transcription factors in P.

sojae. They reported that the upstream regions of 20% of the P. sojae genes contain a

common motif “GCCGCC”. Their study used yeast one-hybrid analysis (Y1H) to identify

proteins that bind to the “GCCGCC” motif using P. sojae cDNA libraries from both zoospore

and mycelia. They identified 31 proteins that were able to bind to the GCCGCC motif and

found that each of these proteins contains a conserved region of 50 amino acids. Using

BLAST, they identified six P. infestans proteins, and a protein from P. ramorum, that are 58 orthologous proteins of Ps1365 .

Through re-analysis of Ps1365 in the genome of P. sojae, out of 31 genes previously

identified, 22 were predicted to be hypothetical proteins, containing complete open reading

frames. Nine proteins were identified as pseudogenes. The following six proteins were

considered to be pseudogene proteins because they have abnormal deletions:

PHYSODRAFT_342634-t26_1

PHYSODRAFT_288539-t26_1

PHYSODRAFT_283985-t26_1

PHYSODRAFT_320623-t26_1

PHYSODRAFT_534655-t26_1

PHYSODRAFT_342550-t26_1

The following six proteins were removed as pseudogenes because of their abnormal insertions:

PHYSODRAFT_288539-t26_1

PHYSODRAFT_283985-t26_1

PHYSODRAFT_288533-t26_1

PHYSODRAFT_289262-t26_1

PHYSODRAFT_342550-t26_1

PHYSODRAFT_257212-t26_1

Among the remaining 22 hypothetical proteins, Ps1365 was selected for further characterization because it encodes a small protein and does not have any intron, which makes it convenient for cloning and manipulating. Additionally, Ps1365 is expressed in 59 various important stages including mycelium, zoospores and during infection. The partial amino acid sequence of Ps1365 (Figure 2.5) was identified and provided by Rutter (2012).

NVRGRTRDGKFIYATTMKGNLEPKKKGNASEVKEESQTIGSQAKKRAKIESNPGVSS

VSSVSEEDPVSTICLRLLNRVTYPFLSEDVASILSNMVGSQSDPLGLVEGDLDYRDEK

Figure 2.5. Partial amino acid sequence of Ps1365. This partial protein sequence was

identified by Rutter (2012). The region labeled red is a conserved region which is shared by a

family of 22 proteins in P. sojae.

The full-length sequence of Ps1365 was retrieved from NCBI via BLAST search

(identical sequences were observed from both JGI and FungiDB databases) and is shown in

Figure 2.6. Analysis of full-length sequence of Ps1365 protein showed that it is 471-nt long and

encodes a putative small protein of 156 amino acids (Figure 2.6). Ps1365 was associated with accession numbers XP_009539397.1 (NCBI), PHYSODRAFT_342624-t26_1 (FungiDB) and

PHYSODRAFT_342624 (JGI version3). The gene is 471-bp long (with no intron) and is located at 572879 – 573349 of the forward strand of the P. sojae genome (P. sojae genome sequence can be downloaded from the webpage:

https://fungidb.org/fungidb/showQuestion.do?questionFullName=GenomicSequenceQuestion

s.SequencesByTaxon). According to FungiDB, the product of PHYSODRAFT_342624-t26_1

is a hypothetical protein of 156 amino acids, with the molecular weight of 17,424 Da and

isoelectric point of 8.39. 60

A) Nucleotide Sequence of 1365 (XP_009539397.1, PHYSODRAFT_342624) ATGAATGTGCGCGGTAGAACGCGTGACGGCAAGTTCATCTATGCCACGACAATGAAAGACAATCTT AAGCCCAAGAAGAAAGGCAACGCCTCTGAGGTGAAGGAGGAGAGTCAGACGATCGGGAGTCAG GCGAAGAAGCGGGCGAAGATCGAGAGCAATCCTGGCGTCTCAAGTGTGTCGAGTGTCTCGGAGG AGGATCCTGTTTTCACGATCTGCCTCCGTTTGCTCAACAGGGTCACCTACCCGTTCCTGAGCGAGG ACGTGGCTTCCATTCTATCAAACATGGTGGAGTCTCAGTATGACCCGCTGGGGCTTGTGGAGGGCG ACCTGGACTATCGCGATGAAGATGAAAAACGAAACAAGAACGTTAGTTCGTATGCAAAGTTGTCGC TTACCGCGACGCGCGACAAGTTGTTGCAGAACGCCCGTAAAGCACTAGCGTACGCGCCGGGTGAC AAGGAAGAAGAGTGA B) Amino Acid Sequence of 1365 (XP_009539397.1, PHYSODRAFT_342624) MNVRGRTRDGKFIYATTMKDNLKPKKKGNASEVKEESQTIGSQAKKRAKIESNPGVSSVSSVSEEDP VFTICLRLLNRVTYPFLSEDVASILSNMVESQYDPLGLVEGDLDYRDEDEKRNKNVSSYAKLSLTATRDK LLQNARKALAYAPGDKEEE*

Figure 2.6. Full-length nucleotide (A) and deduced amino acid sequences (B) of

Ps1365. The partial amino acid sequence of Ps1365 was used to query the NCBI, JGI and

FungiDB databases. BLAST results showed that Ps1365 is associated with accession numbers XP_009539397.1 (NCBI), PHYSODRAFT_342624-t26_1 (FungiDB), and

PHYSODRAFT_342624 (JGI version 3). Start and stop codons are underlined. An asterisk (*) indicates stop codon in the amino acid sequence. 61

Hydrophobicity Analysis

Sub-localization of Ps1365 was first analyzed via the Kyte-Doolittle Hydropathy Plot analysis (Kyte & Doolittle, 1982). Using the window size of 9, results showed that majority parts of Ps1365 are hydrophilic except the regions between the amino acid residues of 50-95 showed several possible hydrophobic regions (Figure 2.7) indicating that Ps1365 may not be a membrane protein, but rather a globular protein.

Figure 2.7. The Kyte-Doolittle Hydropathy Plot analysis of Ps1365. The full-length

amino acid sequence of Ps1365 was analyzed with window size of 9. The regions between

the amino acid residues of 50-95 showed possible hydrophobic parts. The numbers on the

horizontal axis represent the position of amino acid residues; the numbers on the vertical axis

indicates the degree of hydrophobicity of each amino acid residue.

Signal Peptide and Conserved Domain Prediction

Secretory signal peptide of Ps1365 was analyzed by SignalP - Signal Peptide 62

Prediction (Petersen, Brunak, von Heijne, & Nielsen, 2011)

(http://sigpep.services.came.sbg.ac.at/signalblast.html). The C-score is the cleavage site score,

which is high in the cleavage site. The amino acid residues with lower S-scores may

potentially be the part of a mature protein, while the amino acid residues with higher S-scores

may potentially be the part of a signal peptide. The Y-score is highest on the sites where

C-score is significantly higher and S-score abruptly decreased, implying the potential existence of a cleavage site. Results indicated that Ps1365 does not contain any recognized signal peptide (Figure 2.8). Additionally, Pfam (pfam.xfam.org) tool did not identify any

conserved domain in Ps1365, suggesting that Ps1365 either does not have functional domains

or it has domains which were not identified in the database.

Figure 2.8. SignalP analysis showed that Ps1365 did not contain any cleavage site

indicating a recognizable signal peptide. The x-axis represents amino acid residues and their

positions, while the y-axis shows the value of the three scores: C, S and Y which together

indicate the potential existence of a cleavage site. 63

Secondary Structure Prediction

When analyzing the secondary structure of Ps1365 using JPred 4 (Drozdetskiy, Cole,

Procter, & Barton, 2015) (http://www.compbio.dundee.ac.uk/jpred/), a structure that had

either a helix-loop-helix (HLH) or helix-turn-helix (HTH) on the Ps1365 polypeptide was

indicated (Figure 2.9). Within the JPred prediction results, Jnet indicates the final result of secondary structure prediction, jhmm shows the JNet HMM profile prediction result, while jpssm represents the results of JNet PSIBLAST PSSM profile prediction. Existence of more than one α-helix in JNet conclusion in Figure 2.9 implies the potential existence of a HLH or

HTH domain in Ps1365. 64

Figure 2.9. Predicted secondary structure of Ps1365 by JPred 4, a secondary structure predictor server. “H” represents helical, “E” represents extended, “-” represents other types of structures. Jnet = Final secondary structure prediction for query, jhmm = JNet HMM profile prediction, jpssm = JNet PSIBLAST PSSM profile prediction.

Subcellular Localization Prediction

The potential subcellular localization of Ps1365 was predicted by analyzing the

full-length putative amino acid sequence of Ps1365 using CELLO (Yu, Lin, & Hwang, 2004;

Yu, Chen, Lu, & Hwang, 2006). CELLO is a subcellular localization prediction software that

combines analyses of five different classifiers: amino acid composition, N-peptide

composition, partitioned sequence composition, physico-chemical composition, and neighboring sequence composition. The prediction result indicated that Ps1365 may localize to nucleus with a high possibility at 4.306 (Figure 2.10). 65

Figure 2.10. Prediction of Ps1365 subcellular localization by CELLO. The probability of Ps1365 being a nuclear protein was significantly higher than any other subcellular localization probabilities at 4.306 implying that Ps1365 might be a nuclear protein. The

values alongside the subcellular locations/organelles are the degrees of reliability for the

protein to be localized to that location/organelle. Each of the reliability scores is the sum of

the reliabilities of five classifiers: amino acid composition, N-peptide composition, partitioned sequence composition, physico-chemical composition, and neighboring sequence composition. Location/organelle with the highest score is reported as the potential localization of the protein. 66

In Silico Gene Expression Analysis

The transcription levels of the Ps1365-encoding gene were directly observed from the transcriptome data deposited in FungiDB (Figure 2.11). Relatively higher levels of transcription of Ps1365 occurred during mycelium and cyst stages compared to the infection stage. The highest level of transcription occurred during mycelium growth (the transcription of Ps1365 gene increased following maturation).

Figure 2.11. In silico gene expression analysis of Ps1365 in three different developmental stages of P. sojae, mycelium, cyst and infection (Tyler 2014, FungiDB: fungidb.org).

Phylogenetic Analysis

Phylogenetic analyses were performed on Ps1365 and its 21 homologous proteins from P. sojae, as well as several related proteins from other oomycete species (Figure 2.12).

Among them, the 22 P. sojae proteins were provided by Brian Rutter, the protein sequences 67 from other oomycetes were achieved by blasting using FungiDB. The sequences were aligned with each other using ClustalW embedded in MEGA7. As a result, all the 22 P. sojae proteins formed a distinct protein subfamily with a bootstrap value of 93. The analysis also determined the first divergence of the 22 P. sojae proteins with the bootstrap value of 98, but failed to show any subsequent divergence events within the two larger subgroups of the 22 P. sojae proteins. The phylogenetic tree demonstrated a closer connection between the four P.

infestans proteins and the P. so j a e proteins with the bootstrap value of 93. Although the tree showed the P. sojae- P. infestans group (22 proteins from P. sojae, four proteins from P.

infestans) and other-oomycete group (two proteins from P. infestans, one protein from P.

ramorum, one protein from Pythium vexans) as two major branches, the analysis failed to

support the evolutionary relation between the two major branches with a significant bootstrap

value. 68

Figure 2.12. Phylogenetic divergences of the 22 proteins belong to a novel protein

family in P. sojae, including Ps1365. The evolutionary history was inferred by using

Maximum Likelihood, based on the JTT matrix-based model (Jones, Taylor, & Thornton,

1992). The tree with the best ln likelihood (-488.30) is shown. The percentage of trees in which the associated taxa clustered together in bootstrap analyses is shown next to the branches. The initial tree(s) for the heuristic search were obtained automatically by applying

Neighbor-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using a

JTT model, and then selecting the topology with superior ln likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site (scale at lower left). The analysis involved 30 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 41 positions in the final dataset. 69

Evolutionary analyses were conducted in MEGA7 (Kumar, Stecher, & Tamura, 2016).

Bootstrap values ≥ 50 % (1000 replicates) are given at the branching points. Ps1365 was labeled as jgi|Physo3|342624|gm1.22199 g.

Aim 2) Cloning the Bait Sequence, Ps1365 into Yeast, Saccharomyces cerevisiae Y2HGold

Bait Vector Construction

All the three PCR reactions with the three different annealing temperatures (65°C,

66°C and 67°C) showed bands which were about 500 bp which was the expected size of

Ps1365-encoding gene plus two sticky ends (Figure 2.13). The PCR products were purified

using QIAGEN MinElute PCR Purification Kit, then sent to DNA Analysis, LLC, Cincinnati,

OH for sequencing. 70

Figure 2.13. Agarose gel electrophoresis analysis of Ps1365 gradient PCR products.

Ps1365-encoding gene was amplified from P. sojae genome using Ps1365-gene-specific forward and reverse primers at three different annealing temperatures: 65°C, 66°C, 67°C.

Annealing time was set as 30 sec. PCR products were subjected to electrophoresis using a 1% agarose gel in 1X TAE buffer. Lane “100 bps”: the marker NEB (New England BioLabs,

Ipswich, MA) 100 bp DNA ladder, the unit of the numbers on the figure is “bp”; the temperatures on the figure show the annealing temperature of the PCR reaction on the corresponding lane. 71

pGBKT7 (BD) vector was prepared for cloning through restriction digestion with

EcoRI and/or BamHI at 37°C for one hour. The resultant digestion products were electrophoresed on a 0.8% agarose gel. It can be seen from the agarose gel electrophoresis results of restriction digestion that both the single digestion and double digestion were successful because all the digested vectors showed definite narrow bands while undigested vector showed an indefinite wide and faint band (Figure 2.14), implying vectors have been successfully cut. The Ps1365 insert was prepared for cloning through direct PCR amplification and subsequent PCR purification. Both the pGBKT7 (BD) vector and the

Ps1365 insert were treated with In-Fusion homologous recombination reaction to insert

Ps1365 into the vector. Then, TOP10 cells were transformed with the In-Fusion-treated product. Among the achieved numerous colonies, colony PCR analysis was performed on eight randomly-selected E. coli single colonies with the annealing temperature of 47°C.

Universal T7 promoter and specifically-designed T7 terminator were used as primers. The

resultant PCR reactions were subjected to electrophoresis on a 1% agarose gel. Among the

eight randomly-selected single colonies, only colony #4 showed a band with the approximate size more than 500 bps, implying the possible existence of Ps1365 gene insert. 72

Figure 2.14. Agarose gel electrophoresis analysis of pGBKT7 (BD) vector after restriction digestion with EcoRI and/or BamHI at 37°C for one hour. “M” = 1 kb DNA ladder

(NEB, MA), unit of numbers on the ladder are in “kb”; “-” is the circular pGBKT7 (BD) vector without any digestion and was served as control; “E” represents circular pGBKT7 (BD) vector single-digested with EcoRI, while “B” represents circular pGBKT7 (BD) single-digested with BamHI; “D” means circular pGBKT7 (BD) vector double-digested with both EcoRI and BamHI. 73

Numerous colonies resulted after transforming E. coli cells with vector pGBKT7-Ps1365. Among these, eight single colonies were randomly selected and colony

PCR was performed using the universal T7 promoter and specifically-designed T7 terminator

(Table 1).

The PCR products were subjected to electrophoresis on a 1% agarose gel. If colony #4 included the Ps1365 gene insert, a band which is about or higher than 500 bp would be shown on the agarose gel. The expected band would be longer than Ps1365-encoding gene itself because the band included the Ps1365 gene insert as well as 5’ and 3’ vector sequence overhangs (the vector sequences downstream of T7 promoter site and upstream of T7 terminator site in addition to Ps1365-encoding gene). As a result, the desired bands which were about or higher than 500 bp long were shown on the agarose gel (Figure 2.15), which implies the possible existence of Ps1365 gene insert.

The plasmid was extracted from this colony and analyzed for the presence of

Ps1365-encoding gene by performing PCR on the plasmid. This PCR was done by two separate reactions: one was using T7 promoter and the specifically-designed T7 terminator above. Another was using the Ps1365 gene-specific forward and reverse primers. Both groups of PCR reactions showed that the recombinant vector was containing the Ps1365-encoding gene (Figure 2.16). 74

Figure 2.15. Agarose gel electrophoresis analysis of colony PCR reactions of

transformed TOP10 E. coli cells containing the vector pGBKT7-Ps1365. Among the eight

randomly-selected single colonies, only colony #4 showed a band with the approximate size more than 500 bps, implying the possible existence of Ps1365 gene insert. “M” = 50 bp DNA

Ladder (NEB, MA). The units of molecular masses of the ladder are in “bp”; “W” is water

(negative control); “V” is pGBKT7 vector without any insert; “1”-“8” are the colony PCR

reactions of randomly-selected eight E. coli single colonies after transformation with

In-Fusion treated product of pGBKT7 vector and the Ps1365 gene insert. Each lane

represents one single colony. 75

Figure 2.16. Agarose gel electrophoresis of colony PCR of E. coli transformant colony

#4 containing pGBKT7-Ps1365. The purified plasmid was PCR-amplified using either T7 primers or Ps1365 gene-specific primers. “M” = NEB 100 bp DNA ladder (NEB, MA), the unit of the numbers on the figure are in “bp”; “T7” is PCR products amplified using T7-PCR program; “G” is PCR products amplified using Ps1365-PCR program; “W” means water was used instead of the colony #4 plasmid template; “V” means empty pGBKT7 vector without any insert was used instead of the colony #4 plasmid template; “T” means the colony #4 plasmids were used as templates. 76

Analysis of Bait Construct

To make sure that the inserted Ps1365-encoding gene was perfectly in frame with the

host vector pGBKT7, the vector was extracted from the E. coli colony and was sequenced

using T7 promoter. Then, the sequencing result was aligned with Ps1365-encoding gene

sequence from databases using Clustal Omega (Sievers et al., 2011) and the alignment result

(Figure 2.17) showed that the sequence inserted into the vector pGBKT7 (BD) was identical

to Ps1365 gene and the insert is perfectly in-frame with the vector pGBKT7 (BD) start site.

This implies that Ps1365-encoding gene was successfully inserted into vector pGBKT7 (BD). 77 78

Figure 2.17. Alignment of the Ps1365 gene sequence in plasmid pGBKT7-Ps1365 with the Ps1365 database gene sequence. The two sequences were aligned using Clustal

Omega (Sievers et al., 2011) (http://www.ebi.ac.uk/Tools/msa/clustalo/). The Ps1365 gene

sequence within the plasmid pGBKT7-1365 formed an exact match with the database Ps1365

gene sequence, and within the correct reading frame of the vector pGBKT7. “Ps1365” is the

Ps1365 gene sequence within the plasmid pGBKT7-1365; “PHYSODRAFT_342624-t26_1”

is the database Ps1365 gene sequence. A dash (-) means there is no nucleotide on the same position in the input sequence. An asterisk (*) means an exact match. Red boxes at the beginning indicate the codons, and thus the reading frame of the pGBKT7 vector, while red boxes at the end show the stop codons of both the database Ps1365 gene sequence and

Ps1365 gene sequence within the plasmid pGBKT7-Ps1365.

Bait Transformation and Autoactivation Assay

The constructed pGBKT7-Ps1365 plasmid was then transformed into the yeast strain S.

cerevisiae Y2HGold using Clontech Yeastmaker Yeast Transformation System 2 according to

its manual. Yeast transformants were incubated on SD/-Trp selective agar plates at 30°C and

after five days of incubation, typical yeast single colonies with the diameter of 2-3 mm were

achieved as instructed in the manual.

To test for the autoactivation, the Mel1 and AUR1-C genes were used as reporter genes

in the Matchmaker Gold Yeast Two-Hybrid System. That is to say, positive control diploid

cells were incubated on DDO/X and DDO/X/A media, respectively; while Y2HGold cells

including pGBKT7-Ps1365 were incubated on SDO/X and SDO/X/A media. Incubation was 79 conducted at 30°C for 5 days. When an AUR1-C gene is expressed, yeast can grow on agar plates with Aureobasidin A, while it is lethal to yeast cells when AUR1-C gene is silent. In addition, when MEL1 gene is activated and expressed, yeast cells can convert X-α-Gal into blue pigment, indol (Chevalier, Roy, & Savoie, 1991), so that yeast colonies appear as blue. If the MEL1 gene is inactive, yeast colonies would appear white. As a result of the autoactivation test, Y2HGold [pGBKT7-1365] failed to form blue colonies on SDO/X

(Figure 2.18a) and SDO/X/A plates (Figure 2.18b), which implies that Ps1365 alone cannot activate the reporter genes MEL1 and AUR1-C; while the positive control group (which was constructed by mating (co-culturing) Y2HGold [pGBKT7-53] and Y187 [pGADT7-T] cells and had the true interaction of p53 and T antigen) produced blue colonies on both DDO/X

(Figure 2.18c) and DDO/X/A plates (Figure 2.18d), implying the activation of both MEL1 and AUR1-C genes. These results indicated that Ps1365-encoding gene cannot activate the reporter genes inside the yeast strain Y2HGold without the participation of prey proteins. 80

a c

b d

Figure 2.18. Autoactivation assay of Ps1365-encoding gene in S. cerevisiae Y2HGold.

The plates were incubated at 30°C for 5 days. a) Growth of Y2HGold cells including pGBKT7-Ps1365 on a SDO/X plate. b) Growth of Y2HGold cells including pGBKT7-Ps1365 on a SDO/X/A plate. c) Growth of positive control group yeast colonies on a DDO/X plate. d) Growth of positive control group yeast colonies on a DDO/X/A plate.

Positive control cells on both DDO/X and DDO/X/A media appeared blue as expected. 81

Bait Toxicity Assay

In the toxicity test, if Ps1365 is toxic for Y2HGold cells, Ps1365 would impede, even totally stop the growth of Y2HGold cells. As a result of the toxicity test, we did not see any significant difference between the colony diameters of empty vector group and test group after five days of incubation at 30°C on SDO agar media (Figure 2.19), indicating that

Ps1365 is not toxic for the yeast strain Y2HGold, and can be used for further analysis.

a b

Figure 2.19. Toxicity assay of Ps1365-encoding gene in S. cerevisiae Y2HGold. a)

Y2HGold cells transformed with pGBKT7-Ps1365. b) Y2HGold cells transformed with pGBKT7, as control. No significant difference was found between the colony diameters of the two groups.

Aim 3) Identify Interactive Proteins of a Nove l P. sojae Transcription Factor, Ps1365

Using Yeast Two-hybrid Analysis

Prey Library Construction

The construction of P. sojae mycelium prey library was started with the extraction of 82 total RNA from P. sojae mycelium as described earlier and the electrophoresis results indicated that RNA was successfully extracted from P. sojae mycelia (Figure 2.20). SMART

III Oligo, CDSIII and CDSIII/6 primers were used to synthesize the first-strand cDNAs using total RNA extracted from P. sojae mycelium as templates. Among them, SMART III Oligo has homology with the 3’ end of pGADT7-Rec vector, while CDSIII and CDSIII/6 primers have homology with the 5’ end of pGADT7-Rec vector. The difference is that CDSIII vector is the equivalent of an oligo-dT primer, while CDSIII/6 is that of a random primer. Two reactions were used for the synthesis of cDNAs: one using the combination of SMART III and CDSIII primers; another using the combination of SMART III and CDSIII/6 primers. As a result, both the CDSIII and CDSIII/6-primed cDNA synthesis reactions exhibited cDNA bands (Figure 2.21). The first-strand cDNAs were amplified via LD-PCR (Make Your Own

“Mate & Plate” Library System, Mountain View, CA) using 5’ and 3’ PCR primers and the resultant PCR products also produced the expected cDNA bands (Figure 2.22). 83

Figure 2.20. Agarose gel electrophoresis of total RNA extracted from P. sojae mycelium. Lane M = 1 kb ladder (NEB, Ipswich, MA), Lanes 1-4 are RNA samples extracted from four independent hyphal mat of P. sojae. Amount of RNA (ng/lane): Lane 1 - 172.96.

Lane 2 - 39.69. Lane 3 - 152.87. Lane 4 - 185.51. 84

Figure 2.21. Agarose gel electrophoresis of P. sojae mycelium first-strand cDNAs synthesized using SMART III Oligo, CDSIII and CDSIII/6 primers. Lane “1 kb” = 1 kb ladder (NEB, Ipswich, MA); Lanes “RNA” are total RNAs extracted from P. sojae mycelium;

Lane “CDSIII” is CDSIII-primed cDNA synthesis; “CDSIII/6” is CDSIII/6-primed cDNA synthesis; Lane “100 bps” = 100 bp DNA Ladder (NEB, Ipswich, MA). Unit of the numbers on the figure is “kb”. 85

Figure 2.22. Agarose gel electrophoresis of LD-PCR amplicons of P. sojae mycelium

cDNAs synthesized using 5’ PCR and 3’ PCR primers. Lane “1 kb” = 1 kb ladder (NEB,

Ipswich, MA); Lane “(-)” is the negative control in which sterile dH2O was used instead of

DNA template; Lanes “CDSIII” are the LD-PCR reactions whose template is CDSIII-primed

first-strand cDNAs; Lanes “CDSIII/6” are the LD-PCR reactions whose template is

CDSIII/6-primed first-strand cDNAs. The unit of the numbers on the figure is “kb”. 86

Y2H Screening

Before conducting the Y2H screening, the bait yeast strain was recovered from

glycerol stocks and PCR was used to validate the presence of Ps1365. Agarose gel

electrophoresis showed the expected band of Ps1365 at 500 bp (Figure 2.23), which

confirmed the presence of bait construct pGBKT7-Ps1365 within Y2HGold cells.

To prepare the bait strain before Y2H mating, the bait cells were incubated in SD/-Trp liquid media and were measured by hemocytometry to ensure that the bait cell density exceeded 108 cells/ml. In addition, the cell density of P. sojae mycelium prey library was

measured using a library titration method before the Y2H screening. The results were

performed in duplicate and showed that the cell density of both of the prey libraries were

higher than the minimum requirement of the manufacturer - Takara Bio, Inc. (at 2 x 107

cfu/mL). Tube #1 had a cell density of 4.04 x 107 cfu/mL and tube #2 of 4.34 x 107 cfu/mL. 87

Figure 2.23. Agarose gel electrophoresis of bait colony PCR using Ps1365 gene-specific forward and reverse primers. Lane “100 bps” = 100 bp DNA Ladder (NEB,

Ipswich, MA); Lane “water” is the negative control group in which water was used instead of

the DNA template; Lane “empty Y2HGold cells” is the negative control group in which

Y2HGold cells without any vector transformed were used instead of

Y2HGold[pGBKT7-Ps1365]; Lane “pGBKT7-Ps1365” is the positive control group in which

the plasmid pGBKT7-Ps1365 was used as PCR template instead of cell lysate supernatants;

Lanes “Bait Yeast Colony 1” and “Bait Yeast Colony 2” are the respective colony PCR

reactions of the two tested bait yeast colonies. 88

To perform the Y2H mating, liquid culture of the bait strain and prey library were co-incubated for 24 h in 2X YPDA Broth medium using Clontech Matchmaker® Gold Yeast

Two-Hybrid System according to its manual. The reaction was then plated on DDO/X/A plates and incubated at 30°C for five days. As a result, 120 blue colonies resulted on

DDO/X/A plates. The blue colonies were analyzed by colony PCR using Matchmaker

Insert-check PCR Mix 2, and the subsequent agarose gel electrophoresis test showed the PCR amplicons were of different lengths (Figure 2.24). Because of the large numbers of blue colonies, yeast batch colony PCR was performed by combining every 10 colonies into a single PCR reaction. The result showed that this Y2H was successful since all the colony groups resulted in bands longer than 500 bp (Figure 2.25), indicating the insertion of prey

DNA from P. sojae (the amplicon from empty vector is about 300 bp). 89 a b

c 90

Figure 2.24. Agarose gel electrophoresis of blue diploid colonies insert-check PCR using Matchmaker Insert Check PCR Mix 2. a) Insert-check PCR of three randomly-selected blue diploid colonies. b) Insert-check PCR results of 11 randomly-selected blue diploid colonies. c) Insert-check PCR results of other 11 randomly-selected blue diploid colonies.

Lane “100 bps” = 100 bp DNA Ladder (NEB, Ipswich, MA); Lane “1 kb” = 1 kb ladder

(NEB, Ipswich, MA); Lane “-” is the negative control in which nano-pure H2O was used

instead of yeast lysate supernatant; Lane “+” is the positive control in which circular

pGADT7-Rec vector was used instead of yeast lysate supernatant; Lanes “colonies” are the

insert-check PCR reactions of blue diploid colonies and each lane represents one single

colony. The unit of the numbers on the figure is “bp”. 91

Figure 2.25. Agarose gel electrophoresis of blue diploid colonies batch insert-check

PCR reactions using Matchmaker Insert Check PCR Mix 2. Lane “100 bps” = 100 bp DNA

Ladder (NEB, Ipswich, MA); Lane “1 kb” = 1 kb ladder (NEB, Ipswich, MA); Lane “-” is the

negative control in which nano-pure H2O was used instead of yeast lysate supernatant; Lane

“+” is the positive control in which circular pGADT7-Rec vector was used instead of yeast lysate supernatant; Lanes “colonies” are the insert-check PCR reactions of blue diploid colonies and each lane represents at most ten colonies. The unit of the numbers on the figure is “bp”. 92

Plasmid Rescue

The prey inserts which showed positive reactions were sequenced to identify the

potential interactive proteins. To obtain sufficient plasmids for sequencing, the plasmids were

extracted from blue colonies of yeast using Zymoprep™ Yeast Plasmid Miniprep II Kit

(Zymo Research, Irvine, CA). The extracted plasmids were transformed into Invitrogen

Mach1-T1R and TOP10 competent E. coli cells. Numerous E. coli colonies were resulted on

LB agar plates containing 100 μg/mL ampicillin. To confirm the presence of the plasmids in

E. coli, plasmid extraction was performed on four randomly selected E. coli colonies using

Zyppy Plasmid Miniprep Kit according to its protocol. Gel electrophoresis results (Figure

2.26) indicated that each of the E. coli colonies tested contained plasmids from the yeast.

Figure 2.26. Agarose gel electrophoresis of plasmids extracted from E. coli after transformed with prey plasmids. Lane M = 1 kb ladder (NEB, Ipswich, MA); Lanes “P1”-”P4” are the plasmid extractions from the four single E. coli colonies, respectively. The unit of the numbers on the figure is “kb”. 93

Following the successful initial experiments of those four colonies, additional group colony PCR (four single colonies per group) was performed. A total of five groups were tested using the Insert-Check PCR Mix following the PCR program on its manual.

Electrophoresis analysis showed that all the five colony groups and the four plasmid extractions from E. coli showed the bands longer than 500 bp (Figure 2.27) indicating the presence of the DNA inserts derived from P. sojae. 94

Figure 2.27. Agarose gel electrophoresis of E. coli transformants colony PCR and direct PCR of plasmids from some E. coli colonies using Matchmaker Insert Check PCR Mix

2. Lane “100 bps” = 100 bp DNA Ladder (NEB, Ipswich, MA). Lane “1 kb” = 1 kb ladder

(NEB, Ipswich, MA). Lane “water” is the negative control in which water was used instead of PCR templates. Lane “empty vector” is the positive control in which circular pGADT7-Rec vector was used as DNA template. Lanes “E. coli colonies” are the colony

PCR reactions of E. coli colonies, and each lane represents four E. coli colonies. Lanes

“plasmids extracted from E. coli” are the PCR reactions of four plasmid extractions from E. coli. The unit of the numbers on the figure is “kb”. 95

Discussion

Transcription factors play important roles in gene regulation. Identification of TFs that are associated with pathogenicity and zoospore functions could provide a new target for disease control to combat the diseases caused by Phytophthora species. Ps1365 was

identified as one of the members of a novel TF family in P. sojae based on yeast one-hybrid

assay (Rutter, 2012). In silico analysis of Ps1365 showed that it is a small protein with 156 aa

and a mass of 17,424 Da without signal peptide and transmembrane domains. Further

analysis revealed that Ps1365 may contain a HLH/HTH domain indicating its potential

function as TF since HTH domain is known to have DNA-binding property (reviewed in

Brennan & Matthews, 1989). This partly supports my hypothesis that Ps1365 is a TF. The

nuclear localization sequence (NLS) was not detected on Ps1365 using existing software

available to date, and this is contradictory to the hypothesis that Ps1365 is a TF which can

enter nucleus. However, considering that the molecular weight of Ps1365 is about 17.4 kDa,

and the upper limit of nuclear envelope to permit the passage of molecules is 40-60 kDa

(Zanta, Belguise-Valladier, & Behr, 1999), Ps1365 may be able to enter nucleus without an

NLS because of its relatively small size (Kalderon, Roberts, Richardson, & Smith, 1984). In

addition, the prediction tools used have a degree of inabilities in correctly predicting the

subcellular localization of proteins (Min, 2010). Therefore, the signal peptide prediction

result is not enough to completely refute the hypothesis that Ps1365 is a TF. The protein

domain/family-prediction software (e.g., Pfam) failed to return a match, which seemingly

implies that Ps1365 does not have any known characteristic domain. However, the expansion

rate of the domain databases lag behind that of protein databases. Current domain-prediction 96 methods have their shortcomings (Ochoa, 2013), and studies of TFs in Phytophthora are still in their infancy. Therefore, it is possible that Ps1365 may have a functional domain(s) which has not yet been added into current databases.

Phylogenetic analysis of Ps1365 using MEGA7 indicated that all 22 proteins from P. sojae including Ps1365 form a protein family (Figure 2.12 – refer to the phylogram).

Compared with the similar proteins in other oomycetes, four proteins from P. infestans appear

to be evolutionarily close to the 22 P. sojae proteins. This implies that the divergence of the P.

sojae and P. infestans proteins occurred during and after the separation of P. sojae and P.

infestans.

The 22 P. sojae proteins are phylogenetically closer to each other than to other

oomycete proteins, indicating that these proteins evolved from a common ancestor, and the

emergence of the 22 proteins occurred after the divergence of P. sojae and P. infestans. In other words, the 22 P. sojae proteins are the products of gene duplications from an ancestral gene sequence within P. sojae, not the result of speciation from a common ancestral species with P. infestans or other oomycetes.

In this study, Ps 1365 was used as a bait sequence in Y2H screening. Before conducting Y2H screening, Ps1365 was tested for autoactivation and toxicity. The autoactivation results indicated that Ps1365 protein alone cannot activate the expression of the reporter genes AUR1-C and MEL1 in yeast. Ps1365 was also tested for toxicity and results showed that it does not deter the growth of Y2HGold cells indicating that Ps1365 is not toxic to yeast cells. Taken together, Ps1365 can be used in Matchmaker Gold Y2H System as a

bait. 97

After the mating of the bait and prey strains, 120 blue colonies resulted, indicating the interaction of Ps1365 and the prey proteins from P. sojae. At the same time, white colonies also appeared, which may be because the GAL4-AD-prey fusion protein cannot bind with its

target upstream activating sequence (UAS) (Clontech Laboratories, Inc., 2013).

Considering the number of blue colonies produced in this study, Wang et al. (2002) reported 65 potential positive colonies when they screened the interactive proteins of hepatitis

B virus X protein in human liver cDNA library, while Shahheydari et al. (2014) reported 204

potential positive colonies when they were searching the interactive partners of tumor protein

D52 in a human breast carcinoma cDNA library. Thus, the number of positive colonies that

resulted from this experiment is between the numbers of positive colonies reported in the

previous two studies, and is therefore consistent with these studies.

After Y2H screening, potential prey inserts were detected using insert-check PCR

reaction. The PCR amplicons showed bands longer than 500 bp indicating that the Y2H was

successful and further confirmed that the yeast cells contained the recombinant plasmids with

inserted genes form P. sojae. Any amplicons with bands below 500 were excluded from the

analysis as it might be transformants containing the empty pGADT7-Rec vector.

References

Brennan, R. G., & Matthews, B. W. (1989). The helix-turn-helix DNA binding motif. The

Journal of Biological Chemistry, 264(4), 1903-1906.

Cheng, Q., Dong, L., Gao, T., Liu, T., Li, N., Wang, L., . . . Zhang, S. (2018). The bHLH

transcription factor GmPIB1 facilitates resistance to Phytophthora sojae in Glycine

max. Journal of Experimental Botany, 69(10), 2527-2541. doi:10.1093/jxb/ery103 98

Chevalier, P., Roy, D., & Savoie, L. (1991). X-α-gal-based medium for simultaneous

enumeration of bifidobacteria and lactic acid bacteria in milk

doi://doi.org/10.1016/0167-7012(91)90034-N

Coates, P., & Hall, P. (2003). The yeast two‐hybrid system for identifying protein–protein

interactions. The Journal of Pathology, 199(1), 4-7. doi:10.1002/path.1267

Drozdetskiy, A., Cole, C., Procter, J., & Barton, G. J. (2015). JPred4: A protein secondary

structure prediction server. Nucleic Acids Research, 43(W1), W389-W394.

doi:10.1093/nar/gkv332

Finn, R. D., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Mistry, J., Mitchell, A. L., . . .

Bateman, A. (2016). The Pfam protein families database: Towards a more sustainable

future. Nucleic Acids Research, 44(D1), D279-D285. doi:10.1093/nar/gkv1344

Goujon, M., McWilliam, H., Li, W., Valentin, F., Squizzato, S., Paern, J., & Lopez, R. (2010).

A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids

Research, 38(Web Server issue), W695-W699. doi:10.1093/nar/gkq313

Jones, D. T., Taylor, W. R., & Thornton, J. M. (1992). The rapid generation of mutation data

matrices from protein sequences. Computer Applications in the Biosciences : CABIOS,

8(3), 275-282.

Kalderon, D., Roberts, B. L., Richardson, W. D., & Smith, A. E. (1984). A short amino acid

sequence able to specify nuclear location. Cell, 39(3), 499-509.

doi:10.1016/0092-8674(84)90457-4 99

Kumar, S., Stecher, G., & Tamura, K. (2016). MEGA7: Molecular evolutionary genetics

analysis version 7.0 for bigger datasets. Molecular Biology and Evolution, 33(7),

1870-1874. doi:10.1093/molbev/msw054

Kyte, J., & Doolittle, R. F. (1982). A simple method for displaying the hydropathic character

of a protein. Journal of Molecular Biology, 157(1), 105-132.

doi:10.1016/0022-2836(82)90515-0

Lamitina Lab. (2007). Ethanol precipitation of DNA. Unpublished manuscript. Retrieved

from

http://docs.wixstatic.com/ugd/803ab9_1cd1cb09279649b388391953899ae1f9.pdf

Li, S. (2010). Characterization of soybean GmPUB1 proteins that interact with the

Phytophthora sojae effector Avr1b protein (Master's thesis, Iowa State University)

Martinez-Duncker, I., Mollicone, R., Candelier, J., Breton, C., & Oriol, R. (2003). A new

superfamily of protein-O-fucosyltransferases, 2-fucosyltransferases,

and 6-fucosyltransferases: Phylogeny and identification of conserved peptide motifs.

Glycobiology, 13(12), 1-5. doi:10.1093/glycob/cwg113

McWilliam, H., Li, W., Uludag, M., Squizzato, S., Park, Y. M., Buso, N., . . . Lopez, R.

(2013). Analysis tool web services from the EMBL-EBI. Nucleic Acids Research,

41(Web Server issue), W597-W600. doi:10.1093/nar/gkt376

Min, X. J. (2010). Evaluation of computational methods for secreted protein prediction in

different eukaryotes. Journal of Proteomics & Bioinformatics, 3(4), 143-147.

doi:10.4172/jpb.1000133 100

Naveed, Z. A., Bibi, S., & Ali, G. S. (2019). The Phytophthora RXLR effector Avrblb2

modulates plant immunity by interfering with Ca2+ signaling pathway. Frontiers in

Plant Science, 10 doi:10.3389/fpls.2019.00374

Ochoa, A. (2013). Protein domain prediction using context statistics, the false discovery rate,

and comparative genomics, with application Toplasmodium falciparum (Doctoral

dissertation, Princeton University)

Ohad, N., & Yalovsky, S. (2010). Utilizing bimolecular fluorescence complementation (BiFC)

to assay protein-protein interaction in plants. Methods in Molecular Biology (Clifton,

N.J.), 655, 347-358.

Peng, H., Shan, W., Kuang, J., Lu, W., & Chen, J. (2013). Molecular characterization of

cold-responsive basic helix-loop-helix transcription factors MabHLHs that interact

with MaICE1 in banana fruit. Planta, 238(5), 937-953.

doi:10.1007/s00425-013-1944-7

Petersen, T. N., Brunak, S., von Heijne, G., & Nielsen, H. (2011). SignalP 4.0:

Discriminating signal peptides from transmembrane regions. Nature Methods, 8(10),

785-786. doi:10.1038/nmeth.1701

Phizicky, E. M., & Fields, S. (1995). Protein-protein interactions: Methods for detection and

analysis. Microbiological Reviews, 59(1), 94-123.

Rajagopala, S. V., Sikorski, P., Caufield, J. H., Tovchigrechko, A., & Uetz, P. (2012).

Studying protein complexes by the yeast two-hybrid system. Methods, 58(4), 392-399.

doi:10.1016/j.ymeth.2012.07.015 101

Rutter, B. D. (2012). Catch of the day: a yeast one-hybrid assay identifies a novel

DNA-binding domain in Phytophthora sojae (Master's thesis, Bowling Green State

University)

Shahheydari, H., Frost, S., Smith, B., Groblewski, G., Chen, Y., & Byrne, J. (2014).

Identification of PLP2 and RAB5C as novel TPD52 binding partners through yeast

two-hybrid screening. Molecular Biology Reports, 41(7), 4565-4572.

doi:10.1007/s11033-014-3327-y

Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., . . . Higgins, D. G.

(2011). Fast, scalable generation of high‐quality protein multiple sequence

alignments using Clustal Omega. Molecular Systems Biology, 7(1), 539-n/a.

doi:10.1038/msb.2011.75

Stajich, J. E., Harris, T., Brunk, B. P., Brestelli, J., Fischer, S., Harb, O. S., . . . Roos, D. S.

(2012). FungiDB: An integrated functional genomics database for fungi. Nucleic Acids

Research, 40(D1), D675-D681. doi:10.1093/nar/gkr918

Wang, X. Z., Jiang, X. R., Chen, X. C., Chen, Z. X., Li, D., Lin, J. Y., & Tao, Q. M. (2002).

Seek protein which can interact with hepatitis B virus X protein from human liver

cDNA library by yeast two-hybrid system. World Journal of Gastroenterology, 8(1),

95-98. doi:10.3748/wjg.v8.i1.95

Wienken, C. J., Baaske, P., Rothbauer, U., Braun, D., & Duhr, S. (2010). Protein-binding

assays in biological liquids using microscale thermophoresis. Nature Communications,

1(7), 100. doi:10.1038/ncomms1093 102

Yan, H. W., Hong, L., Zhou, Y. Q., Jiang, H. Y., Zhu, S. W., Fan, J., & Cheng, B. J. (2013).

A genome-wide analysis of the ERF gene family in sorghum. Genetics and Molecular

Research : GMR, 12(2), 2038-2055. doi:10.4238/2013.May.13.1

Yu, C., Chen, Y., Lu, C., & Hwang, J. (2006). Prediction of protein subcellular localization.

Proteins: Structure, Function, and Bioinformatics, 64(3), 643-651.

doi:10.1002/prot.21018

Yu, C., Lin, C., & Hwang, J. (2004). Predicting subcellular localization of proteins for

gram‐negative bacteria by support vector machines based on n‐peptide

compositions. Protein Science, 13(5), 1402-1406. doi:10.1110/ps.03479604

Zanta, M. A., Belguise-Valladier, P., & Behr, J. (1999). Gene delivery: A single nuclear

localization signal peptide is sufficient to carry DNA to the cell nucleus. Proceedings

of the National Academy of Sciences of the United States of America, 96(1), 91-96.

doi:10.1073/pnas.96.1.91 103

CHAPTER 3. ANALYSIS OF THE POTENTIAL INTERACTOR PROTEINS

Introduction

I Bioinformatic Analysis of Interactor Proteins of Ps1365

Bioinformatic analysis allows for efficient analysis of sequences obtained from lab experiments to predict the possible structures and functions of genes and proteins. Genome and transcriptome databases can be used to assist and complement lab experiments. For example, Kaur, Kocher & Gupta (2012) cloned the potential alkaline protease gene from

Bacillus circulans MTCC 7906, and analyzed the sequence using the Basic Local Alignment

Search Tool (BLAST) located at the National Center for Biotechnology Information (NCBI) website, followed by the phylogenetic analysis of the alkaline protease gene. Through the analyses, they concluded that the cloned sequence is indeed a novel alkaline protease from B. circulans MTCC 7906. Genome and transcriptome databases have also been used in the studies of oomycetes. Tian et al. (2004) performed the similarity and motif searches using public databases, including the GenBank nonredundant database, to analyze the expressed sequence tags (ESTs) from tomato leaves infected with P. infestans. They discovered a potential member of the Kazal serine protease inhibitor family and named its gene epi1, which was later showed to have the protease-inhibiting function. The genome of P. sojae strain

P6497 was sequenced by the United States Department of Energy Joint Genome Institute using a whole genome shotgun strategy at 9x coverage (Tyler et al., 2006). The project has the accession AAQY00000000 within the GenBank database at the National Center for

Biotechnology Information (NCBI). The genome size is approximately 95 Mb (Tyler et al.,

2006) and contains 26,489 coding genes and 25 pseudogenes (Howe et al., 2020; Kersey et 104 al., 2018; Protists.ensembl.org, 2018). Currently, there are P. sojae transcriptomic data from ten different developmental stages (mycelia, zoosporangia, zoospores, cysts, germinating cysts, and five infection stages) available under the accession number SRP006969 (Ye et al., 2011).

In this chapter, existing genomic and transcriptomic databases were used to analyze the genes encoding potential interactor proteins of Ps1365, a potential transcription factor identified in a previous study (Rutter, 2012). After sequencing, several software packages were employed to analyze the sequences obtained. The identities, structures and functions of the interactive proteins were determined according to their gene annotations in GenBank,

FungiBD and JGI. Additional software includes Sequencher 5.4.5 (Gene Codes Corporation,

Ann Arbor, MI, USA), which automatically compares the forward and reverse-complementary orientations to assemble the best possible contigs, so DNA assembly can be done regardless of orientation. Sequencher also trims ends to remove poor-quality and vector sequences that may mislead data analysis. Additionally, a series of protein analysis tools were used for predicting the identity, structure and functions of the prey protein candidates.

Hypotheses and Aims

Ps1365 was identified as a novel transcription factor via yeast one-hybrid assay

(Rutter, 2012). This chapter aims to analyze the potential interactor proteins of Ps1365 that were obtained from the yeast two-hybrid assay (Y2H). In Chapter 2, the bait vector pGBKT7

(Clontech, CA) was used to clone the Ps1365-encoding gene and screen the cDNA library of

P. sojae mycelium. The results showed that the bait plasmid containing Ps1365 gene insert showed neither self-activating effect nor toxicity on the yeast strain Y2HGold. After the Y2H assay, the sequencing was performed and many prey plasmids were sequenced. The 105 hypotheses for this chapter are:

1) The potential prey candidates are transcription factors, at least nuclear proteins.

2) The transcriptions of the bait and two potential prey candidate genes manifest the same temporal and spatial patterns.

In this chapter, there were two aims:

1) Analyze the interactive proteins of the potential novel P. sojae transcription factor, Ps1365, which was indicated from Y2H.

2) Confirm the co-expression of the bait and the potential prey proteins using existing transcriptome data available in FungiDB.

Materials and Methods

I Yeast Two-hybrid Assay

The detailed procedures of the lab experiment are provided in Chapter2. Briefly, the

“bait” plasmid was constructed by cloning the Ps1365 gene from P. sojae into the pGBKT7 vector. Following the sequencing and alignment analysis, the "bait" plasmid pGBKT7-Ps1365 was transformed into the yeast strain S. cerevisiae Y2HGold. Neither autoactivation nor toxicity of Ps1365 was observed. Y2H was then performed by mating Y2HGold cells containing pGBKT7-Ps1365 with Y187 cells containing P. sojae mycelium cDNA library

plasmids. Once the 3-lobes-shaped structures were observed, indicating the successful mating,

the mated yeast cells were plated on double-dropout medium and assayed for X-α-Gal color and Aureobasidin A resistance activity. The interaction between Ps1365 and its interactive proteins was observed visually as positive blue colonies (please see details in Chapter 2). The plasmids were extracted from the blue yeast colonies and re-transformed into competent E. coli 106 cells in order to amplify the plasmids sufficiently for sequencing. The single colonies of E. coli transformants were subcultured and plasmids were extracted and sequenced by the University of Chicago Comprehensive Cancer Center DNA Sequencing & Genotyping Facility

(UCCCC-DSF).

II Bioinformatic Analysis

The Sequencher 5.4.5 (Gene Codes Corporation, Ann Arbor, MI) was used to screen for sequence redundancy and infer consensus sequences. In order to remove the redundant sequences and to analyze the potential prey inserts, sequences belonging to the vectors were trimmed and the remaining parts of the sequences were aligned with each other using

Sequencher 5.4.5 with default parameters (Figure 3.1).

Figure 3.1. Default parameters of Sequencher 5.4.5 used to align the prey sequences. 107

The resulting consensus sequences, as well as in-contig (highly similar sequences assembled into a same group, i.e. contig) and out-of-contig (sequences were not assembled into any group because of their differences with the in-contig sequences) acceptable-quality sequences, were subjected to searches using BLAST against transcripts, nucleotide and protein databases via the National Center for Biotechnology Information (NCBI)

(https://www.ncbi.nlm.nih.gov/) (Zhang, Schwartz, Wagner, & Miller, 2000) and FungiDB

(http://fungidb.org/fungidb/) (Stajich et al., 2012; Tyler et al., 2006). The RNA-Seq data of

the hypothetical protein hits were retrieved directly from FungiDB. The gene models of the

hypothetical protein hits in FungiDB were aligned with RNA-Seq data to correct the gene models. Molecular weight and isoelectric point of the achieved hypothetical protein hits were calculated using ExPASy - Compute pI/Mw tool (https://web.expasy.org/compute_pi/)

(Gasteiger et al., 2005). The putative amino acid sequences (retrieved from FungiDB) were blasted against protein databases by NCBI blastp tool (Altschul et al., 1997; Altschul et al.,

2005) as well as PSI-BLAST (https://www.ebi.ac.uk/Tools/sss/psiblast/) (Altschul et al.,

1997). Other characteristics of the deduced proteins were also analyzed. The potential

transmembrane domains were predicted by Kyte-Doolittle Hydropathy Plot

(https://web.expasy.org/protscale/) (Gasteiger et al., 2005; Kyte & Doolittle, 1982), TMHMM

Transmembrane Helix Prediction (http://www.cbs.dtu.dk/services/TMHMM/) (Krogh,

Larsson, von Heijne, & Sonnhammer, 2001; Sonnhammer, von Heijne, & Krogh, 1998), DAS

- Transmembrane Prediction Server (https://tmdas.bioinfo.se/DAS/) (Cserzö, Wallin, Simon,

von Heijne, & Elofsson, 1997), HMMTOP transmembrane topology prediction server

(http://www.enzim.hu/hmmtop/) (Tusnády & Simon, 1998; Tusnády & Simon, 2001) and 108

TMpred Prediction of Transmembrane Regions and Orientation

(https://embnet.vital-it.ch/software/TMPRED_form.html) (Hofmann & Stoffel, 1993).

Possible domain/motifs and protein family memberships were predicted using InterPro

protein sequence analysis & classification (http://www.ebi.ac.uk/interpro/) (Mitchell et al.,

2019; Jones et al., 2014), CDART Conserved Domain Architecture Retrievel Tool

(https://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi) (Geer, Domrachev, Lipman,

& Bryant, 2002), NCBI CD-Search (Conserved Domain Search)

(https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) (Marchler-Bauer & Bryant, 2004;

Marchler-Bauer et al., 2011; Marchler-Bauer et al., 2015; Marchler-Bauer et al., 2017),

Scooby-domain (Sequence hydrophobicity predicts domains)

(http://www.ibi.vu.nl/programs/scoobywww/) (George, Lin, & Heringa, 2005; Pang, Lin,

Wouters, Heringa, & George, 2008), Pfam protein families database (http://pfam.xfam.org/)

(Finn et al., 2016), SMART Simple Modular Architecture Research Tool

(http://smart.embl-heidelberg.de/) (Letunic & Bork, 2018; Schultz, Milpetz, Bork, & Ponting,

1998). Several servers were used to predict secondary structure of the proteins. These servers

included Jpred 4 Protein Secondary Structure Prediction Server

(http://www.compbio.dundee.ac.uk/jpred/) (Drozdetskiy, Cole, Procter, & Barton, 2015),

NetSurfP Protein Surface Accessibility and Secondary Structure Predictions

(http://www.cbs.dtu.dk/services/NetSurfP/) (Petersen, Petersen, Andersen, Nielsen, &

Lundegaard, 2009), GOR protein secondary structure prediction server

(https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_gor4.html) (Garnier, Gibrat, &

Robson, 1996; Sen, Jernigan, Garnier, & Kloczkowski, 2005) , NetTurnP β-turn region 109 predictor (http://www.cbs.dtu.dk/services/NetTurnP/) (Petersen, Lundegaard, & Petersen,

2010), PORTER Protein Secondary Structure Prediction at University College Dublin

(http://distillf.ucd.ie/porter/) (Pollastri & McLysaght, 2005), SOPMA Secondary Structure

Prediction Server (https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_sopma.html)

(Geourjon & Deléage, 1995). Their potential tertiary structures and possible functions were

determined with Phyre2 Protein Fold Recognition Server

(http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) (Kelley, Mezulis, Yates, Wass,

& Sternberg, 2015), (PS)2 Protein Structure Prediction Server (http://ps2v3.life.nctu.edu.tw/)

(Huang et al., 2015), RaptorX Protein Structure Prediction Server

(http://raptorx.uchicago.edu/StructurePrediction/predict/) (Källberg et al., 2012; Peng & Xu,

2011), PFP (Protein Function Prediction) (http://kiharalab.org/pfp.php) (Hawkins, Chitale,

Luban, & Kihara, 2009) and ESG (Extended Similarity Group method)

(http://kiharalab.org/esg.php) (Chitale, Hawkins, Park, & Kihara, 2009). The existence of

signal peptide and subcellular localization of the hypothetical proteins were predicted by

SignalP signal peptide prediction server (http://www.cbs.dtu.dk/services/SignalP/) (Petersen

et al., 2011), Wolf PSORT (https://wolfpsort.hgc.jp/) (Horton et al., 2007), TargetP protein

subcellular localization predictor server (http://www.cbs.dtu.dk/services/TargetP/)

(Emanuelsson et al., 2000), CELLO Subcellular Localization Predictive System

(http://cello.life.nctu.edu.tw/) (Yu, Lin, & Hwang, 2004; Yu, Chen, Lu, & Hwang, 2006),

iPSORT subcellular localization site predictor (http://ipsort.hgc.jp/) (Bannai et al., 2002;

Bannai et al., 2001), CoSiDe Combined Signal Peptide Predictor

(http://sigpep.services.came.sbg.ac.at/coside.html) (Frank, 2013). 110

Co-expression analysis of the bait gene PHYSODRAFT_342624-t26_1, as well as the

two prey candidate genes was conducted using the transcriptome data of those genes

available in FungiDB.

Results

Aim 1) Analysis of the Potential Interactive Sequences of Ps1365 Obtained from Yeast

Two-hybrid Assay

Analysis of the Potential Prey Inserts

E. coli transformants were transferred onto two new LB agar ampicillin plates (40 colonies/plate on average) and incubated at 37°C overnight to make secondary plates. The plates were sent to UCCCC-DSF for sequencing, and a total of 188 potential prey sequences were obtained. Among them, 48 sequences were discarded due to poor sequencing quality.

Another 140 sequences were of good quality and were used in the analysis.

Initially Sequencher 5.4.5 identified four contigs. They were Contig[0001],

Contig[0003], Contig[0007] and Contig[0074] with 76, 71, 7 and 6 members, respectively

(Table 1). However, all of the six sequences from Contig[0074] were of low quality.

Therefore, this contig was discarded. The consensus sequences of the remaining three contigs are provided in Appendix. 111

Table 3.1. NCBI BLAST analysis results of the contigs of P. sojae prey inserts.

Numbers

Numbers of raw of good Contig No. BLAST E-value %ID sequences quality

sequences

18S ribosomal RNA [0001] 76 64 0.0 100% gene

PHYSODRAFT_29

1312 [0003] 71 69 0.0 99% PHYSODRAFT_35

6433

28S large subunit [0007] 7 6 5e-175 99% ribosomal RNA gene

[0074] 6 0 N/A N/A N/A 112

Analysis of the Potential Prey Inserts by BLAST

The consensus sequences of the three contigs were subjected to analysis using the

NCBI BLAST tool in comparison to both P. sojae nucleotide and protein databases. BLAST analysis of Contig[0001] against the NCBI, FungiDB and JGI provided the same results and indicated that it is an 18S ribosomal RNA gene, while that of Contig[0007] indicated that it is a 28S large subunit ribosomal RNA gene (Table 1). For Contig[0003], BLAST analysis indicated that it is either a hypothetical protein PHYSODRAFT_291312 or

PHYSODRAFT_356433 (Table 1).

Validation of the Gene Models of the Prey Candidates

Because the gene models of the two hypothetical protein candidates in FungiDB were

predicted by annotation software, and not by laboratory experimentation, in addition, the

PHYSODRAFT_291312-t26_1 gene model in FungiDB includes an extremely long intron, it

is necessary to use RNA-Seq to figure out the right gene structures of the two prey protein

candidates.

Alignment of the FungiDB gene model of PHYSODRAFT_291312 with the existing

RNA-Seq data (available in FungiDB) showed that the new model of

PHYSODRAFT_291312 gene is significantly shorter than the predicted gene model (Figure

3.2). According to the alignment, the new gene model of PHYSODRAFT_291312 is now predicted to contain three exons and two introns (Figure 3.2 and 3), while the predicted gene model of PHYSODRAFT_356433 appeared to be correct (Figure 3.4). 113 a

b c

Figure 3.2. Alignment of the PHYSODRAFT_291312 (is shown here as

PHYSODRAFT_291312-t26_1) gene model with RNA-Seq data. (a) P. sojae transcription log scale graph obtained from FungiDB. (b) Predicted gene models in FungiDB. (c)

Newly-devised gene model of PHYSODRAFT_291312-t26_1 after aligning the original gene model with RNA-Seq data. White rectangles represent untranslated regions (UTRs), black

rectangles represent coding sequences (CDSs), solid lines represent introns. 114

>JH159173 | Phytophthora sojae unplaced genomic scaffold PHYSOscaffold_23, whole genome shotgun sequence. | 174389 to 178361

CACGACTTCAGCGAGTGCAACTGACGGCGTCGTGCCATCGTCAGTGCAAGTGAGTGACGC TTAGTGAAAAGCCGGCCCACCACGTCAAGGTGGTCACTCGTCGGACCGGGAAAGGACACC CGGATTGCTCGGGGTGCCCCGAGAACGCGTCAACGTCCCCAAGGGAGGCTCAAACTACCT GTGGGTCTGGTAAAATGCGGCTTCATCACCTAGGCGTTCCACGGCCCAAGCTATCCCCCG CAAGTCGATTGGAAGCGACAAGTTCGACGACTCAAGGTGCTTTCACGCGGAATGGAACCG CGGCTGACCGTCTCCCTTTCAGGACGGCCACACTTTGAGAGTCTCCCCGGACACACCCCA ACCTAGATCCGATCAAGTCGAATCCGCCAGCCTCCCCCCGCTACAACGAGCAGGTTCGAC CGTACTGCACAATGTGCAAGCTCCACCCACGATTCATCTCAAGCATGCGGCCTTAGGGAC CCCACGCGAGGACTACCTGCGTAAGTGACGACTCCTCTCGATCCACGCACGAATTGCAAC ACCAAAGCCACCGCTGAGTTTCCTAGGCTCAACAGATCCTCGATGCCAACCCCCTTGCAG ATTGAGACTCGTCCGTTTTGCCCGACGAGAACCTGGTCCGAATCGCGAAGCGTTTCTTCT ACTCGCCCGGCGGCTCCAAGGTGTCTGGACGATTTTCATCATGACCTATCAAGGTCGGTC CCGACTCCCTGCCGCGCCGTTTGAGGGCTCACTCCATGCCCCAAGTCGTCTGGAAGACTT TGTGTCAAGGTCGGTCAGACCCATCCAGATGATGACTTGTCACACTTCGTGACTGCTTCG CAATCCCACTGGTGATCAGCTCGCCAGCCAACTCCTTTATCGCCCGCAGCAGAACAGCCA AGCCCAAGTTCATCCAAGCCGAAGCCAGACAGACAAGAACCCAGCCATTCCGCCGCAAAC TTCCGGAATCGGCCGTTGATCATCTCACGTTGTTCTCGTCGACGTCTCGAGTACACCAAG ATGCTTTACGCCGCCATACGTGCGTTTTTTCGGCTGCTCACCGAATACCGATAACCCATG GCGGAGCTCTCCCGGTGCGACGAACTGTCTTAGTTCCAGGCACAGACGTTTCCAGCCGAC CACTCCCCCGGTCCGACCGTACTGCGTCAGCGACTCTGAAGCTGCAGCCACCTTTGTTGT GTGGACATCTGCCTCCCCCCGCTCACCCGGGCAGGAATCAGACTGCGCTTCGTCGTCTCA TTAAATTCGCACGCCCAGGCCGAAGCCCCGACGCTGATACTCGTTTCCAAGTTCTTCGAC ATCATGACATCTCCAGGTTGGTATTCTGTGTCCCCCTCTCATCCCAGGGCGGAACTCAGG CCGCAACTCGAAGAATCTCTCAAGAACACAACGTCCGGACCGAAGTCAAGACTACGTTCT TCAACGACTGCTACAACACGCTGCGCCCCCCGTAAGGTTGAAGTCTACATCCAGCCCCCA TCCCCCTCTCGCCGGGACAGGAATTGGAATATACTACGCAAGCTCGCAATCAAGATCAAC GGATCAGTGCCCAAGACCGAAGTCCTGGACCCATCTCCACCAGGTTTCCAGCCGGGGTTT GTACCCCCAGACATCCAGAGCGCACTGGACCATGCATCGCAGGCCGGAAGCTGTTTTTTA GAGAGGAGGTCTCCTCTATATACGCTAGCCTCTTGCGGAGAATTCTCCCCGCGTTCTACA TATTTGCTACTTTAGGTTTGGAATTAAGTCAATGTAATCAACATTCCCAGCCAAACTCCC CACCTGACGATGTCTTCCACGTAGCTCACCCAGACAAAAGCCCGAGATTACAACTAGAAC TGAGCGACCGCGAGGCCACCCAGATGCTACATCATGGTATAAGTAAAACAACATTGAGAG TAGTGGTATTTCACGGACGGCAGAGCCTCCCACTTATTCTACACCTCCCAAGTCATTTCA CAACGTCAGACTAGAGTCAAGCTCAACAGGGTCTTCTTTCCCCGCTGATTATTCCAAGCC CGTTCCCTTGGCTGTGGGTTCGCTAGACAGTAGATAGGGACAGTGGGAATCTCATTAATC CATTCATGCGCGTCACTAGTTAGATGACGAGGCATTTGGCTACCTTAAGAGAGTCATAGT TACTCCCGCCGTTTACCCGCGCTTGGTTGAATCTCTTCACTTTGACATTCAGAGCACTGG GCAGAAATCACATTGTGTCAACACCGTCTCCGGCCATCACAATGCTTTGTTTTAATTAAA CAGTCGGATTCCCCTTGTCCGCTCCAGTTCTGAGCCGGTTGTTCAACGCACTAGGGAAAC AGCCGCCAGCCCGAAAGCCAACGACCTTTCTCTCCGGCGCAGCAAGAGCAGCCCGACCGC 115

CGGGCCGCATCCGGTCCCCGAAAGGTCCAGACGCAGCCCAGCGTAGGCCGCCACAAGCTC TCCAAAAAGACCCCTAGGCCCAACCCTTAGAGCCAATCCTTTTCCCGAAGTTACGGATCT ATTTTGCCGACTTCCCTTATCTACATTCTTCTATCAACTAGAGGCTGCTAACCTTGGAGA CCTGATGCGGTTATGAGTACGAACGAGGGTGCGAATAAATCTCTAGCCAGGATTTTCAAG GGCTGTCGTGGGCGCACAGGACACTTCAAAAAGTAAAGTGCTTTGCCAAGGCGTCCTCCT TATCGCCGGATAATCCGTTTCCAAGGCAGGACATGCTTGTTAAAAAGAAAAGAGAACTCT TCCCTGGGCACACGCTAGCGTCTCCTGGGTCGGTTGTGTTGCCACACATTATCCACGTCT CGGTTCAGGAATATTAACCTGATTCCCTTTCGATACACGAGGCTGCATACAAAACGCGAG GACGAACCCCACGCCCCAAACAGCCCAGCTTTCAAACGGAATTATCCTATCTCTTAGGAT CGACTCACCCGTGTCCAATTACTGATCACACGGAACCTTTCTCCACTTCAGTCTTCAAAG TTCGCATTTGAATATTTGCTACTACCACCAAGATCTGCACTAAAGGCCGTTTCACTCAGG CTCACGCCACGAGCTTCTTCACGACCCTCACGCCCTCCTACTCATTAACGCGTACATTAC ATAT GCCAT CGCGCTAACGGCAAAGTATAAGTAGCCCGCTTTAGCGCCATCCATTTTCAG GGCTAGTTCATTCGGCAGGTGAGTTGTTACACACTCCTTAGCGGATTCCGACTTCCATGG CCACCGTCCTGCTGTCTAAATGAACCAACACCTTTTATGGTATCTAGGTGAGCGGGCATT TTGGCACTTTAACTTTGCGTTCGGTTCATCCCGCATCGCCAGACGAGCTTACCCCGTATG GCCCACTAGCAACTTGATATTCACATCCACAAGTTCAATTAAGAAACCTGCAGGTCTTAC AGATTTAAAGTTTGAGAATAGGTCGAGGAAGTTTCTTCCCCGAATCCTCTAATCATTCGC TTTACCTCATAAAACTATCGTAAATAAGTTGCTGCTATCCTGAGGGAAATTTCGGAGGGA ACCAGCTACTAGATGGTTCGATTAGTCTTTCGCCCCTATACCCAAGTTTGACGATCGATT TGCACGTCAGAATCGCTACGAGCTTCCACCAGAGTTTCCCCTGGCTTCACCCTACTCAGG CATAGTTCACCATCTTTCGGGTACCAACATATGTGCTCAAACTCAAATCTTTCACCACGA AGGTTCATGATCGGTCGATAGTGCCACGCCGCAGACCGAAGCCCGCAACATAGACGATAA TCACCTTTCCCGATGTGCCGCGAATAGCGATAGGTGTCTTCTGGGCACCCAACATCATAC AATTGCAACGCACTCCGCTGCCTGTCAAGTGCTGGCGGTGGAGAGTAGGCTGACTTGTAA TTTCAAATATTGGGAAAGATAAATCCTTTGTAGACGACTTAACTACAGAACGGGGTGTTG TAAGCATGAGAGT Figure 3.3. DNA sequence of the predicted new gene model of

PHYSODRAFT_291312 after aligning the original gene model from FungiDB with

RNA-Seq data from the same database. Red letters represent exonic regions, including untranslated regions (UTRs) and coding sequences (CDSs), and black letters represent introns. 116 a

Figure 3.4. Alignment of PHYSODRAFT_356433 gene model (is shown here as

PHYSODRAFT_356433-t26_1) with RNA-Seq data. (a) P. sojae transcription log scale

graph from FungiDB. (b) Predicted gene model in FungiDB.

Protein Features of PHYSODRAFT_291312

Data from FungiDB (http://fungidb.org/fungidb/) showed that the coding gene of

PHYSODRAFT_291312 was located on the forward strand of P. sojae genomic sequence

JH159173. Its exact position on the genome is 167015 - 178157 (+). The predicted coding

region of PHYSODRAFT_291312 is 213 bp long. The deduced length of

PHYSODRAFT_291312 is 70 amino acids and it does not have any signal peptide and

transmembrane domains (Figure 3.5). 117

Figure 3.5. Features of PHYSODRAFT_291312 protein achieved from FungiDB.

Additional analysis of PHYSODRAFT_291312 using ExPASy Compute pI/Mw tool,

https://web.expasy.org/compute_pi/ (Gasteiger et al., 2005) showed that it is a protein with

the molecular weight of 8,288.62 Da and isoelectric point of 11.34.

PHYSODRAFT_291312 against Protein Databases

BLAST search results from the PHYSODRAFT_291312 amino acid sequence

compared to protein databases using NCBI blastp and PSI-BLAST tools gave the highest

score hit to hypothetical protein PHMEG_00035810 in Phytophthora megakarya, with one amino acid difference.

PHYSODRAFT_291312: Globular or Membrane

Whether the two possible protein candidates are globular or membrane proteins was theoretically judged by predicting the possible existence of transmembrane domains using several different software. Kyte-Doolittle Hydropathy Plot, DAS, HMMTOP, TMHMM, and

TMpred predicted that there is no possibility of having transmembrane domains in

PHYSODRAFT_291312 (Figure 3.6). 118

a 119

c 120

Figure 3.6. Prediction of potential transmembrane domains in

PHYSODRAFT_291312. (a) Prediction using ExPASy ProtScale Kyte-Doolittle Hydropathy

Plot. X-axis represents the positions of amino acid residues, y-axis represents the hydrophobicity score of each amino acid residue. Window size was set to 19. Scores of the amino acid residues predicted to be lower than +1.6, implying the possibility that

PHYSODRAFT_291312 may not have transmembrane domains. (b) Prediction using DAS.

X-axis represents the positions of amino acid residues, y-axis represents the hydrophobicity score of each amino acid residue. The scores of the amino acid residues are lower that the loose hit value (1.7), hinting about the possibility that PHYSODRAFT_291312 may not have transmembrane domains. (c) Prediction using TMHMM. X-axis represents the positions of amino acid residues, y-axis represents the probabilities of the locations of amino acid residues in a protein molecule (interior, surface, transmembrane). The analysis showed that the 121 probability of locating on surface are highest for all the amino acids, so the prediction result is that PHYSODRAFT_291312 may be a water-soluble globular protein. (d) Prediction using

TMpred. X-axis represents the positions of amino acid residues, y-axis represents the TMpred scores of each amino acid residues. The prediction showed that the scores of all the amino acids are lower than 500, implying PHYSODRAFT_291312 may not be a transmembrane protein.

Domain and Protein Family Prediction of PHYSODRAFT_291312

Two hypothetical protein candidates were analyzed for possible domain structures and

protein family memberships for predicting their functions. Pfam, CDART, InterPro, NCBI

CD-Search, and SMART failed to find any domain structure for PHYSODRAFT_291312.

Secondary Structure of PHYSODRAFT_291312

Secondary structure of PHYSODRAFT_291312 was predicted using several different tools (Figure 3.7). Jpred 4, SOPMA, GOR and NetSurfP predicted α-helix structures in

PHYSODRAFT_291312. Though there are differences between the results of each software tool, this implies the possibility that PHYSODRAFT_291312 may have HLH or HTH domains. 122

Figure 3.7. Secondary structure prediction of PHYSODRAFT_291312. (a) Secondary structure predicted by GOR. “c” is random coil; “e” is extended strand; “h” is α-helix. (b)

Secondary structure predicted by Jpred 4. Letter “H” represents helical; “-” represents other

types of structures. (c) Secondary structure predicted by SOPMA. “h” is α-helix; “e” is

extended strand; “c” is random coil; “t” is beta-turn.

Tertiary Structure and Functional Analysis of PHYSODRAFT_291312

(PS)2, ESG, PFP failed to find any significant template for PHYSODRAFT_291312.

Phyre2 analysis (Kelley et al., 2015) indicated that the tertiary structure of

PHYSODRAFT_291312 resembles interleukin-37 with the confidence level of 59.6%

(Figure 3.8). RaptorX predicted the structural model shown in Figure 3.9. 123

Figure 3.8. The structural model of PHYSODRAFT_291312 predicted by Phyre2 web server. The directions of arrows as well as the change of rainbow colors (from blue to red) show the direction of polypeptide: N-terminus to C-terminus. 124

Figure 3.9. Tertiary structure of PHYSODRAFT_291312 predicted by RaptorX. 125

Figure 3.10. SignalP Signal Peptide Prediction of PHYSODRAFT_291312. The positions of amino acid residues are shown on the x-axis, while the y-axis indicates the values of the three scores. The C is the cleavage site score, which is high in the cleavage site. The amino acid residues with lower S-scores may potentially be the part of a mature protein, while the amino acid residues with higher S-scores may potentially be the part of a signal peptide. The Y-score is highest on the sites where C-score is significantly higher and S-score abruptly decreased, implying the potential existence of a cleavage site. 126

Signal Peptide and Subcellular Localization Prediction of PHYSODRAFT_291312

SignalP failed to detect any signal peptide for PHYSODRAFT_291312 (Figure 3.10) while CoSiDe Combined Signal Peptide Predictor predicted a signal peptide whose cleavage site is after the 22nd amino acid residue (Figure 3.11). iPSORT also failed to detect any signal peptide and predicted that the protein may be localize to mitochondria. Phobius predicted

PHYSODRAFT_291312 to be a noncytoplasmic protein (Figure 3.12).

TargetP and CELLO subcellular localization prediction servers predicted that

PHYSODRAFT_291312 is a mitochondrial protein, while WoLF PSORT predicted that the probability of being a nuclear protein is the highest. 127

Figure 3.11. Analysis of PHYSODRAFT_291312 by CoSiDe Combined Signal

Peptide Predictor predicted the best cleavage site at the 23rd amino acid residue. The x-axis represents the positions of amino acid residues, while the y-axis represents scores, meaning the probability of cleavage. 128

Figure 3.12. Subcellular localization prediction of PHYSODRAFT_291312 using

Phobius. Among all the four subcellular localization probabilities, being a non-cytoplasmic protein is the highest as indicated by the blue curve. 129

Protein Features of PHYSODRAFT_356433

Data from FungiDB (http://fungidb.org/fungidb/) indicated that the encoding gene of

PHYSODRAFT_356433 is located on the position between 114601 – 117100 on the reverse strand of P. sojae genomic sequence JH159173. The predicted transcript is 2500 bp long, has no intron, and encoded 101 amino acids. According to FungiDB, PHYSODRAFT_356433 does not have any functional domains nor signal peptide, but has a transmembrane domain

(Figure 3.13). Calculation by ExPASy Compute pI/Mw tool indicated that the molecular weight of PHYSODRAFT_356433 is 11,737.68 Da, and its isoelectric point is 9.64.

Figure 3.13. Features of PHYSODRAFT_356433 protein achieved from FungiDB.

PHYSODRAFT_356433: Globular or Membrane

For PHYSODRAFT_356433, Kyte-Doolittle Hydropathy Plot predicted that there are

no transmembrane domains in PHYSODRAFT_356433. In contrast, DAS, HMMTOP,

TMHMM, and TMpred predicted one transmembrane domain (Figure 3.14). 130

a 131 b

c 132

Figure 3.14. Prediction of potential transmembrane domains in

PHYSODRAFT_356433. (a) Prediction using ExPASy ProtScale Kyte-Doolittle Hydropathy

Plot. X-axis represents the positions of amino acid residues, y-axis represents the hydrophobicity score of each amino acid residue. Window size was set to 19. Scores of the very few amino acid residues predicted to be higher than +1.6, implying the possibility that

PHYSODRAFT_356433 may not have transmembrane domains. (b) Prediction using DAS.

X-axis represents the positions of amino acid residues, y-axis represents the hydrophobicity score of each amino acid residue. The scores of the amino acid residues between positions 19 and 46 are above the loose hit value (1.7), so the prediction is that PHYSODRAFT_356433 may have a transmembrane domain. (c) Prediction using TMHMM. X-axis represents the positions of amino acid residues, y-axis represents the probabilities of the locations of amino 133 acid residues in a protein molecule (interior, surface, transmembrane). The amino acids on the positions 30-52 showed the highest probability of being a transmembrane segment, so

PHYSODRAFT_356433 may have a transmembrane region. (d) Prediction using TMpred.

X-axis represents the positions of amino acid residues, y-axis represents the TMpred scores of each amino acid residues. The prediction showed that the scores of the amino acids on the positions 29-45 are above 500, implying the possible existence of a transmembrane segment.

Analysis of PHYSODRAFT_356433 against Protein Databases

BLAST analysis of PHYSODRAFT_356433 compared to the protein databases using

NCBI blastp and PSI-BLAST tools gave the highest-scored hit - hypothetical protein

PHMEG_00034225 in P. megakarya, with two-amino acid differences. Domain Analysis of

PHYSODRAFT_356433 by Pfam, CDART, InterPro, NCBI CD-Search and SMART

analyses did not show any domains and protein families.

Secondary Structure of PHYSODRAFT_356433 by NetSurfP, JPred 4, GOR, SOPMA

and PORTER predicted that the putative protein contains α-helix structure (Figure 3.15). 134

Figure 3.15. Secondary structure prediction of PHYSODRAFT_356433. (a)

Secondary structure predicted by GOR. “c” is random coil; “e” is extended strand; “h” is

α-helix. (b) Secondary structure predicted by Jpred 4. Letter “H” represents helical; “E”

represents extended, “-” represents other types of structures. (c) Secondary structure

predicted by SOPMA. “h” represents α-helix; “e” represents extended strand; “c” represents

random coil; “t” represents β-turn. 135

Tertiary Structure and Functional Prediction of PHYSODRAFT_356433

Analysis of PHYSODRAFT_356433 by (PS)2, PFP failed to predict a tertiary structure while ESG predicted that PHYSODRAFT_356433 may be a membrane protein involved in transportation. Phyre2 predicted that the tertiary structure of

PHYSODRAFT_356433 is similar to that of influenza B virus nucleoprotein (with 20.7% of confidence), which is a RNA-binding protein (Figure 3.16). RaptorX predicted the tertiary structure of PHYSODRAFT_356433 as shown in Figure 3.17. 136

Figure 3.16. Structure model of PHYSODRAFT_356433 by Phyre2 web server represented in ribbon diagram. The change of rainbow colors (from blue to red) shows the direction of polypeptide: N-terminus to C-terminus. 137

Figure 3.17. Prediction of PHYSODRAFT_356433 tertiary structure by RaptorX. 138

Figure 3.18. SignalP Signal Peptide Prediction showed that PHYSODRAFT_356433

contains no signal peptide because no cleavage site was observed from all the three scores. C

is the cleavage site score, which is high in the cleavage site. The amino acid residues with

lower S-scores may potentially be the part of a mature protein, while the amino acid residues

with higher S-scores may potentially be the part of a signal peptide. The Y-score is highest on the sites where C-score is significantly higher and S-score abruptly decreased, implying the potential existence of a cleavage site. 139

Signal Peptide and Subcellular Localization Analyses of PHYSODRAFT_356433

The programs CoSiDe, iPSORT, SignalP (Figure 3.18) predicted that

PHYSODRAFT_356433 does not have any signal peptide. CELLO predicted that

PHYSODRAFT_356433 may be an extracellular or nuclear protein. TargetP predicted that

PHYSODRAFT_356433 is neither a mitochondrial nor a secretary protein. WoLF PSORT

predicted that PHYSODRAFT_356433 is a nuclear protein. Phobius failed to predict the

subcellular localization of PHYSODRAFT_356433 (Figure 3.19).

Figure 3.19. Prediction of subcellular localization of PHYSODRAFT_356433 by

Phobius. The amino acids between the positions between 27 and 47 showed the highest probability of being a transmembrane region. 140

In conclusion, PHYSODRAFT_291312 appears to be a protein without any transmembrane domains by the majority of prediction models. Some software tools predicted it may have more than one α-helix, which indicates the possibility of having HTH or HLH

domains, which are the characteristic structures in TFs. Signal peptide and subcellular

localization predictors gave various results. Concluding the analysis results,

PHYSODRAFT_291312 may be a water-soluble globular protein, or possibly a TF. Using

four secondary structure prediction tools PHYSODRAFT_291312 appears to contain α-helix structure. Phyre2 predicted that PHYSODRAFT_356433 is similar to a RNA-binding protein, but with a low confidence. Signal peptide and subcellular localization prediction of

PHYSODRAFT_356433 is with a similar situation with PHYSODRAFT_291312.

Concluding these, PHYSODRAFT_356433 may be a TF.

Aim 2) Confirmation of the Co-expression of the Bait and the Potential Prey Protein Genes using Existing Transcriptome Data Available in FungiDB

Co-expression Analysis of Bait and Prey Protein Genes

The expression profiles of the bait gene Ps1365 and the possible interactive prey genes

PHYSODRAFT_291312 and PHYSODRAFT_356433 were available via the FungiDB

database. All the three genes showed the highest transcription levels during the mycelium

stage. The lowest transcription rate of Ps1365 occurred during the infection stage, while the

lowest transcription of PHYSODRAFT_291312 and PHYSODRAFT_356433 happened

during the cyst stage. However, during the stages of mycelium, cyst, and infection, the

transcription levels of those potential prey protein genes are significantly higher than that of 141

Ps1365 (Figure 3.20). Except for these differences, the transcription levels of those three protein genes showed a similar expression pattern during the mycelium and cyst stages. 142

Figure 3.20. Transcription levels of three P. sojae protein genes: Ps1365

(PHYSODRAFT_342624), PHYSODRAFT_291312 and PHYSODRAFT_356433 during the three developmental stages of P. sojae (by Tyler 2014, FungiDB: fungidb.org). These transcriptome data were retrieved from FungiDB. In all the three stages, the transcription 143 levels of two prey proteins (PHYSODRAFT_291312 and PHYSODRAFT_356433) are significantly higher than that of the bait protein (PHYSODRAFT_342624). Except in the infection stage, the expression patterns of the three proteins are basically the same in two other developmental stages.

Discussion

In this research, two potential interactive protein partners for the bait protein, Ps1365,

were identified through yeast two-hybrid screening and subsequent bioinformatic analyses.

Yeast two-hybrid screening has showed its feasibility in many previous studies. For example,

yeast two-hybrid assay has been used successfully to identify the interactive proteins of

parasitic Toxoplasma gondii protein SAG2 in human cells (Lai & Lau, 2017). The authors

used the same system (Matchmaker Gold Yeast Two-Hybrid System) as used in this study.

SAG2 was used as bait and human cDNA library was used as prey. Eighteen clones were

initially identified as harboring potential interactive partners of SAG2, and by sequencing

there were thirteen candidate preys. Validation of these results was conducted using small-scale, one-to-one Y2H with both SAG2-inserted and empty bait vectors, and the results showed that only one prey is the true interactive partner of SAG2. They identified the protein as a human zinc-finger protein (HZF). Their further examination of the interaction using

β-galactosidase and coimmunoprecipitation assays showed that HZF is the true interactor protein of SAG2. A second experiment by Xin et al. (2017) also used the Matchmaker Gold

Yeast Two-Hybrid System to screen for the interactive proteins of the bovine muscle protein

CMYA1. For bait, in addition to using the full-length CMYA1, they also used only the

Xin-repeats (16 aa) as bait in their Y2H assay. Twenty-seven putative proteins were 144 identified as interacting proteins, but some of them only interact with CMYA1, while some others only with Xin-repeats. Only three proteins interact with both CMYA1 and Xin-repeats.

The one-to-one Y2H assays were used to verify the interactions of these three proteins with

both CMYA1 and Xin-repeats, and the results confirmed their interactions. Two universal

ribosomal proteins that are highly abundant were also identified. By these results the authors

speculated that the Xin-repeats may play a role in translation via the interactions between

ribosome and cytoskeleton.

The combination of Y2H and bioinformatic analysis showed that two putative protein

candidates, PHYSODRAFT_291312 and PHYSODRAFT_356433 are the potential

interactive partners of Ps1365. PHYSODRAFT_291312 was predicted not to contain any

transmembrane domain and may be a globular protein with one or two α-helixes, which

implies the possibility of being a transcription factor. The molecular weight of the

PHYSODRAFT_291312 protein is 8,288 Da which is lower than 40 kDa – the upper limit of

molecules to freely pass through nuclear envelope (Zanta et al., 1999; Kalderon et al., 1984).

Thus, the PHYSODRAFT_291312 protein may be able to freely enter the nucleus without

having a nuclear localization signal. These predictions partly support our hypothesis that

PHYSODRAFT_291312 is a transcription factor or at least a nuclear protein that interacts

with Ps1365. Signal peptide and subcellular localization prediction tools provided varies

results, and this may be because the current prediction tools have their limitations in

analyzing the localization of proteins (Min, 2010). Co-expression analysis of Ps1365 and

PHYSODRAFT_291312 using existing transcriptome data available in FungiDB showed that

the two proteins are co-expressed in mycelium and during infection. Although the expression 145 levels of Ps1365 and PHYSODRAFT_291312 are different, this may be explained by the possibility that two interactive proteins do not have to be in an exact one-to-one ratio, even in any fixed ratio for interacting with each other. It is also possible that they may also have different half-lives, and/or interact with several other proteins. In addition, the different expression of the two proteins can also be explained by the two possibilities:

PHYSODRAFT_291312 protein may be stored as a reserve for future interaction with

Ps1365 or PHYSODRAFT_291312 may perform functions other than interacting with

Ps1365. Taken together, these results support the possibility that these two proteins form a complex to perform a certain function together.

Secondary structure prediction showed that PHYSODRAFT_356433 may have more than one α-helix, and this implies the possible existence of HTH and HLH domains, which are the characteristic structures of transcription factors, at least nucleic acid-binding proteins

(Jones, 2004; Brennan & Matthews, 1989). Results from Phyre2 analysis showed that

PHYSODRAFT_356433 may be an RNA-binding protein, but with a low confidence level of

20.7%. Together, our data supports the assumption that PHYSODRAFT_356433 may be a nucleic acid-binding protein, which is the necessary attribute of transcription factors. The molecular weight of the deduced protein of PHYSODRAFT_356433 is 8288 Da which is lower than 40 kDa, thus it is able to freely enter nucleus without NLS to interact with other proteins including Ps1365 (Zanta et al., 1999; Kalderon et al., 1984). Existing transcriptome data available in FungiDB showed the similarity of the expression patterns of Ps1365 and

PHYSODRAFT_356433 in mycelium and cyst which supports the hypothesis that Ps1365 interacts with PHYSODRAFT_356433. 146

Validation of the interactions of Ps1365 with PHYSODRAFT_291312, and with

PHYSODRAFT_356433 requires at least an individual one-to-one Y2H assay of each pair.

Other methods that could be used to confirm protein-protein interactions in P. sojae include

co-immunoprecipitation (co-IP), pull-down assays and crosslinking analysis.

Two ribosomal RNAs were also identified as the interactors of Ps1365. The presence

of ribosomal RNAs in the Ps1365 interaction could be explained as follows: Firstly, it was

due to the innate weakness of Y2H: false positives (Vidalain, Boxem, Ge, Li & Vidal, 2004);

Secondly, the hybrid yeast cells may have high expression of the plasmids which contain

ribosomal RNA genes, thus allowing rRNA to be included as the “interactive protein”. Yeast two-hybrid assay is known to produce some false positives (Lai & Lau, 2017; Xin et al, 2017).

In some systems, Y2H had failed to show genuine interactions between proteins. For example, in the work of Strausak et al. (2003), Y2H did not show the interaction between Atox1 and

MBS5/6, while the surface plasmon resonance (SPR) analysis successfully showed the interaction. They attributed this weakness of Y2H to its nature as an indirect method to measure protein-protein interactions because the interaction is dependent on the formation of active transcription factors in yeast. Therefore, Y2H assay may fail to detect many interactions or yield false positive results. According to the authors, factors such as salt concentrations and pH may interfere with the formation of an intact transcription factor and thus result in false positives and/or false negatives. Therefore, they preferred a real-time surface plasmon resonance (SPR) method. In addition, Chua et al. (2012) failed to show the interaction between PfAha1 and PfHsp90 by Y2H. They also mentioned that Y2H is not able to indicate the interactions between PfHsp90 and many putative co-chaperones (Chua, Low, 147

Lehming, & Sim, 2012). These examples of the previous studies showed that Y2H is prone to false negatives. This may explain why in this study the number of achieved candidate proteins is so few.

As to the possibility that some potential interactive proteins of Ps1365 may have escaped from detection during this Y2H analysis, the P. sojae protein library may be screened using methods like protein probing, or phage display (Phizicky and Fields, 1995). Possible escaped weak and transient interactions may be analyzed by crosslinking protein interaction

analysis as well as label transfer protein interaction analysis (Golemis, 2002; Phizicky and

Fields, 1995).

Possible functions of the bait protein (Ps1365) and the two potential prey proteins may

be deduced from their transcription patterns. The transcriptome data from FungiDB indicated

that the highest transcriptions of the three genes occur during mycelium (mature) stage, and

this pattern implies the possibility that the three proteins may be essential for the vegetative

development as well as the invasion of hypha into host tissues, even cells. In other words, the

three proteins may regulate the expression of proteins perform non-reproductive growth and developmental functions. Secondary structure predictions of the three proteins hinted about the possible existence of HLH or HTH domains. Because majority of the known HLH and

HTH proteins do functions not directly related to reproductive cell formation (Jones, 2004;

Norton, 2000; Rosinski & Atchley, 1999), the predicted secondary structure of the three proteins are in consistent with their presumptive functions.

The findings of this research may have potential practical values in controlling and combating the deadly soybean pathogen, P. sojae. Because transcription is one of the most 148 important processes for all the living organisms on Earth, and transcription factors are the key players of eukaryotic transcriptional regulation, controlling a target species could be attained by controlling its transcription factors. Bao et al. (2019) found that symptoms on infected potatoes like stunting are resulted from the RNA-directed silencing of the expression of the

potato transcription factor, StTCP23 by the pathogen Potato spindle tuber viroid (PSTVd),

and their findings established that control on transcription factors can alter, even manipulate

the target organism. In this study, three potential transcription factors from P. sojae were

theoretically predicted. The three potential transcription factors may act as biological targets

for future studies on controlling P. sojae, and those studies may involve methods including

Host-Induced Gene Silencing (HIGS) (Baulcombe, 2015; Goulin et al., 2019; Qi et al., 2019),

small molecule-inhibition (Berg, 2008; Fontaine et al., 2017) to manipulate the three potential

P. sojae TFs to destroy P. sojae, this disastrous and stubborn pathogen.

In conclusion, in this study, we achieved two candidate prey proteins as well as two

rRNAs by Y2H. Initial analysis indicated that Ps1365 may interact with the two candidate

prey proteins PHYSODRAFT_291312 and PHYSODRAFT_356433. One-to-one Y2H assay as well as co-immunoprecipitation or pull-down assay and western blot could be applied to confirm these interactions.

References

Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., & Lipman,

D. J. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database

search programs. Nucleic Acids Research, 25(17), 3389-3402.

doi:10.1093/nar/25.17.3389 149

Altschul, S. F., Wootton, J. C., Gertz, E. M., Agarwala, R., Morgulis, A., Schäffer, A. A., &

Yu, Y. (2005). Protein database searches using compositionally adjusted substitution

matrices. The FEBS Journal, 272(20), 5101-5109.

doi:10.1111/j.1742-4658.2005.04945.x

Bannai, H., Tamada, Y., Maruyama, O., Nakai, K., & Miyano, S. (2002). Extensive feature

detection of N-terminal protein sorting signals. Bioinformatics, 18(2), 298-305.

doi:10.1093/bioinformatics/18.2.298

Bannai, H., Tamada, Y., Maruyama, O., Nakai, K., & Miyano, S. (2001). Views:

Fundamental building blocks in the process of knowledge discovery. Paper presented

at the Fourteenth International Florida Artificial Intelligence Research Society

Conference,

Bao, S., Owens, R. A., Sun, Q., Song, H., Liu, Y., Eamens, A. L., . . . Zhang, R. (2019).

Silencing of transcription factor encoding gene StTCP23 by small RNAs derived from

the virulence modulating region of potato spindle tuber viroid is associated with

symptom development in potato. PLoS Pathogens, 15(12), e1008110.

doi:10.1371/journal.ppat.1008110

Baulcombe, D. C. (2015). VIGS, HIGS and FIGS: Small RNA silencing in the interactions of

viruses or filamentous organisms with their plant hosts. Current Opinion in Plant

Biology, 26, 141-146. doi:10.1016/j.pbi.2015.06.007

Berg, T. (2008). Inhibition of transcription factors with small organic molecules. Current

Opinion in Chemical Biology, 12(4), 464-471. doi:10.1016/j.cbpa.2008.07.023 150

Brennan, R. G., & Matthews, B. W. (1989). The helix-turn-helix DNA binding motif. The

Journal of Biological Chemistry, 264(4), 1903-1906.

Chitale, M., Hawkins, T., Park, C., & Kihara, D. (2009). ESG: Extended similarity group

method for automated protein function prediction. Bioinformatics, 25(14), 1739-1745.

doi:10.1093/bioinformatics/btp309

Chua, C. S., Low, H., Lehming, N., & Sim, T. S. (2012). Molecular analysis of Plasmodium

falciparum co-chaperone Aha1 supports its interaction with and regulation of Hsp90 in

the malaria parasite. International Journal of Biochemistry and Cell Biology, 44(1),

233-245. doi:10.1016/j.biocel.2011.10.021

Cserzö, M., Wallin, E., Simon, I., von Heijne, G., & Elofsson, A. (1997). Prediction of

transmembrane alpha-helices in prokaryotic membrane proteins: The dense alignment

surface method. Protein Engineering, 10(6), 673-676. doi:10.1093/protein/10.6.673

Drozdetskiy, A., Cole, C., Procter, J., & Barton, G. J. (2015). JPred4: A protein secondary

structure prediction server. Nucleic Acids Research, 43(W1), W389-W394.

doi:10.1093/nar/gkv332

Emanuelsson, O., Nielsen, H., Brunak, S., & von Heijne, G. (2000). Predicting subcellular

localization of proteins based on their N-terminal amino acid sequence. Journal of

Molecular Biology, 300(4), 1005-1016. doi:10.1006/jmbi.2000.3903

Finn, R. D., Coggill, P., Eberhardt, R. Y., Eddy, S. R., Mistry, J., Mitchell, A. L., . . .

Bateman, A. (2016). The Pfam protein families database: Towards a more sustainable

future. Nucleic Acids Research, 44(D1), D279-D285. doi:10.1093/nar/gkv1344 151

Fontaine, F., Overman, J., Moustaqil, M., Mamidyala, S., Salim, A., Narasimhan, K., . . .

Francois, M. (2017). Small-molecule inhibitors of the SOX18 transcription factor. Cell

Chemical Biology, 24(3), 346-359. doi:10.1016/j.chembiol.2017.01.003

Frank, K. (2013). Sequence and structure searches for biological molecules - development

and applications of bioinformatics methods in molecule biology (doctoral dissertation).

University of Salzburg, Salzburg, Austria

Gabor E. Tusnady, & Istvan Simon. (2001). The HMMTOP transmembrane topology

prediction server. Bioinformatics, 17(9), 849-850.

doi:10.1093/bioinformatics/17.9.849

Garnier, J., Gibrat, J. F., & Robson, B. (1996). GOR method for predicting protein secondary

structure from amino acid sequence. Methods in Enzymology, 266, 540-553.

Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M., Appel, R., & Bairoch, A.

(2005). Protein identification and analysis tools on the ExPASy server. The

proteomics protocols handbook (pp. 571-607). Totowa, NJ: Humana Press. doi:571

Geer, L. Y., Domrachev, M., Lipman, D. J., & Bryant, S. H. (2002). CDART: Protein

homology by domain architecture. Genome Research, 12(10), 1619-1623.

doi:10.1101/gr.278202

George, R. A., Lin, K., & Heringa, J. (2005). Scooby-domain: Prediction of globular domains

in protein sequence. Nucleic Acids Research, 33(Web Server issue), W160-W163.

doi:10.1093/nar/gki381 152

Geourjon, C., & Deléage, G. (1995). SOPMA: Significant improvements in protein

secondary structure prediction by consensus prediction from multiple alignments.

Computer Applications in the Biosciences : CABIOS, 11(6), 681-684.

Gianluca Pollastri, & Aoife McLysaght. (2005). PORTER: A new, accurate server for protein

secondary structure prediction. Bioinformatics, 21(8), 1719-1720.

doi:10.1093/bioinformatics/bti203

Golemis, E. (2002). Protein‐Protein interactions: A molecular cloning manual. Cold Spring

Harbor (New York): Cold Spring Harbor Laboratory Press.

Goulin, E. H., Galdeano, D. M., Granato, L. M., Matsumura, E. E., Dalio, R. J. D., &

Machado, M. A. (2019). RNA interference and CRISPR: Promising approaches to

better understand and control citrus pathogens. Microbiological Research, 226, 1-9.

doi:10.1016/j.micres.2019.03.006

Hawkins, T., Chitale, M., Luban, S., & Kihara, D. (2009). PFP: Automated prediction of gene

ontology functional annotations with confidence scores using protein sequence data.

Proteins, 74(3), 566-582. doi:10.1002/prot.22172

Hofmann, K., & Stoffel, W. (1993). TMBASE - A database of membrane spanning protein

segments [Abstract]. Biol. Chem. Hoppe-Seyler 374,166

Horton, P., Park, K., Obayashi, T., Fujita, N., Harada, H., Adams-Collier, C. J., & Nakai, K.

(2007). WoLF PSORT: Protein localization predictor. Nucleic Acids Research,

35(Web Server issue), W585-W587. doi:10.1093/nar/gkm259 153

Howe, K. L., Contreras-Moreira, B., De Silva, N., Maslen, G., Akanni, W., Allen, J., . . .

Flicek, P. (2020). Ensembl Genomes 2020 - enabling non-vertebrate genomic

research. Nucleic Acids Research, 48(D1), D689-D695. doi:10.1093/nar/gkz890

Huang, T., Hwang, J., Chen, C., Chu, C., Lee, C., & Chen, C. (2015). (PS)2: Protein structure

prediction server version 3.0. Nucleic Acids Research, 43(W1), W338-W342.

doi:10.1093/nar/gkv454

Jones, P., Binns, D., Chang, H., Fraser, M., Li, W., McAnulla, C., . . . Hunter, S. (2014).

InterProScan 5: Genome-scale protein function classification. Bioinformatics, 30(9),

1236-1240. doi:10.1093/bioinformatics/btu031

Jones, S. (2004). An overview of the basic helix-loop-helix proteins. Genome Biology, 5(6),

226.

Kalderon, D., Roberts, B. L., Richardson, W. D., & Smith, A. E. (1984). A short amino acid

sequence able to specify nuclear location. Cell, 39(3), 499-509.

doi:10.1016/0092-8674(84)90457-4

Källberg, M., Wang, H., Wang, S., Peng, J., Wang, Z., Lu, H., & Xu, J. (2012).

Template-based protein structure modeling using the RaptorX web server. Nature

Protocols, 7(8), 1511-1522. doi:10.1038/nprot.2012.085

Kaur, I., Kocher, G., & Gupta, V. (2012). Molecular cloning and nucleotide sequence of the

gene for an alkaline protease from Bacillus circulans MTCC 7906. Indian Journal of

Microbiology, 52(4), 630-637. doi:10.1007/s12088-012-0297-4 154

Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N., & Sternberg, M. J. E. (2015). The

Phyre2 web portal for protein modeling, prediction and analysis. Nature Protocols,

10(6), 845-858. doi:10.1038/nprot.2015.053

Kersey, P. J., Allen, J. E., Allot, A., Barba, M., Boddu, S., Bolt, B. J., . . . Yates, A. (2018).

Ensembl Genomes 2018: An integrated omics infrastructure for non-vertebrate species.

Nucleic Acids Research, 46(D1), D802-D808. doi:10.1093/nar/gkx1011

Krogh, A., Larsson, B., von Heijne, G., & Sonnhammer, E. L. L. (2001). Predicting

transmembrane protein topology with a hidden markov model: Application to

complete genomes. Journal of Molecular Biology, 305(3), 567-580.

doi:10.1006/jmbi.2000.4315

Kyte, J., & Doolittle, R. F. (1982). A simple method for displaying the hydropathic character

of a protein. Journal of Molecular Biology, 157(1), 105-132.

doi:10.1016/0022-2836(82)90515-0

Lai, M., & Lau, Y. (2017). Screening and identification of host proteins interacting with

Toxoplasma gondii SAG2 by yeast two-hybrid assay. Parasites & Vectors, 10(1),

456-458. doi:10.1186/s13071-017-2387-y

Letunic, I., & Bork, P. (2018). 20 years of the SMART protein domain annotation resource.

Nucleic Acids Research, 46(D1), D493-D496. doi:10.1093/nar/gkx922

Marchler-Bauer, A., & Bryant, S. H. (2004). CD-search: Protein domain annotations on the

fly. Nucleic Acids Research, 32(Web Server issue), W327-W331.

doi:10.1093/nar/gkh454 155

Marchler-Bauer, A., Bo, Y., Han, L., He, J., Lanczycki, C. J., Lu, S., . . . Bryant, S. H. (2017).

CDD/SPARCLE: Functional classification of proteins via subfamily domain

architectures. Nucleic Acids Research, 45(D1), D200-D203. doi:10.1093/nar/gkw1129

Marchler-Bauer, A., Derbyshire, M. K., Gonzales, N. R., Lu, S., Chitsaz, F., Geer, L. Y., . . .

Bryant, S. H. (2015). CDD: NCBI's conserved domain database. Nucleic Acids

Research, 43(Database issue), D222-D226. doi:10.1093/nar/gku1221

Marchler-Bauer, A., Lu, S., Anderson, J. B., Chitsaz, F., Derbyshire, M. K., DeWeese-Scott,

C., . . . Bryant, S. H. (2011). CDD: A conserved domain database for the functional

annotation of proteins. Nucleic Acids Research, 39 (Database issue), D225-D229.

doi:10.1093/nar/gkq1189

Min, X. J. (2010). Evaluation of computational methods for secreted protein prediction in

different eukaryotes. Journal of Proteomics & Bioinformatics, 3(4), 143-147.

doi:10.4172/jpb.1000133

Mitchell, A. L., Attwood, T. K., Babbitt, P. C., Blum, M., Bork, P., Bridge, A., . . . Finn, R. D.

(2019). InterPro in 2019: Improving coverage, classification and access to protein

sequence annotations. Nucleic Acids Research, 47(D1), D351-D360.

doi:10.1093/nar/gky1100

Norton, J. D. (2000). ID helix-loop-helix proteins in cell growth, differentiation and

tumorigenesis. Journal of Cell Science, 113 (Pt 22), 3897-3905

Pang, C. N. I., Lin, K., Wouters, M. A., Heringa, J., & George, R. A. (2008). Identifying

foldable regions in protein sequence from the hydrophobic signal. Nucleic Acids

Research, 36(2), 578-588. doi:10.1093/nar/gkm1070 156

Peng, J., & Xu, J. (2011). Raptorx: Exploiting structure information for protein alignment by

statistical inference. Proteins: Structure, Function, and Bioinformatics, 79(S10),

161-171. doi:10.1002/prot.23175

Petersen, B., Lundegaard, C., & Petersen, T. N. (2010). NetTurnP – neural network

prediction of beta-turns by use of evolutionary information and predicted protein

sequence features. PLoS One, 5(11), e15079. doi:10.1371/journal.pone.0015079

Petersen, B., Petersen, T. N., Andersen, P., Nielsen, M., & Lundegaard, C. (2009). A generic

method for assignment of reliability scores applied to solvent accessibility predictions.

BMC Structural Biology, 9(1), 51. doi:10.1186/1472-6807-9-51

Petersen, T. N., Brunak, S., von Heijne, G., & Nielsen, H. (2011). SignalP 4.0:

Discriminating signal peptides from transmembrane regions. Nature Methods, 8(10),

785-786. doi:10.1038/nmeth.1701

Phizicky, E. M., & Fields, S. (1995). Protein-protein interactions: Methods for detection and

analysis. Microbiological Reviews, 59(1), 94-123.

Pollastri, G., & McLysaght, A. (2005). PORTER: A new, accurate server for protein

secondary structure prediction. Bioinformatics (Oxford, England), 21(8), 1719-1720.

doi:10.1093/bioinformatics/bti203

Qi, T., Guo, J., Peng, H., Liu, P., Kang, Z., & Guo, J. (2019). Host-induced gene silencing: A

powerful strategy to control diseases of wheat and barley. International Journal of

Molecular Sciences, 20(1), 206. doi:10.3390/ijms20010206

Rosinski, J. A., & Atchley, W. R. (1999). Molecular evolution of helix–turn–helix proteins.

Journal of Molecular Evolution, 49(3), 301-309. doi:10.1007/PL00006552 157

Rutter, B. D. (2012). Catch of the day: a yeast one-hybrid assay identifies a novel

DNA-binding domain in Phytophthora sojae (Master's thesis, Bowling Green State

University)

Schultz, J., Milpetz, F., Bork, P., & Ponting, C. P. (1998). SMART, a simple modular

architecture research tool: Identification of signaling domains. Proceedings of the

National Academy of Sciences of the United States of America, 95(11), 5857-5864.

doi:10.1073/pnas.95.11.5857

Sen, T. Z., Jernigan, R. L., Garnier, J., & Kloczkowski, A. (2005). GOR V server for protein

secondary structure prediction. Bioinformatics (Oxford, England), 21(11), 2787-2788.

doi:10.1093/bioinformatics/bti408

Sonnhammer, E. L., von Heijne, G., & Krogh, A. (1998). A hidden markov model for

predicting transmembrane helices in protein sequences. Proceedings. International

Conference on Intelligent Systems for Molecular Biology, 6, 175-182.

Stajich, J. E., Harris, T., Brunk, B. P., Brestelli, J., Fischer, S., Harb, O. S., . . . Roos, D. S.

(2012). FungiDB: An integrated functional genomics database for fungi. Nucleic Acids

Research, 40(D1), D675-D681. doi:10.1093/nar/gkr918

Strausak, D., Howie, M. K., Firth, S. D., Schlicksupp, A., Pipkorn, R., Multhaup, G., &

Mercer, J. F. B. (2003). Kinetic analysis of the interaction of the copper chaperone

Atox1 with the metal binding sites of the menkes protein. Journal of Biological

Chemistry, 278(23), 20821-20827. doi:10.1074/jbc.M212437200

Tian, M., Huitema, E., Cunha, L. d., Torto-Alalibo, T., & Kamoun, S. (2004). A kazal-like

extracellular serine protease inhibitor from Phytophthora infestans targets the tomato 158

pathogenesis-related protease P69B. Journal of Biological Chemistry, 279(25),

26370-26377. doi:10.1074/jbc.M400941200

Tusnády, G. E., & Simon, I. (1998). Principles governing amino acid composition of integral

membrane proteins: Application to topology prediction. Journal of Molecular Biology,

283(2), 489-506. doi:10.1006/jmbi.1998.2107

Tusnády, G. E., & Simon, I. (2001). The HMMTOP transmembrane topology prediction

server. Bioinformatics (Oxford, England), 17(9), 849-850.

doi:10.1093/bioinformatics/17.9.849

Tyler, B. M., Tripathy, S., Zhang, X., Dehal, P., Jiang, R. H. Y., Aerts, A., . . . Boore, J. L.

(2006). Phytophthora genome sequences uncover evolutionary origins and

mechanisms of pathogenesis. Science, 313(5791), 1261-1266.

doi:10.1126/science.1128796

Vidalain, P., Boxem, M., Ge, H., Li, S., & Vidal, M. (2004). Increasing specificity in

high-throughput yeast two-hybrid experiments. Methods, 32(4), 363-370.

doi:10.1016/j.ymeth.2003.10.001

Xin, X., Wang, T., Liu, X., Sui, G., Jin, C., Yue, Y., . . . Guo, H. (2017). A yeast two-hybrid

assay reveals CMYA1 interacting proteins. Comptes Rendus - Biologies, 340(6-7),

314-323. doi:10.1016/j.crvi.2017.06.003

Ye, W., Wang, X., Tao, K., Lu, Y., Dai, T., Dong, S., . . . Wang, Y. (2011). Digital gene

expression profiling of the Phytophthora sojae transcriptome. Molecular

Plant-Microbe Interactions : MPMI, 24(12), 1530-1539.

doi:10.1094/MPMI-05-11-0106 159

Yu, C., Chen, Y., Lu, C., & Hwang, J. (2006). Prediction of protein subcellular localization.

Proteins: Structure, Function, and Bioinformatics, 64(3), 643-651.

doi:10.1002/prot.21018

Yu, C., Lin, C., & Hwang, J. (2004). Predicting subcellular localization of proteins for

gram‐negative bacteria by support vector machines based on n‐peptide

compositions. Protein Science, 13(5), 1402-1406. doi:10.1110/ps.03479604

Zanta, M. A., Belguise-Valladier, P., & Behr, J. (1999). Gene delivery: A single nuclear

localization signal peptide is sufficient to carry DNA to the cell nucleus. Proceedings

of the National Academy of Sciences of the United States of America, 96(1), 91-96.

doi:10.1073/pnas.96.1.91

Zhang, Z., Schwartz, S., Wagner, L., & Miller, W. (2000). A greedy algorithm for aligning

DNA sequences. Journal of Computational Biology: A Journal of Computational

Molecular Cell Biology, 7(1-2), 203-214 160 APPENDIX A. CONSENSUS SEQUENCES OF THE FOUR PREY CONTIGS OBTAINED FROM SEQUENCHER Contig[0001] CTCAAAGATTAAGCCATGCATGTCTAAGTATAAACACTTTTGTACTGTGA AACTGCGAATGGCTCATTATATCAGTTATAGTCTACTCGATAGTACCTTA CTACTTGGATACCCGTAGTAATTCTAGAGCTAATACATGCATAAATACCC AACTGCTTGTCGGGCGGGTAGCATTTATTAGATTGAAACCAATGCAGTCT TCGGGCTGGTATTGTGTTGAGTCATAATAACTGTGCGGATCGCGCTTTTG CGCGATAAATCGATTGAGTTTCTGCCCTATCAGCTTTGGATGGTAGGATA TGGGCCTACCATGGCATTAACGGGTAACGGGGAATTAGGGTTTGATTCCG GAGAGGGAGCCTTAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGTAA ATTACCCAATCCTGACACAGGGAGGTAGTGACAATAAATAACAATG CTCTGGCTCTTCGAGTCGGGCAATTGGAATGAGAACAATTTAAATCCCTT AACGAGGATCAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTC CAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTT GGATTTCTGTTTTGGATGTCCGGTCCGCTCCCTCTGGGAGTGCGTACTTA TGGATGTTCGAGGCATTTTT:TGTGAGGCTGCCTTTCTGCCATTAAGTTG GTGGGTTGGTGGGCTTGCATCGTTTACTGTGAAA:A:AATTAGAGTGTTT AAAGCAGGCGTTTGCTCATTTGAATACATTAGCATGGAATAATAAGATAC GGCCTTGGTGGTCTATTTTGTTGGTTTGCACACCAGGGTAATGATTAATA GGGACAGTTGGGGGTATTCATATTTCAGCGTCAGAGGTGAAATTCTTGGA TCGCTGAAAGATGAGCTTAGGCGAAAGCATTTACCAAGGATGTTTTCATT AATCAAAAAAAAAAAA

Contig[0003] GTAAGCTCGTCTGGCGATGCGGGATGAACCGAACGCAAAGTTAAAGTGCC AAAATGCCCGCTCACCTAGATACCATAAAAGGTGTTGGTTCATTTAGACA GCAGGACGGTGGCCATGGAAGTCGGAATCCGCTAAGGAGTGTGTAACAAC TCACCTGCCGAATGAACTAGCCCTGAAAATGGATGGCGCTAAAGCGGGCT ACTTATACTTTGCCGTTAGCGCGATGGCATATGTAATGTACGCGTTAATG AGTAGGAGGGCGTGAGGGTCGTGAAGAAGCTCGTGGCGTGAGCCTGAGTGAAAC GGCCTTTAGTGCAGATCTTGGTGGTAGTAGCAAATATTCAAATGCG AACTTTGAAGACTGAAGTGGAGAAAGGTTCCGTGTGATCAGTAATTGGAC ACGGGTGAGTCGATCCTAAGAGATAGGATAATTCCGTTTGAAAGCTGGGC TGTTTGGGGCGTGGGGTTCGTCCTCGCGTTTTGTATGCAGCCTCGTGTAT CGAAAGGGAATCAGGTTAATATTCCTGAACCGAGACGTGGATAATGTGTG GCAACACAACCGACCCAGGAGACGCTAGCGTGTGCCCAGGGAAGAGTTCT CTTTTCTTTTTAACAAGCATGTCCTGCCTTGGAAACGGATTATCCGGCGA TAAGGAGGACGCCTTGGCAAAGCACTTTACTTTTTGAAGTGTCCTGTGCG CCCACGACAGCCCTTGAAAATCCTGG:CTAGAGATTTATTCGCACCCTCG TTCGTACTCATAACCGCATCAGGTCTCCAAGGTTAGCAGCCTCTAGTTGA TAGAAGAATGTAGATAA:GGGAA:GT:CGGCAAAATAGATCCGTAACTTC GGAAAAAAAAAAAAAAA 161

Contig[0007] GAGGAAAAGAAACTAACAAGGATTCCCCTAGTAACGGCGAGTGAAGCGGG AAGAGCTCAAGCTTAAAATCTCCGTGCAAGTTTTGCGCGGCGAATTGTAG TCTATAGAGGCGTGGTCAGCGTGGGCGCTTGGGGCAAGTTCCTTGGAGGA GGACAGCATGGAGGGTGATACTCCCGTTCATCCCTGAGTGGCTCGTGCGT ACGACCCGTGTTCTTTGAGTCGCGTTGTTTGGGAATGCAGCGCAAAGTAG GTGGTAAATTCCATCTAAAGCTAAATATTGGTGCGAGACCGATAGCGAAC AAGTACCGTGAGGGAAAGATGAAAAGAACTTTAAAAAAAAAAAA

Contig[0074] GACGGTGTTGACACAATGTGATTTCTGCCCAGTGCTCTGAATGTCAAAGT GAAGAGATTCAACCAAGCGCGGGTAAACGGCGGGAGTAACTATGACTCTC TTAAGGTAGCCAAATGCCTCGTCATCTAACTGTGACGCGCATGAATGGAT TAATGAGATTCCCACTGTCCCTATCTACCGTCTAGCGAACCCACAGCCAA GGGAACGGGCTTGGAATAATCAGCGGGGAAAGAAGACCCTGTTGAGCTTG ACTCTAGTCTGACGTTGTGAAATGACTTGGGAGGTGTAGAATAAGTGGGA GGCTCTGCCGTCCGTGAAATACCACTACTCTCAATGTTGTTTTACTTATA CCATGATGTAGCATCTGGGTGGCCTCGCGGTCGCTCAGTTCTAGTTGTAA TCTCGGGCTTTTGTCTGGGTGAGCTACGTGGAAGACATCGTCAGGTGGGG AGTTTGGCTGGGGCGGCACATCTGTTAAATGATAACACAGGTGTCCTAAG GTGAGCTCAATGAGAACAGAAATCTCATGTAGAACAAAAGGGTAAAAGCT CACTTGATTTTGATTTTCAGTATGAATACAAACCGTGAAAGCGTGGCCTA TCGATCCTTTAGTTCTTTAGAATTTTAAGCTAGAGGTGTCAGAAAAGTTA CCACAGGGATAACTGGCTTGTGGCAGCCAAGCGTCCATAGCGACGTTGCT TTTTGATTCTTCGATGTCGGCTCTTCCTATCATTGCGAAGTAGAACTCGC CAATTGTTGGATTGTTCACCCACTAATAGGGAACGTGAGCTGGGTTTAGA CCGTCGTGAGACAGGTTAGTTTTACCCTACTGATGAGTTCGTTGTCTAAA CAGTAATCCAACCCAGTACGAGAGGAACCGTTGGTTCAGATAATTGGTAA CTGCGGTTAGCTGAAAAGCTAGTGCCGCCAAGCTACCATCTGTAGGATTA TGGCTGAAC:CCTCTAAGTCAGAATCCATGCTGGAATAGACGATATCACC TTTCCGA:TGTGCCGCGAATAGCGATAG:GTGTCTTTCTGGGACCCANCA TCATA:CAA:TGCAACGCACTCCGCTGCC:GTCAGTG:CTGGCG:NGAA: TTAGGCTG:AC:TT:GTAATTYCAAAAATA:TGTGGGGAANN