Functional Genomic Screening of Nematocida parisii host-exposed proteins

By

Eashwar Mohan

A thesis submitted in conformity with the requirements

for the degree of Master of Science

Department of Molecular Genetics

University of Toronto

© Copyright by Eashwar Mohan 2021 Functional Genomic Screening of Nematocida parisii host-exposed proteins

Eashwar Mohan

Master of Science

Department of Molecular Genetics

University of Toronto

2021

Abstract

Microsporidia are a divergent group of obligate, intracellular pathogens that are relatively poorly understood. The intestinal infecting Nematocida parisii, has been shown to secrete a diverse arsenal of proteins to their cells upon infection, labelled “host-exposed” proteins. These secreted proteins may be serving roles as effector proteins, and the model organism Saccharomyces cerevisiae has proven to be a reliable system for the study of bacterial effectors. This uses yeast to study microsporidian host-exposed proteins. I have generated a pipeline to clone N. parisii genes into yeast expression vectors and demonstrated reliability by generating a gene set of 97 N. parisii host-exposed genes along with additional controls. Screening this gene collection identified 23 toxic genes, demonstrating that this strategy may help further understanding of microsporidian infection biology by identifying novel microsporidian effector proteins and learning more about the rules of these proteins and their expression in yeast.

ii

Table of Contents

ABSTRACT……………………………………………………………………………………...ii TABLE OF CONTENTS……………………………………………………………………….iii LIST OF TABLES……………………………………………………………………………….v LIST OF FIGURES……………………………………………………………………………..vi LIST OF ABBREVATIONS………………………………………………………………...…vii 1. INTRODUCTION 1.1. ……………………………………………………………………………1 1.1.1. Microsporidia History, Evolution and Host Range………………………………..1 1.1.2. Microsporidia Genetic and Cellular Biology……………………………………...2 1.1.3. Microsporidian Life Cycle………………………………………………………...4 1.1.4. Infection and Clinical Information………………………………………………...6 1.1.5. Nematocida parisii………………………………………………………………...7 1.2. Pathogen Effector Proteins……………………………………………………………..8 1.2.1. Pathogen Effector Proteins………………………………………………………..9 1.2.2. Microsporidia Host-Exposed Proteins…………………………………………...10 1.3. Leveraging Yeast to Study Effector Proteins………………………………………...11 1.3.1. Usage and Applications of Yeast to Study Bacterial Effector Proteins………….11 1.4. High-throughput Cloning Techniques………………………………………………..12 1.5. Thesis Rationale………………………………………………………………………..13 2. MATERIALS AND METHODS 2.1. N. parisii Gene Amplification and Cloning…………………………………………...14 2.1.1. N. parisii Spore Prep and Genomic DNA Extraction……………………………14 2.1.2. Two-Step PCR Primer Design…………………………………………………...15 2.1.3. Two-Step PCR Protocol………………………………………………………….16 2.1.4. Gateway Cloning Protocols……………………………………………………...17 2.2. Bacterial/Yeast Strains and Culturing………………………………………………..17 2.2.1. Bacterial and Yeast Strains………………………………………………………17 2.2.2. Media Formulations……………………………………………………………...18 2.2.3. Transformation Protocols and Strain Generation………………………………..18 2.3. Yeast Growth Assays…………………………………………………………………..19 2.3.1. Spot Dilution Assays…………………………………………………………….19 2.3.2. Arrayed Liquid Growth Assays………………………………………………….20 2.4. LASSO Protocol………………………………………………………………………..20 2.4.1. LASSO Probe Synthesis…………………………………………………………20 2.4.2. Blunt-end Intramolecular Ligation………………………………………………21 2.4.3. Large Volume Intramolecular Ligation …………………………………………22 2.4.4. OPool Probe Capture…………………………………………………………….22 3. RESULTS 3.1. Cloning and Screening of N. parisii Host-Exposed Factors…………………………23 3.1.1. Host-Exposed Gene Pilot and Development of the Two-Step PCR System…….24

iii

3.1.2. Yeast Growth Pilot Screen of the 3 N. parisii genes…………………………….31 3.2. N. parisii Host-Exposed Protein Screening List and Composition………………….34 3.2.1. Choosing Genes and Designing Primers…………………………………………34 3.2.2. Functional Genomic Screening of N. parisii host-exposed genes……………….37 3.3. Future Directions………………………………………………………………………40 3.4. LASSO Cloning………………………………………………………………………...41 3.4.1. LASSO Cloning Procedure………………………………………………………43 3.4.2. Implementation and Progress Using the Published LASSO Protocol…………...43 3.5. Modifications to the LASSO Protocol………………………………………………...45 3.5.1. Blunt-End Ligation………………………………………………………………45 3.5.2. Large Volume Ligation Reactions……………………………………………….47 3.5.3. Oligo Pool Mediated Gene Capture……………………………………………...50 3.6. Future Directions………………………………………………………………………52 4. DISCUSSION 4.1. Yeast as a tool to study microsporidian effector proteins……………………………….53 4.2. High-throughput LASSO-based gene capture remains a challenge…………………….56 4.3. Thesis Summary………………………………………………………………………...57 5. REFERENCES……………………………………………………………………………...58 6. APPENDICES………………………………………………………………………………67

iv

List of Tables

Table 1. Two-Step PCR reaction components and parameters …………………………………...16

Table 2. List of Bacteria and Yeast source strains ………………………………………………..18

Table 3. List of source Plasmids …………………………………………………………………19

Table 4. Gene Cloning Pilot Summary Table …………………………………………………….28

Table 5. Toxic N. parisii Genes Summary ………………………...……………………………..39

Table 6. Blunt-end Ligation Protocols and Results……...…………………………………….….46

v

List of Figures

Figure 1. Diagram of a microsporidian spore ……………………………………………………...4

Figure 2. Microsporidian Life Cycle ………………………………………………………………6

Figure 3. PCR Results for HE Pilot Screen ………………………………………………………26

Figure 4. Two-Step PCR Primer Diagram ……………………………………………………….29

Figure 5. Two-Step PCR Development Results ………………………………………………….30

Figure 6. Streaking Assay of Pilot Gene Transgenic Yeast Lines ………………………………32

Figure 7. Spot Dilution Assay of Pilot Gene Transgenic Yeast Lines ……………………………33

Figure 8. Cloning and Screening Pipeline for N. parisii HE Genes ………………………………34

Figure 9. Screen Control Gene Growth Curves …………………………………………………..37

Figure 10. Normalized Growth Values for N. parisii Genes……………………………………...38

Figure 11. LASSO Probe Creation and Gene Capture Diagram ………………………………...44

Figure 12. Blunt-end Ligation Results …………………………………………………………...47

Figure 13. Large Volume Ligation Results ………………………………………….…………...48

Figure 14. Inverse PCR Results ………………………………………………………………….49

Figure 15. N. parisii Opool LASSO Probe Capture Results…………………………………...... 51

Figure 16. K12 Opool LASSO Probe Capture Results…………………………………………...52

vi

List of Abbreviations

Amp Ampicillin APE A Plasmid Editor APX Ascorbate peroxidase ATP Adenosine triphosphate bp Base pairs Carb Carbenicillin cDNA Complementary DNA Gal Galactose gDNA Genomic DNA Glu Glucose HE Host-Exposed HK Hexokinase Kan Kanamycin LASSO Long Adapter Single Strand Oligonucleotide Mb Megabases PCR Polymerase Chain Reaction PNK Phosphonucleokinase ProK Proteinase K RNA-Seq RNA Sequencing RTH Round-the-horn site directed mutagenesis SP Signal Peptide

Tm Melting Temperature TMD Transmembrane Domain tRNA Transfer RNA Ura Uracil

vii

Introduction:

1.1: Microsporidia

In our world, there are thousands of pathogenic microbes that cause great harm to human health and wellbeing. These pathogens have evolved their lifestyles from many distinct evolutionary lineages and phylogenetic taxa1, with many developing to fit the niche of intracellular pathogens2–4. These pathogens are marked by their ability to invade host cells as an ecological niche, using the host cell as an environment to replicate. Intracellular pathogens can be facultative or obligate, with the latter having an indispensable requirement for a host cell to reproduce2,3,5–7. An interesting class of obligate, intracellular pathogens that have a profound effect on human health and interests are microsporidia.

1.1.1: Microsporidia History, Evolution and Host Range

Microsporidia were first investigated by Louis Pasteur, who was studying a “pepper disease” spreading in silkworms 150 years ago8,9. The scientific consensus on the origins and classification of microsporidia would be a topic of discussion for many years. In 1867, the causative agent for this disease was determined to be a microscopic parasite, which was named as Nosema bombycis and was classified as a schizomycete fungi9. Later, further study would place Nosema into its own new group, Microsporidia, although the taxanomic classification of this group would be altered many times in the coming future. Microsporidia have a highly unique method of infection, which would lead to their removal from fungi and placement in Sporozoa, a group of protists characterized by forming spores9. In the 1980s, it was proposed that some arose before the endosymbiosis of the mitochondrion and this theory would influence the understanding of microsporidian evolution. Microsporidia lacked mitochondria and molecular evidence, such as the short and almost prokaryotic ribosomal RNAs, would lead to the belief that microsporidia were early diverging eukaryotes9. Further phylogenetic analysis on other microsporidian genes would raise doubts about the latest origin theory. Tubulin genes of microsporidia showed high conservation with fungi and despite the lack of mitochondria, microsporidia do have mitochondrial genes. This information would place microsporidia as a member of fungi, and although learning the exact relationships between microsporidia and its fungal relatives is ongoing, microsporidia are believed to be a distinctly evolved member of fungi9.

1

Currently, there are over 1400 identified microsporidia species that are divided into around 200 genera8,9. Microsporidia have evolved to be successful animal pathogens, with every vertebrate group and almost all invertebrates having a microsporidia species that uses them as a host8–10. Additionally, there are certain microsporidia species that are capable of infecting protists9. There are 16 known microsporidia species that can infect humans, causing the disease microsporidiosis10,11. The first case of human infection was recorded in 1959 and microsporidia gained increased clinical relevance due to the emergence of HIV as microsporidia predominantly infect immunocompromised patients10. Microsporidia are quite variable as a group, some species have very tight host specificities while others can be quite broad or even be zoonotic between vertebrates and invertebrates, such as Edhazardia aedis, which can infect mosquitoes and humans11,12. Infection of invertebrates by microsporidia has also emerged as a problem due to infection of agriculturally important species such as bees, where Nosema ceranae has been tied to colony-collapse disorder13,14. Microsporidia are also variable in other traits such as genome sizes ranging from 2.3 megabases (Mb), the smallest known eukaryotic genome, to 51.3 Mb12. Due to microsporidia’s ability to infect humans and other important animals, they can have a large impact on human health and wellbeing.

1.1.2: Microsporidia Genetic and Cellular Biology

Microsporidia, despite their evolutionary relation to other fungi, have evolved to suit a very different ecological niche and have also evolved to be very distinct from other fungi. The first main difference is seen in their genomes, that aside from their variable sizes, have undergone large scale genome reductions and gene compactions15,16. Microsporidia have lost many genes that were dispensable for extracellular, nonpathogenic growth but retain many core cellular processes such as transcription and translation9,12,15. Comparisons of homologous genes in microsporidia and the yeast, Saccharomyces cerevisiae, show that the genes are often shorter in microsporidia, with a 15% average reduction in Encephalitozoon cuniculi15. Accompanying the substantial loss of genes, microsporidia have lost many eukaryotic metabolic pathways. The absence of the mitochondria imposes a strong metabolic limitation, however microsporidia have retained some mitochondrial processes, such as mitochondrial iron-sulfur cluster genes17. Microsporidia do retain some metabolic enzymes which are likely indispensable or have been repurposed to suit other functions. An example of this is seen in microsporidian glucose metabolism as they have lost nearly all

2 mitochondrially associated components but retain the enzymes for glycolysis, with some enzymes being given a secretion signal to facilitate secretion into the host. Microsporidia are reliant on their hosts for many metabolites and they have various transporters to obtain these resources such as ATP; however there are many metabolic gaps that have yet to be addressed18. The loss in gene size is also seen in the structures of various proteins as some have lost large sections or entire domains which were deemed unnecessary for pathogenic growth. This alteration is seen in aminoacyl- tRNA-synthetases, which have lost their regulatory domains and are now more prone to errors19. In addition to the losses in their genes, microsporidia have altered proteins to compensate for some of those loses. Microsporidia have shortened ribosomal RNAs that are very similar to their prokaryotic counterparts, this shortening eliminated many binding sites for eukaryotic ribosomal proteins but these proteins have been altered so that they no longer have to bind to the RNA subunits and they can now bind to each other to form a functional ribosome20. Although there are many microsporidian genes that are conserved with other eukaryotes, around 50% of the microsporidian genome is unannotated15. Due to their physiology and divergence from other eukaryotes, it is difficult to study microsporidian genes in their native context as microsporidia cannot be cultured outside a host cell and they are not genetically tractable. The variability of microsporidia as a group is also demonstrated in the degree of divergence from other eukaryotes. For example, some microsporidia species lack introns and splicing machinery which are retained in other species21,22 and there are also microsporidia species that are known to be intranuclear pathogens12,22. There is a degree of conservation between microsporidia which consists of various morphological and genetic traits which are used to identify them as a taxonomic group. Conserved morphological features include the presence of the polar tube10, a component of the microsporidia spore that is used during the infection of host cells, while there are 800 proteins that are conserved in all microsporidia12.

Concurrent with microsporidia’s reduced genome, their morphological features are quite reduced compared to other eukaryotes. Microsporidia go through many different stages of their life cycle but the most recognizable is the microsporidian spore. Microsporidian spores can range from 1μm to 40μm, depending on the species9. The spore has a relatively simple morphology due to its reduced genome and metabolic inactivity (Figure 1). The exospore wall is composed of a granular, proteinaceous membrane while the endospore is primarily made of chitin. The polaroplasts are a collection of stacked membranes that are more tightly stacked in the lamellar

3 polaroplast than the vesicular polaroplast. The posterior vacuole is an expandable vacuole located at the end of the spore, the polaroplast and posterior vacuole play a role in the infection process9,10. The nucleus of the spore can be found as an uninucleate or in a diplokaryotic state, depending on the species. Both are contained in the sporoplasm, the internal material of the spore, which also contains microsporidian ribosomes9,10. The most distinguishable and unique feature of the spore is the polar tube, which is essential for the microsporidian infection cycle. The polar tube is a long, coiled, hollow structure that can range from 4 to 500 μm in length in different species9,10,23. The polar tube is used to invade host cells and deposit the microsporidian sporoplasm and nucleus, where it will then continue into the rest of the infection cycle and produce progeny spores (See Introduction Section 1-1-3). The microsporidian spore can also vary within a species as some species have been noted to produced multiple types of spores with varying sizes23.

1.1.3: Microsporidian Life Cycle

Figure 1. An illustration of the microsporidia spore. Major structures discussed in the text are labeled. Difference species can have variance in their structures. Created in Biorender.com

Microsporidia are reliant on their hosts and can only reproduce following the infection of a host cell. The spore is the most distinguishable form of microsporidia and the proliferative segment of its life cycle9,10. Microsporidia infection is localized to the gut and intestinal tract, but there are other species with different tissue specificities, with muscles and eyes also being a

4 common site of infection9,10,21,23. The infection cycle begins when a spore is ingested by an appropriate animal host. Once the spore reaches a suitable site for infection, the spore will fire its polar tube to infect the host cell. The polar tube is shot out of the spore due to an influx of water into the spore, which increases the osmotic pressure of posterior vacuole and polaroplasts causing it to swell.9,10. The polar tube is hollow and when it is fired it everts from the spore, starting at the anchoring disk. Firing of the polar tube happens exceedingly quick, finishing under two seconds and the tube can travel around 100 µm/s9,24. The swelling components then push the sporoplasm and nucleus through the polar tube and they are deposited inside the host cell9,10,24. It is still an outstanding question if the polar tube penetrates the host cell or not, but there is evidence for both mechanisms occurring with one or the other being used for different species of microsporidia. For some microsporidia, it is thought that the polar tube penetrates the host cell but for others the polar tube is thought to create pressure at the cell membrane and the microsporidia nucleus invaginates in9,10,23–25. Additionally, the mechanistic trigger for polar tube firing is also not fully understood. There are several proposed mechanisms such as pH shifts and trehalose signaling, but whether these triggers are universal to all species and how physiologically relevant they are is still an ongoing investigation9,10,24. Once inside, the microsporidian nucleus will transition to a meront. During the meront stage, the microsporidia is closely tied to its host cell and this stage is the most variable between different microsporidia species. The meront is commonly found in the host cell’s cytoplasm, however there are certain microsporidia meronts that are located in the host cell nucleus9. Components of the host cell, such as endoplasmic reticulum and mitochondria, can be localized to the meront9,25,26. The meront can exert a strong influence onto its host cell which can vary between different microsporidia. These changes can range from metabolic changes25–27, to whole cell changes such as increased growth to form large growths known as xenomas9 or inhibition of apoptosis28. The meront will then produce progeny spores, either directly in the host

5 cytoplasm or contained within a sporophorous vacuole, the spores will then exit the host cell to infect other cells in the same animal or infect a new individual9 (Figure 2).

1.1.4: Infection and Clinical Information

There are 16 known microsporidia species that can infect humans, causing the disease microsporidiosis10,11. The first case of microsporidia infecting a human host was recorded in 1959, where a microsporidia infection in a 9-year-old child was causing seizures10,29. Microsporidia infections are called microsporidiosis and found across the globe, however there are many different tissues that can be infected by the differing microsporidia spores and this can lead to different symptoms10,29. The most common site of infection is the intestinal tract, with diarrhea as the main resulting symptom. Other common sites of infection include the muscles and eyes10,29. Due to the emergence of HIV, microsporidia infections became more prevalent due to their potential as opportunistic pathogens10,30. Microsporidia infections can occur in both immunocompetent and immunocompromised individuals10,30. Immunocompetent individuals who get infected by

Figure 2. An illustration of the microsporidia infection cycle. The example used in this diagram is Nematocida parisii infection of Caenorhabditis elegans. Infections by different microsporidia species in different hosts can result in departures from this scheme, however the general steps are well conserved. Credit to Dr. Alexandra Willis for creating this image.

6 microsporidia will often go unnoticed as the infection is either asymptomatic or self-limiting30. Infection by an intestinal microsporidian species could result in acute diarrhea which would not warrant medical testing to determine the cause, as such the true extent of infection in immunocompetent populations is unknown. These types of infection are often a cause of traveler’s sickness and immunocompetent people have been observed to shed microsporidia with feces, indicating limited proliferation of microsporidia within such a host30. Immune responses to microsporidia are T Cell mediated and are dependent on both interferon gamma and interleukin- 129. Microsporidia infections in immunocompromised patients can be chronic with some being lethal, and these infections are the focus of medical microsporidia research. Infections have been recorded for multiple causes of immune deficiency such as HIV/AIDS and prescribed immunosuppressants10,11,29,31. Symptoms can be drastically different depending on the species involved but commonly include chronic diarrhea and weight loss10,29. Sites of infection also vary depending on the species, but the common locations are the eyes, muscles or small intestine although many of the species can lead to systemic infections10. Diagnosis of microsporidia was first done using electron microscopy due to the small size of these pathogens, but this process was arduous. More recently, staining techniques have allowed the use of light microscopy to detect microsporidia and PCR based diagnostic techniques combined with other molecular approaches allows for identification of microsporidia at the subspecies level32. Treatment of infections in immunocompromised patients involves drug regiments such as albendazole, fumagillin or other compounds but the treatments are not universal due to lack of efficacy against some species, host toxicity and the severity of the infection10.

1.1.5: Nematocida parisii

Microsporidia are known to infect virtually all animal hosts8,10, and there are several that have been observed that naturally infect the nematode Caenorhabditis elegans. C. elegans is a widely used model organism and has more recently been used to study host-pathogen interactions. In 2008, a natural intestinal microsporidian pathogen of C. elegans was identified and named Nematocida parisii, this species would serve as a useful model to study microsporidian intestinal infections in a C. elegans system14,33. N. parisii infection can lead to death of the host nematode, however the host is capable to handling the burden of the pathogen for a substantial time and over 100,000 spores are produced from a single infected worm14,33. The use of N. parisii to study

7 intestinal infections has revealed many insights into the interactions between hosts and pathogens. N. parisii does not induce the innate immune response consistent with bacterial or other fungal infections, instead microsporidia induce a response that is shared with viral pathogens, such as Orsay virus11,33. N. parisii is capable of inducing various changes to host intestinal cells upon infection. Notable changes include remodeling of the host cell cytoskeleton to facilitate non-lytic exit of progeny spores from the host cell14,34, as well as fusing host cells together to form a syncytium that can aid in the cell to cell spread of N. parisii35.

N. parisii has a genome size of 4.1 Mb with 2661 coding genes21, this is more minimal genome than even the bacteria Escherichia coli, which has a 4.6 Mb genome with 4401 genes36,37. The average length of an N. parisii gene is 1104 base pairs (bp) long and the large majority of genes have evidence to support their predictions, such as RNA-Seq or protein homology21. Similar to other microsporidian genomes, the N. parisii genome is compacted, with only 28% of the genome predicted to be non-coding and an average distance between genes of around 400 bp. A unique feature of N. parisii and other related microsporidia is the absence of introns and splicing machinery commonly found in eukaryotes21. As with other microsporidia, the compaction of the N. parisii genome has rendered them reliant on the host cell and host factors to complete their life cycle, as hosts deficient for factors such as the GTPase RAB-11 that is essential for spore exit, show impairments in the pathogen’s life cycle34,38. There are several strains of N. parisii that exhibit genomic differences including varying genome sizes and different number of genes21. Regions of the microsporidian genome have undergone rare loss-of-heterozygosity events, combined with the diploid nature of the genome, this hinted at possible sexual reproduction in N. parisii however there has not been any observed evidence of a sexual mating cycle as of yet21,34.

1.2: Pathogen Effector Proteins

Microsporidia, like most pathogens, thrive off their hosts and use a variety of mechanisms to influence their hosts. One common mechanism seen across pathogens of diverse phyla is to use effector proteins to modulate their hosts to improve the fitness of the pathogen39–41. These effector proteins are found in many different bacterial pathogens and are a prominent topic of study due to their roles in pathogenesis and disease. These proteins are secreted from the pathogen to the host and are important for the proliferation of the pathogen, but the roles of secreted proteins in microsporidia are not as well studied as their bacterial counterparts11,21,42.

8

1.2.1: Pathogen Effector Proteins

Effector proteins are frequently studied in the context of bacterial pathogens and their associated diseases. The term effector protein is used in many different areas of study outside pathogenic organisms, in this context an effector protein is a secreted protein that is used to interact with the host organism. Bacterial effector proteins are defined as being secreted to the host using bacterial secretion systems and they can also contain domains that have similar characteristics to eukaryotic proteins43. These metrics can be used to predict the existence of effector proteins in a given bacterial genome, adjusting for differences in the bacterial secretion systems43. Effector proteins are found in many different taxa of pathogenic bacteria and their numbers can vary, with Legionella pneumophila possessing over 300 effector proteins and over 18,000 in the Legionella genus43,44. The effector proteins in a bacteria’s arsenal can have a wide range of functions that aid in subverting host functions. A prime example of this trait is seen in Shigella, an intracellular pathogen which has effectors that cause rapid cell death in macrophages to avoid the host immune system but prevent cell death in epithelial cells to maintain a suitable reproductive environment41. Effector proteins are not limited to intracellular bacteria, as enteropathogenic E. coli can deliver effectors into the host epithelial cells to modify the cytoskeleton and promote bacterial adhesion, while Yersinia species can deliver effector proteins to alter cytoskeletal dynamics as to avoid phagocytosis by host cells45. The number of effector proteins available in a pathogenic bacteria’s repertoire is increased by many of the proteins having redundant or additive effects, which can complicate analysis into the roles of these proteins during infection45. In addition, many effectors can act to regulate other effectors as seen with Shigella OspD2, which regulates the Shigella type III secretion system41, and Legionella meta-effectors that can either suppress or amplify the activity of other secreted effector proteins46. Effector proteins are also employed by eukaryotic pathogens to a similar end. Apicomplexans, such as Toxoplasma gondii, have secretory organelles that release proteins into the host that are essential for infection and play a role in the manipulation of the host. Examples of eukaryotic effectors are GRA15, an effector used to modulate the host cell’s immune response through NF-κB signaling, and T. gondii Inhibitor of STAT Transcription (TgIST) which blocks Type 1 Interferon signaling3,47,48. Plasmodium falciparum, the parasite behind malaria, uses effector proteins such as P. falciparum erythrocyte membrane protein 1 (PfEMP1), to make host erythrocytes more rigid and increase parasite adhesion, and Ringinfected

9 erythrocyte surface antigen (RESA) to avoid red blood cell trafficking to the spleen which is detrimental to the parasite39,49.

1.2.2: Microsporidia Host-Exposed Proteins

The microsporidian genome, in addition to its characteristic gene losses and compactions, contains many proteins that have been modified to serve a new purpose compared to their function in other eukaryotes. Microsporidian genome analysis has determined that several proteins have been modified through the addition of secretion signals. An interesting example of this phenomenon is seen in N. parisii hexokinase (HK), a key glycolytic enzyme that catalyzes the reaction of glucose to glucose-6-phosphate, which has a signal peptide while the other glycolysis enzymes do not21. The N. parisii HK was found to be secreted in a heterologous yeast-based assay and during infection of the C. elegans intestines, demonstrating that these secretion signals are functional in yeast21,42. HK was found to have secretion signals in other microsporidia genomes, even those that are distantly related to N. parisii. HK from Nosema bombycis and N. ceranae were found to phosphorylate host glucose, confirming catalytic activity of the secreted protein27 while HK from Trachipleistophora hominis was found to be localized to the plaque matrix (the interface between the meront and its host cell) and was shown to influence the metabolism of glucose in the host cells26. HK from N. bombycis and Antonospora locustae were found to localize in the host nucleus11,50,51, suggesting a role host cell transcriptional regulation, and downregulation of N. bombycis HK impaired the microsporidia’s proliferation50. Additionally, serine protease inhibitors (serpins), which are found in many eukaryotic organisms, are coopted as effector proteins by N. bombycis where they are used to inhibit melanization of their silkworm hosts, an important innate immune response of insects52.

Microsporidia possess many proteins that are secreted to the host but are not as well studied as hexokinase. These proteins were confirmed to be exposed to the host during infection using a biotin labelling system42. Reinke et al. used the enzyme ascorbate peroxidase (APX), which biotinylates nearby proteins when biotin-phenol and hydrogen peroxide are added, to tag proteins in the nucleus and cytoplasm of C. elegans intestinal cells infected by N. parisii. Biotinylated proteins were identified with mass spectrometry and removal of C. elegans proteins identified N. parisii host-exposed proteins42. 72 host-exposed proteins were discovered for N. parisii and over 75% of these proteins were found to be enriched for signal peptides and transmembrane domains

10 when compared to the rest of the genome. Furthermore, the host-exposed proteins were found to be members of large gene families. In addition to supporting the functions of signal peptides in microsporidia, the enrichment traits were used to find the rest of the host-exposed proteins in the N. parisii genome and other available microsporidia genomes. The counts of host-exposed protein in a genome can vary depending on the microsporidia, ranging from 6 – 32% of the genome, with N. parisii containing 713 proteins (27% of the genome). The large gene families that include these host-exposed proteins are also conserved within clades and are clade specific, as N. parisii shares most of these gene families with other Nematocida species but not with other microsporidia clades42. These host-exposed proteins could act as microsporidian effector proteins and, like hexokinase, are secreted to the host cell in order to manipulate the host cell to bolster the proliferation of the pathogen. Determining the role of these proteins is difficult as only 7% of these proteins have a predicted Pfam domain, making homology comparisons to other eukaryotic proteins difficult. Uncovering the functions and roles of these proteins would reveal a great deal about microsporidia infections as secreted proteins are important for pathogenicity in bacteria41 and microsporidia50. It is possible that some of these proteins may be involved in the changes seen in host cells upon microsporidia infection, such as syncytium formation seen with N. parisii35, and studying these proteins could reveal the mechanisms and patterns seen in microsporidia hos- pathogen interactions.

1.3: Leveraging Yeast to Study Effector Proteins

The budding yeast, S. cerevisiae, is a routinely studied model organism that has proved to be very effective in characterizing effector proteins. Bacterial effector proteins target eukaryotic processes and due to the degree of conservation between eukaryotic cells53, yeast has arisen as a powerful tool to analyze effector proteins.

1.3.1: Usage and Applications of Yeast to Study Bacterial Effector Proteins

S. cerevisiae is a commonly used eukaryotic model organism and was the first to have its genome sequenced53,54. Yeast provides many advantages for its use as a model organism such as a short generation time,ease of genetic manipulation, and the level of conservation with other eukaryotes. Yeast has been a workhorse for studying conserved eukaryotic processes for more than 50 years45. Bacterial effector proteins have evolved to target important and well conserved eukaryotic processes that are similar between the pathogen’s native host and yeast45.

11

Due to conservation of targets, biology gleaned from a yeast-based study of an effector is a good representation of how these effectors function in their native contexts. Functions of effector proteins are often conserved between the native host of the parasite and the yeast model. Yeast can be used to determine the conserved cellular localization of effector proteins through the proteins that the effector interacts with and yeast deletion strains can be used to identify the function of an effector through its interactions55,56. Additionally, the versatility of yeast can be leveraged to develop new assays to determine the specific functions that are perturbed by effector proteins such as cell trafficking40,57.

There are many benefits of using yeast to understand bacterial effector proteins. One such advantage is the ease of use that comes with yeast when compared to other methods to examine effector proteins. Yeast are simple and inexpensive to culture, while allowing for many genetic and biochemical assays to elucidate the function of a given protein45,58. Many pathogens have complex interactions with their host and modifications to understand effector protein function can be complicated, switching to yeast would be beneficial due to the simplicity of the system which also allows for a great deal of flexibility. A wide array of informative yeast assays only requires a cloned copy of the effector protein of interest, providing a strong resource to study effector proteins from pathogens that are difficult to culture or not genetically tractable45. Yeast also provides a powerful tool to identify potential effector proteins from a range of uncharacterized proteins. An example of this trait was seen through expression of secreted proteins and non-secreted proteins from Shigella flexneri in a yeast background. The relative growth of each expression strain was measured and the proportion of secreted proteins that resulted in a growth defect when expressed was higher than non-secreted proteins that caused growth defects, indicating that yeast growth inhibition is a specific indicator for effector proteins45,57,59,60.

1.4 High-throughput Cloning Techniques

Gene cloning is an essential step for many different types of research. There are many different cloning methods available, ranging from traditional enzyme digestion and ligation to recombination-based methods. Due to the rapid advancements of genome sequencing, it is not a heavy burden to sequence the genome of a given organism, and there are thousands of sequenced genomes readily available61. A sequenced genome can reveal many insights into an organism’s biology, such as the identities of the genes, but it has limitations. The main limitation is that the

12 sequence of a gene may not be indicative of its function, especially for divergent or poorly understood organisms. A solution to this limitation is high-throughput functional genomics, capable of screening thousands of genes and learning more about the functions of entire genomes at once. Although functional genomics is effective, the main limitation of the process is the cloning of target genes. High-throughput and scalable cloning methods are available, such as Gateway or InFusion cloning, but the process of capturing each of the genes and cloning them can be demanding. Due to the relative low cost of reading or sequencing DNA compared to writing or synthesizing it, there is a large gap between the number of DNA sequences available versus the number of genes with known functions. Development of high-throughput gene capture and cloning methods would work to strengthen the power of functional genomics, allowing for faster characterization of genes with unknown functions.

1.5: Thesis Rationale

N. parisii possess over 700 proteins that are predicted to be exposed to the host cell upon infection42. Some of these proteins have been shown to have effects on host cells in other microsporidia species11,26,27,50,51, however most of these host-exposed proteins have no known function42. N. parisii can induce various changes to the host cell, such as fusion of neighbouring intestinal cells35, and these changes may be caused or mediated by some of these host-exposed proteins. The host-exposed proteins that have a function in the host cell could be novel microsporidian effector proteins, an aspect of microsporidia pathology that suffers from a lack of available research.

I propose that these previously identified host-exposed proteins are microsporidian effector proteins, used by the microsporidia to modulate the host cell environment. To determine which of these host-exposed proteins are potential microsporidian effectors, I will utilize yeast and its accompanying techniques used to study bacterial effector proteins. I hypothesize that the methods developed to analyze bacterial effectors, such as inhibition of growth, will be applicable to the microsporidian context. I will express a subset of N. parisii proteins in a yeast background and measure any yeast growth defects, which will identify potential N. parisii effector proteins. Despite the differences between C. elegans and yeast, expression of N. parisii proteins that target conserved eukaryotic processes should disrupt the growth of the cell due to the conservation between the cell types. This research has the potential to identify novel microsporidian effector

13 proteins which may have important roles in microsporidian biology. Additionally, this work would help establish yeast as a viable model to study the functions of microsporidian proteins, enabling future researchers to leverage yeast and its wide range of available assays to characterize proteins of interest. In addition, I will attempt to develop and use new gene capture techniques to capture a large set of N. parisii genes for use in large scale functional genomic screens.

Materials & Methods

2.1 N. parisii Gene Amplification and Cloning

2.1.1 N. parisii Spore Prep and Genomic DNA Extraction

Spore prep to produce a stock of N. parisii spores is performed by first thawing a frozen stock of infected C. elegans. The infection is allowed to progress for 6 days in the thawed worms, which will produce a large number of spores. A small chunk of the infected source plate was plated on a fresh plate of worms to propagate the infection. After another 6 day period, the infected worms were washed off the plates using 6 mL of sterile water collected into 15 mL conical tubes. The tubes were centrifuged at 1400 rcf for 30 seconds and rinsed with water for a total of three times, the supernatant was removed to 1 mL. 25 μL aliquots were made for each set of plates and one aliquot from each set was used to check for contamination by plating these aliquots at 21 and 25℃.

N. parisii spores were extracted by adding 500 μL of silicon beads to the aliquots and vortexing for 5 minutes on a Disruptor machine. The liquid from the tubes was collected into a 50 mL conical tube. The beads were washed twice with 1 mL water and this was also collected into the same conical. The liquid was passed through a 5 μm filter to remove any worm matter, the filtrate was then aliquoted and frozen at -80℃. Spore concentrations were calculated by taking a volume of the aliquoted filtrate, staining it with Direct Yellow and calculating the average spore count under a fluorescence microscope.

N. parisii genomic DNA was prepared by using an Epicenter Masterpure Yeast DNA Kit and LoDNA Binding tubes. Firstly, 10 million spores were centrifuged for 1 minute at full speed and washing the pellet with 100 μL water, this was repeated an additional time. The supernatant was removed and 300 μL of Yeast Cell Lysis Solution was added along with 2.5 μL Proteinase K

14

(ProK) and 1 μL RNAse A (5 μg/mL). The solution was mixed and incubated at 65℃ for 30 minutes, after which the solution was placed on ice for 5 min. Next, 150 μL of MPC Protein Precipitation Reagent was added and the solution for vortexed for 10 minutes, then centrifuged at full speed for 10 minutes. The supernatant was removed and 500 μL of 70% ethanol was mixed in. The solution was spun twice to remove the ethanol. Finally, the pellet was resuspended in 12 μL of 10 mM TRIS pH 8.0 buffer.

2.1.2 Two-Step PCR Primer Design

Primers for amplification of N. parisii genes (and the Legionella pneumophila controls), were designed manually using available gene sequences from the MicrosporidiaDB database (L. pneumophila sequences were provided by Dr. Malene Urbanus from the Ensminger Lab). Gene sequences were downloaded and analyzed in A Plasmid Editor (APE) software. Primer sequences were designed to capture the soluble portions, the section exposed to the host cell, of the genes of interest were amplified45,62. Available prediction tools such as SignalP 5.063,64 and TMHMM 2.065– 67 were used to predict the positions of any signal peptides and transmembrane domains, these portions of the gene were removed while preserving the soluble host-exposed regions. Forward and reverse primers were designed with the following criteria ranked by priority: 1. Melting temperature (Tm) of both primers should be between 52-57℃, ideally 55℃, 2. Forward and reverse annealing regions should not exceed 40 bases, 3. GC percentage should be around 30-40%. If a primer did not satisfy the first two requirements, it was omitted from analysis. After the primers were designed, additional sequences were added such as partial Gateway attB1/B2 sites (ACAAAAAAGCAGGCTCA for forward primers and GGGGACCACTTTGTACAAGAAAGCTGGGTT for reverse primers). The reverse primer received a yeast synthetic terminator with a TAG stop codon and additional GC base pairs to boost the GC percentage (TAGTATATATTTAATAAAGAGTATCATCTTTCAAACCGC68), as well as a randomly selected barcode69 (See Reference 69, set 17-2), both added as reverse complements. Addition and combination of sequences to create the final primers was performed with Microsoft Excel.

2.1.3 Two-Step PCR Protocol

The first step of the two-step PCR protocol amplifies the gene from the genome and adds the yeast synthetic terminator and barcode to the 3’ end, as well as partial Gateway attB1/B2 sites

15 to the 5’ and 3’ ends, respectively. The second step of the two-step system uses the product from the first step as a template. The primers in this reaction are complete Gateway attB1 and B2 sequences (GGGGACAAGTTTGTACAAAAAAGCAGGCTCA for the F primer, GGGGACCACTTTGTACAAGAAAGCTGGGTT for the R primer) and completes the partial Gateway sequences generated in the first step, allowing for the amplicon to be cloned using Gateway BP clonase. Cycle numbers for both reactions combined to 30 cycles to minimize any potential of introducing errors during amplification and to prevent smearing on agarose gels. See Table 1 for PCR reaction mixture and PCR cycle parameters.

PCR Reaction Mixture PCR Cycle Parameters Step 1 Nuclease Free H2O – 32.5μL 1. 98℃ for 1 min 5X Phusion Buffer - 10μL 2. 98℃ for 10 sec 10mM dNTPs - 1µL 3. 62℃ for 20 sec 10µM F/R Primer – 2.5µL each 4. 72℃ for 2:30 min Template DNA(2ng/µL) - 1µL 5. Go to step 2, 9 times Phusion – 0.5µL 6. 72℃ for 6 min Step 2 Nuclease Free H2O – 30.5μL 1. 98℃ for 1 min 5X Phusion Buffer - 10μL 2. 98℃ for 10 sec 10mM dNTPs - 1µL 3. 72℃ for 20 sec 10µM F/R Primer – 2.5µL each 4. 72℃ for 2:30 min Template DNA(Step 1 PCR)- 3µL 5. Go to step 2, 19 times Phusion – 0.5µL 6. 72℃ for 6 min Table 1. Reaction components and PCR cycle parameters for both steps of the Two-Step PCR system. Final reaction volumes were 50µL for both steps. 2.1.4 Gateway Cloning Protocols

Gateway cloning was performed with separate BP and LR cloning steps. The product from the two-step PCR was used with pDONR221 Entry Vector and BP clonase, positive clones from this BP reaction were combined with the desired empty yeast expression vector and LR clonase to produce a cloned yeast expression vector. The BP reaction was composed of 1µL PCR product, 1µL 75 ng/μL pDONR221 Entry vector and 0.5µL BP Clonase II (Invitrogen). The LR reaction was composed of 1µL 75ng/μL cloned pDONR221 Entry clone, 1µL 75ng/μL empty yeast expression vector, 0.5µL LR Clonase II (Invitrogen). Both reactions were incubated at 25℃ overnight, after which 0.3µL Proteinase K was added and incubated at 37℃ for 10 minutes. Protocols were provided by Dr. Urbanus from the Ensminger Lab.

16

2.2 Bacterial/Yeast Strains and Culturing

2.2.1 Bacterial and Yeast Strains

Bacterial strains used include DH5α competent E. coli cells (purchased from NEB or Invitrogen) and K12 E. coli. DH5α competent cells were used for cloning and propagation of vectors. The yeast strain used was BY4741, a Ura3- mat α haploid yeast. This strain was used for expression of cloned N. parisii and L. pnuemophila proteins to screen for growth defects. Bacterial and yeast strains were stored at -80℃ in 25% glycerol.

Bacterial strains were cultured in LB medium with the addition of antibiotics when necessary. Liquid cultures were grown overnight at 37℃ while shaking at 220 RPM, solid cultures were grown in similar conditions except no shaking. Yeast strains were grown in YPD medium or Ura- Synthetic Dropout medium (mix purchased from ThermoFisher) when selective pressures for yeast cloning were needed. Ura- + 40% glucose (Glu) medium was used for selection following yeast transformation with no gene expression while Ura- + 40% galactose (Gal) was used for expression of cloned genes. Liquid yeast cultures were grown overnight at 30℃, shaking at 220 RPM, while solid cultures were grown for ~40 hours without shaking. Strain Species Genotype Selection Marker Description Source DH5α E. coli fhuA2 Δ(argF-lacZ)U169 N/A Competent cells NEB phoA glnV44 Φ80 used for cloning Δ(lacZ)M15 gyrA96 and propagating recA1 relA1 endA1 thi-1 plasmids hsdR17 BY4741 S. cerevisiae MATa his3Δ1 leu2Δ0 Ura3- Yeast used as a Ensminger Lab met15Δ0 ura3Δ0 host to express host-exposed proteins K12 E. coli Wildtype N/A E. coli strain Navarre Lab cultured for genomic DNA

Table 2. List of major bacterial and yeast strains used to generate derivative cloned strains or were used in experiments.

17

2.2.2 Media Formulations

LB Medium: 10g of NaCl, 10g of Tryptone, 5g of yeast extract, optional 15g of agar, 975mL of

H2O (960mL if adding agar)

YPD Medium: 20g of bacto-peptone, 10g of yeast extract, optional 24g of agar, 950mL of H2O, 50 mL of 40% glucose following autoclave sterilization

SOC Medium: 20g of tryptone, 5g of yeast extract, 0.5g of NaCl, 10mL of 250mM KCl, 5mL of

2M MgCl2, 960 mL H2O, 20 ml of 1M glucose following autoclave sterilization

Ura- Synthetic Dropout Medium: 1.92g of Ura- Mix, 6.7g of yeast nitrogen base w/ ammonium sulfate, optional 25g of agar, 890mL H2O (865mL if adding agar), 100 mL 40% glucose or galactose following autoclave sterilization

All recipes formatted to make one litre of medium.

2.2.3 Transformation Protocols and Strain Generation

Bacteria were transformed using heat shock, following protocols provided with NEB5α competent cells. Bacterial transformations were used to generate and propagate Gateway Cloning products, with BP reaction transformations grown with a kanamycin selective pressure and ampicillin/carbenicillin for LR reaction transformations. Colonies from either reaction were isolated, and cultured overnight after which plasmids were extracted using a Qiagen Miniprep Kit. Plasmids were Sanger Sequenced using Invitrogen M13F Primers and validate using APE alignment tool.

Yeast transformations were performed following protocols published by Gietz & Schiestl (2007). Yeast were transformed with plasmid extracted from positive, sequence verified LR transformants and were grown in Ura- medium. A single colony was isolated and grown in liquid culture, after which it was stored in -80℃.

18

Plasmid Name Description Selection Marker(s) Source pDONR221 Gateway Donor Kan Reinke Lab Vector pAG426Gal-ccdB Gateway Destination Amp, Ura3 Ensminger Lab Vector Table 3. List of major plasmids used to generate derivative cloned gene vectors.

2.3 Yeast Growth Assays

2.3.1 Spot Dilution Assays

Spot dilution assays were performed on relatively dry plates; plates were kept at room temperature for 2-3 days before use. Overnight cultures of each strain were made (3 mL Ura3 + glucose medium) and their optical densities were determined the following day using a spectrophotometer. The overnight cultures were used to make a 5 mL culture with an OD around 0.8. This culture was centrifuged at 3000 rpm for 5 minutes, the supernatant was removed after which the pellet was resuspended in 1 mL sterile water and transferred to 1.5 mL tube. This solution was briefly centrifuged to 10000 rpm and the pellet was resuspended in 500 μL water. A 96 well plate was used to prepare the serial dilutions for plating. The first well in each row/column was left empty while the following 4-6 wells received 180 μL water. 200 μL of the culture was added to the first well after which 20 μL were removed and added to the second well, performing a 10-fold serial dilution. This was repeated to all wells after which 10 μL of each dilution was plated. The plate was allowed to dry, then placed in a 30℃ incubator and imaged two days later.

2.3.2 Arrayed Liquid Growth Assays

Liquid growth assays were performed in 96 well format. First, a master plate is created by using a pin tool to make indents on an omnitray agar plate of non-inducing medium. Colonies from the transformation plate are dotted into their appropriate spots dictated by the plate layout. The plate layout should have the borders occupied by the empty vector negative control with additional negative controls located somewhere inside the plate border to control for plate effects. Additionally, the plate layout should contain empty spots used for contamination checks. The master plate is grown at 30℃ for two days.

Following the growth period, the master plate is used to create a liquid preculture plate. A 96 well plate is filled with 200µL of non-inducing medium and a 96 well pinning tool is used to

19 pin yeast from the master plate to the preculture plate. The preculture plate is then sealed with a breathable, sterile seal and grown at 30℃ with shaking overnight. The preculture plate should be kept shaking the next day until it is needed to be used to create the screening plates.

The screening plates are generated by filling multiple 96 well plates with 200µL of inducing and non-inducing media, in equal number and to the desired number of replicates. A 2µL slit pinning tool is used to pin the preculture plates to the screening plates which are then sealed with a breathable, sterile seal. The plates are then placed in the growth robot, which grows the plates at 30℃, shakes them and takes OD600 measurements to monitor the growth of each well over the course of two days. The timing of the measurements is dependent on the limitations of the robot factoring in the number of replicates and total plates, 15-minute intervals is ideal but 30- minute intervals are acceptable.

2.4 LASSO Protocol

2.4.1 LASSO Probe Synthesis

LASSO probes are constructed from Pre-LASSO Probes ordered in pools by combining them with a universal adapter. The Pre-LASSOs are ordered in a state that is incapable of capturing the target gene and must be made to do so through a series of reactions. The first reaction is a fusion PCR which combines the Pre-LASSO and the universal adapter (adapter-242). The fusion PCR reaction conditions are as follows: 5X Phusion Buffer 5µL, 10 mM dNTPs 0.6µL, Phusion

0.2µL, 20 ng of Pre-LASSO and adapter_242, H2O to 25µL. 4 min at 95℃, 15 sec at 95℃, 20 sec at 50℃, 40 sec at 72℃ (repeat steps 2-4 10 times), add 1µL each of fusion BLAF and fusion RFPR200EcoRI primers then run for 30 cycles. Fusion PCR products are run on a 1.1% Agarose gel with SYBR SAFE and the product is gel extracted. Fusion PCR product is then digested with EcoRI: 5 μl of EcoRI 10× buffer and 1 μl (20 units μl−1) of EcoRI restriction enzyme (NEB) for 1 h at 37°C followed by 10 min at 80°C. Purify the digested DNA with AMPure Beads (1.4X wash twice with 70% ethanol) and elute with 40 µL water. The digest will leave sticky ends for the subsequent intramolecular ligation.

Intramolecular ligation is performed by adding 5 ng of EcoRI digested fusion PCR product with 400 units of T4 DNA Ligase in a 2mL reaction of 1X T4 Ligase Buffer (NEB). The ligation reaction is performed at 16℃ overnight. After the ligation, the reaction was concentrated in a

20

SpeedVac to ~20 µL, then add water to 100 µL. Purify with AMPure Beads (1.4X wash twice with 70% ethanol) and elute with 50 µL water. Digest linear DNA by adding 2 µL of 1 µL Lambda Exonuclease (5 units/µL) and 1 µL Exonuclease I (20 units/µL) to the 50 µL solution. Digest for 30 min at 37°C then 20 min at 80°C.

Inverse PCR is performed in a 25 µL reaction volumes containing: 10 µL of circularized LASSO precursors (see above), 2.5 µL of 10X Klentaq Mutant Buffer, 0.2 µL of Omni Klentaq LA, 0.6 µL dNTPs and 1 µL of 0.4 µM reverse primer TiolNew and forward primer SapINew. Thermal profile: 4 min at 95°C then 30 cycles of 10 s at 95°C, 20s at 55°C, 40 s at 72°C and then 4 min at 72°C. Purify with AMPure Beads (1.4X wash twice with 70% ethanol) and elute with 40 µL nuclease free water. Measure concentration with Nanodrop.

The final enzymatic digests to produce mature LASSO probes are as follows: Digest 1 µg of purified inverted PCR product in 4 µL of 10X CutSmart buffer and 1 µL of BspQI restriction enzyme. Perform at 50°C for 1 hour then 80°C for 20 min. After this digestion, directly add 1 µL (5 units) of Lambda Exonuclease and incubate for 30 min at 37°C. Purify the mature ssDNA using AMPure Beads (1.4X wash twice with 70% ethanol) and elute with 40 µL water. Measure LASSO ssDNA concentration with Nanodrop.

2.4.2 Blunt-end Intramolecular Ligation

Blunt-end intramolecular ligations were performed by modifying the “Round the horn site directed mutagenesis” protocol70. Primers were phosphorylated to provide the right ends for a blunt-end ligation using phosphonucleokinase (PNK). The phosphorylation reaction was performed in 5µL 10X T4 DNA Ligase Buffer, 5µL of 100µM primer, 1µL PNK and 39µL of water. This reaction was incubated at 37℃ for 1 hour then 65℃ for 20 min. These primers were used to amplify the product in a PCR using the original LASSO protocols. If the product being ligated was amplified from a bacterial plasmid, the reaction was treated with 1µL of DpnI at 37℃ for 1 hour. Blunt-end ligations were always performed at a concentration of 2ng/µL of DNA, although volume varied from 20-40µL. The reaction was incubated at 25℃ overnight, after which it was treated following the original LASSO protocol (See Methods Section 2.5.1).

21

2.4.3 Large Volume Intramolecular Ligation

Large volume intramolecular ligations were performed using EcoRI digested fusion product. 2µg of EcoRI digested fusion product was ligated in a 20mL reaction, with similar composition and conditions to the original protocol. The ligation reaction was purified using a NEB Monarch DNA/PCR Column Purification Kit.

2.4.4 OPool Probe Capture

Opool Probe capture was started by first preparing the Opool Probes. The Opool was resuspended to 250ng/µL and phosphorylated by PNK. The reaction was composed of 5µL 10X T4 DNA Ligase Buffer, 1µL PNK, 1µL Opool and 43µL of water. The reaction was incubated at 37℃ for 1 hour then 95℃ for 5 min. Opool Probe Annealing was performed in 15µL Ampligase Buffer, genome amount varied but was added to reach 250ng, and 1µL of 5ng/µL phosphorylated Opool Probes. The reaction was incubated at 95℃ for 5 min, 65℃ for 1 hour and held at 65℃ until Gap-filling mix was added. The Gap-filling mix was a made of 0.2µL 1mM dNTPs, 0.5µL 10X Ampligase Buffer, 1µL NAD+, 3µL water, 0.1µL Ampligase and 0.3µL Phusion. A master mix of the Gap-filling mix and 5µL was added to each annealing reaction, bringing the total reaction volume to 20µL. The reaction was incubated at 72℃ for 30 min, 98℃ for 3 min, then held at 37℃. The reaction was then exonuclease digested using by adding 3µL of an exonuclease mix, containing 1µL Exo I, III and λ. The reaction was incubated at 37℃ for 1 hour then 80℃ for 20 min. The reaction is then used as template in a PCR reaction to amplify the captured genes. The PCR reaction is composed of 10µL 5X Kapa HiFi Buffer, 1µL 10mM dNTPs, 2.5µL 10mM Forward Primer, 2.5µL Reverse Primer, 1µL Kapa HiFi Polymerase, 10µL of the exonuclease treated reaction and 23µL water. The PCR program was as follows: 2 min @ 95℃, 15 sec @ 98℃, 15 sec @ 55℃, 5 min @ 72℃, Go to Step 2 x23m 8 min @ 72℃, hold @ 4℃. The PCRs were then ran on an agarose gel to visualize PCR products.

Results

3.1 Cloning and Screening of N. parisii Host-Exposed Factors

Many pathogens release effector proteins to influence their hosts and provide benefits to the pathogen. A pathogen can possess a diverse toolkit for effector proteins, each having a unique role during infection. Effector proteins can serve many different purposes, that can sometimes have

22 opposite effects on the host, but when regulated can provide a versatile method to counter or exploit the host. Shigella provides an example of this trait, as it employs effector proteins that can lead to macrophage death in order to subvert the host immune system, but Shigella also uses effectors to prevent epithelial cell death to avoid the loss of its reproductive environment41. Understanding the functions of effector proteins, and how pathogens use them for their benefit during infection, would reveal many details about the pathology and treatment of infectious disease.

Microsporidia, like many pathogens, have evolved various mechanisms to influence their hosts and illicit certain changes. The nematode microsporidia, N. parisii, is capable of remodeling the host cell cytoskeleton by modulating host RAB-1114,34,38 and can fuse neighboring host cells together to aid in cell to cell spread35. Other microsporidia can influence their hosts in a variety of diverse ways, and some of these effects could be mediated through microsporidia host-exposed proteins that act as effector proteins. Microsporidia secrete many proteins to their host cells upon infection42. The majority of these proteins have unknown functions, but some are known such as hexokinase, which has been shown to be secreted by a variety of different microsporidia. Microsporidian HKs can influence host cell metabolism and support proliferation of microsporidia, as knockdown of Nosema bombycis HK reduced its proliferation21,26,50.

A promising way to study effector proteins is to clone effectors of interest and express them in yeast. Previous work studying Shigella secreted proteins has demonstrated that secreted proteins are more likely to be cytotoxic and cause growth defects in yeast when compared to non- secreted proteins59. Yeast has been applied to study effectors from many different bacterial pathogens such as Legionella, Yersinia and many others45. Studying effector proteins in yeast is possible due to the degree of conservation between the pathogen’s natural host and yeast45, meaning that the targets and functions of effector proteins are also conserved. Using yeast to study these proteins provides many advantages and has proven a versatile system for effector protein investigation. As such, yeast may provide an excellent tool to study microsporidian effector proteins. The host-exposed proteins that are secreted by N. parisii42 can have a multitude of roles during infection, either interfacing with or influencing the host cell. The latter category would include effector proteins that modulate host processes, and they can be detected using yeast, as the effector proteins would by cytotoxic and cause a growth defect in the yeast. By expressing N.

23 parisii host-exposed proteins in yeast and measuring yeast growth, it is possible to identify novel microsporidian effector proteins.

This chapter will detail my work and findings in developing a functional genomics system to clone and screen N. parisii host-exposed proteins in yeast. This chapter will detail the development of the cloning system I used to generate my transgenic yeast and the growth assays I performed to identify novel microsporidian effectors.

3.1.1 Host-Exposed Gene Pilot and Development of the Two-Step PCR System

In order to screen the N. parisii host-exposed proteins, they first must be captured and cloned into a yeast expression vector, which is then transformed into yeast and finally expressed. The capture step of the process is performed through PCR amplification of the target gene from N. parisii genomic DNA (gDNA). gDNA was obtained by extracting it from prepared N. parisii spores (See Methods Section 2.1.1.). Normally, amplification of eukaryotic genes for cloning uses complementary DNA (cDNA) made from RNA. This is done to avoid the issue of introns present in eukaryotic genomes, which would alter the protein if included. N. parisii, like several other microsporidia, lacks introns and splicing machinery12,21, meaning that genes of interest can be amplified directly from the gDNA without the impacts of introns.

To amplify the genes from the genome, I designed primers following the guidelines in Methods Section 2.1.2 except that at this stage I did not consider removing signal peptides or transmembrane domains. These were the first set of conditions I tried for my primer design, based primarily on rules and conditions that worked for common PCRs with no considerations made for the microsporidian genome. Primers could get quite long to meet some of the design requirements due to the low GC% of the N. parisii genomes21, which led to the ranking system for design requirements. The forward and reverse primers were designed to have an ideal Tm of 55℃. Since Phusion Polymerase would be used in the PCRs, which has a stabilizing effect on primers making the annealing temperature for the reaction would rise to 62℃. I gave this rule the highest priority due to the eventual goal of performing multiple reactions at once, having a single annealing temperature would be essential. The second rule was limiting an annealing region to a maximum of 40 bp. This rule was added to prevent multiple repeating regions within an annealing region, as it would lead to primer dimerization. The final rule was having a GC% between 30-40% and this rule was ranked last as it is the least essential and Phusion Polymerase’s stabilizing effect would

24 alleviate this limitation. The primers were given barcodes69 and yeast synthetic terminators68 that would be added to the 3’ end of the gene, as stated in the methods section. The barcode was added so that pooled screens could be performed with this gene set, using next-generation sequencing to measure the amount of each barcode as a proxy measure of yeast growth. The yeast synthetic terminators were added to provide a similar transcription rate to the traditional CYC1 terminator, which is blocked in the expression vector due to the barcode. One change was that the first set of primers did not use a two-step process to add the Gateway attB1/B2 sites and they had complete Gateway sites, this was changed later (See below).

To test if this system could be used to amplify N. parisii genes, I selected three genes from the N. parisii genome that have different sizes and different genomic loci to ensure this would work across the genome. The three sizes were represented as three categories: Less than 1 kb, 1-2 kb, more than 2 kb. The locations of the genes were verified using MicrosporidiaDB. The three genes chosen were: NEPG_01270 (1512 bp), NEPG_01777 (678 bp) and NEPG_02635 (2847 bp). The sequences of each gene was obtained from MicrosporidiaDB and primers were designed. In addition to these three N. parisii genes, I obtained three Legionella pneumophila genes from the Ensminger Lab to serve as positive controls for yeast growth defects when expressed. These L. pneumophila genes code proteins that have known toxicities and growth defects in yeast. These genes were treated like the N. parisii genes, meaning that I designed primers with the same rules and sequence additions. The three genes were lpg0439 (no toxicity), lpg2176 (moderate toxicity) and lpg0695 (severe toxicity). Two of these genes were cloned for a second time, except the yeast synthetic terminator and barcode were not added, as a control to ensure that these additional sequences did not impact the growth of the yeast (See Table 4 for a summary of all pilot genes).

25

After I designed primers for my genes, I went on to try to amplify the target genes from the genome or cloned vectors (lpg0439 and lpg0695 were provided in a plasmid, L. pneumophila genome was used for lpg2176). The PCR protocol I designed was heavily based on the recommended protocols supplied with NEB Phusion Polymerase. The primary change made was in regard to the amount of template DNA added to the reaction. Most genomic amplifications need around 1-5 ng/µL template concentration, but with the N. parisii genome being quite small I was able to use a 2 ng in a 50 µL reaction. I performed PCR reactions to clone the six genes, the additional sequences controls were not amplified at this time. During the first attempt, 4 of the 6 were successfully amplified, verified by agarose gel electrophoresis (1.5% TAE, SYBR Safe). Two of the genes (lpg2176 and NEPG_02635) did not amplify (Figure 3 Left), these two were reattempted and amplified the second trial (Figure 3 Right). Lane 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5

Figure 3. SYBR Safe agarose gel electrophoresis of pilot gene PCR products. Left. The first trial of all 6 pilot genes. NEB 1kb + Ladder is in Lanes 1 and 8. Lanes 2-4 are NEPG_01270, NEPG_01777, NEPG_02635 with Lanes 5-7 being the corresponding no template negative controls in the same order. Lanes 9-11 are lpg2176, lpg0439, lpg0695 with Lanes 12-1 being the corresponding no template controls in the same order. Right. The second trial, with the two genes that failed to amplify. Lane 1 is the NEB 1kb + Ladder. Lane 2 is NEPG_02635 and Lane 3 is lpg2176. Lanes 4 and 5 are the no template negative controls in the same order. Faint bands seen at the bottom of PCR lanes in both gels are the primers. Following successful PCR amplification of my six pilot genes, I proceeded to Gateway cloning to generate my donor and expression vectors. The first step of Gateway cloning is the BP cloning reaction, which uses the BP clonase enzymes to recombine the PCR product with the pDONR221 plasmid, creating a cloned entry vector. The second step of Gateway cloning is the LR reaction, which uses the LR clonase enzyme to recombine the insert from an entry vector to a

26 destination vector, which I used to generate an expression vector of the target gene. Initial BP cloning reactions were performed using the manufacturer’s suggested protocols, this was eventually changed to the protocol provided in Methods Section 2.1.4. Transformations of BP cloning products would yield many transformants on selective plates, however screening of colonies through colony PCR or BsrGI restriction digest would show that positive colonies were extremely rare. Even after several attempts, I had a few genes cloned into the pDONR221 vector, but some were still not being generated. After testing several reaction variables, I began to suspect that the primers used to amplify the genes were causing the issues. Gateway cloning has a size bias; the primers have complete Gateway attB1/B2 sites and they are long enough (over 100 bp) to be recombined in, so the primers would be preferentially cloned over the genes71. This idea was validated when I used gel extracted PCR product in a BP reaction, it produced the donor vectors I was missing.

Performing gel extractions for every gene I intend to clone would not be feasible, so I had to develop a solution to the cloning failures that would be easily scalable. It is a common practice to perform PCR amplification for Gateway cloning in two steps72, although often done to save cost on the synthesis of primers, using a two-step process could resolve the issues with Gateway cloning. By adding the Gateway attB1/B2 sites in two reactions, the long primers used in the first step to amplify the gene would not have full Gateway sites and would not recombine as they did before. The new primers would also not recombine, resulting in only the gene recombining into the vector. At this point, I had already obtained the BP clones for all six genes; but developing this two-step method would be beneficial in the later stages of this project since it would eliminate the need to gel extract every PCR reaction. To test this new method, I generated primers for lpg2176 and lpg0439, both with the barcode and yeast synthetic terminator following guidelines listed in Methods Section 2.1.2 (Figure 4). I performed the first step of the process using similar parameters to the original PCR, as this step differs from the original by which primers are added. One change

27 control sequence lpg0695 control sequence lpg0439 lpg0695 lpg2176 lpg0439 2635 NEPG_0 1777 NEPG_0 1270 NEPG_0 Gene

pneumophila L. pneumophila L. pneumophila L. pneumophila L. pneumoph L. parisii N. parisii N. parisii N. Species

ila

2847 774 2847 1824 774 2808 678 1512 (bp) Length

transferase AnkX Phosphocholine Uncharacterized Protein transferase AnkX Phosphocholine phosphate lyase sphingosine Probable Uncharacterized Protein Hypothetical Protein Hypothetical Protein uridylyltransferase UT Annotation

P

-

glucose

-

1

-

75

phosphate phosphate

73,74 73,74

15,21

21 21

-

1

-

AAATCTACCT AGGCTCAATGGTAAAAATTATGCC GGGGACAAGTTTGTACAAAAAAGC AATTTTTAAAACATG AGGCTCAATGAGAGGAAAAAAAG GGGGACAAGTTTGTACAAAAAAGC AAATCTACCT AGGCTCAATGGTAAAAATTATGCC GGGGACAAGTTTGTACAAAAAAGC TGCAATG AGGCTCAATGTTCGTAAGGATA GGGGACAAGTTTGTACAAAAAAGC AATTTTTAAAACATG AGGCTCAATGAGAGGAAAAAAAG GGGGACAAGTTTGTACAAA AATAATGATGTACAC AGGCTCAATGATCATGAAACTATT GGGGACAAGTTTGTACAAAAAAGC TCATCAACAT AGGCTCAATGCATACACTAAGTAA GGGGACAAGTTTGTACAAAAAAGC ACA AGGCTCAATGGGAGACAACAGTTC GGGGACAAGTTTGTACAAAAAAGC F Primer

AAAGC

TG

TCCATTTTAATTTCAAGGATGTACCAAC GGGGACCACTTTGTACAAGAAAGCTGGGT TTGCGAAGGGATTTATTTTATCATCTT GGGGACCACTTTGTACAAGAAAGCTGGGT TTTTAATTTCAAGGATGTACCAAC ATGATACTCTTTATTAAATATATACTACCA TCATGAGGTGTTGTTGTTGCGGTTTGAAAG GGGGACCACTTTGTACAAGAAAGCTGGGT TTTATATTTCTCTGTTTTTCTATTTCTAAGT ATGATACTCTTTATTAAATATATACTACAC TCCTACCTTGTTGTTGTTGCGGTTTGAAAG GGGGACCACTTTGTACAAGAAAGCT GAAGGGATTTATTTTATCATCTT ATGATACTCTTTATTAAATATATACTATGC TCGGTTGTTGTTGTTGTTGCGGTTTGAAAG GGGGACCACTTTGTACAAGAAAGCTGGGT ATTAAAGTCCATTTCTCTTATACTTGT ATGATACTCTTTATTAAATATATACTACTG TGCTGGTTGCATATTGTT GGGGACCACTTTGTACAAGAAAGCTGGGT TAATCCAGTATATAACGGATGTTTTTCT ATGATACTCTTTATTAAATATATACTAATA TCCGACTACTCTTGTGTTGCGGTTTGAAAG GGGGACCACTTTGTACAAGAAAGCTGGGT ATCAATGACATTTAAATTTCCTGTTAC ATGATACTC TTGTGAAGATCGTGTGTTGCGGTTTGAAAG GGGGACCACTTTGTACAAGAAAGCTGGGT R Primer

TT

TATTAAATATATACTAATG

GC

GGTTTGAAAG

GG

GT

Table 4. A summary table for the genes cloned in the pilot screen. Gene sizes excluding untranslated regions, annotations, F primer and R primer sequences are provided. For the L. pneumophila controls, lpg0439 (green) is not toxic, lpg2176 (orange) is moderately toxic 28 while lpg0695 (red) has severe toxicity. made was that the number of cycles of the PCR was lowered from 29 to 9, as there will be second PCR step which will have 20 cycles and running too many cycles on a template increases the risk of introducing error in the amplicon sequence. Additionally, casual observations have shown that running a PCR for too many cycles may lead to DNA degradation which will resolve as large smears on gels, which will ruin any subsequent visualization steps. Following the first PCR step, I used a Qubit High-Sensitivity DNA concentration kit to determine the concentration of DNA in the reaction. This step was essential to determine how much of the first reaction should be added to the second reaction and I determined that 3 µL of the first reaction would be sufficient as template in the second reaction. I performed the second reaction of the two-step process and ran the products from the first and second reaction on an agarose gel to visualize the reaction. Both steps of the PCR process worked as expected, and I was able to reliably amplify the target gene (Figure 5). Following the two-step PCR, I directly used the PCR products in a BP reaction to test if the two-step process would result in better cloning efficiency. Following transformation, I selected three colonies from each reaction transformation and digested the plasmid to verify the insert. I found that all three colonies from both plates contained the proper insert, a far superior cloning efficiency than I was able to achieve before. Two-step PCR proved to be a superior method that allowed me to reliably amplify and clone target genes with high efficiencies while eliminating the need for any purification steps.

Figure 4. Two-step PCR primer design for amplification of host-exposed genes. Gene specific primer is the annealing region. Template for the first PCR is N. parisii genomic DNA (L. pneumophila gDNA or cloned plasmids in the case of developing two-step PCR) and the product from this reaction was used as template for the second PCR. Images are not to scale. 29

Lane 1 2 3 4 5 6 7 8 9 10

Step N/A 1 1 1 1 2 2 2 2 2 Gene N/A lpg0439 lpg2176 lpg0439 lpg2176 lpg0439 lpg2176 lpg0439 lpg2176 N/A Template N/A Y Y N N Y Y N N N

Figure 5. SYBR Safe agarose gel of the two-step PCR amplification of lpg0439 and lpg2176. Lane 1 contains 1Kb + Ladder. Lane 10 contains a negative control that has no template from step 1. Amplification seen in the negative control lanes was due to contaminated reagents and was not seen in subsequent trials.

30

3.1.2 Yeast Growth Pilot Screen of the 3 N. parisii genes

Using the BP donor clones I generated, I proceeded to use LR cloning to create yeast expressions vectors for each of the pilot genes. The LR reaction uses the LR clonase enzyme to recombine the donor clones with an empty yeast expression vector to produce a yeast expression vector with the cloned gene. LR cloning of the pilot genes reliably produced cloned yeast expression vectors without the issues of the BP reaction, as there is only one species of reactant and no interfering primers. The next step was to transform the yeast expression vectors into yeast and generate the expression lines. The expression vectors were transformed into the BY4741 haploid yeast strain, which is auxotrophic to uracil due to the loss of the Ura3 gene that acts as a selective marker for transformations. The yeast expression vector, pAG426Gal-ccdB is a 2-micron high-copy number plasmid that contains the Ura3 gene76. Yeast transformations produced the transgenic lines for all of the pilot genes, which were made into frozen stocks, and I tested to see if they would have the proper growth phenotypes by streaking each strain on inducing (Gal) and non-inducing (Glu) media (Figure 6).

This assay demonstrated that all transgenic lines could grow normally under non-expressing conditions. When the gene was expressed, the three L. pneumophila genes showed their predicted growth phenotypes and the two sequence controls showed the same growth as their counterparts. NEPG_01777 and NEPG_02635 did not show any growth defects upon gene expression, but NEPG_01270 does show a moderate growth defect. Although this assay demonstrates that gene expression can result in growth defects and that addition of the barcode and yeast synthetic terminator do not alter the effects of the gene, the strength of the effects is hard to visualize. To solve this, I performed a spot dilution assay to give a clearer understanding of how each gene can cause growth defects. In this assay, I generated an appropriate negative control for gene expression which was yeast transformed with the empty pAG426Gal-ccdB vector, essentially a wildtype yeast line. The spot dilution assay provided a superior visualization of the growth defects caused by gene expression, which matched their predicted growth defects (in the case of the three L. pneumophila controls) and the growth phenotypes seen in the previous streaking assay (Figure 7). NEPG_01270 shows a far clearer moderate growth defect in the spot dilution assay. NEPG_01270 is annotated as UTP-glucose-1-phosphate uridylyltransferase21, an enzyme that catalyzes the reversible reaction of UDP-glucose to glucose-1-phosphate and is involved in the process of glycogen

31 metabolism77, which is also found in several other microsporidia species15. These assays demonstrate that Two-Step PCR and expression of cloned genes in yeast can influence the growth of yeast (Figure 8).

Figure 6. A diagram of the plate layout for streaked yeast colonies transformed with a cloned gene. Non-inducing medium is supplemented with glucose (A) while inducing medium is supplemented with galactose (B). The order of the plates (left to right) is Legionella pneumophila controls, the three N. parisii genes and the two unmodified Legionella controls (no terminator or barcode) with depictions of expected growth phenotypes if applicable, lpg0695 has a severe yeast growth defect, lpg2176 has a moderate defect, lpg0439 does not affect yeast growth. Successful yeast transformants were plated on inducing and non-inducing media, in the same layout in parts A&B (C).

32

Figure 7. Spot Dilution assay of the 6 cloned genes. The top pair are the Legionella positive controls: lpg0695 has severe yeast toxicity, lpg2176 has mild toxicity while lpg0439 has no toxicity. The bottom pair are the N. parisii cloned genes: only NEPG_01270 shows a moderate growth defect.

33

Figure 8. Schematic of cloning and expressing N. parisii host-exposed proteins. The pipeline begins by using Two-Step PCR to amplify the gene of interest (Primers are as seen in Figure 4) followed by Gateway cloning to produce a yeast expression vector. The vector is then transformed into yeast and the gene expressed, if the gene causes a growth defect upon expression, the gene could be a microsporidian effector protein. Created in Biorender.com. 3.2 N. parisii Host-Exposed Protein Screening List and Composition

3.2.1 Choosing Genes and Designing Primers

Using the methods and pipeline developed in the previous sections, a large collection of N. parisii HE proteins can be screened to determine if they are potentially microsporidian effector proteins. The first component of this large gene list will be genes that were confirmed to be host- exposed proteins. These are the genes that Reinke et al. 2017 detected to be HE through their APX tagging system42. A common practice used during the study of bacterial effector proteins is to remove all the parts of the proteins except the cytosolic regions, which are the regions that are secreted to the host45,60. Including other regions, such as transmembrane domains and signal

34 peptides, can impact the folding of bacterial proteins but with microsporidia proteins they may be trafficked though the yeast’s secretory mechanisms21,78. If they proteins are trafficked out of the cell, then the cytotoxic functions related to the host-exposed regions cannot be detected. As such, I opted to remove the signal peptides (SP) and transmembrane domains (TMD) from these proteins, a step that was not included at the time of the pilot analyses (Results Sections 3.1.1 and 3.1.2). NEPG_01270 was the only gene of the three that did not have an SP or TMD, so the other genes may have also been cytotoxic but since they had an SP and/or a TMD, they did not cause a growth defect. By removing SPs and TMDs, they will not interfere with the cytotoxicity of the cytosolic regions. However, by opting to remove the SPs and TMDs, I had to remove some genes from the 72 confirmed host-exposed gene list. The APX method used to discover these proteins does not consider the topology of the proteins; anything that is host-exposed will be tagged, which includes proteins with multiple transmembrane domains or small host-exposed regions (less than 50 amino acids). Host-exposed proteins with either of these criteria were omitted from the screen list as it would be difficult to clone each external region between the multiple transmembrane domains and very small host-exposed regions will likely have no function in the yeast cell. One exception was NEPG_01777, which has two TMDs, and it was included due to being chosen for the pilot screen. After removing genes with the aforementioned traits, I designed primers as per the guidelines in Method Section 2.1.2. The first group of genes in the screen list would be the remaining 52 Confirmed HE Genes.

The second group in the screen list will contain genes that are predicted to be HE. Reinke et al. 2017 found that the HE proteins they detected were enriched for certain features such as the presence of a SP or TMD and that many HE proteins are members of large gene families. To test if these predictions are accurate, I selected 20 genes that were predicted to be HE genes but were not detected in the biotin tagging42. These genes were randomly chosen from a master list of predicted HE genes and referenced with the list of confirmed HE genes to prevent duplicates, genes that had the undesired traits detailed in the previous chapter were also excluded. An additional criterion added to create this list was to control for the large gene families some of these HE genes belonged to. The gene families are not equal in their member counts42, so to avoid over or underrepresenting a single large gene family, I made sure to randomly choose genes within a family and check the final lists to ensure that the families were properly represented. After a suitable list of 20 genes was generated, I designed primers following Methods Section 2.1.2. while

35 replacing any genes that did not meet the criteria for good primer design with another randomly chosen gene.

The third group in the screen list would serve primarily to show including SPs and TMDs will influence the growth of the yeast upon gene expression. To create this group, I randomly chose 15 genes from the first group (Confirmed HE Genes) and 5 genes from the second group (Predicted HE Genes) for a total of 20 genes, the 3:1 ratio matching the size ratio of the two groups. I designed primers for these genes like I did earlier, however I used the full gene sequence and did not remove the SPs or TMDs. If SPs and TMDs do impact the growth of the yeast when they are not removed, then the genes in this group will show different yeast growth defects when compared to their counterparts in the first two groups. Microsporidian signal sequences have been shown to function in the yeast so the SPs and TMDs may change the localization of the expressed proteins and shuttle them away from where they are having their cytotoxic effects21,78. This would result in genes that were cytotoxic in the first two groups now showing no cytotoxicity in this group. Another possibility is that genes that were not cytotoxic in the first two groups will show cytotoxicity when their SPs or TMDs are restored. It is difficult to predict the outcome of this group, but it will be informative regarding the process of dealing with SPs and TMDs in microsporidian genes and will inform future work hoping to use yeast to study microsporidia.

The final group will contain 20 genes that are not predicted to be HE genes. This category of genes will include non-secreted genes, some of which are likely microsporidian housekeeping genes that are not involved with modulation of the host cell. The majority of this category lack SP and TMDs, making them distinct from the previous groups where such features were abundant. As with the previous groups, I randomly selected 20 genes and designed primers for each gene. 11 of the 20 genes have been annotated (Appendix 3), with annotations ranging in completion21. This group was added to see if the trends observed with bacterial effector proteins apply in the microsporidian context. Bacterial effector proteins are more likely to be cytotoxic over non- secreted proteins59. The applicability of this trend to the microsporidian context can be gauged as if it holds true, then the Not Predicted HE group will show lower rates of cytotoxicity when compared to the first two groups. In total, there are 112 genes that will be screened in order to identify and understand effector proteins (See Appendix 3 for the gene list). The genes were amplified and cloned following the protocols from Methods Section 2.1.3, 2.1.4 and 2.2.3).

36

3.2.2 Functional Genomic Screening of N. parisii host-exposed genes

Of the 112 genes selected for cloning and screening, 97 would be successfully cloned into the yeast expression vector, the missing genes were lost at various stages during the cloning pipeline. Three of the genes failed to amplify in the two-step PCR, either having no observable or incorrectly sized amplicon. Another ten would not BP or LR clone correctly, indicated by the lack of bacterial colonies or an abundance of colonies that had the incorrect insert confirmed by plasmid extraction and restriction digest. Entry vectors that had the proper restriction pattern were Sanger Sequenced bidirectionally using M13-Invitrogen primers, and 95% of the successful genes had the correct sequence (See Appendix 3 for a full breakdown of all cloned genes). 85 of the successfully LR cloned genes were transformed into yeast and arrayed into a liquid growth screen master plate, with the remaining 11 being halted to contamination in the yeast transformation experiments, however were successfully transformed later but could not be screened at the time. (See Methods Section 2.3.2). Liquid screening of the gene library included the control genes from the spot dilution assay (Figure 7), the three Legionella positive controls and an empty vector negative control. Analysis of the liquid screening growth curves demonstrated that the controls showed the toxicities and growth patterns like what was seen in the previous assay (Figure 9).

Control Growth Curves on Glucose vs. Galactose Media 1.8 1.6 1.4 EV Glu 1.2 lpg0439 Glu

1 lpg2176 Glu OD 0.8 lpg0695 Glu 0.6 EV Gal 0.4 lpg0439 Gal 0.2 lpg2176 Gal 0 lpg0695 Gal 0 10 20 30 40 50 Time (Hours)

Figure 9. Growth Curves for the controls on non-inducing (glucose/Glu) and inducing (galactose/Gal) media. Growth curves were obtained by measuring the OD620 of the control wells at 15-minute intervals over a 48-hour period. EV represents the empty vector negative control, lpg0439 is the low toxicity positive control, lpg2176 is the moderate toxicity positive control while lpg0695 is the severe toxicity positive control. 37

Liquid screening was used to identify additional toxic N. parisii genes by comparing the area under the curve for each gene’s growth curve to that of the empty vector negative controls, all on inducing medium. A gene was defined as toxic if it had a growth defect stronger than the three times standard deviation of the negative controls from both master plate replicates (Figure 10). Using this approach, 16 HE genes, 0 Pred HE, 1 SP/TM and 6 Non HE genes were determined to be toxic (Table 5). With these counts, 30% of the HE, 5.8% of the SP/TM and 40% of the Non HE genes showed distinguishable toxicity.

Distribution of Area Under Curve Relative to Empty Vector Average Area 10

5

0

-5

-10

-15

-20

Area Under Curve (subtracted EV area) EV (subtracted Curve Under Area -25

-30 EV average 3X Std.dev EV lpg0439 lpg2176 lpg0695 HE Genes Pred HE Genes SP/TM Non HE Genes

Figure 10. Areas under the curve for each gene, normalized to the empty vector controls. The EV average of 0 is denoted by a purple line, while the -3X standard deviation of the EVs is denoted in a black line. All genes that fell under this line were labelled as toxic.

38

Gene Group Annotation Growth Reduction (%) NEPG_00672 HE Phosphoglycerate kinase42 73.0 NEPG_00719 HE N/A 74.3 NEPG_00771 HE N/A 83.5 NEPG_00935 HE N/A 85.3 NEPG_01040 HE N/A 86.4 NEPG_01056 HE N/A 84.5 NEPG_01134 HE N/A 85.2 NEPG_01266 HE Adaptin-N42 74.3 NEPG_01270 HE UTP—glucose-1-phosphate 65.2 uridylyltransferase42 NEPG_01319 HE N/A 20.5 NEPG_01767 HE N/A 83.5 NEPG_01930 HE N/A 85.8 NEPG_02502 HE N/A 84.5 NEPG_02541 HE N/A 72.7 NEPG_02543 HE N/A 84.6 NEPG_02661 HE N/A 77.9 NEPG_02234 SP/TM N/A 75.8 NEPG_00023 Non HE 60S ribosomal protein 7A21 34.3 NEPG_01641 Non HE N/A 60.8 NEPG_01820 Non HE 14-3-3 family protein 28.7 beta/alpha21 NEPG_02077 Non HE N/A 83.0 NEPG_02185 Non HE N/A 78.1 NEPG_02506 Non HE N/A 79.3

Table 5. Summary table of all toxic genes. Growth reduction values are provided compared to growth values from the average of the empty vector negative controls.

39

3.3 Future Directions

The first step to follow up on the work described in this section would be to finish transforming the outstanding expression clones into yeast and then screen these remaining N. parisii genes. The results from the screen would then help decide the next steps. One possibility would be making another expression vector library using a low-copy number CEN plasmid. Using low-copy plasmids to express the gene of interest would put a lower burden on the yeast and would require a gene to be very toxic to cause a growth defect, this would be desirable if the high-copy plasmid screen showed many genes of the Non HE category having noticeable toxicity and growth defects. The existing gene collection should also be screened using solid growth. Solid growth screens are similar to liquid growth, differing by having lower sensitivity but are less likely to generate suppressor mutants and their confounding of the data. Repeating the screen using solid growth would provide an orthogonal method to determine the identity of novel microsporidian effectors, bolstering the confidence and reliability of the findings from the liquid growth screen. Additionally, the barcodes added to each gene allow for the use of pooled screening by growing all of the different yeast strains in a single culture, extracting the plasmids, sequencing them and using read depth as a proxy measure for growth to identify if any genes caused growth defects79,80. This screen should be performed following the arrayed screening methods to determine the differences between the different screening techniques. A pooled screen is designed for much higher throughput and would require a much larger sample size than what is available using the current gene set. However, since barcodes are being measured and not individual genes, generating multiple expression vectors of a single gene that each had a unique barcode through the use of methods such as site directed mutagenesis or round-the-horn PCR may be a solution to obtain a larger sample size70. The prospect of pooled screening is very promising as it would provide a powerful method to analyze many microsporidian genes at once, and through variation of the screen conditions, would provide a way to determine gene function of genes other than effector proteins. Combined with high-throughput gene capture and cloning techniques, pooled screening would be a powerful tool; the data generated from and the understanding of pooled screens from this experiment would be of much use in developing a system to perform much larger screens.

Following completion of the screen, genes that caused toxicity that came from the HE or Predicted HE groups should be prioritized for further study as likely microsporidian effector

40 proteins. To confirm that they are effector proteins, biochemical assays should be performed to determine their function. Assays such as affinity purification and pulldowns would be useful to determine the identity of the proteins that interacting with the protein of interest. Using the Gateway system, an expression clone of the target gene fused with a protein tag such as a FLAG tag can used to pulldown interacting partners81. Additionally, this strategy could be used to reinforce the conservation of biology between C. elegans and S. cerevisiae, by comparing the interacting partners of a single gene of interest from the two organisms. This set of experiments would also provide insights into the differences between the two organisms and what interactions may potentially be lost due to the use of yeast over nematodes.

An additional method to identify interacting partners would be to use yeast single-gene deletion strains. An effector that targets a pathway that is lost through the gene deletion will show differential growth when the effector is expressed, indicating a genetic interaction. This technique has been used to great effect to determine the partners and potential functions of bacterial effector proteins, such as the OspF effector protein of Shigella flexneri56. A hinderance of this technique is that screening the entirety of the yeast deletion collection for many genes of interest can be very resource intensive, so being extremely selective on which genes to screen is required. However, the use of condensed deletion collections can serve to provide similar robustness of the full collection without the high cost. Using computer modeling and the observation that much of the interactions between yeast genes are redundant, a set of 90 yeast deletion strains has been developed that covers approximately 70% of all known genetic interactions and was successful in determining the same interacting partners and function as the aforementioned OspF protein56,82.

3.4 LASSO Cloning

Advancement of technology, primarily in the field of DNA sequencing, has led to a greater understanding of microbes through analysis of their genomes. Next-generation sequencing has expanded the available catalogue of microbial genomes, with over 20,000 complete genomes available61. Despite the wealth of genomic information, it is burdened with several limitations. Genomic information provides the sequences of proteins available in the genome; but it may not provide information regarding the functions or roles for much of the proteins within the genome, especially for divergent organisms. Sequencing a genome has become inexpensive and is currently a feasible option for most, however the cost of synthesizing a gene is still not accessible for all the

41 proteins in a genome. This cost difference has resulted in a bottleneck between genomic and proteomic data, with many proteins in a genome having unknown functions61. In addition, many proteins are annotated based on homology to proteins of known function, but this is not always adequate to describe protein function in some microbial contexts as proteins may be coopted to serve different roles in different organisms. To bridge the gap between genomic and proteomic data, high-throughput functional assays are needed. Due to high cost of DNA synthesis, acquiring commercially made genes is not feasible on a genomic scale. The solution is to perform high- throughput capture of genes from the genome which has several advantages such as relatively low cost and requires only genomic DNA that can be acquired even from organisms that are difficult to culture.

There are many methods to generating an ORFeome (all open-reading frames in a genome) however, the limiting step is capturing the gene from the genome. Traditionally, PCR is used to amplify the gene, which is then cloned as needed72, however performing genomic scale PCR is a demanding process and not always feasible in most contexts. Recently, a new technique called Long Adapter Single Strand Oligonucleotide (LASSO) cloning was introduced. LASSO cloning uses single strand commercial DNA oligonucleotides to capture genes from the genome, with whole ORFeome capture in a single reaction being achievable83. LASSO builds upon the principle behind molecular inversion probes, a method for sequencing specific loci at high depth, and uses similarly structured single-stranded DNA probes to a specific DNA sequence, after which it is captured by synthesis83. LASSO cloning would solve the issue of gene capture being a limited step as it provides a method to perform to quickly and inexpensively capture an ORFeome. The captured ORFeome could be cloned and analysed in a given experiment to provide information on every gene in the genome.

LASSO cloning would be highly beneficial in the microsporidian context to study the functions of microsporidian genes. The microsporidian genome is largely uncharacterized and has been a challenge to study15. Understanding microsporidian proteins is hampered by the divergence this group has with other well known fungi such as yeast, and microsporidia cannot be cultured outside a host and many species of microsporidia are not genetically tractable15,21. LASSO cloning would allow for screening of microsporidian ORFeomes, providing a rapid tool to characterize genes. With LASSO cloning, screening all 700 N. parisii and other microsporidian HE proteins42

42 would be feasible, and due to the versatility of the yeast model organism, assays can be generated to learn more about the functions of proteins other than HE proteins. Application of LASSO cloning extends past N. parisii and can be used to obtain and screen ORFeome libraries for many microsporidia species, further amplifying our understanding of microsporidia biology.

This chapter will detail my progress in developing a LASSO system capable of capturing N. parisii genes and trying to capture multiple genes at once. I will discuss the steps I took to capture a gene, different approaches and innovations to the LASSO protocol as well as pitfalls and limitations I have discovered with LASSO cloning.

3.4.1 LASSO Cloning Procedure

LASSO cloning begins with an Oligo pool of single-stranded DNA probes called pre- LASSO probes. The pre-LASSO probes are in this pool are not ready for gene capture and have to be processed through a set of reactions first. The pre-LASSO probes provide specificity in the final LASSO probe and are combined with a universal 242 bp adapter to create the final LASSO probe. The final LASSO probe is 345 bp long, ordering a pool of completed LASSO probes would be expensive and was often not a purchasing option at the time the LASSO protocol was published. The main steps to create a functional LASSO probe involved fusion PCR with the adapter, intramolecular ligation to form a circular product and inverse PCR to create a linear oligo with the correct orientation83 (Figure 11). There are a series of enzymatic steps scattered in between the main steps, see Methods Section 2.5.1 or Tosi et al. 2017 for a full protocol.

Generation of the LASSO probes is done in pools, the same format that the pre-LASSOs are acquired. The LASSO probes will also be pooled and will facilitate pooled capture of genes, allowing for a large number of genes to captured in a single reaction, enabling capture and cloning for thousands of genes83. Gene capture occurs when the LASSO probes bind to their target sites in the DNA, the extension arm will act as a primer for DNA synthesis to add the rest of the gene which is ligated to the ligation arm creating a circular product83 (Figure 11).

3.4.2 Implementation and Progress Using the Published LASSO Protocol

To test if LASSO cloning would be a viable process to clone N. parisii genes, I started with attempting to capture a single gene using the LASSO method. The gene that was chosen was NEPG_00061, a 966 bp gene that was bioinformatically chosen to fit the average N. parisii gene.

43

A Pre-LASSO probe and the universal adapter were obtained to test if the full protocol was able to produce a functional LASSO probe. The first attempt to follow the published protocol was not

Figure 11. Diagram of LASSO Probe synthesis and gene capture. A. A formula for generating LASSO probes from the Pre-LASSO and the Adapter. B. The main steps to generate the LASSO pr obes, enzymatic reaction and purification steps have been omitted for simplicity. C. Gene capture using a LASSO probe. The probe binds to the DNA, the gap is filled by synthesis and the probe is amplified to generate a linear molecule. Enzymatic steps are omitted. Created in Biorender. Adapted from Tosi. et al, 2017.

successful, and due to the limited visualization steps in this protocol, it was difficult to point out the exact step that had failed. The fusion PCR step was successful and reproducible (Figure 10) however problems seemed to arise from the intramolecular ligation steps. The ligation is intramolecular so that the probe binds to itself and the orientation can be reversed in the following inverse PCR. Intermolecular ligation is a more favourable reaction compared to intramolecular ligation, so the original authors used very low concentrations of fusion PCR product in the ligation reaction to minimize intermolecular ligation events. The ligation reaction used 5 ng of EcoRI

44 digested fusion PCR products in a 2 mL reaction, which was then concentrated to 100 µL and purified; a resource intensive process and sometimes I would notice that there was no product available following the purification steps. DNA concentration measurements of ligation products yielded no DNA in the solution and using the ligation product as template, as even if there was no measurable DNA it may still amplify, in the inverse PCR provided no detectable amplicons, hinting at a lack of template. This finding demonstrated that the ligation procedure from the published protocol was where the issues were arising. Several attempts at the ligation reaction produced the same lack of ligated product, indicating that this step would have to be modified to achieve the desired intramolecular ligation.

3.5 Modifications to the LASSO Protocol

3.5.1 Blunt-End Ligation

A potential solution for the intramolecular ligation was to use a blunt-end ligation instead of the EcoRI sticky-end method from the original protocol. This idea was inspired by a PCR based mutagenesis protocol called Round-the-horn site directed mutagenesis (RTH). RTH is a versatile method used to mutate plasmids and generates the mutations through PCR. RTH uses phosphorylated primers that will generate a linear blunt-ended product with the desired mutation. Due to the primers beingphosphorylated, blunt-end ligation can intramolecularly link the ends to generate a circular vector70.

First, I generated a positive control for blunt-end ligation by designing primers to blunt a readily available plasmid, pMA112, a 3853 bp plasmid. The primers would not introduce a mutation; but would blunt the plasmid and let me use it to test if the blunt-end ligation methods provided in the RTH protocol can perform the ligation. I amplified the plasmid to generate the blunt ends and performed the blunt end ligation as per the RTH Protocol (See Methods Section 2.5.2). The ligation was performed with PCR product that had been DpnI digested, which would remove the template plasmid, preventing it from transforming into a competent cell and would provide a clearer negative control. The ligation products were transformed with colony count as a measure for how successful the reaction was. Colonies grew after being transformed with the ligation products and the negative control showed no growth, indicating that blunt-end ligation as per the RTH protocol is possible. The next step was to manipulate the concentration of the plasmid in the reaction to see which would provide the best reaction efficiencies. I tried three protocols

45 from different sources, each protocol was performed with the pMA112 plasmid and then transformed into DH5α E. coli, using colony count as a measure of ligation efficiency (Table 6). Protocol Conditions Colony Count Colony Count Colony Count Number (1µL) (10ng of DNA) (no template). 1 8.5µL PCR Product 73 123 0 1µL 10X T4 Ligase Buffer 0.5µL T4 DNA Ligase (0µL for control) DNA concentration = 18.36 ng/µL 2 2µL PCR Product 72 224 2 2µL 10X T4 Ligase Buffer 1µL T4 DNA Ligase (0µL for control) Water to 20µL DNA concentration = 2 ng/µL 3 2µL PCR Product 53 132 2 17µL 1X CutSmart w/ 1mM ATP (18µL for control) 1µL T4 DNA Ligase (0µL for control) DNA concentration = 2 ng/µL Table 6. A summary table for the results of the blunt-end ligation using pMA112. The ligation reactions were all incubated at 25℃ overnight and were transformed into E. coli, using the colony count as a measure of reaction efficiency. I moved on to attempting the blunt-end ligation on the fusion PCR to achieve intramolecular ligation instead of the published protocol. Using the methods developed from the RTH testing, I performed a blunt end ligation using the fusion PCR product amplified with phosphorylated primers. I was able to produce blunt-end ligated products using the new protocols however the two species of products were produced. There were two bands seen, one at 375 bp and another at 750 bp, and both of these bands were present after exonuclease digestion which indicates that the smaller band is my desired intramolecular ligation product while the larger band is formed by intermolecular ligation followed by intramolecular ligation to create a circular product. Comparisons between band intensity of exonuclease digested and undigested reactions showed that although the blunt-end ligation was possible, it was an inefficient reaction as there

46

Figure 12. Agarose gel for the Solution Size 20 20 20 10 20 visualization of the blunt-end ligation (µL) reaction products. All reactions had 40 Ligated + + + + - ng of DNA in varying solution volumes. Exonuclease + - - - - Exonuclease digestion was done with Digestion exonuclease I and exonuclease λ for one Purified + + - - - hour at 37℃. Purification of the reactions was performed using a purification column. Lane 1 contains the 1 kb + Ladder. The green circle denotes the desired band and the red circle denotes the larger band, both are seen again in Lane 3 for clarity. The last Lane is fusion PCR in a mock reaction to serve as a size marker. Note that the bright band seen above the larger ligated product is an artifact of SDS loading dye and SYBR Safe reaction.

was a noticeable decrease in intensity (Figure 12). Inverse PCR using these ligation products did not provide any amplicons, however it could be due to an issue with the ligated template or PCR conditions as I did not have a positive control for the inverse PCR. Later attempts to produce more ligation product did not result in the same results as seen in Figure 12, indicating a lack of reproducibility in the ligation reaction.

3.5.2 Large Volume Ligation Reactions

My next approach to achieve reliable intramolecular ligation was to use a method provided in the supplemental information for the LASSO paper. The protocol in the supplemental information used 1 µg of EcoRI digested fusion PCR in a 5 mL reaction, a much larger reaction that the main protocol that is performed with a higher concentration of DNA. This protocol was likely used by the others to confirm that intramolecular ligation is possible, and I chose it because even if the ligation is not very efficient, with this large amount of DNA, I should be able to produce enough ligated product to proceed with the rest of the LASSO protocol. Performing this ligation did produce the desired ligation products however the ligation lacked efficiency as much of the input fusion PCR DNA was lost and the products were split into two bands (Figure 13). Like the

47 previous ligation, a band with the right size was produced and another that was double the size. The topology of each band was revealed following exonuclease digestion, as the smaller band remained, indicating that it was circular, but the larger band was lost due to being linear.

Ligated + - + - Exonuclease - + + - Digestion

Figure 13. SYBR Safe agarose gel for the large volume ligation. Ligation was performed with 1µg in 5mL buffer overnight. The ligation was purified using a silica DNA purification column. Lane 1 contains the 100bp ladder. The green circle denotes the desired intramolecular ligation product while the red circle denotes larger intermolecular products. Lane 4 contains fusion PCR product as a marker for size comparison.

Next, I moved onto the inverse PCR using product from this ligation as the template. Initially, the inverse PCRs were unsuccessful or produced bands of several kb in size, but the issue was fixed when annealing temperatures were corrected to match Phusion polymerase (NEB) and higher concentrations of SDS were added to the loading dye. Successful inverse PCRs produced the desired band, with some impure species as well (Figure 14). One band with the right size was produced while two other larger bands were made that are double and triple the size of the LASSO probe. The production of these larger bands indicates that there were circular bands of the same size from the ligation reaction that were not present at the amounts needed to be seen in the gel (Figure 13), but were able to be amplified in a PCR reaction.

48

Figure 14. Agarose Gel for the Inverse PCR. Lane 1 contains 100 bp ladder. Lane 3 contains the inverse PCR. The green circle denotes the desired product while the red circles denotes undesired DNA species produced by the reaction. Lane 4 contains fusion PCR product as a marker for size comparison. Lane 5 is the inverse PCR no template negative control. Lane 2 is empty.

The next steps would be to perform the final enzymatic reactions to create a functional LASSO probe. The inverse PCR product is BspQ1, Exonuclease and USER enzyme treated to convert it into its final single stranded form. The probes are purified following digest but attempts at purification were not successful and this could be due to the inefficient inverse PCR. The inverse PCR produced multiple species of products and the intended one is not as prevalent as the other two. The inverse PCR product is purified then used in the final enzymatic steps and is purified one final time, multiple purification steps with a low start yield likely lead to loss of the small amount of the LASSO probe. This, combined with the findings from previous sections, demonstrates that construction of LASSO probes is a difficult process that without refinement, would not be feasible in most circumstances.

49

3.5.3 Oligo Pool Mediated Gene Capture

When the LASSO protocol was published, DNA oligo synthesis technology did not allow for cheap synthesis of complete LASSO probes, which is why they were ordered as Pre-LASSOs that were processed into their final configurations. Recently, new options for synthetic oligos have become available and pooled oligos can reach 350 bp. This allows for commercial synthesis of full length and complete LASSO probes, an exceedingly valuable option as I can bypass the intensive and difficult process of generating LASSO probes from the shorter Pre-LASSOs. To test if these new synthetic oligo pools(Opool) can capture a gene like the LASSO probes were capable of83, I ordered a pool of 12 oligos, made with similar conditions to those used in the LASSO protocol, to see if these probes could capture N. parisii genes.

OPool LASSO Probes ordered from Integrated DNA Technology (IDT) have several requirements and limitations for their synthesis, the most important being a limit of 350 bp per oligo. I designed 12 probes to capture a collection of N. parisii genes, three being the genes from the HE Pilot, another three randomly selected HE proteins and six not predicted HE proteins (See Appendix 1). OPool LASSO probes were designed with parameters from the original LASSO protocol and the Two-Step PCR cloning, so that the capture protocols from LASSO could be applied with the screening systems of the Two-Step method. Barcodes, yeast synthetic terminators and Gateway Cloning Sites were added while SPs and TMDs were removed. Attempts with the N. parisii Opool LASSO Probes were attempted twice (See Methods Section 2.5.4) and no amplification of target genes was seen. Large smearing was visible on the gels (Figure 15), which was later concluded to be caused by running too many cycles on a reaction that contained used Phusion Buffer; lowering the cycle number removed the smearing. A new Opool was made using K12 E. coli genes as targets due as positive controls being previously captured using the LASSO protocol83.(See Appendix 1 and 2).

50

Lane 1 2 3 4 Content 1 kb+ PCR + Capture - PCR - ladder

Figure 15. Opool PCR using N. parisii probes. Smearing seen in the lanes was visible for all but did not always appear the same. Capture negative (Lane 3) is PCR on a capture reaction with no genomic DNA template. PCR Negative (Lane 4) is a PCR with no capture reaction. Image is representative of all similar trials.

Using the K12 E. coli Opools allowed for more expedient trials, due to the ease of acquiring genomic DNA. The smearing was present in the first few trials but was found to be caused by running Opool PCRs for too many cycles that contained Phusion Buffer from the previous step, the Opool Capture. Although the smearing was resolved, giving clarity on the capture efficiencies for each reaction, subsequent Opool Capture trials revealed that none of the target genes were being captured (Figure 16). The cause was not determined and lead me to conclude that Opool based gene capture was not possible under these conditions. A potential cause could be the percent completion of the synthetic Opools, as at the recommended length used for capture in the original LASSO protocol, less than 20% of the probes would be the correct length.

51

Lane 1 2 3 4 5 6 Content 1 kb+ ladder PCR + Capture - PCR -

Figure 16. Opool PCR using E. coli probes. Capture negative (Lane 4) is PCR on a capture reaction with no genomic DNA template. PCR Negative (Lane 6) is a PCR with no capture reaction. Smearing is seen in Lane 6, however was not seen in subsequent trials. Image is representative of all similar trials.

3.6 Future Directions

Replicating or modifying the techniques from the published LASSO protocol was not as fruitful as desired. For future steps, a more simplified and sequential process should be used to determine the feasibility of the LASSO method. Firstly, generation of LASSO probes should be pursued and tested following the successful capture of a gene using a synthetic preconstructed LASSO probe. Although a commercial oligo of that size would not be inexpensive, it would be a good first step to determine if LASSO capture was possible and reaction conditions needed to do so. Following successful capture by the commercial LASSO probe, the next step would be to use the LASSO protocol and generate a LASSO probe for capture. This step would test if the LASSO probe generation protocol was feasible and if any changes were necessary or could be made to improve the process. Additionally, if the LASSO probe is successfully made and the capture step is next, the synthetic probe would serve as a positive control for gene capture.

52

The next challenge would be to determine if multiple gene capture was possible using the LASSO protocol. There are several potential routes to test this, either purchasing additional commercial LASSO probes or by generating multiple LASSO probes using the LASSO protocol. The former would be more costly than the latter, but would circumvent the need to generate the probes before capture, allowing for testing of multiple gene capture without any confounding variables generated by the attempt of generating multiple LASSO probes at once. The generation of multiple probes using the LASSO probes could be isolated and the synthetic mixture could serve as a positive control for capture, as with the previous set of steps. The capture reactions should ideally begin with a small number of target genes of easily distinguishable sizes, such as a < 1kb, ~ 1kb and > 1kb gene set. Following successful capture of three genes, the capture reaction can be sequentially increased in number to determine how increasing the number of targets impacts the capture, and if any changes need to be made for a larger target set. Additionally, testing the viability of the Opools would be possible following the generation of a working multiple capture protocol as it would serve as a positive control, if the Opools fail while the synthetic or generated set succeed, then the Opools can be safely deemed unviable. LASSO cloning is a powerful technique, however it has seen limited use outside the original publication and follow up publications from the inventors of the protocol have shown that the conditions of LASSO capture are very sensitive to change, with multiple sources of error84.

Discussion:

4.1 Yeast as a tool to study microsporidian effector proteins

Leveraging yeast as a tool to study effector proteins can be a powerful and informative method. Yeast has previously been used to study effectors from a variety of different bacterial pathogens, such as Shigella and Legionella46,56,59,60. In addition, the versatility of yeast genetics allows for generation of a variety of assays to further elucidate the biology of effector proteins56,57.

A common utility of yeast in the study of effector proteins is to identify effectors through their toxicity59,60. I cloned three N. parisii host-exposed proteins and used yeast to identify NEPG_01270 as a toxic host-exposed protein in yeast, indicating firstly that that it may be an effector protein and secondly, that microsporidian host-exposed proteins can be toxic to yeast enabling the model organism’s use to identify additional novel microsporidian effectors. The

53 status of NEPG_01270 as an effector protein is not concrete, as the gene lacks a signal peptide or transmembrane domain, so the mechanism of how it is secreted to the host cell is unclear. Additionally, NEPG_01270 is annotated as UTP-glucose-1-phosphate uridylyltransferase21, an enzyme that catalyzes the reaction of UDP-glucose to glucose-1-phosphate and plays a role in glycogen metabolism77. This finding is interesting as hexokinase, another metabolic enzyme, is secreted by N. parisii and other microsporidia21,42. Hexokinase has been shown to play a variety of roles in different microsporidia species. HK from Nosema bombycis and N. ceranae were found to phosphorylate host glucose, confirming catalytic activity of the secreted protein27 while HK from Trachipleistophora hominis was found to be localized to the plaque matrix (the interface between the meront and its host cell) and was shown to influence the metabolism of glucose in the host cells26. HK from N. bombycis and Antonospora locustae were found to localize in the host nucleus11,50,51, suggesting a role host cell transcriptional regulation, and downregulation of N. bombycis HK impaired the microsporidia’s proliferation50. UTP-glucose-1-phosphate uridyltransferase can generate free glucose in a cell and it is possible that N. parisii, and other species that secrete this enzyme, are using it to generate glucose in the host cell for their benefit. N. parisii is dependent on the host cell for nutrients that it uses to replicate and may be using this enzyme to free up glucose for its use. The enzyme does not have an SP or TM and was found to be localized in the cytoplasm where it may be influencing host metabolism, but was also found in the nucleus where it may be playing a less understood role. The activity and metabolic capability of this enzyme are unknown; future studies should examine the enzymatic activity, as done with HK27, and any transcriptional regulation of the protein when it is present in the nucleus. Due to the conservation of UTP-glucose-1-phosphate uridyltransferase in other microsporidia, even those outside the Nematocida genus such as Encephalitozoon hellem, understanding the role of the N. parisii gene may shed light on its orthologs.

I proceeded to generate a gene set of 112 N. parisii genes, 72 of which are host-exposed proteins42, to determine the identities of novel microsporidian effector proteins as well as the rules governing heterologous expression of microsporidian proteins in yeast. 97 of the genes were successfully cloned into yeast expression vectors and 85 of them were arrayed into a master plate, ready for screening. I conducted this screen to identify novel microsporidian effector candidates and further the understanding of the rules governing the expression of microsporidian genes in yeast, such as the influence of SPs and TMDs on the growth of yeast and if, as seen in bacteria57,59,

54 microsporidian secreted proteins have higher rates of toxicity compared to non-secreted proteins. The screen revealed a total of 23 genes that were toxic to yeast and of these genes, only five were annotated. Of the three HE genes that were annotated, two were metabolic proteins involved in glucose metabolism77,85, while the other is important in the formation of vesicles86. The metabolic proteins were UTP—glucose-1-phosphate uridylyltransferase, which was discovered earlier, and phosphoglycerate kinase. The detection of these proteins, along with hexokinase, as host-exposed proteins, indicates that N. parisii may be modulating host metabolism to free the nutrients needed for replication. The secretion of adaptin may play a variety of roles during the process of infection. Progeny N. parisii spores are known to exit the host cell in a non-lytic manner, dependent on host factors such as RAB-1138. It is possible that the release of adaptin plays a role in this process as the identification of host-exposed proteins would have captured secreted proteins important in the vesicle formation step of the microsporidian life cycle21,42. The Non HE genes showed higher rates of toxicity that the other groups. Only two of these genes had available annotations, one is a 14-3- 3 protein family member while the other is 60S Ribosomal Protein 7A. In yeast, 14-3-3 proteins have been shown to interfere with cellular trafficking87 when overexpressed and some ribosomal proteins can cause growth defects when overexpressed88,89. The results from the screen suggest that inclusion of signal peptides and transmembrane domains can influence the growth phenotypes of yeast, as the SP/TM group had a far lower rate of toxicity than the HE genes and two of the toxic HE genes, NEPG_01266 and NEPG_02543, did not show toxicity with the presence of these elements. The findings also revealed that HE genes and Non HE genes have very similar rates of toxicity, which contrasts the trends seen in bacterial effector proteins57,59. Conducting an additional screen, with a larger sample size for each of the groups, would provide more support for the findings.

The rates of toxicity seen in this screen raise the issue of the shared evolutionary relation between yeast and microsporidia. The toxic genes from this screen may in fact be novel microsporidian effectors, but they may also cause toxicity which would also be seen with the overexpression of homologous yeast genes. One such example is UTP—glucose-1-phosphate uridylyltransferase, which caused a measurable toxicity in yeast, but literature has shown overexpression of yeast UTP—glucose-1-phosphate uridylyltransferase (UGP1) can lead to growth defects in the presence of galactose90. Around 15% of yeast genes have been found to be toxic to yeast when overexpressed88, and microsporidia have a highly reduced genome compared

55 to yeast15,21 and this may result in an enrichment for toxic or non-toxic genes. The data from this screen suggest that N. parisii is enriched for toxic genes, as the Non HE groups had a toxicity rate of 40%, however a larger sample size is needed to draw any meaningful conclusions. It is difficult to deduce if the toxicity seen in yeast is unique to the microsporidian gene or if it is shared with yeast. The lack of genome annotation for microsporidia makes it challenging to identify genes and check for the toxicity for any available yeast homologues. Various techniques, including those mentioned in the Future Directions section, would be useful in understanding the toxic genes from this screen. Although some may have arisen due to shared toxicity in yeast, others may be novel microsporidian effector proteins. Future studies focusing on the interactions of these toxic genes may reveal the function of the protein, and comparisons of these results to the interaction patterns seen in yeast may identify a homologue or reveal if the toxicity is unique to microsporidia.

4.2 High-throughput LASSO-based gene capture remains a challenge

The ability to clone many genes, even at the scale of ORFeomes, in a quick process would be a powerful tool. It would enable researchers to understand the functions of an organism’s ORFeome by coupling the high-throughput cloning techniques with similar throughput functional genomic technologies. The cost to read or sequence DNA is far cheaper than the cost of writing or synthesizing it, combined with the abundance of genomic data available, there is a growing bottleneck between knowing the sequence of a gene versus its function. LASSO is a promising tool as it provides a way to use cheaper pooled synthesis of oligos91 to capture genes for functional genomic screening. LASSO cloning is capable of capturing whole ORFeomes and makes screening through multiple genomes less daunting of a task83. For a group of poorly understood organisms like microsporidia, methods such as LASSO cloning combined with high-throughput functional genomics would enable our understanding of these pathogens to grow at a healthy rate.

My efforts to use LASSO cloning to capture N. parisii genes was not successful in its goal, however there are several insights that can be gleaned from these endeavors. LASSO cloning is a powerful but challenging method. The process to create LASSO probes is difficult and requires many steps, some far more challenging and liable to fail than others. Many steps of the LASSO process are complicated and require additional changes to make them simpler, reliable and repeatable. A flaw with the process is the lack of visualization steps to verify if the process is working as intended and it is difficult to know if a step failed. An issue with changing the protocol

56 was revealed by the original developers of the method two years after the publication of the original protocol; changing variables in steps like the intramolecular ligation, can hamper the outcome of subsequent steps, such as the gene capture84. Attempts to bypass LASSO probe assembly using Opools proved to be unsuccessful, with a cause of failure remaining unclear. It may be due to the lack of complete probe length, alteration of the capture conditions or another cause. Regardless, further work into LASSO based gene capture should adopt a more controlled, sequential process to test the feasibility and prowess of the technique, as discussed earlier.

4.3 Thesis Summary

In conclusion, I have shown that yeast can be a powerful tool to study host-exposed proteins of the Caenorhabditis elegans intracellular pathogen, Nematocida parisii. Yeast has already seen extensive use to study bacterial effector proteins, and I have demonstrated that it may be equally valuable in the study of microsporidian effector proteins, an aspect of microsporidian infection biology that demands much greater understanding. The N. parisii gene NEPG_01270 is UTP- glucose-1-phosphate uridyltransferase and is toxic to yeast and can cause visible growth defects, indicating that it is a novel microsporidian effector protein, a premise strengthened by the employment of other metabolic proteins by an assortment of microsporidia. I created a gene collection of 97 N. parisii genes for use in a functional genomic screen to learn more about microsporidian secreted proteins. This screen has the potential of being very enlightening, by revealing the identities of potential microsporidian effector proteins and the patterns governing the secreted proteins of microsporidia, such as the influence of signal peptides and transmembrane domains on the yeast and the rates of toxicity for secreted and non-secreted proteins. The identities of these potential effector proteins would make interesting candidates for future study, to learn more about the breadth of functions available for these secreted factors.

57

References:

1. Taylor, L. H., Latham, S. M. & Woolhouse, M. E. Risk factors for human disease emergence. Philos

Trans R Soc Lond B Biol Sci 356, 983–989 (2001).

2. Alvar, J. et al. Leishmaniasis Worldwide and Global Estimates of Its Incidence. PLoS ONE 7, e35671

(2012).

3. Sidik, S. M. et al. A Genome-wide CRISPR Screen in Toxoplasma Identifies Essential Apicomplexan

Genes. Cell 166, 1423-1435.e12 (2016).

4. Bliska, J. B. & Casadevall, A. Intracellular pathogenic bacteria and fungi — a case of convergent

evolution? Nature Reviews Microbiology 7, 165–171 (2009).

5. Stentiford, G. D. et al. Microsporidia – Emergent Pathogens in the Global Food Chain. Trends in

Parasitology 32, 336–348 (2016).

6. Billings, A. N., Teltow, G. J., Weaver, S. C. & Walker, D. H. Molecular characterization of a novel

Rickettsia species from Ixodes scapularis in Texas. Emerg Infect Dis 4, 305–309 (1998).

7. Lamason, R. L. et al. Rickettsia Sca4 Reduces Vinculin-Mediated Intercellular Tension to Promote

Spread. Cell 167, 670-683.e10 (2016).

8. Zhang, G. et al. A Large Collection of Novel Nematode-Infecting Microsporidia and Their Diverse

Interactions with Caenorhabditis elegans and Other Related Nematodes. PLOS Pathogens 12,

e1006093 (2016).

9. Keeling, P. J. & Fast, N. M. Microsporidia: Biology and Evolution of Highly Reduced Intracellular

Parasites. Annual Review of Microbiology 56, 93–116 (2002).

10. Han, B. & Weiss, L. M. Microsporidia: Obligate Intracellular Pathogens Within the Fungal

Kingdom. Microbiology Spectrum 5, (2017).

11. Szumowski, S. C. & Troemel, E. R. Microsporidia-Host Interactions. Curr Opin Microbiol 26, 10–16

(2015).

58

12. Wadi, L. & Reinke, A. W. Evolution of microsporidia: An extremely successful group of eukaryotic

intracellular parasites. PLoS Pathog 16, e1008276 (2020).

13. Johnson, R. M., Evans, J. D., Robinson, G. E. & Berenbaum, M. R. Changes in transcript

abundance relating to colony collapse disorder in honey bees (Apis mellifera). Proc Natl Acad Sci U S

A 106, 14790–14795 (2009).

14. Troemel, E. R. New Models of Microsporidiosis: Infections in Zebrafish, C. elegans, and Honey

Bee. PLOS Pathogens 7, e1001243 (2011).

15. Katinka, M. D. et al. Genome sequence and gene compaction of the eukaryote parasite

Encephalitozoon cuniculi. Nature 414, 450–453 (2001).

16. Quandt, C. A. et al. The genome of an intranuclear parasite, Paramicrosporidium saccamoebae,

reveals alternative adaptations to obligate intracellular . eLife 6, e29594 (2017).

17. Goldberg, A. V. et al. Localization and functionality of microsporidian iron–sulphur cluster

assembly proteins. Nature 452, 624–628 (2008).

18. Major, P. et al. A new family of cell surface located purine transporters in Microsporidia and

related fungal endoparasites. eLife 8, e47037 (2019).

19. Melnikov, S. V. et al. Error-prone protein synthesis in parasites with the smallest eukaryotic

genome. Proc. Natl. Acad. Sci. U.S.A. 115, E6245–E6253 (2018).

20. Barandun, J., Hunziker, M., Vossbrinck, C. R. & Klinge, S. Evolutionary compaction and

adaptation visualized by the structure of the dormant microsporidian ribosome. Nat Microbiol 4,

1798–1804 (2019).

21. Cuomo, C. A. et al. Microsporidian genome analysis reveals evolutionary strategies for obligate

intracellular growth. Genome Res 22, 2478–2488 (2012).

59

22. Whelan, T. A., Lee, N. T., Lee, R. C. H. & Fast, N. M. Microsporidian Introns Retained against a

Background of Genome Reduction: Characterization of an Unusual Set of Introns. Genome Biol Evol

11, 263–269 (2019).

23. Luallen, R. J. et al. Discovery of a Natural Microsporidian Pathogen with a Broad Tissue Tropism

in Caenorhabditis elegans. PLoS Pathog 12, (2016).

24. Troemel, E. R. & Becnel, J. J. Genome analysis and polar tube firing dynamics of mosquito-

infecting microsporidia. Fungal Genet Biol 83, 41–44 (2015).

25. Han, B. et al. Microsporidia Interact with Host Cell Mitochondria via Voltage-Dependent Anion

Channels Using Sporoplasm Surface Protein 1. mBio 10, e01944-19, /mbio/10/4/mBio.01944-19.atom

(2019).

26. Ferguson, S. & Lucocq, J. The invasive cell coat at the microsporidian Trachipleistophora

hominis–host cell interface contains secreted hexokinases. MicrobiologyOpen 8, e00696 (2019).

27. Dolgikh, V. V., Tsarev, A. A., Timofeev, S. A. & Zhuravlyov, V. S. Heterologous overexpression of

active hexokinases from microsporidia Nosema bombycis and Nosema ceranae confirms their ability

to phosphorylate host glucose. Parasitol Res 118, 1511–1518 (2019).

28. Scanlon, M., Leitch, G. J., Shaw, A. P., Moura, H. & Visvesvara, G. S. Susceptibility to apoptosis is

reduced in the Microsporidia-infected host cell. J. Eukaryot. Microbiol. 46, 34S-35S (1999).

29. Weber, R., Bryan, R. T., Schwartz, D. A. & Owen, R. L. Human Microsporidial Infections. CLIN.

MICROBIOL. REV. 7, 36 (1994).

30. Sak, B., Kváč, M., Kučerová, Z., Květoňová, D. & Saková, K. Latent Microsporidial Infection in

Immunocompetent Individuals – A Longitudinal Study. PLoS Negl Trop Dis 5, (2011).

31. Texier, C., Vidau, C., Viguès, B., El Alaoui, H. & Delbac, F. Microsporidia: a model for minimal

parasite–host interactions. Current Opinion in Microbiology 13, 443–449 (2010).

60

32. Mathis, A., Weber, R. & Deplazes, P. Zoonotic Potential of the Microsporidia. Clinical

Microbiology Reviews 18, 423–445 (2005).

33. Troemel, E. R., Félix, M.-A., Whiteman, N. K., Barrière, A. & Ausubel, F. M. Microsporidia Are

Natural Intracellular Parasites of the Nematode Caenorhabditis elegans. PLoS Biol 6, e309 (2008).

34. Troemel, E. R. Host-Microsporidia Interactions in Caenorhabditis elegans, a Model Nematode

Host. 6 (2020).

35. Balla, K. M., Luallen, R. J., Bakowski, M. A. & Troemel, E. R. Cell-to-cell spread of microsporidia

causes Caenorhabditis elegans organs to form syncytia. Nature Microbiology 1, 16144 (2016).

36. Blattner, F. R. et al. The Complete Genome Sequence of Escherichia coli K-12. Science 277,

1453–1462 (1997).

37. Serres, M. H. et al. A functional update of the Escherichia coli K-12 genome. Genome Biol 2,

research0035.1-research0035.7 (2001).

38. Szumowski, S. C., Botts, M. R., Popovich, J. J., Smelkinson, M. G. & Troemel, E. R. The small

GTPase RAB-11 directs polarized exocytosis of the intracellular pathogen N. parisii for fecal-oral

transmission from C. elegans. PNAS 111, 8215–8220 (2014).

39. Ooij, C. van et al. The Malaria Secretome: From Algorithms to Essential Function in Blood Stage

Infection. PLOS Pathogens 4, e1000084 (2008).

40. Shohdy, N., Efe, J. A., Emr, S. D. & Shuman, H. A. Pathogen effector protein screening in yeast

identifies Legionella factors that interfere with membrane trafficking. Proceedings of the National

Academy of Sciences 102, 4866–4871 (2005).

41. Mou, X., Souter, S., Du, J., Reeves, A. Z. & Lesser, C. F. Synthetic bottom-up approach reveals the

complex interplay of Shigella effectors in regulation of epithelial cell death. PNAS 115, 6452–6457

(2018).

61

42. Reinke, A. W., Balla, K. M., Bennett, E. J. & Troemel, E. R. Identification of microsporidia host-

exposed proteins reveals a repertoire of rapidly evolving proteins. Nat Commun 8, 1–11 (2017).

43. Gomez-Valero, L. et al. More than 18,000 effectors in the Legionella genus genome provide

multiple, independent combinations for replication in human cells. PNAS 116, 2265–2273 (2019).

44. Burstein, D. et al. Genome-Scale Identification of Legionella pneumophila Effectors Using a

Machine Learning Approach. PLoS Pathog 5, (2009).

45. Siggers, K. A. & Lesser, C. F. The Yeast Saccharomyces cerevisiae: A Versatile Model System for

the Identification and Characterization of Bacterial Virulence Proteins. Cell Host & Microbe 4, 8–15

(2008).

46. Urbanus, M. L. et al. Diverse mechanisms of metaeffector activity in an intracellular bacterial

pathogen, Legionella pneumophila. Mol Syst Biol 12, (2016).

47. English, E. D., Adomako-Ankomah, Y. & Boyle, J. P. Secreted effectors in Toxoplasma gondii and

related species: determinants of host range and pathogenesis? Parasite Immunol 37, 127–140 (2015).

48. Matta, S. K. et al. Toxoplasma gondii effector TgIST blocks type I interferon signaling to promote

infection. Proc Natl Acad Sci USA 116, 17480–17491 (2019).

49. Matthews, K. M., Pitman, E. L. & de Koning-Ward, T. F. Illuminating how malaria parasites export

proteins into host erythrocytes. Cell. Microbiol. 21, e13009 (2019).

50. Huang, Y. et al. A secretory hexokinase plays an active role in the proliferation of Nosema

bombycis. PeerJ 6, e5658 (2018).

51. Senderskiy, I. V., Timofeev, S. A., Seliverstova, E. V., Pavlova, O. A. & Dolgikh, V. V. Secretion of

Antonospora (Paranosema) locustae Proteins into Infected Cells Suggests an Active Role of

Microsporidia in the Control of Host Programs and Metabolic Processes. PLoS ONE 9, e93585 (2014).

62

52. Bao, J. et al. Nosema bombycis suppresses host hemolymph melanization through secreted

serpin 6 inhibiting the prophenoloxidase activation cascade. Journal of Invertebrate Pathology 168,

107260 (2019).

53. Botstein, D., Chervitz, S. A. & Cherry, J. M. Yeast as a Model Organism. Science 277, 1259–1260

(1997).

54. Goffeau, A. et al. Life with 6000 genes. Science 274, 546, 563–567 (1996).

55. Patrick, K. L. et al. Quantitative Yeast Genetic Interaction Profiling of Bacterial Effector Proteins

Uncovers a Role for the Human Retromer in Salmonella Infection. Cell Systems 7, 323-338.e6 (2018).

56. Kramer, R. W. et al. Yeast Functional Genomic Screens Lead to Identification of a Role for a

Bacterial Effector in Innate Immunity Regulation. PLOS Pathogens 3, e21 (2007).

57. Curak, J., Rohde, J. & Stagljar, I. Yeast as a tool to study bacterial effectors. Current Opinion in

Microbiology 12, 18–23 (2009).

58. Hanson, P. K. Saccharomyces cerevisiae: A Unicellular Model Genetic Organism of Enduring

Importance. Current Protocols Essential Laboratory Techniques 16, e21 (2018).

59. Slagowski, N. L., Kramer, R. W., Morrison, M. F., LaBaer, J. & Lesser, C. F. A Functional Genomic

Yeast Screen to Identify Pathogenic Bacterial Proteins. PLOS Pathogens 4, e9 (2008).

60. Campodonico, E. M., Chesnel, L. & Roy, C. R. A yeast genetic system for the identification and

characterization of substrate proteins transferred into host cells by the Legionella pneumophila

Dot/Icm system. Mol. Microbiol. 56, 918–933 (2005).

61. Land, M. et al. Insights from 20 years of bacterial genome sequencing. Funct Integr Genomics 15,

141–161 (2015).

62. Sisko, J. L., Spaeth, K., Kumar, Y. & Valdivia, R. H. Multifunctional analysis of Chlamydia-specific

genes in a yeast expression system. Molecular Microbiology 60, 51–66 (2006).

63

63. Nielsen, H., Engelbrecht, J., Brunak, S. & von Heijne, G. Identification of prokaryotic and

eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, 1–6 (1997).

64. Almagro Armenteros, J. J. et al. SignalP 5.0 improves signal peptide predictions using deep

neural networks. Nature Biotechnology 37, 420–423 (2019).

65. Sonnhammer, E. L. L. & Krogh, A. A hidden Markov model for predicting transmembrane helices

in protein sequences. 8.

66. Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. L. Predicting transmembrane protein

topology with a hidden markov model: application to complete genomes11Edited by F. Cohen.

Journal of Molecular Biology 305, 567–580 (2001).

67. Möller, S., Croning, M. D. & Apweiler, R. Evaluation of methods for the prediction of membrane

spanning regions. Bioinformatics 17, 646–653 (2001).

68. Curran, K. A. et al. Short Synthetic Terminators for Improved Heterologous Gene Expression in

Yeast. ACS Synth. Biol. 4, 824–832 (2015).

69. Hawkins, J. A., Jones, S. K., Finkelstein, I. J. & Press, W. H. Indel-correcting DNA barcodes for

high-throughput sequencing. PNAS 115, E6217–E6226 (2018).

70. Moore, S. D. & Prevelige, P. E. A P22 scaffold protein mutation increases the robustness of head

assembly in the presence of excess portal protein. J. Virol. 76, 10245–10255 (2002).

71. Marsischky, G. & LaBaer, J. Many Paths to Many Clones: A Comparative Look at High-

Throughput Cloning Methods. Genome Res. 14, 2020–2028 (2004).

72. Matsuyama, A. & Yoshida, M. Systematic Cloning of an ORFeome Using the Gateway System. in

Reverse Chemical Genetics: Methods and Protocols (ed. Koga, H.) 11–24 (Humana Press, 2009).

doi:10.1007/978-1-60761-232-2_2.

73. Tan, Y., Arnold, R. J. & Luo, Z.-Q. Legionella pneumophila regulates the small GTPase Rab1

activity by reversible phosphorylcholination. Proc. Natl. Acad. Sci. U.S.A. 108, 21212–21217 (2011).

64

74. Pan, X., Lührmann, A., Satoh, A., Laskowski-Arce, M. A. & Roy, C. R. Ankyrin repeat proteins

comprise a diverse family of bacterial type IV effectors. Science 320, 1651–1654 (2008).

75. Brüggemann, H., Cazalet, C. & Buchrieser, C. Adaptation of Legionella pneumophila to the host

environment: role of protein secretion, effectors and eukaryotic-like proteins. Current Opinion in

Microbiology 9, 86–94 (2006).

76. Alberti, S., Gitler, A. D. & Lindquist, S. A suite of Gateway® cloning vectors for high-throughput

genetic analysis in Saccharomyces cerevisiae. Yeast 24, 913–919 (2007).

77. Führing, J. I. et al. A Quaternary Mechanism Enables the Complex Biological Functions of

Octameric Human UDP-glucose Pyrophosphorylase, a Key Enzyme in Cell Metabolism. Sci Rep 5,

(2015).

78. Slamovits, C. H., Burri, L. & Keeling, P. J. Characterization of a Divergent Sec61β Gene in

Microsporidia. Journal of Molecular Biology 359, 1196–1202 (2006).

79. Smith, A. M. et al. Competitive Genomic Screens of Barcoded Yeast Libraries. J Vis Exp (2011)

doi:10.3791/2864.

80. Pierce, S. E., Davis, R. W., Nislow, C. & Giaever, G. Genome-wide analysis of barcoded

Saccharomyces cerevisiae gene-deletion mutants in pooled cultures. Nat Protoc 2, 2958–2974 (2007).

81. Dunham, W. H., Mullin, M. & Gingras, A.-C. Affinity-purification coupled to mass spectrometry:

Basic principles and strategies. Proteomics 12, 1576–1590 (2012).

82. Bosis, E., Salomon, D. & Sessa, G. A Simple Yeast-Based Strategy to Identify Host Cellular

Processes Targeted by Bacterial Effector Proteins. PLoS One 6, (2011).

83. Tosi, L. et al. Long-adapter single-strand oligonucleotide probes for the massively multiplexed

cloning of kilobase genome regions. Nat Biomed Eng 1, 0092 (2017).

84. Shukor, S., Tamayo, A., Tosi, L., Larman, H. B. & Parekkadan, B. Quantitative assessment of

LASSO probe assembly and long-read multiplexed cloning. BMC Biotechnology 19, 50 (2019).

65

85. Watson, H. C. et al. Sequence and structure of yeast phosphoglycerate kinase. EMBO J 1, 1635–

1640 (1982).

86. Boehm, M. & Bonifacino, J. S. Adaptins. Mol Biol Cell 12, 2907–2920 (2001).

87. van Heusden, G. P. H. & Yde Steensma, H. Yeast 14-3-3 proteins. Yeast 23, 159–171 (2006).

88. Sopko, R. et al. Mapping Pathways and Phenotypes by Systematic Gene Overexpression.

Molecular Cell 21, 319–330 (2006).

89. Boyer, J. et al. Large-scale exploration of growth inhibition caused by overexpression of genomic

fragments in Saccharomyces cerevisiae. Genome Biol 5, R72 (2004).

90. Daran, J. M., Dallies, N., Thines‐Sempoux, D., Paquet, V. & François, J. Genetic and Biochemical

Characterization of the UGP1 Gene Encoding the UDP-Glucose Pyrophosphorylase from

Saccharomyces cerevisiae. European Journal of Biochemistry 233, 520–530 (1995).

91. Kosuri, S. & Church, G. M. Large-scale de novo DNA synthesis: technologies and applications.

Nature Methods 11, 499–507 (2014).

66

Appendices:

Appendix 1: N. parisii Opool Probe List

Probe

Gene Gene Signal Peptide TM# (bp) Length

A

GG

GAAATTTAA

NEPG_01270 N 0 1512 GGATGATAA GGTTGTAACA G ATGTCATTGA TCATTAGTAT ATATTTAATA AAGAGTATCA TCTTTCAAAC CGCAACACAC GATCTTCACA AACCCAGCTT TCTTGTACAA AGTGGTCCCC TCTCTTCAGG ATCGTGCCAT TGGAGGCTC CTACAGTAGT TTCTCAAATT TCTCAAATAC TCTAAATTCC AGTTCTACCT CCCTTCGCAG TTGCCTGTGC TCAAGCTCTA ACTCCCTCTT CCTCTTCC CCGGGGGGA CAAGTTTGT CAAAAAAGC AGGCTCAATG GGAGACAAC AGTTCACAGG AAGTAGGAC C

NEPG_01777 N 2 678 GCAGAGTTTA CTAGATTAAT AGAAAAACA TCCGTTATAT ACTGGATTAT ATTAGTATAT ATTTAATAAA GAGTATCATC TTTCAAACCG CAACACAAG AGTAGTCGG AACCCAGCTT TCTTGTACAA AGTGGTCCCC TCTCTTCAGG ATCGTGCCAT TGGAGGCTC CTACAGTAGT TTCTCAAATT TCTCAAATAC TCTAAATTCC AGTTCTACCT CCCTTCGCAG TTGCCTGTGC TCAAGCTCTA ACTCCCTCTT CCTCTTCCGG CCGGGGGGA CAAGTTTGTA CAAAAAAGC AGGCTCAATG CATACACTAA GTAATCATCA AC

TCAA

CCCAGC

NEPG_02635 Y 0 2751 GTTCCTGAAA GTTCTTCAAC CAATCCAATT ACTATATAGT ATATATTTAA TAAAGAGTAT CATCTTTCAA ACCGCAACAA TATGCAACCA GCAA TTTCTTGTAC AAAGTGGTCC CCTCTCTTCA GGATCGTGC CATTGGAGG CTCCTACAGT AGTTTC ATTTCTCAAA TACTCTAAAT TCCAGTTCTA CCTCCCTTCG CAGTTGCCTG TGCTCAAGCT CTAACTCCCT CTTCCTCTTCC GGCCGGGGG GACAAGTTTG TACAAAAAA GCAGGCTCA ATGATCATGA AACTATTAAT AATGATGTAC ACAGTGTGT GC

GCCGGG

GAAAAAG

CG

NEPG_00061 N 0 733 CA AGATAAACA AGAAATTGCT TAGTATAATA GAAGGATAG TATATATTTA ATAAAGAGT ATCATCTTTC AAACCGCAAC AACAACAACA ACCGAACCCA GCTTTCTTGT ACAAAGTGG TCCCCTCTCTT CAGGATCGT GCCATTGGA GGCTCCTACA GTAGTTTCTC AAATTTCTCA AATACTCTAA ATTCCAGTTC TACCTCCCTT CGCAGTTGCC TGTGCTCAAG CTCTAACTCC CTCTTCCTCTT C GGGACAAGT TTGTACAAAA AAGCAGGCT CAATGGTGCA GTGTGAAGA AGAACTTATA GC

TCTCAA

TT

NEPG_00805 N 1 399 CTAACACTTC CAAAACTAGC AAACCTACAA TCACATAGTA TATATTTAAT AAAGAGTATC ATCTTTCAAA CCGCAACAAC AACAAGGTA GGAACCCAG CTTTCTTGTA CAAAGTGGTC CCCTCTCTTC AGGATCGTG CCATTGGAG GCTCCTACAG TAGTTTCTCA AA ATACTCTAAA TTCCAGTTCT ACCTCCCTTC GCAGTTGCCT GTGCTCAAGC TCTAACTCCC TCTTCCTCTTC CGGCCGGGG GGACAAGTTT GTACAAAAA AGCAGGCTC AATGTCACAA TTACTTAGCA CATCTTCTAT AGATTTTAAA GAAG

CAA

CG

NEPG_02163 Y 6 2607 CTGCATTAGA CAAATTGATT GCACATCTAT TTCAGAAGTA GTATATATTT AATAAAGAG TATCATCTTT CAAAC CAACAACACC TCATGAACCC AGCTTTCTTG TACAAAGTG GTCCCCTCTC TTCAGGATCG TGCCATTGGA GGCTCCTACA GTAGTTTCTC AAATTTCTCA AATACTCTAA ATTCCAGTTC TACCTCCCTT CGCAGTTGCC TGTGCTCAAG CTCTAACTCC CTCTTCCTCTT CCGGCCGGG GGGACAAGT TTGTACAAAA AAGCAGGCT CAATGAAGG TTCACATCTC TCTGTTGTTT TTGATACAGG C

AG

_02370

PG

NE Y 0 2646 GTTGACAGCA TTTTATGTGC TCAAGAATAC TAGTATATAT TTAATAAAGA GTATCATCTT TCAAACCGCA ACAACAACCG AGAATGAAC CCAGCTTTCT TGTACAAAGT GGTCCCCTCT CTTCAGGATC GTGCCATTGG AGGCTCCTAC AGTAGTTTCT CAAATTTCTC AAATACTCTA AATTCCAGTT CTACCTCCCT TCGCAGTTGC CTGTGCTCAA GCTCTAACTC CCTCTTCCTCT TCCGGCCGG GGGGACA TTTGTACAAA AAAGCAGGC TCAATGAAGT TTACTTTTGA GAACATTGAA GAAGTACAA AGAACCAC

G

CTACAGT

NEPG_00492 N 0 1218 GCCTGGATTT TGTCAATCAA AAAGTAACA ATTGAGTAGT ATATATTTAA TAAAGAGTAT CATCTTTCAA ACCGCAACAA CAACCTGCGC CGAACCCAGC TTTCTTGTAC AAAGTGGTCC CCTCTCTTCA GGATCGTGC CATTGGAGG CTC AGTTTCTCAA ATTTCTCAAA TACTCTAAAT TCCAGTTCTA CCTCCCTTCG CAGTTGCCTG TGCTCAAGCT CTAACTCCCT CTTCCTCTTCC GGCCGGGG GACAAGTTTG TACAAAAAA GCAGGCTCA ATGGAAGAA GACAAAAGA GAAGAAGAA AGAGAGCAG ATAACC

C

AGAGGAC

AATAAAGA

NEPG_01060 N 0 1182 GTAGAGCAA GAAAGGATT CCTGGGGTT GGGTCGGAC TTGCAGGAAT AGTATATATT T GTATCATCTT TCAAACCGCA ACAACAACG GTCTCGGAAC CCAGCTTTCT TGTACAAAGT GGTCCCCTCT CTTCAGGATC GTGCCATTGG AGGCTCCTA AGTAGTTTCT CAAATTTCTC AAATACTCTA AATTCCAGTT CTACCTCCCT TCGCAGTTGC CTGTGCTCAA GCTCTAACTC CCTCTTCCTCT TCCGGCCGG GGGGACAAG TTTGTACAAA AAAGCAGGC TCAATGGACG AGGATGAAG GG AAAGAATTG

67

CCTGTGCT

NEPG_01151 N 0 2841 GGCCGCTGT ACCGCACATT GGGCATGTA CTATTACAAC AATAGTATAT ATTTAATAAA GAGTATCATC TTTCAAACCG CAACAACAA GAATCGCTGA ACCCAGCTTT CTTGTACAAA GTGGTCCCCT CTCTTCAGGA TCGTGCCATT GGAGGCTCC TACAGTAGTT TCTCAAATTT CTCAAATACT CTAAATTCCA GTTCTACCTC CCTTCGCAGT TG CAAGCTCTAA CTCCCTCTTC CTCTTCCGGC CGGGGGGAC AAGTTTGTAC AAAAAAGCA GGCTCAATGC ATGAAGGAA AAAACTGCGT AGGATGCGA AGAG

C

CTC

NEPG_01796 N 0 2088 CGATTCACTG CAGTACTACA CTCAAGGGTA TTGTAAATAG TATATATTTA ATAAAGAGT ATCATCTTTC AAACCGCAAC AACAAGCGC ATTACAACCC AGCTTTCTTG TACAAAGTG GTCCCCTCT TTCAGGATCG TGCCATTGGA GGCTCCTACA GTAGTTT AAATTTCTCA AATACTCTAA ATTCCAGTTC TACCTCCCTT CGCAGTTGCC TGTGCTCAAG CTCTAACTCC CTCTTCCTCTT CCGGCCGGG GGGACAAGT TTGTACAAAA AAGCAGGCT CAATGGGAC AGGAAAAGC AGAGCTGGC TTAGAAATG

AAGAGCAT

TATATATTT

NEPG_01801 N 0 2142 CCTAGATTAT CTAAAAAGA GAAAATGCA CAGACATATA G AATAAAGAG TATCATCTTT CAAACCGCAA CAACAAGGA CCAAGTAACC CAGCTTTCTT GTACAAAGT GGTCCCCTCT CTTCAGGATC GTGCCATTGG AGGCTCCTAC AGTAGTTTCT CAAATTTCTC AAATACTCTA AATTCCAGTT CTACCTCCCT TCGCAGTTGC CTGTGCTCAA GCTCTAACTC CCTCTTCCTCT TCCGGCCGG GGGGACAAG TTTGTACAAA AAAGCAGGC TCAATGCACT TA TATGCCAGAA GAAAATGGC

Appendix 2: E. coli K12 Opool Probe List

Gene Gene (bp) Length Probe

AG

AGTTTCT

aat 702 GTACCACGAT GCTTGTTTTC ACCACAAGA ATAGTATATA TTTAATAAAG AGTATCATCT TTCAAACCGC AACACACGAT CTTCACAAAC CCAGCTTTCT TGTACAAAGT GGTCCCCTCT CTTCAGGATC GTGCCATTGG AGGCTCCTAC AGT CAAATTTCTC AAATACTCTA AATTCCAGTT CTACCTCCCT TCGCAGTTGC CTGTGCTCAA GCTCTAACTC CCTCTTCCTCT TCCGGCCGG GGGGACA TTTGTACAAA AAAGCAGGC TCAATGCGCC TGGTTCAGCT TTCTCGCCAT TCAATAGCCT TCCCTTCCCC

ATAC

GCAACACAA

nadk 876 AGCACCAAG CTCGGCTGGT CAAAAAAATT ATTCTAGTAT ATATTTAATA AAGAGTATCA TCTTTCAAAC C GAGTAGTCG GAACCCAGCT TTCTTGTACA AAGTGGTCCC CTCTCTTCAG GATCGTGCCA TTGGAGGCTC CTACAGTAGT TTCTCAAATT TCTCAA TCTAAATTCC AGTTCTACCT CCCTTCGCAG TTGCCTGTGC TCAAGCTCTA ACTCCCTCTT CCTCTTCCGG CCGGGGGGA CAAGTTTGTA CAAAAAAGC AGGCTCAATG AATAATCATT TCAAGTGTAT TGGCATTGTG GGACACCC

CT

GCAA

fepD 1002 GCTGATCTTC CTCGTGCGAC GTAAAACGC GAGGTGGTG CATAGTATAT ATTTAATAAA GAGTATCATC TTTCAAACCG CAACAATATG CAACCA CCCAGCTTTC TTGTACAAAG TGGTCCCCTC TCTTCAGGAT CGTGCCATTG GAGGCTCCTA CAGTAGTTTC TCAAATTTCT CAAATACTCT AAATTCCAGT TCTACCTCCC TTCGCAGTTG CCTGTGCTCA AGCTCTAACT CCCTCTTC CTTCCGGCCG GGGGGACAA GTTTGTACAA AAAAGCAGG CTCAATGTCT GGTTCTGTTG CCGTGACACG CGCCATTGCC G

TTT

AGTTTGTAC

gltD 1416 GCAGACGGT ATTATGAACT GGCTGGAAG TTTAGTATAT ATTTAATAAA GAGTATCATC TTTCAAACCG CAACAACAAC AACAACCGA ACCCAGCTTT CTTGTACAAA GTGGTCCCCT CTCTTCAGGA TCGTGCCATT GGAGGCTCC TACAGTAGTT TCTCAAA CTCAAATACT CTAAATTCCA GTTCTACCTC CCTTCGCAGT TGCCTGTGCT CAAGCTCTAA CTCCCTCTTC CTCTTCCGGC CGGGGGGAC A AAAAAAGCA GGCTCAATGA GTCAGAATGT TTATCAATTT ATCGACCTGC AGCGCGTTG ATCC

A

CAAGGTA

hybC 1702 GACGGCAAC GAAGTGGTTT CAGTGAAGG TTCTGTAGTA TATATTTAAT AAAGAGTATC ATCTTTCAAA CCGCAACAAC AA GGAACCCAG CTTTCTTGTA CAAAGTGGTC CCCTCTCTTC AGGATCGTG CCATTGGAG GCTCCTACAG TAGTTTCTCA AATTTCTCA ATACTCTAAA TTCCAGTTCT ACCTCCCTTC GCAGTTGCCT GTGCTCAAGC TCTAACTCCC TCTTCCTCTTC CGGCCGGGG GGACAAGTTT GTACAAAAA AGCAGGCTC AATGAGCCA GAGAATTACT ATTGATCCGG TAACCCGTAT TGAGGG

ACAAG

10

CAGCTTTCT

ydhK 20 CAAGCACCAC CGCAAGGTA CGCTGGCCTC TTAGTATATA TTTAATAAAG AGTATCATCT TTCAAACCGC AACAACAACA CCTCATGAAC C TGTACAAAGT GGTCCCCTCT CTTCAGGATC GTGCCATTGG AGGCTCCTAC AGTAGTTTCT CAAATTTCTC AAATACTCTA AATTCCAGTT CTACCTCCCT TCGCAGTTGC CTGTGCTCAA GCTCTAACTC CCTCTTCCTCT TCCGGCCGG GGGG TTTGTACAAA AAAGCAGGC TCAATGAACG CATCGTCATG GTCCTTGCGC AATTTGCCCT GGTTCAGG

68

ACTCTA

gyrB 2412 CGCCCTGAAA GCGGCGAAT ATCGATATTT AGTATATATT TAATAAAGA GTATCATCTT TCAAACCGCA ACAACAACCG AGAATGAAC CCAGCTTTCT TGTACAAAGT GGTCCCCTCT CTTCAGGATC GTGCCATTGG AGGCTCCTAC AGTAGTTTCT CAAATTTCTC AAAT AATTCCAGTT CTACCTCCCT TCGCAGTTGC CTGTGCTCAA GCTCTAACTC CCTCTTCCTCT TCCGGCCGG GGGGACAAG TTTGTACAAA AAAGCAGGC TCAATGTCGA ATTCTTATGA CTCCTCCAGT ATCAAAGTCC TGAAAGGG

AAG

AGCTTTCT

gcvP 2871 CTGCTCCTGC GTACCGATTA GCGAATACCA GTAGTATATA TTTAATAAAG AGTATCATCT TTCAAACCGC AACAACAACC TGCGCCGAAC CC TGTACAAAGT GGTCCCCTCT CTTCAGGATC GTGCCATTGG AGGCTCCTAC AGTAGTTTCT CAAATTTCTC AAATACTCTA AATTCCAGTT CTACCTCCCT TCGCAGTTGC CTGTGCTCAA GCTCTAACTC CCTCTTCCTCT TCCGGCCGG GGGGAC TTTGTACAAA AAAGCAGGC TCAATGACAC AGACGTTAA GCCAGCTTGA AAACAGCGG CGCTTTTATT G

AC

AAAAAAGC

narZ 3738 GGTCGCGAT CAGGTACAG GAGGCGAAA AAATAGTATA TATTTAATAA AGAGTATCAT CTTTCAAACC GCAACAACA ACGGTCTCG GAACCCAGCT TTCTTGTACA AAGTGGTCCC CTCTCTTCAG GATCGTGCCA TTGGAGGCTC CTACAGTAGT TTCTCAAATT TCTCAAAT TCTAAATTCC AGTTCTACCT CCCTTCGCAG TTGCCTGTGC TCAAGCTCTA ACTCCCTCTT CCTCTTCCGG CCGGGGGGA CAAGTTTGTA C AGGCTCAATG AGTAAACTTT TGGATCGCTT TCGCTACTTC AAACAAAAG GGCG

CAGTT

GCTTTCT

carB 3219 CGGTGCAGG AAATGCACGC ACAGATCAAA TAGTATATAT TTAATAAAGA GTATCATCTT TCAAACCGCA ACAACAAGA ATCGCTGAAC CCA TGTACAAAGT GGTCCCCTCT CTTCAGGATC GTGCCATTGG AGGCTCCTAC AGTAGTTTCT CAAATTTCTC AAATACTCTA AATTC CTACCTCCCT TCGCAGTTGC CTGTGCTCAA GCTCTAACTC CCTCTTCCTCT TCCGGCCGG GGGGACAAG TTTGTACAAA AAAGCAGGC TCAATGCCAA AACGTACAG ATATAAAAAG TATCCTGATT CTGGGTGCG GG

Appendix 3: N. parisii cloned gene list See Electronic File: Final Gene List w PCR results thesis version.xlsx

69