<<

Metabolic Network Analysis of Apicomplexan Parasites to Identify Novel Drug Targets

by

Stacy Susan Hung

A thesis submitted in conformity with the requirements

for the degree of Doctor of Philosophy

Molecular

University of Toronto

© Copyright by Stacy Susan Hung 2014

ii

Metabolic Network Analysis of Apicomplexan Parasites to Identify Novel Drug Targets

Stacy Susan Hung Doctor of Philosophy Molecular Genetics University of Toronto 2014

Abstract

Parasites of the include many important human and veterinary such as (malaria), Toxoplasma (a leading opportunistic infection associated with

AIDS and congenital neurological birth defects) and (an economically significant disease of poultry and cattle). The lack of effective vaccines or treatments and increasing prevalence of drug-resistant strains stresses the urgency to develop novel drug therapies.

Research presented within this thesis has provided numerous insights into the metabolic capabilities of apicomplexans and importantly, identifies potential enzyme drug targets offering a plethora of opportunities for further investigation and experimental validation. I first focused on generating a pipeline for accurately reconstructing metabolic networks by integrating enzyme annotations from complementary curated datasets and automated tools. Key to this integration is the Density Estimation Tool for Enzyme ClassificaTion (DETECT), a probabilistic method I developed for improved enzyme prediction accounting for sequence diversity across enzyme families. My comparative analyses across the resulting networks revealed that apicomplexans iii adopt differing strategies for performing similar core metabolic activities. Pantothenate biosynthesis was highlighted as a druggable pathway based on its conserved enzyme complement across the phylum and absence from the host. Using P. falciparum as a model for , I incorporated gene expression, thermodynamics and evolutionary data to gain insight into the operation of metabolic pathways in the parasite. Finally, the application of my metabolic reconstruction pipeline to other parasites illustrates its utility for establishing a meaningful characterization of metabolism for a newly sequenced , and when combined with additional experimental datasets provide a wealth of insights in the biology of these .

iv

Acknowledgements

I express my sincerest gratitude to my supervisor Dr. John Parkinson for giving me the opportunity to pursue my doctoral thesis in his lab. I am truly grateful for his insightful feedback and unwavering support throughout my Ph. D. degree. I would also like to thank all current and previous members of the Parkinson Lab, especially postdoc fellow Dr. James Wasmuth, who was an inspirational project mate and helped me feel grounded in my project. A special thanks to Alexandra Gast, Viviana Pszenny, and other members of Dr. Michael Grigg’s Lab at the NIH who have been instrumental in fast-forwarding my wet-lab expertise and knowledge of working with Toxoplasma parasites in the lab. Finally, I’d like to thank all my friends and family for their love, support, and encouragement.

v

Table of Contents

Acknowledgements ...... iv Table of Contents ...... v List of Tables ...... ix List of Figures ...... x Chapter 1 Introduction ...... 1 1 Introduction ...... 1 1.1 The Apicomplexa ...... 1 1.1.1 ...... 1 1.1.1.1 Genomes ...... 3 1.1.2 Epidemiology and Pathology ...... 5 1.1.2.1 Plasmodium ...... 5 1.1.2.2 ...... 6 1.1.2.3 ...... 7 1.1.2.4 Other Apicomplexa ...... 8 1.1.3 General morphological features ...... 8 1.1.4 Life Cycle ...... 10 1.1.4.1 General life cycle of apicomplexans ...... 10 1.1.4.2 Plasmodium falciparum life cycle ...... 10 1.1.4.3 Toxoplasma gondii life cycle ...... 12 1.1.5 Population Genetics and Virulence ...... 13 1.1.5.1 Plasmodium ...... 13 1.1.5.2 Toxoplasma gondii ...... 13 1.1.6 Available drugs for the Apicomplexa ...... 14 1.1.7 Metabolism as a source of drug targets ...... 16 1.1.7.1 Enzymes are essential for life ...... 17 1.1.7.2 Mechanistic basis of enzymes for highly directed ...... 17 1.2 Experimental manipulations in the Apicomplexa ...... 18 1.2.1 Methods for genetic manipulation ...... 18 1.2.1.1 Transfection and transformation ...... 18 1.2.1.2 Gene knockouts ...... 19 1.3 Metabolic reconstruction and analysis as a route to drug discovery ...... 20 1.3.1 Metabolic reconstruction for the Apicomplexa ...... 21 1.3.2 Importance of enzyme annotation for metabolic reconstruction ...... 23 vi

1.3.2.1 Enzyme annotation ...... 23 1.3.2.2 Automated methods for enzyme prediction ...... 25 1.3.2.3 Curated resources for metabolic reconstruction ...... 26 1.3.3 Modelling, Network-based methods and FBA: Applications to drug discovery 28 1.4 Project goals ...... 31 Chapter 2 Improving enzyme annotation for accurate metabolic reconstruction ...... 32 2 Improving enzyme annotation for accurate metabolic reconstruction ...... 32 2.1 Introduction ...... 32 2.2 Materials and Methods ...... 33 2.2.1 Protein sequence and enzyme data ...... 33 2.2.2 Generation of probability profiles ...... 34 2.2.3 Probability score calculation ...... 34 2.2.4 Five-fold cross-validation and ROC analysis ...... 36 2.2.5 Prediction of malarial enzymes ...... 36 2.3 Results and Discussion ...... 37 2.3.1 Assessing enzyme diversity ...... 37 2.3.2 Density estimation tool for enzyme classification (DETECT) ...... 41 2.3.3 Comparison to current prediction methods ...... 46 2.3.4 Expanding the metabolome of P. falciparum ...... 52 2.4 Concluding Remarks ...... 57 Chapter 3 Reconstructing parasite metabolism: Biological insights ...... 58 3 Reconstructing parasite metabolism: Biological insights ...... 58 3.1 Introduction ...... 58 3.2 Materials and Methods ...... 59 3.2.1 Metabolic reconstruction for other organisms ...... 59 3.2.2 Biochemical assays in T. gondii ...... 61 3.3 Results and Discussion ...... 62 3.3.1 Integration of enzyme datasets ...... 62 3.3.2 Diversity in apicomplexan metabolism ...... 64 3.3.2.1 Insights from metabolic networks for the Apicomplexa ...... 67 3.3.2.2 Metabolic insights into apicomplexan Eimeria tenella ...... 68 3.3.2.2.1 The mannitol cycle is essential for E. tenella ...... 71 3.3.2.2.2 E. tenella possesses additional enzymes for polyketide sugar unit biosynthesis and streptomycin biosynthesis ...... 72 3.3.2.2.3 Identification of genes for lipoic acid metabolism ...... 74 vii

3.3.2.3 The pantothenate biosynthesis pathway for drug targeting ...... 76 3.3.2.4 Assessing activity of pantothenate biosynthesis enzymes in Toxoplasma gondii 77 3.3.2.5 Gene knockout experiments reveal misassembly of potentially bifunctional PanBE enzyme in Toxoplasma gondii ...... 78 3.3.3 Reduced metabolic diversity in tapeworms ...... 83 3.3.3.1 Energy metabolism in Echinococcus and other parasitic helminths ...... 86 3.3.3.2 Amino acid metabolism ...... 90 3.3.4 Metabolic insights into phytopathogen Ophiostoma ulmi ...... 91 3.3.4.1 Overview of metabolism in O. ulmi ...... 91 3.3.4.2 Cell-wall degradation enzymes as potential drug targets ...... 94 3.4 Concluding Remarks ...... 95 Chapter 4 Reconstructing parasite metabolism: Applications ...... 97 4 Reconstructing parasite metabolism: Applications ...... 97 4.1 Introduction ...... 97 4.2 Materials and Methods ...... 99 4.2.1 Metabolic reconstruction for iMPMP420 ...... 99 4.2.2 Correlation of thermodynamics data with gene expression ...... 99 4.2.3 Evolutionary analysis ...... 100 4.2.4 Expression analysis ...... 101 4.3 Results and Discussion ...... 101 4.3.1 Reconstruction of iMPMP420 and comparison to other models ...... 101 4.1.1 Thermodynamics variability analysis of P. falciparum metabolism ...... 104 4.1.1.1 Candidates for regulation in phospholipid biosynthesis pathways ...... 105 4.1.1.2 Correlation of thermodynamics data with gene expression – an unresponsive genome 106 4.1.2 Metabolic enzymes as drug targets ...... 111 4.1.3 Evolutionary analysis of P. falciparum metabolism ...... 117 4.1.4 Selective pressures in essential pathways ...... 120 4.1.5 Transcriptomic analyses for the Apicomplexa ...... 126 4.1.5.1 Toxoplasma displays strain-specific differences in arachidonic acid metabolism ...... 127 4.1.5.2 P. falciparum pathways are highly expressed throughout the intraerythrocytic development cycle ...... 129 4.2 Concluding Remarks ...... 131 Chapter 5 ...... 132 viii

Summary of findings and future directions ...... 132 5 Summary of findings and future directions ...... 132 5.1 Summary of findings ...... 132 5.2 Future directions ...... 133 5.2.1 Expansion and improvement of DETECT ...... 133 5.2.1.1 Classification of transporter proteins and detection of proteins belonging to ortholog groups ...... 133 5.2.1.2 Enabling the classification of under-represented enzyme families ...... 134 5.2.1.3 Integration of additional information to improve predictive capaticity ...... 135 5.2.2 Understanding the evolution of apicomplexan metabolism ...... 136 5.2.3 Experimental validation of the pantothenate biosynthesis pathway ...... 138 References ...... 139

ix

List of Tables

Table 1-1 Genome size and number of genes for sequenced apicomplexans...... 4 Table 1-2 Tools and resources for apicomplexan enzyme annotation...... 24 Table 2-1 Matches of various resources and methods to MPMP annotations...... 47 Table 2-2 Lists of proteins correctly annotated by either DETECT or BLAST...... 50 Table 3-1 Enzymes for streptomycin biosynthesis pathway that have been identified in the ...... 73 Table 3-2 E. tenella genes for lipoic acid metabolism identified based on orthology to T. gondii ...... 74 Table 3-3 Genes involved in pantothenate biosynthesis for target selection...... 76 Table 3-4 Band sizes of PCR products generated from TgPanBE gDNA...... 82 Table 3-5 Band sizes of PCR products generated from TgPanBE cDNA...... 82 Table 3-6 Maximal RNA-Seq expression for E. multilocularis (Em), E. granulosus (Eg), and H. microstoma (Hm) genes mapping to malate dismutation...... 90 Table 4-1 Properties of iMPMP420, Huthmacher et al., and Plata et al. models...... 102 Table 4-2 Maximum and minimum free energy change values for regulation candidates and their associated pathway(s)...... 105 Table 4-3 List of gold standard enzymes compiled for P. falciparum...... 112 Table 4-4 List of experimentally verified enzyme drug targets in P. falciparum...... 113 Table 4-5 Evaluation of iMPMP420 for predicting essential enzymes compared to previous P. falciparum FBA models...... 113 Table 4-6 Essential enzymes identified by all three iMPMP420, Huthmacher et al., and Plata et al. that have not been reported in the literature...... 116 Table 4-7 P. falciparum genomes assembled for calculation of dN/dS...... 118 Table 4-8 Sites of positive selection in genes of the pyrimidine and/or purine metabolism pathways...... 122 Table 4-9 Expression datasets for comparative transcriptomics of the Apicomplexa...... 127

x

List of Figures

Figure 1-1 Overview of groups among Apicomplexa...... 3 Figure 1-2 Morphological features of Toxoplasma gondii and Plasmodium falciparum...... 9 Figure 1-3 Malaria parasite life cycle ...... 11 Figure 1-4 Life cycle of Toxoplasma gondii...... 12 Figure 1-5 Malaria transmission areas and the distribution of reported resistance or treatment failures with selected antimalarial drugs, September 2004...... 14 Figure 1-6 Metabolic reconstruction for apicomplexan genomes...... 22 Figure 1-7 Common measures used in network analysis...... 29 Figure 2-1 Number of proteins in each EC category as annotated in Swiss-Prot ...... 39 Figure 2-2 Boxplots show ranges of global sequence alignment scores for a representative set of 50 gold standard enzymes...... 40 Figure 2-3 Boxplots for the length of positive alignments corresponding to the fifty representative enzymes...... 42 Figure 2-4 Using density estimation values to calculate a combined probability score for EC:1.1.1.100...... 44 Figure 2-5 ROC curves for DETECT, BLAST and PSI-BLAST based on data generated from 5- fold cross-validation analyses...... 45 Figure 2-6 Overlap in enzyme curations and predictions for P. falciparum as annotated by four database resources...... 47 Figure 2-7 Re-annotation of MAL13P1.301 and PF11_0395 using DETECT...... 51 Figure 2-8 The distribution of scores for alignments of PF13_0141 against (A) EC:1.1.1.27 and (B) EC:1.1.1.37 proteins compared to the respective within-family alignments indicates that PF13_0141 is more likely to belong to EC:1.1.1.37 based on density estimation profiles...... 51 Figure 2-9 Density distributions of similarity scores for PF11_0207...... 53 Figure 2-10 Density distributions of similarity scores for PFI1475w...... 54 Figure 2-11 Density distributions of similarity scores for PFD0485w and PFL1535w...... 55 Figure 2-12 DETECT prediction for PF14_0424...... 56 Figure 3-1 Enzyme complements of the Apicomplexa as annotated by a variety of resources. . 63 Figure 3-2 Heatmap showing the conservation of individual metabolic pathways for sequenced apicomplexans...... 65 Figure 3-3 An integrated view of metabolism in C. parvum, P. falciparum, and T. gondii...... 68 Figure 3-4 Overlap of integrated enzyme datasets for E. tenella, T. gondii, and N. caninum ..... 69 Figure 3-5 Heatmap showing the conservation of individual metabolic pathways for E. tenella, T. gondii, and N. caninum...... 70 Figure 3-6 The mannitol cycle in Eimeria tenella...... 71 Figure 3-7 Lipoic acid biosynthesis and salvage pathways as illustrated in KEGG...... 75 Figure 3-8 Pantothenate biosynthesis...... 77 Figure 3-9 Spectrophotometric assay of PanB activity in Toxoplasma gondii...... 78 Figure 3-10 Schematic illustration of the single-crossover gene insertion mechanism using a modified endogenous gene tagging approach...... 79 Figure 3-11 Genomic landscape of TgPanBE showing the predicted gene model with tracks for various experimental datasets...... 80 Figure 3-12 Schematic of TgPanBE cDNA ...... 81 xi

Figure 3-13 PCR amplification of pantothenate synthesis genes from gDNA...... 81 Figure 3-14 PCR amplification of TgPanBE from cDNA using RFLP analysis...... 81 Figure 3-15 Heatmap on the conservation of individual metabolic pathways...... 84 Figure 3-16 Overlap of enzyme datasets for E. multilocularis and comparison of integrated datasets for H. microstoma and S. mansoni...... 85 Figure 3-17 Metabolic network of E. multilocularis...... 86 Figure 3-18 Proposed schematic overview of oxidative phosphorylation and malate dismutation in tapeworms...... 89 Figure 3-19 Overlap of enzyme datasets for Ophiostoma ulmi and comparison to Yeast...... 92 Figure 3-20 Heatmap showing the conservation of individual metabolic pathways for Ophiostoma ulmi compared to Saccharomyces cerevisiae (Yeast) and Arabidopsis thaliana. .... 93 Figure 4-1 Boxplots showing distribution of fold change expression for bottleneck reactions and regulation candidate reactions...... 108 Figure 4-2 The expression profiles of enzyme-encoding genes ...... 109 Figure 4-3 ∆G°, expression, and dN/dS for reactions in iMPMP420 ordered by increasing ∆Gmin...... 110 Figure 4-4 Overlap of gold standard enzymes with predicted essential enzymes by iMPMP420, Huthmacher et al., and Plata et al...... 114 Figure 4-5 Distribution of dN/dS ratios for genes in iMPMP420...... 120 Figure 4-6 Location of codon sites under selection relative within the PfCPSII gene...... 123 Figure 4-7 Evolutionary rates of P. falciparum enzymes in the context of (A) pathways and (B) superclass pathways...... 124 Figure 4-8 Cluster analysis of gene expression based on microarray expression data for three strains of Toxoplasma (PRU, RH and VEG) and RNA-Seq data for T. gondii VEG and N. caninum...... 128 Figure 4-9 GSEA enrichment plot shows that enzymes involved in lipid metabolism are overall more active in T. gondii VEG than T. gondii PRU...... 129 Figure 4-10 Cluster analysis of gene expression based on RNA-seq and microarray datasets for P. falciparum 3D7...... 130

1

Chapter 1 Introduction

1 Introduction

1.1 The Apicomplexa

The eukaryotic phylum Apicomplexa comprises more than 5000 species of obligate intracellular parasites (Cavalier-Smith, 1993) that are responsible for a number of severe diseases in a range of organisms including humans. Species of Plasmodium are the etiological agents of malaria, affecting an estimated 300-500 million people and resulting in up to three million deaths each year (Snow, et al., 2005), of which more than 90% are attributed to infection by P. falciparum (Greenwood, et al., 2005). Toxoplasma gondii and Cryptosporidium cause food-borne and water-borne illnesses that are primarily health threats in HIV+/AIDS and immunocompromised populations (Kaplan, et al., 2000). parva and T. annulata, which are responsible for great economic losses in cattle in Africa, have profound medical, social, and economic effects (Billiouw, et al., 2002). Currently, there is a lack of effective vaccines or treatments against many apicomplexans, and the increasing emergence of drug resistant strains has stressed the urgency to develop novel drug therapies.

1.1.1 Taxonomy

The Apicomplexa forms a large and diverse group of single-celled inhabiting a wide array of terrestrial and marine environments depending on their host niche. The phylum evolved from a free-living ancestor that adapted to an obligate intracellular parasitic lifestyle through the loss of flagella and development of gliding motility (Vavra and Small, 1969). Morphological and molecular evidence indicate that the apicomplexans are closely related to the and , which together form the taxonomic group known as the Alveolata (Gould, et al., 2008; Moore, et al., 2008). While analysis of small subunit rRNA has suggested that apicomplexans are more closely related to dinoflagellates than to ciliates (Fast, et al., 2002; Van 2 de Peer and De Wachter, 1997), more recent investigations of plastid genes have uncovered a more intimate relationship with the red algal phylum Chromerida (Janouskovec, et al., 2010; Moore, et al., 2008).

Formally, the Apicomplexa can be broken down into three main clades: the Coccidia, Gregarina, and Hematozoa (see Figure 1-1). The Coccidia are the largest group of apicomplexan which includes most prominently Toxoplasma gondii, as well as species from the genera , Eimeria and . Unlike other apicomplexans, coccidians are spore-forming parasites that infect intestinal tracts of . The gregarines, on the other hand, infect invertebrate hosts exclusively and all undergo meiosis. Other characteristics that distinguish members of the Gregarine clade are phenotypic in nature such as the appearance of the apical complex in the sporozoite stage, and the nucleus and nucleolus of trophozoites being large and conspicuous. Note that Cryptosporidium is traditionally considered a member of coccidian, however, phylogenetic evidence suggests a closer relationship to the gregarines (Carreno, et al., 1999; Leander, et al., 2003; Zhu, et al., 2000); comparative data for the gregarines is still lacking, so the phylogenetic position of Cryptosporidium remains unresolved (Rueckert, et al., 2011). The third apicomplexan clade, the Hematozoa, is represented by two subgroups – the Haemosporidia, which includes the malaria-causing Plasmodium species, and , which includes a species of veterinary importance such as those from the genera and Theileria. Hematozoans are sometimes referred to as parasites as they make use of vertebrate hosts for asexual reproduction and blood-sucking invertebrate vectors to complete their sexual life cycle (Levine, 1973).

3

Figure 1-1 Overview of groups among Apicomplexa. Three main clades are coloured and their life cycle indicated. Figure from the Tree of Life Web Project (http://tolweb.org/), used with permission from the publisher.

1.1.1.1 Genomes

Due to the difficulties associated with the cultivation of parasites and genetic manipulation in the laboratory, biomedical research for the Apicomplexa is challenging. In recent years, however, a number of members have been selected for genome sequencing providing new opportunities for scientists to gain insight into the evolution and biology of these parasites.

Currently, there are genomes for over two dozen apicomplexans available through public databases. Several annotated genomes of Plasmodium spp. have been released (Carlton, et al., 2008; Carlton, et al., 2002; Gardner, et al., 2002; Neafsey, et al., 2012; Pain, et al., 2008) with sequences for other malaria species such as P. ovale and P. reichenowi underway. Both bovine and human strains of Cryptosporidium, C. parvum and C. hominis, respectively, have also been 4 sequenced and annotation is ongoing (Abrahamsen, et al., 2004; Xu, et al., 2004). Cryptosporidium genomes are unique in being particularly streamlined and relatively small. The coccidian genomes for T. gondii and Neospora caninum have also been annotated and published (Reid, et al., 2012), with Eimeria tenella and Sarcocystis neurona, also members of the Coccidia, in preparation (pers. comm.). Apicomplexan genomes of veterinary importance have also been sequenced including Babesia bovis, Babesia microti, Theileria parva, and T. annulata (Brayton, et al., 2007; Cornillot, et al., 2012; Gardner, et al., 2005; Pain, et al., 2005). Other species in the phylum such as Gregarina niphandrodes and marinus are marine parasites and raw sequence data is available from NCBI. Currently, the most comprehensive source for annotated genome sequences is EuPathDB (Aurrecoechea, et al., 2009). The full list of the published genomes has been summarized in Table 1-1.

Table 1-1 Genome size and number of genes for sequenced apicomplexans.

Database Species Mbp Genes Reference CryptoDB Cryptosporidium hominis 8.74 3956 Xu et al., 2004 Cryptosporidium parvum 9.10 3886 Abrahamsen et al., 2004 PiroplasmaDB Babesia bovis 8.18 3781 Brayton et al., 2007 Babesia microti 6.39 3554 Cornillot et al., 2012 Theileria annulata 8.35 3845 Pain et al., 2005 Theileria parva 8.35 4167 Gardner et al., 2005 PlasmoDB 26.18 5776 Tachibana et al., 2012 Plasmodium falciparum 3D7 23.33 5772 Gardner et al., 2002 Plasmodium knowlesi 23.74 5307 Pain et al., 2008 27.01 5507 Carleton et al., 2008 Plasmodium yoelli yoelii 17XNL 22.94 7774 Carleton et al., 2002 ToxoDB Eimeria tenella 51.81 9235 In preparation Neospora caninum 59.10 7266 Reid et al., 2012 Toxoplasma gondii ME49 65.13 8814 Reid et al., 2012 Toxoplasma gondii TgCkUg2 41.93 n/a Bontell et al., 2009

5

1.1.2 Epidemiology and Pathology

1.1.2.1 Plasmodium

Malaria is a mosquito-borne infectious disease caused by parasitic protozoans of the genus Plasmodium. There are more than 100 species of Plasmodium that infect a wide array of species including reptiles, birds, and mammals. Five species are known to infect humans, with the majority of deaths caused by P. falciparum (~75%) and P. vivax (~20%), while P. ovale and P. malariae result in a generally milder form of malaria that is rarely fatal (Nadjm and Behrens, 2012). P. knowlesi is a zoonotic species prevalent in Southeast Asia that is responsible for malaria in macaques but can also cause severe infection in humans. Transmission in all cases occurs exclusively through the bite of Anopheles mosquitoes.

Approximately 40% of the world’s population lives in areas where malaria is transmitted. There are an estimated 300-500 million cases and up to 2.7 million deaths from malaria each year. The mortality levels are greatest in sub-Saharan Africa, where children under 5 years of age account for 90% of all deaths due to malaria (Breman, 2001). Resistance to anti-malarial drugs and insecticides, the decay of public health infrastructure, population movements, political unrest, and environmental changes are contributing to the spread of malaria (Greenwood and Mutabingwa, 2002). Recent studies suggest that the number of malaria cases may double in 20 years if new methods of control are not devised and implemented (Breman, 2001).

Infection with malaria parasites may result in a wide variety of symptoms, ranging from absent or very mild to severe disease and even death. In classical or uncomplicated malaria, infected individuals suffer from “attacks” consisting of three stages: (i) a cold stage (shivering), (ii) a hot stage (fever, headaches, vomiting, and sometimes seizures), and (iii) a sweating stage (sweats with a return to normal temperatures, and tiredness). Untreated, these attacks occur every second day with P. falciparum, P. vivax, and P. ovale, or every third day with infection by P. malariae. More commonly, however, patients present with a combination of these symptoms, making diagnosis difficult and are often attributed to influenza especially in countries where malaria cases are infrequent. Severe malaria occurs when infections are complicated by organ failure or abnormalities in the patient’s blood or metabolism. In such cases, manifestations include 6 cerebral malaria, severe anemia (destruction of red blood cells), renal failure (leakage of from red blood cells into the urine), and metabolic acidosis (excessive acidity in the blood and tissue fluids) (Bartoloni and Zammarchi, 2012). Coinfection of HIV with malaria increases mortality (Korenromp, et al., 2005).

1.1.2.2 Toxoplasma gondii

Toxoplasma gondii, a zoonotic protozoal parasite, is well-known for its global distribution and ability to infect virtually all warm-blooded animals. In humans, T. gondii causes disease in congenitally infected infants and immunocompromised individuals and infects up to a third of the world’s population (Montoya and Liesenfeld, 2004). Prevalence varies widely between countries with up to 95% of some populations being infected with Toxoplasma (www.cdc.gov). In France, a high prevalence of infection has been related to a preference for eating raw or undercooked meats containing tissue cysts, while in Central America a high prevalence has been related to the frequency of stray cats in a climate favouring survival of oocysts and soil exposure.

Transmission occurs through three principle routes: foodborne, animal-to-human (zoonotic), and mother-to-child (congenital). In foodborne transmission, individuals become infected by eating undercooked contaminated meat (especially pork, lamb, and venison) that contain the tissue form of the parasite (bradyzoites). In zoonotic transmission, cats play an important role in the spread of toxoplasmosis and become infected by eating rodents, birds, or other small birds that harbour the parasite. The cat can then shed millions of oocysts in their feces for as long as three weeks after infection, potentially contaminating soil and water in the environment. This may lead to infection in people by accidental ingestion of the oocysts from food or water that has been in contact with the contaminated feces. Intermediate hosts other than humans may also become infected through zoonotic transmission; Toxoplasma has even been detected in marine ecosystems, where ingestion of oocysts from land-to-sea run-off has been suggested as the likely cause for hundreds of sea otter deaths along the California coast (Conrad, et al., 2005). The third form of transmission occurs congenitally, whereby a woman who is newly infected with Toxoplasma during pregnancy passes the infection to her unborn child.

7

T. gondii primary infection in immunocompetent adults (including pregnant women) and children is typically asymptomatic (Montoya and Liesenfeld, 2004). Only a small portion of these cases (~10%) result in non-specific illness that will rarely require treatment (Remington, 1974). In cases where the fetus has been infected, however, symptoms of toxoplasmosis typically include retinochorioditis (inflammation of the eye), hydrocephalus (brain swelling due to accumulation of fluid in skull), convulsions, and intracerebral . Toxoplasmosis can be life-threatening to patients who are immunocompromised or have AIDS (Liesenfeld, 1999), with the disease almost always resulting from reactivation of chronic infection (Porter and Sande, 1992). The central nervous system is the site most typically affected by infection and is characterized by encephalitis (inflammation of the brain) and systemic infections (Montoya and Liesenfeld, 2004). Recent studies have also linked Toxoplasma infection to mental illnesses such as depression, schizophrenia, and suicidal behavior (Flegr, 2007; Zhang, et al., 2012). In severe cases, the patient may die from necrosis (caused by intracellular growth of tachyzoites) of the intestine and lymph nodes that line the abdominal cavity before other organs are severely damaged (Dubey and Frenkel, 1973).

1.1.2.3 Cryptosporidium

Cryptosporidium is a genus of apicomplexan protozoans that causes the diarrheal disease . Several species infect mammals, with disease in humans attributed mainly to C. parvum and C. hominis. Infected individuals or animals can shed millions of oocysts in the stool for weeks after symptoms begin, potentially contaminating soil, sources of food, and water supplies. This may lead to infection in people by accidental ingestion of the oocysts from food or water that has been in contact with the contaminated feces. Once ingested, the parasites inhabit the small intestine leading to infection of intestinal epithelial tissue. The parasite is spread most commonly through contaminated water supplies, and is a major culprit of diarrhea in humans. Its protective outer shell makes eradication of Cryptosporidium difficult since the parasite is able to survive harsh chemical treatments such as chlorine disinfection. In 1993, Milwaukee water supplies contaminated by Cryptosporidium oocysts resulted in the largest documented waterborne disease outbreak in American history where over 400,000 residents became infected (Hoxie, et al., 1997). Cryptosporidiosis, characterized by watery diarrhea, is 8 typically acute and short-term, but can become severe in children and immunocompromised individuals.

1.1.2.4 Other Apicomplexa

Besides Plasmodium species which are responsible for human malaria, there are several other vector-transmitted blood parasites of veterinary importance including Theileria parva, the cause of East Coast fever and Babesia bovis, the cause of tick fever, to which well over half of the world’s cattle population are at risk. In addition to Toxoplasma gondii, a number of coccidian parasites have a major impact on human and animal health world-wide and are among the most successful and widespread parasitic protozoa; this includes Neospora caninum which is the leading cause of abortion in cattle and Eimeria tenella, a species of Apicomplexa having the largest economic impact infecting poultry.

1.1.3 General morphological features

The apical organelles are characteristic secretory vesicles of the Apicomplexa and consist of rhoptries, micronemes and dense granules. The sequential secretion of these apical organelles is key to a successful host-cell invasion (for a review, see (Blackman and Bannister, 2001)). In the early stages of invasion, micronemes release proteins that have cell-binding properties and are involved in host-cell recognition. Rhoptry proteins are then released and participate in a multitude of roles including: (i) contribution to a dynamic host-parasite complex crucial to the internalization of the parasite (known as the “moving junction”), (ii) modification of the host plasma membrane, (iii) facilitation of the formation and integration into the parasitophorous vacuole membrane (PVM), which surrounds the intracellular parasite and (iv) modulation of host cell transcription. The role of dense granules is not yet understood, but are believed to be involved in building the membranous structures of the parasitophorous vacuole (Mercier, et al., 2005). Other features that are common to all members include a haploid nucleus, golgi apparatus, absence of centrioles, , and ejectile organelles and inclusions, and three- membrane pellicle of alveolar structure penetrated by micropores. See Figure 1-2. 9

Figure 1-2 Morphological features of Toxoplasma gondii and Plasmodium falciparum. Figure from (Baum, et al., 2006), used with permission from the publisher.

All apicomplexans are characterized by the presence of the apical complex, which contains an assembly of organelles required for host invasion (Sibley, 2004). All members of the phylum (with the exception of Cryptosporidium) also contain a defining organelle, the apicoplast, a relict non-photosynthetic plastid likely originating from the through secondary endosymbiosis (Keeling, 2004; Kohler, et al., 1997). The organelle has been shown to be essential for parasite development (McConkey, et al., 1997), and because of its absence from the human host, provides a source of potential drug targets against apicomplexans (Soldati, 1999). While the exact role of the apicoplast is unclear, the plastid appears to encode enzymes that function in the biosynthesis of fatty acids (Surolia and Surolia, 2001; Waller, et al., 1998), isoprenoids (Jomaa, et al., 1999) and (Sato and Wilson, 2002; van Dooren, et al., 2002), 10 suggesting that one or more of these compounds could be exported from the apicoplast analogous to what occurs in plastids. Since its discovery, the apicoplast has been the subject of intensive investigation, and several in-depth reviews of these studies are available (Foth and McFadden, 2003; Roos, et al., 2002).

1.1.4 Life Cycle

1.1.4.1 General life cycle of apicomplexans

Apicomplexans have diverse hosts, infecting virtually all animals from mollusks to mammals, with varying host ranges and specificities. For example, P. falciparum is restricted to a single host, infecting humans via transmission by Anopheles mosquitoes whereas T. gondii has an exceptionally large host range and can infect nearly any tissue of warm-blooded animals. Life cycles may be simple where a single host is involved, or complex, in which sexual reproduction in a vector species is required for transmission. In general, the basic life cycle begins when an infective sporozoite enters the host cell and divides repeatedly forming merozoites. Eventually the host cell bursts, releasing merozoites that infect new cells. This step can repeat several times until the merozoites transform into sexually reproductive cells, gamonts, which join together in pairs and form a gamontocyst. Within the gamontocyst, the gamonts divide to form numerous gametes, and fuse together in pairs forming diploid zygotes. Through meiosis, zygotes then give rise to haploid sporozoites, thereby initiating the cycle again.

1.1.4.2 Plasmodium falciparum life cycle

During a blood meal, a malaria-infected female Anopheles mosquito inoculates sporozoites into the human host. The parasites then migrate through the blood stream to liver cells, where they multiply asexually producing thousands of merozoites. The infected liver cells rupture releasing the merozoites, which then infect red blood cells and multiply until the cells burst. This releases more merozoites, repeating the cycle and causing fever each time parasites break free and invade new blood cells. Some merozoites leave the asexual cycle, where instead of replicating they develop into the sexual forms of the parasite, called gametocytes that circulate in the blood 11 stream. If ingested by an Anopheles mosquito during a blood meal, the gametocytes develop further into mature sex cells known as gametes. Inside the mosquito’s stomach, the fertilized female gametes develop into actively moving ookinetes that invade the midgut wall of the mosquito where they form oocysts. Inside the oocyst, thousands of active sporozoites develop eventually causing the oocyst to burst releasing the sporozoites into the mosquito’s salivary glands. Inoculation of the sporozoites into a new human host perpetuates the malaria life cycle. See a summary of the malaria parasite life cycle in Figure 1-3.

Figure 1-3 Malaria parasite life cycle. Figure from www.cdc.gov, used with permission from the publisher.

12

1.1.4.3 Toxoplasma gondii life cycle

There are three infectious stages of the parasite: tachyzoites, bradyzoites (tissue cysts) and sporozoites (oocysts). Oocysts, which are only produced by the definitive host, are passed in feces and when ingested can infect virtually all warm-blood animals. Infection can also occur by consumption of contaminated meat containing tissue cysts. Shortly after digestion, both oocysts and tissue cysts transform into tachyzoites, which are the highly metabolic trophozoite form of T. gondii. The tachyzoites then disseminate throughout the body in macrophages and lymphocytes, or freely in the bloodstream. Within the host cells, tachyzoites divide rapidly causing tissue destruction and spreading the infection. Tachyzoites are also capable of infecting the fetus of a pregnant woman. Eventually, tachyzoites localize to the neural and muscle tissue where up to three days later, tachyzoites may revert back into bradyzoites leading to tissue cysts that may persist in the host for life (Dubey, et al., 1998). While the exact mechanism triggering this conversion process is not yet known, it is believed to be in response to the host immune reaction. The T. gondii life cycle is summarized in Figure 1-4.

Figure 1-4 Life cycle of Toxoplasma gondii. Figure from http://en.wikipedia.org/wiki/Toxoplasma_gondii, used with permission from the publisher. 13

1.1.5 Population Genetics and Virulence

1.1.5.1 Plasmodium

There are over 100 species of Plasmodium, which can infect many animal species including reptiles, birds, and various mammals. Five species cause malaria in humans: P. falciparum, P. vivax, P. ovale, P. malariae, and P. knowlesi (Bhaumik, 2013). Of these, P. falciparum and P. vivax have the greatest impact on human disease. P. falciparum is responsible for the most fatal form of malaria and predominates in Africa where parasite populations show remarkably similar population genetic structure (Mobegi, et al., 2012). The large-scale population structure in P. falciparum follows continental lines, and there are major branches in Africa, South and Central America and South and East Asia. While population differences exist within and between countries and dependent on region, the overall picture is consistent with a recent geographic spread from a source population in Africa, which remains the largest population (Volkman, et al., 2012). P. vivax, on the other hand, is the most prevalent human malaria parasite and has a much wider distribution due to its ability to survive at higher altitudes and in cooler climates. Population genetic studies in P. vivax, however, are limited due to a lack of suitable in vitro culturing methods. As well, information on the geographical origin of isolates is poor and biased by opportunistic rather than systematic sampling (Carlton, et al., 2013).

1.1.5.2 Toxoplasma gondii

Molecular analysis of T. gondii strains dispersed throughout the world suggests that most strains fall into one of the three genotypes (Types I, II, and III) that are due to clonal proliferation of three individual progeny resulting from a single genetic cross (Grigg, et al., 2001; Sibley and Boothroyd, 1992). Type I strains (e.g. RH, GT1) are highly virulent and require only one infectious oocyst to kill 100% of mice (LD100 ~ 1), while Type II (e.g. ME49) and III (e.g. VEG) strains are less virulent and require several orders of magnitude higher inoculum to guarantee mortality (LD100 ~ 105) (Sibley, 2012). Strains also differ in their capacity to induce disease as well as in the rate at which they induce death. Whole genome sequences for the prototypic strains (ME49, GT1, and VEG) reveal three major clonal lineages found in North 14

America and Europe. More recent studies, however, have detected additional genetic variation suggesting a more complex population structure than previously thought. The ME49 strain is the most prevalent clinical isolate.

1.1.6 Available drugs for the Apicomplexa

Currently, there are only a small number of drugs against malaria parasites, with no consistently effective therapy available against other apicomplexans. Plasmodium species, especially P. falciparum, have developed resistance against essentially all drugs in clinical use rendering them ineffective in many parts of the world (Figure 1-5). No effective vaccine exists, although efforts to develop one are ongoing. Existing treatment for malaria consists of four main classes of drugs – antifolates, artemisinins, atovaquone, and quinolines – each with distinct modes of action and molecular mechanisms of resistance, but all targeting some aspect of the parasite’s metabolism.

Figure 1-5 Malaria transmission areas and the distribution of reported resistance or treatment failures with selected antimalarial drugs, September 2004. (WHO guidelines for the treatment of malaria, 2006) 15

The folate metabolic pathway, which includes the enzymes dihydrofolate reductase (DHFR) and dihydropteroate synthase (DHPS), is essential for DNA synthesis and the metabolism of certain amino acids. Humans are not capable of de novo synthesis of folate cofactors, and instead rely on the salvage of folate from external nutrient sources. Pyrimethamine and proguanil (and its analogs) are known to target DHFR activity, while sulfa drugs inhibit DHPS. Sulfonamide drugs such as sulfadoxine act through competitive inhibition of DHPS (Zhang and Meshnick, 1991), but lack sufficient effectiveness when used individually as antimalarials (Nzila, 2006). The combination of sulfadiazine and pyrimethamine is synergistic and currently used as front-line treatments for both malaria and toxoplasmosis. In cases where the patient is intolerant to sulfonamides or pyrimethamine, atovaquone can be used as an alternative. Atovaquone is a licensed antiparasitic agent for the treatment of clinical malaria, toxoplasmosis, and babesiosis, selectively inhibits the cytochrome bc1 complex resulting in a collapse of the mitochondrial membrane and death of the parasite (Mather and Vaidya, 2008).

Two classes of antimalarials, the artemisinins and quinolines, inhibit parasite growth by interfering with hemoglobin degradation. Malaria parasites rely on hemoglobin as a source of amino acids. Digestion of hemoglobin releases free heme, however, which catalyzes the production of reactive oxygen species that is toxic to the parasites. To detoxify the compound, the parasite converts the compound into an insoluble and chemically inert substance () through a process known as heme biocrystallization (Hempelmann, 2007; Pagola, et al., 2000). In malaria parasites, the detoxification process is distinct from that of mammals, where instead the enzyme heme oxygenase breaks down excess heme into biliverdin, iron, and carbon monoxide (Kikuchi, et al., 2005). Current inhibitors of heme biocrystallization are the quinoline drugs such as , and mefloquine. Artemisinin is an extract from the sweet wormwood plant, Artemisia annua, and is another antimalarial that targets hemoglobin digestion. The production of artemisinin is very expensive, so synthetic derivatives such as artemether, artesunate and dihydroartemisinin are utilized instead and incorporated into artemisinin-based combination therapies (ACTs), which are currently the treatment of choice for uncomplicated malaria (www.who.int). The mechanism of action for these drugs is not well- elucidated but is thought to involve the production of free radicals during the process of heme degradation (Meshnick, 2002). 16

A number of drugs against the Apicomplexa rely on compounds derived from antibacterial chemotherapy that interfere with essential functions of the apicoplast. Due to similarities of the transcriptional and translational machinery encoded by the apicoplast to bacterial counterparts, antibiotics such as doxycycline, clindamycin, and spiramycin, all of which block protein synthesis, have shown potent parasiticidal properties and used clinically for the treatment of malaria and toxoplasmosis (Fichera and Roos, 1997). Nitazoxanide, which exhibits a broad range of efficacy against protozoan parasites, is currently available for the treatment of persistent diarrhea caused by Cryptosporidium parvum as well as for the treatment of equine myeloencephalitis caused by the apicomplexan, Sarcocystis neurona (Esposito, et al., 2005). Treatments against Babesia include diminazene diaceturate, effective for bovine and canine babesiosis, and quinine and clindamycin for symptomatic human babesiosis (Vial and Gorenflot, 2006). Ionophores are the fermentation products of Streptomyces and other fungi species, and extensively used as treatment for poultry coccidiosis caused by Eimeria spp. (Yadav and Gupta, 2001).

1.1.7 Metabolism as a source of drug targets

Enzymes are considered by many in the pharmaceutical community to be the most attractive targets for small molecule drug intervention in human diseases. The attractiveness of enzymes as targets stems from their essential catalytic roles in many physiological processes that may be altered in disease states. In addition, the structures of enzyme active sites and other ligand binding pockets on enzymes lend themselves well to high-affinity interactions with drug-like inhibitors. Not surprisingly, enzymes represent the most prominent target family inhibited by drugs currently on the market today (Hopkins and Groom, 2002; Imming, et al., 2006). A recent systematic study indicates that enzymes comprise 29% of all human drug targets approved by the US Federal Drug Administration (Rask-Andersen, et al., 2011). Here I provide a brief overview of the salient features of enzyme catalysis and structure than make this class of macromolecules such attractive targets for chemotherapeutic intervention of human diseases.

17

1.1.7.1 Enzymes are essential for life

A simple but common definition for life as being “a series of chemical reactions” reflects the fact that living cells and in turn organisms depend on chemical transformations for every essential life process. The synthesis of the building blocks of life (proteins, nucleic acids, polysaccharides, and lipids) all involve sequential series of chemical reactions, which if uncatalyzed, would proceed a rate too slow to sustain life. For instance, the formation of UMP via the decarboxylation of OMP can be enhanced by 1017-fold by the enzyme OMP decarboxylase (EC 4.1.1.23) enabling the reaction to proceed at the rapidity necessary for living organisms (Radzicka and Wolfenden, 1995). Given the essentiality of enzymes for life, selective inhibition of critical enzymes in apicomplexan parasites is an attractive means of therapeutic intervention for many human diseases such as malaria, toxoplasmosis, and cryptosporidiosis.

1.1.7.2 Mechanistic basis of enzymes for highly directed drug design

Because of the mechanisms associated with the catalytic process, enzymes offer unique opportunities for drug design that are not available to cell surface receptors, nuclear hormone receptors, ion channels, transporters, and DNA (Robertson, 2005). In fact, the majority of drugs that target enzymes have some resemblance to the respective substrate either by undergoing catalysis in the active site, chemically reacting with the enzyme cofactor, or containing a structural motif related to the substrate (Robertson, 2005). Thus, by competing with the natural substrate, these drugs inhibit enzyme activity by wasting catalytic cycles. Sulfa drugs, for instance, including those that are used against dihydropteroate synthase (DHPS) in malaria parasites, are structural analogues of the enzyme’s substrate, pABA, and act by competitive inhibition (Zhang and Meshnick, 1991). Another feature that has been exploited for enzyme- targeted drugs is their ability to function as “transition-state inhibitors” based on the principle that enzymes binding more tightly to substrates that are distorted towards the transition state, thereby enforcing catalysis (Schramm, 1998). Oseltamivir and zanamivir, which are now used to treat influenza, are two potent transition-state inhibitors that were designed using the crystal structure of the enzyme and model of the transition state (von Itzstein, et al., 1993). Many drugs that target enzymes are irreversible inhibitors; in these cases, either the enzyme is covalently 18 modified by the respective drug or in some cases binding lasts so long (up to days) to the extent that it is considered functionally irreversible (Robertson, 2005). Aspirin, being perhaps the most highly consumed drug in North America, is an irreversible covalent inactivator of prostaglandin- endoperoxide synthase; incubation of the purified enzyme with [acetyl-3H]aspirin leads to inactivation and covalent modification of a single serine residue which can be recovered as [3H]acetylserine in proteolytic digests of the enzyme (Van Der Ouderaa, et al., 1980). In short, distinguishing features which enable enzymes to perform a selective chemical reaction sets enzymes apart as a specialized class of targets for highly directed drug design.

1.2 Experimental manipulations in the Apicomplexa

Unlike other apicomplexans, T. gondii is readily amenable to genetic manipulation in the laboratory. Features such as high efficiency of transient and stable transfection, the availability of many cell markers, and the relative ease with which the parasite can be studied using advanced microscopic techniques makes cell biology studies for the parasite more readily performed (Kim and Weiss, 2004). For these reasons, T. gondii remains the best model system for studying the biology of the Apicomplexa. Much of the understanding of mechanisms of drug resistance, biology of the apicoplast, and process of host cell invasion has been advanced through studies in T. gondii. While recent studies of the Apicomplexa that have been facilitated through genome sequencing have revealed surprising differences in cell biology and metabolism in the phylum, T. gondii remains an important system for understanding the biology of apicomplexan parasites.

1.2.1 Methods for genetic manipulation

1.2.1.1 Transfection and transformation

A protocol for transfection was first established through the combined efforts of several laboratories (Donald and Roos, 1993; Kim, et al., 1993; Soldati and Boothroyd, 1993). The ability to transfect Toxoplasma led to the rapid development of various tools for genetic manipulation of T. gondii. For example, gene disruptions and the stable expression of transgenes 19 are readily achievable with transformation through either homologous recombination or random integration. Furthermore, a wide variety of markers are available for the selection of stable transfectants (Donald, et al., 1996; Donald and Roos, 1993; Kim, et al., 1993; Messina, et al., 1995; Sibley, et al., 1994; Soldati, et al., 1995). Currently, T. gondii remains the apicomplexan species most readily amenable to genetic manipulation. Transient transfection efficiency is high and expression of epitope tags, reporter constructs and heterologous proteins is relatively uncomplicated (Kim and Weiss, 2004).

1.2.1.2 Gene knockouts

Transfection technology can also be used to remove or alter endogenous genes, where the use of gene targeting or “knock-out” experiments provide a means for assessing the involvement of a gene and its product in a given biological process. T. gondii is haploid and most proteins are encoded by a single-copy gene, so targeting a particular locus is usually sufficient to produce a null mutant. The approach of gene targeting is based on a double-crossover homologous recombination event between the replacement locus provided by transfection of a plasmid and the target locus in the genome. When linearized, constructs for gene targeting typically contain genomic sequences from the 5' and 3' ends of the target gene flanking a selectable marker. If an assay for phenotypic selection against the target gene is available, gene targeting is straightforward. However, in the absence of selection, random integration can produce considerable background but can be counter-selected against through the introduction of a second negative selectable marker outside the homologous flanking sequence (Gilbert, et al., 2007). Deletion of the KU80 protein, an essential component of the nonhomologous recombination pathway, significantly increases the efficiency of gene targeting (Fox, et al., 2009).

For essential genes, the use of gene targeting is less than suitable since knocking out the gene will result in a non-viable mutant that cannot be studied. For instance, a gene targeting experiment in T. gondii may “fail” (i.e. unable to produce a knockout line), indicating that the locus is either essential or that integration of the targeting plasmid is occurring largely by non- homologous recombination. In such cases, the use of a conditional gene knockout approach is 20 more appropriate and has be done using the tetracycline regulatable expression system, which enables the observation of loss-of-function mutants for essential genes (Meissner, et al., 2001). The system is not perfect, however, since in order to maintain expression of the essential gene, parasites must be in presence of the drug anhydrotetracycline (ATc) for a prolonged period of time eventually leading to the generation of revertants and subsequent loss of regulation. An improved version of the system based on the use of a functional transcriptional activating is suitable for the conditional disruption of essential genes with no apparent reversion effects (Meissner, et al., 2002).

1.3 Metabolic reconstruction and analysis as a route to drug discovery

Analysis of metabolic networks has proven useful for many applications (Pal, et al., 2006; Pharkya and Maranas, 2006; Samal, et al., 2006), and in the context of eukaryotic parasites has the potential to play an important role in drug discovery (Cowman and Crabb, 2003). With advances in metabolic network analysis and the increasing number of apicomplexan and host genomes, we can gain insights into parasitology through metabolic reconstruction. Previous analyses of metabolic networks revealed the presence of many pathway holes, enzymes that are essential for a complete biochemical pathway, but for which no gene has been identified (Green and Karp, 2004). This is especially evident in parasite genomes due to the large phylogenetic distance that separates parasites and model organisms (Pinney, et al., 2005). These enzymes are difficult to identify as they may either be absent and supplied by the host, or present and have diverged too far to be recognizable.

The apicomplexans have developed strategies to survive in their environmental niche and as a result have that have been shaped largely by adaptive evolution. Defining features such as the apicoplast has revealed pathways that are novel to apicomplexans and absent from the host. For example, the shikimate pathway, previously believed to be confined to , , and fungi, has been shown to be functional and necessary for growth in Plasmodium, Toxoplasma, and Cryptosporidium (Roberts, et al., 1998). The pathway encodes seven enzymes that lead to the production of chorismate, a common precursor to folates and aromatic compounds. Humans and other mammals, which lack the shikimate pathway, rely exclusively 21 on exogenous folates and aromatic compounds. The abandonment of some pathways is also a driving force in the evolution of obligate parasites, and is particularly evident in Cryptosporidium. With one of the smallest known apicomplexan genomes, Cryptosporidium is an example where metabolic streamlining has occurred due to a reliance on the host for supplying essential nutrients (Abrahamsen, et al., 2004; Xu, et al., 2004). Thus, there is a need to fill these gaps to better understand the biology of these parasites. Comparisons between apicomplexan and host metabolic networks are useful for highlighting specific adaptations that the parasite evolved to survive in the host. More in-depth global comparisons and analyses of metabolic networks will provide insights to better understand the biology of Apicomplexa.

1.3.1 Metabolic reconstruction for the Apicomplexa

Metabolic reconstruction has become an indispensable tool for studying metabolism from a systems perspective (Feist and Palsson, 2008). In addition to crystallizing current knowledge on metabolism and highlighting areas requiring further investigation, in the context of parasites, reconstructions are playing an increasingly important role in drug discovery (Pinney, et al., 2007). Early metabolic reconstructions focusing on model organisms such as and Yeast relied almost exclusively on experimental evidence and are hence extremely well characterized (Herrgard, et al., 2008; Reed and Palsson, 2003). Accordingly, they serve as useful models for the generation of reconstructions in less well-studied species. With the availability of a fully sequenced genome, established protocols have been developed and typically involve the initial generation of a draft reconstruction, using automated homology-based methods, followed by iterative rounds of refinement through experimentation (when possible) and manual curation, whereby functions and reactions are individually evaluated against organism-specific literature and/or expert opinion (Thiele and Palsson, 2010). See Figure 1-6 for a schematic overview of the metabolic reconstruction process.

22

Figure 1-6 Metabolic reconstruction for apicomplexan genomes. The steps involved in generating a draft metabolic reconstruction beginning with the genome sequence and ultimately finishing with a list of potential drug targets. Data structures are represented by light blue rounded boxes, and methods are displayed as pink boxes.

Genomes for the Apicomplexa are now available for several species (Section 1.1.1.1). The ability to identify enzymes in these genomes, however, is complicated by high levels of sequence and functional divergence relative to the well-annotated model organisms. In adapting to a broad range of host environments, these obligate parasites have acquired a variety of sophisticated strategies to access host cell nutrients resulting in significant remodelling of their own metabolism (Seeber, et al., 2008). Comparisons to model organisms reveal an overall marked reduction in metabolic capabilities due to extensive gene loss, together with lineage-specific innovations resulting from lateral gene transfer, endosymbiotic events, or high rates of divergence (Anantharaman, et al., 2007; Wasmuth, et al., 2009). Amongst these innovations might be a number of lineage-specific enzymes that do not share similarity to previously sequenced enzymes and in the absence of biochemical data will remain uncharacterized (Kuo and Kissinger, 2008). Hence, although draft reconstructions for Apicomplexa can be easily generated, in the absence of a comprehensive biochemical investigation, these reconstructions 23 should only be considered as a hypothetical view of an organism’s metabolic capabilities, limiting subsequent analyses (Ginsburg, 2009). Nevertheless, such reconstructions allow the generation of testable hypotheses concerning the presence of key enzymes and pathways. For example, a recent reconstruction of Plasmodium falciparum metabolism suggested nicotinate mononucleotide adenyltransferase to be essential. This was subsequently confirmed in a biochemical assay (Plata, et al., 2010).

1.3.2 Importance of enzyme annotation for metabolic reconstruction

Prior to the advent of genomic sequencing, metabolic network construction involved painstaking studies focused on individual enzymes derived from an organism of interest. As a result, only a small number of reconstructions were possible (Papoutsakis, 1984), but are very well-developed and serve as models for the reconstruction process in other organisms (Christie, et al., 2004). The challenge is now to apply this knowledge to help annotate the increasing numbers of genomes that are currently being sequenced. The process has been aided by the development of the Enzyme Commission (EC) hierarchy in which sequences are mapped to distinct enzymatic functions. Due to the large number of genes encoded by a genome, the annotation process is reliant on the use of automated methods based on sequence similarity or profile discovery. Commonly used tools such as BLAST (Altschul, et al., 1990) are useful when looking for homologues based on overall sequence similarity, while more sensitive profile based methods [e.g. PFAM (Bateman, et al., 2002), PROSITE (Hulo, et al., 2008)] focus on the conservation of domains, sequence patterns and even single residues (Mistry, et al., 2007; Zhang, et al., 2008; Zhang and Meshnick, 1991).

1.3.2.1 Enzyme annotation

To determine the enzyme complement of an organism, genes encoding catalytic proteins are mapped to an Enzyme Commission (EC) number: a hierarchical classification system that uniquely assigns a four-digit number to biochemically characterized enzymatic reactions (http://www.chem.qmul.ac.uk/iubmb/enzyme/). For example, EC 1.X.X.X represents classes of 24 oxidoreductases, whereas EC 1.1.1.1 represents specifically alcohol dehydrogenase. Because of the complex lifecycles of parasites, enzyme annotation is not amenable to genome-scale biochemical investigations. Instead, a number of alternative methods have been employed. Herein, accessible, competitive, and actively maintained resources suitable for the metabolic reconstruction of apicomplexan parasites will be the focus (Table 1-2). These methods can be categorized according to the level of annotation, varying from purely automated approaches where the set of gene-enzyme mappings are obtained in a high-throughput fashion to fully curated datasets where an expert has validated the occurrence of each individual enzyme. Manual curation is a time consuming process limited by the availability of curators. Consequently, such datasets tend to be relatively incomplete albeit of the highest quality data. By contrast, automated datasets can be rapidly generated for an entire genome, providing a higher level of coverage at the expense of accuracy. Several methods for enzyme annotation in P. falciparum have been evaluated (Ginsburg, 2009; Hung, et al., 2010) and surprisingly display a lack of overlap. For example, of 327 enzymes annotated by KEGG, only 269 are similarly annotated in BioCyc (Hung, et al., 2010). Because each approach has its own merits, a more meaningful reconstruction can be derived with a judicious integration of these datasets providing a compromise between accuracy and coverage.

Table 1-2 Tools and resources for apicomplexan enzyme annotation.

Automated tools BLAST Local alignment similarity http://blast.ncbi.nlm.nih.gov/ DETECT Density estimation http://www.compsysbio.org/projects/DETECT/ EFICAz Functionally conserved residues http://cssb2.biology.gatech.edu/EFICAz/ PRIAM Enzyme-specific profiles http://priam.prabi.fr/ Curated resources BioCyc Metabolic pathways www.biocyc.org EuPathDB Genomics and functional genomics database for http://eupathdb.org/eupathdb/ eukaryotic pathogens KEGG General metabolism http://www.genome.jp/kegg/ BRENDA Enzymes found in primary literature http://www.brenda-enzymes.org/ MPMP Manually constructed pathways in P. falciparum http://sites.huji.ac.il/malaria/

25

1.3.2.2 Automated methods for enzyme prediction

For the identification of enzymes that are highly conserved and well represented in sequence databases, traditional homology-based searches are appropriate. Typically such enzymes are associated with core functionalities such as nucleotide and energy metabolism (Peregrin-Alvarez, et al., 2009). For this reason, the application of BLAST (Altschul, et al., 1990), popularized for its usability and speed, is often the first method of choice for obtaining a first-pass annotation of a genome. In its simplest form, the user annotates a gene with the top scoring enzyme match based on a specified E-value cutoff. Other homology-based methods include the profile methods such as PRIAM (Claudel-Renard, et al., 2003) and its genomic equivalents metaSHARK and metaTIGER (Pinney, et al., 2005). The PRIAM tool searches a set of position-specific scoring matrices generated using PSI-BLAST to search for enzymes in the proteome. metaTIGER improves on PRIAM by employing separate profiles encapsulating well-annotated eukaryotes (Pinney, et al., 2005). Whereas a recent analysis of metaTIGER applied to P. falciparum identified the need for additional curation (Ginsburg, 2009; Hung, et al., 2010), a subsequent analysis nonetheless demonstrated its utility at least for annotating certain pathways (Whitaker, et al., 2009).

Methods like BLAST operate on the basis of single-sequence comparisons, where the top- scoring match defines the catalytic activity that is predicted for a given protein sequence. This can be problematic for low quality matches and instances where the protein sequence represents a multifunctional enzyme that can arise through gene fusion events, as exemplified by the bifunctional enzyme dihydrofolate reductase-thymidylate synthase (Bzik, et al., 1987). As an indication of the scale of this problem, the Swiss-Prot database annotates ~128 000 proteins with an enzymatic activity, of which ~4000 (3%) possess more than one activity. Multifunctional enzymes can also arise where an enzyme demonstrates expanded substrate specificity. For example, the Toxoplasma gondii gene TGME49_024490, annotated as a farnesyl-diphosphate synthase in EuPathDB, has also been shown to possess genanylgeranyl-diphosphate synthase activity (Ling, et al., 2007). A recent assessment of enzyme sequence diversity (Hung, et al., 2010) revealed that nearly 70% of all enzyme families exhibit some degree of overlap between alignments of the same or different enzymes families. Those with the highest degree of overlap 26 belonged to broad substrate enzymes (e.g. protein kinases). Hence by focusing only on top sequence matches, in addition to only partially annotating multifunctional enzymes, it is also possible that the wrong substrate specificity can be assigned.

A recently developed method, DETECT (Hung, et al., 2010), avoids these limitations by calculating a probability score that accounts for the sequence diversity present across enzyme families. Instead of relying on similarity to a single sequence, enzyme activities are predicted based on the integration of similarity scores for all relevant proteins of a particular enzyme family. The resulting probability score generates ranked lists of enzyme candidates that can be exploited for the purposes of metabolic reconstruction. Applied to P. falciparum, DETECT was found to significantly outperform BLAST and PRIAM.

A final noteworthy approach is EFICAz (Arakaki, et al., 2009) which combines four components that each predict enzyme function based on enzyme-specific profiles of functionally discriminating residues. The method has been applied at the genome-scale for over 20 eukaryotes including two apicomplexans, and analysis of the predictions for the human proteome has indicated significantly greater numbers of unique enzyme assignments compared to KEGG; however, further assessments revealed high numbers of false positives together with other inconsistencies (Arakaki, et al., 2009).

1.3.2.3 Curated resources for metabolic reconstruction

Accurate enzyme annotation is crucial if the metabolic reconstruction is to be used for subsequent downstream analyses. Despite advances in enzyme prediction, many problems persist including: incorrect substrate specificity, cofactor usage, reaction reversibility, pathway holes, and false positive predictions. Software tools such as meta-SHARK and PathwayTools in conjunction with methods for gap analysis (Chen and Vitkup, 2006; Lee and Sonnhammer, 2003; Suhre, 2007); (Zhou, et al., 2008) can be used to aid in the refinement process, but do not replace manual curation. Accordingly, several research groups have devoted significant time and effort to provide high quality annotations by improving enzyme predictions with expert curation. 27

As a partially curated metabolic resource, KEGG is widely used for its reference pathway maps based on the current knowledge of biochemistry. Pathway maps are presented as reaction networks where enzymes (and compounds) link to detailed entries containing information on nomenclature, chemistry, and gene annotation. KEGG performs automatic genome-based annotation for the prediction of enzyme-encoding genes by applying a method based on BLAST for best-hit functional assignment. Although enzyme entries in KEGG include support from the primary literature, these references are absent for most species. Further, in absence of the scores used to identify enzymes based on similarity searches, the quality of the annotations can be difficult to assess.

BioCyc (Caspi, et al., 2010) is a collection of over 500 organism-specific Pathway/Genome Databases (PGDBs) resulting from the prediction of metabolic networks by the PathwayTools software (Karp, et al., 2009). PGDBs are constructed from experimentally determined pathways (Caspi, et al., 2006), but because enzymatic functions are inferred by matches to gene annotations, the quality of such an automated construction can be compromised for organisms with a poorly annotated genome. Fortunately, databases that have been developed for apicomplexan species are designated ‘Tier 2’ and therefore receive a moderate amount of review with occasional updating. Curators that collaborate on a given PGDB can assign credits to objects in the database and have the ability to contribute literature references, evidence codes for enzyme functions, and other information that is not present in pathway databases such as KEGG (Kanehisa, et al., 2010).

EuPathDB (Aurrecoechea, et al., 2009), an integrated database for 11 eukaryotic pathogens including four genera of apicomplexans, provides access to genomic and functional genomic data with links to external databases such as KEGG, BLAST, and OrthoMCL (Chen and Vitkup, 2006). Apicomplexan PGDBs from BioCyc are also located in EuPathDB (Karp, et al., 2009), but differences in numbers of predicted enzymes indicate that data from EuPathDB is not fully synchronized with BioCyc (Figure 3-1). For instance, PGDBs are available for P. berghei, P. chabaudi, and P. falciparum, but EuPathDB provides enzyme predictions for only P. falciparum. Notwithstanding, EuPathDB provides high quality enzyme annotations for the Apicomplexa, which includes enzyme predictions for BioCyc, annotations from sequencing annotation centers, 28 and additional curation from the scientific community. In some cases, contributions are included for fully curated datasets such as enzyme predictions from the manually constructed metabolic pathways for P. falciparum.

The malaria parasite metabolic pathway (MPMP) resource (Ginsburg, 2006) represents a gold standard set of annotations for P. falciparum. Pathways are manually curated based on the presence of at least three to four enzymes acting consecutively and with annotations in the genome. Pathways not meeting these criteria are also accepted if supported by biochemical evidence. Because of its comprehensive coverage and focus on pathways relevant to P. falciparum, MPMP remains the most authoritative metabolic database available for any apicomplexan. As such, MPMP is not only an essential tool for exploring malarial metabolism but serves as a valuable model for the development of similar resources representing other apicomplexans.

Finally, the BRENDA database (Chang, et al., 2009) provides a resource for enzymes that have been identified in the primary literature, with the latest release (July 2010) indicating that P. falciparum contains 162 experimentally verified enzymes. BRENDA also contains a wide range of functional data for enzyme classes including biochemical and molecular properties of enzymes.

1.3.3 Modelling, Network-based methods and FBA: Applications to drug discovery

Many researchers have applied computational approaches to study topological characteristics of the metabolic network and their impact on various functional properties (Deutscher, et al., 2006; Jeong, et al., 2000; Stelling, et al., 2002). As well, a number of well-developed tools are available for network analysis including Cytoscape (Shannon, et al., 2003) and tYNA (Yip, et al., 2006). These tools can be used to analyse the structure and organization of the network to identify enzymes that are critically positioned in the network and therefore likely to play an important role in the metabolism of the organism. Examples of commonly used parameters are illustrated in Figure 1-7 Common measures used in network analysis.. Comparison of these 29 parameters between apicomplexans and model organisms may reveal critical enzymes such as those defined as “hubs” (Albert, et al., 2000), which are highly connected nodes essential to the integrity of the network, “chokepoints” (Yeh, et al., 2004), which are enzymes that catalyze reactions that cannot be compensated by any other enzyme, and “bottlenecks” (Yu, et al., 2007), which are paths that control the flow of most of the reactions and therefore represent critical points in the network.

Figure 1-7 Common measures used in network analysis. (i) Shortest path is the minimum number of edges between two nodes (the shortest path between 1 and 8 is 1 à 5 à 8); (ii) Betweenness measures the number of shortest paths passing through a node (node 8 has the highest betweenness centrality); (iii) Degree represents the number of edges connected to a node (node 7 has the highest degree).

With the increasing availability of high quality metabolic reconstructions, a variety of modelling procedures have been developed to analyze how these reconstructions are organized and operate. In particular, a number of constraint-based approaches have been developed which now make it possible to study how systems organize with scarce kinetic data (Raman and Chandra, 2009). The most established method is flux balance analysis (FBA) (Kauffman, et al., 2003; Lee, et al., 2006). FBA is a sophisticated and elegant mathematical modelling framework that captures the complex interrelationships of metabolic networks. Within this framework, the flux of metabolites is computed through the network to allow for the identification of key enzymes and pathways that process nutrients imported into the cell to the final metabolites required for growth. 30

As the fundamentals of flux balance analysis are quite simple, the method has found diverse uses in physiological studies, gap-filling efforts and genome-scale synthetic biology (Feist and Palsson, 2008). In addition, by altering bounds on certain reactions, growth in different conditions or growth of the organism with multiple gene knockouts can be simulated (Edwards, et al., 2002); FBA can then be used to predict the yields of important cofactors such as ATP, NADH, or NADPH (Varma and Palsson, 1995). For most genome-scale metabolic reconstructions which are incomplete and contain “knowledge gaps”, FBA-based algorithms can help to predict which reactions are missing by comparing in silico simulations to experimental results (Kumar and Maranas, 2009; Reed, et al., 2006). FBA has also found applications in metabolic engineering, where the goal is to confer useful properties to a biological system, such as the ability to achieve higher efficiencies in metabolite production through alterations in metabolic flux distribution. The FBA-based algorithm, OptKnock (Burgard, et al., 2003), for example, can predict gene knockouts that allow an organism to produce desirable compounds (Feist, et al., 2009; Park, et al., 2009).

There is a growing effort to use network models to identify drug targets and characterize mechanisms of disease. For instance, a network-based pipeline for identifying potential antimicrobials is in the process of development (Shen, et al., 2010), while the human metabolic network reconstruction has been analysed to identify alternative enzyme drug targets for treating hyperlipidemia (Ma, et al., 2007) and subsequently to predict biomarker changes characterizing a diverse set of genetically inherited metabolic disorders (Shlomi, et al., 2009). Importantly, two recent metabolic models for P. falciparum have been published predicting distinct but overlapping sets of clinically relevant drug targets (Huthmacher, et al., 2010; Plata, et al., 2010). Huthmacher et al. integrate gene expression data to provide a life cycle stage-specific model that predicts 35 of 57 experimentally demonstrated essential enzymes. The model by Plata et al. is able to reproduce phenotypes of experimental gene knockout and drug inhibition assays with up to 90% accuracy. While each builds distinct frameworks to represent the metabolism of the malaria parasite, both models are able to produce predictions that are in good accordance with experimental evidence. These along with future models that incorporate additional experimental datasets for more precise metabolic measurements will lead to a better understanding of parasite 31 physiology and ultimately accelerate the identification of desperately needed new drug leads against apicomplexan diseases.

1.4 Project goals

Our current knowledge on apicomplexan metabolism has revealed a number of pathways specific to the phylum, but for which many enzymes have not yet been elucidated. Due to selection pressures associated with surviving in an obligate host, these enzymes are either absent from the parasite, or present in the parasite and have evolved to be highly divergent from the host organism. To answer key questions concerning the evolution and conservation of apicomplexan parasites, I am interested in identifying parasite-specific enzymes that are critical for survival and up-regulated during stages of parasite growth and infection. These represent adaptations of the parasite to persist in the host and are important targets for therapeutic intervention. With the availability of genomics, functional genomics, and high-throughput datasets, I can systematically study the metabolism of apicomplexan parasites (as outlined by my three main aims). To begin an analysis of apicomplexan metabolism requires the reconstruction of its metabolic network through the accurate identification of enzymes. The network serves as an ideal platform for comparative analysis (Aim1), where I can explore how pathways vary within and across genera to highlight important enzymes based on their topological relationship to other enzymes in the network and their patterns in sequence conservation relative to other organisms and taxonomic classes. These enzymes, which are critical for metabolism and divergent from human, represent important adaptations in . The in vivo significance of these enzymes will be examined using gene expression data (Aim 2). Transcriptome data generated through microarray and RNASeq technologies will be analysed to highlight the enzymes that are significantly up-regulated during clinically relevant stages of the parasite and therefore critical for survival in the host. Candidate enzymes identified in silico will be experimentally validated (Aim 3). Biochemical assays will confirm the activity of the enzymes, and knockout studies will establish if the identified genes are essential for parasite survival and persistence. The results from this work will provide a wealth of insights into the biology of apicomplexans that will be useful in the foundation for the development of anti-parasitic therapeutics.

32

Chapter 2 Improving enzyme annotation for accurate metabolic reconstruction

2 Improving enzyme annotation for accurate metabolic reconstruction

2.1 Introduction

Due to the large number of genes encoded by a genome, the annotation of genes is heavily reliant on the use of automated methods based on sequence similarity (e.g. BLAST) or profile discovery (e.g. PFAM). Although widely applied in databases such as KEGG (Kanehisa, et al., 2007) and BioCyc (Caspi, et al., 2007), sequence-based methods suffer from limitations when dealing with enzymatic proteins. In particular, homology transfer methods typically ignore the range of sequence variation characterizing individual enzyme classes; while certain enzymes are relatively easy to classify based on sequence conservation, there are numerous examples of enzymes that share high sequence similarity and catalyze different reactions (Rost, 2002). Furthermore, the use of only a single protein match to assign function (usually the highest scoring homolog) greatly impacts the reliability of the assignment. In the absence of some measure of reliability, genes may be indiscriminately annotated with incorrect EC numbers that can lead to non-biologically relevant interpretations (Green and Karp, 2005). Consequently, there is a need to associate enzyme annotations with an appropriate confidence score, which is lacking even in recently developed functional classification tools (Espadaler, et al., 2008; Levy, et al., 2005). In an attempt to circumvent these issues, enzyme databases such as KEGG and BioCyc devote considerable effort to manual curation to improve on the quality of annotations. However, critical comparisons of these databases reveal numerous discrepancies, highlighting the importance of developing consistent methods and standards for assigning reliable gene annotations (Ginsburg, 2009).

Herein, I describe a new probabilistic method for enzyme prediction based on both global and local sequence alignment, which I term the Density Estimation Tool for Enzyme ClassificaTion (DETECT). DETECT uses a Bayesian framework that integrates information from density estimation profiles generated for each EC number. Instead of relying on similarity to a single 33 sequence, a probability score is calculated from the similarity scores of all relevant proteins for a particular EC number. Each EC number is represented by a family of proteins that have been either biochemically characterized or computationally predicted to have the same catalytic activity. I define an enzyme family as the set of proteins that share a common EC number. A high-scoring match of an unknown protein to a single member of an enzyme family provides evidence for shared catalytic activity, but is not always sufficient to justify the transfer of function. In many cases, the unknown protein will also exhibit varying degrees of similarity to other members of the family and, more importantly, may also show similarity to proteins from different enzyme families. Based on this prior information, a posterior probability can be computed representing the likelihood of membership to a particular enzyme family. By providing a list of probabilities associated with predicted functions for an unknown enzymatic protein, DETECT not only improves enzyme annotation accuracy, but also provides ranked lists of candidate enzymes that may be exploited for the purposes of metabolic network reconstruction. Applying DETECT to the genome of the apicomplexan Plasmodium falciparum, the causative agent of malaria, I demonstrate how DETECT improves on existing approaches.

2.2 Materials and Methods

2.2.1 Protein sequence and enzyme data

Proteins used for generating the density estimation profiles were obtained from Swiss-Prot Version 57.0 which is considered a non-redundant database . Sequence fragments were excluded by rejecting sequences annotated with a description field containing “fragment”. The 397,213 proteins taken from Swiss-Prot were assigned either: (i) a complete EC number, (ii) an incomplete EC number (either partial EC or contained “n” as an EC digit), (iii) multiple EC numbers, or (iv) no EC number (representing unannotated enzymes and/or non-enzyme proteins). To avoid ambiguity, multifunctional enzymes (proteins that have been assigned multiple EC numbers) and incompletely annotated enzymes were excluded, leaving 127,478 proteins spanning 2,277 complete EC categories. The proteome for P. falciparum was obtained from PlasmoDB (version 5.5 - http://www.plasmodb.org) (Bahl, et al., 2003). As of January 34

2009, the P. falciparum genome consists of 5459 proteins, 2022 of which have been functionally annotated, leaving 3257 proteins annotated as “unknown function”.

2.2.2 Generation of probability profiles

To increase efficiency of analyses, proteins from Swiss-Prot were aligned using BLAST and filtered with an E-value < 1. Protein pairs from these alignments were globally aligned using the Needleman-Wunsch algorithm (Rice, et al., 2000), resulting in a total of 55,089,045 alignments covering 4147 species (based on unique Swiss-Prot identifiers). Here, global rather than local alignments have been used to avoid individual domain matches that may not be relevant to the catalytic specificity of the enzyme. All self-alignments and alignments with a BLAST bit score < 50 were considered unmeaningful and excluded from the DETECT analysis. EC categories were mapped to query and hit proteins of all global alignments. Alignments were grouped by the EC annotation of the query protein. Each EC-specific set of alignments was further divided into one of two categories: (i) positive alignments: aligned protein is annotated with the same EC number as the query protein, or (ii) negative alignments: aligned protein is annotated with a different EC number (or has no EC number at all). A probability profile was generated for each enzyme, consisting of the density estimation values for positive alignment scores and density estimation values for negative alignment scores. Density estimation values were calculated using the R statistical program (http://www.r-project.org/). This was performed for all enzyme categories with ≥ 30 members (585 EC categories). EC categories with < 30 members were considered to have inadequate sequence data to produce accurate and meaningful probability profiles.

2.2.3 Probability score calculation

The probability profiles provide a probabilistic framework for predicting the enzymatic function of an unknown protein based on sequence diversity. For an unknown protein P, pairwise global sequence alignments are generated with every Swiss-Prot protein. The resulting alignments are then categorized by the EC number of the aligned protein (only proteins annotated with EC categories are included in this analysis). The non-redundant set of hit EC numbers E = {E1, E2, 35

…, En} represents n potential enzyme activities that are performed by the unknown protein. A probability score corresponding to the likelihood that P has enzyme activity Ei is calculated for each i=1, 2, …, n, based on the respective alignment scores, together with the prior information known about Ei. These multiple pieces of evidence are integrated using Bayes Theorem as follows:

Given unknown protein P and alignments that have been categorized by EC number as described earlier, let F be the EC category of interest and m be the number of alignments between P and proteins with activity F. Now define the alignments as a1, a2, …, am having hit proteins f1, f2, …, fm, and scores s1, s2, …, sm. The probability pi that P belongs to F based on ai is

PF()⋅ Ps (|i F ) . (1) pPFsii==(|) PF()⋅ Ps ( | F )+ PF (')(|⋅ Ps F ') ii|

P(F) is the probability that a protein belongs to F, calculated as the ratio of proteins with F activity to the total number of proteins. P(F’) is the probability that a protein does not belong to

F, and equal to 1 – P(F). P(si|F) represents the probability of seeing an alignment with score si given that the hit protein has activity F; this is simply the density estimation value taken from the positive hit distribution for the probability profile of F. P(si|F’) is the probability of seeing an alignment with score si given that the hit protein does not have activity F, and is equal to the density estimation value taken from the negative hit distribution for the probability profile of F.

Thus, pi represents the likelihood that P belongs to F given its alignment to protein fi with score si. This gives us m likelihood scores: p1, p2, …, pm. Combining these probabilities, a single score is obtained that represents the overall likelihood that P is classified with activity F. The integrated likelihood score is calculated as

n Pp=11− − (2) combined∏( i ) i=1 and represents a combined score that expresses increased confidence when additional evidence is available. While formula (2) is relatively simple to implement and has been applied in other contexts such as the prediction of protein-protein interactions (von Mering, et al., 2005), it is 36 appreciated that an alternative and more elegant solution would be to merge (1) and (2) as previously described (Leontovich, et al., 2008). However, comparisons between these approaches revealed little significant difference. In many instances, an unknown enzyme will have multiple EC predictions, which can then be ranked according to the integrated likelihood score.

2.2.4 Five-fold cross-validation and ROC analysis

To compare the performance of DETECT with BLAST and PSI-BLAST, I applied a five-fold cross-validation approach to predict EC categories for proteins from Swiss-Prot. Proteins annotated with an EC category containing < 30 protein members were removed while the remaining 385,142 proteins were randomly assigned into five sets of equal size. For each set, training data was assembled from the other four data sets and used to generate density estimation distributions for all enzyme classes. Using these distributions, DETECT was applied to generate predictions where each EC number is associated with a probability confidence score. Separate BLAST and PSI-BLAST searches were performed for each protein in the set against the training set and the highest scoring match extracted. PSI-BLAST was run with five iterations and an E-value cutoff of 1. For each method, I defined true positives (TP) as Swiss-Prot annotated enzymes that matched the prediction with a score greater than the assigned cutoff; false positives (FP) as those proteins with incorrect enzyme predictions (including those with no enzyme annotation); true negatives (TN) as proteins neither predicted to have enzymatic activity nor annotated by Swiss-Prot as having enzyme activity; and false negatives (FN) as enzymes annotated by Swiss-Prot but not predicted to have any enzyme activity. Values were determined for a range of score cutoffs (BLAST E-value 0.1 to e-300; DETECT probability 0.1 to 1.0) and were used to generate Receiver Operating Characteristic (ROC) curves.

2.2.5 Prediction of malarial enzymes

Each automated method (DETECT, BLAST and PRIAM) was applied to the P. falciparum proteome to generate predictions that were compared to the well annotated database of Malaria Parasite Metabolic Pathways (MPMP) (Ginsburg, 2006) (release Dec 17 2008). For the 37

DETECT predictions, density distributions were constructed after the removal of Plasmodium proteins from Swiss-Prot. After removing partial EC numbers, annotations to EC numbers with < 30 proteins, and proteins that mapped to more than one EC number, 457 protein-EC mappings were obtained from MPMP. The query sequences were classified as either: (i) enzymes based on MPMP annotation or (ii) non-enzymes based on Gene Ontology (GO) evidence. The set of enzymes representing 457 unique proteins were annotated by MPMP with a single complete EC number containing at least 30 protein members. The set of non-enzymes represented 274 unique proteins with annotations that were not in MPMP, did not contain “ase”, and mapped to a GO experimental and/or author statement evidence code.

2.3 Results and Discussion

2.3.1 Assessing enzyme diversity

Previous sequence-based methods for genome annotations have relied on single score cutoffs to define enzyme classifications. As such these methods typically ignore the range of sequence variation characterising individual enzyme classes, which can make certain enzymes easier to discriminate on the basis of sequence similarity than others. To explore sequence diversity within different enzyme families, James Wasmuth (post-doc in Parkinson lab) performed a systematic set of global alignments for each enzyme against every other protein within the Swiss-Prot database (Boeckmann, et al., 2005). The Swiss-Prot database was chosen as it provides manually curated high quality protein annotations (Schnoes, et al., 2009). From Swiss- Prot I identified 127,478 proteins annotated with one of 2,277 complete EC categories. Of these, 1,285 (56%) were associated with 10 or fewer distinct protein sequences, while only 80 (4%) were represented by >400 (Figure 2-1). For each of these 127,478 enzymes, BLAST searches were initially performed against Swiss-Prot to identify potential sequence matches. Each match was globally aligned using the Needleman-Wunsch algorithm. Results were split into two groups: (i) aligned protein is annotated with the same EC number as query protein (“positive alignment”); or (ii) aligned protein is annotated with a different EC number than query protein, or has not been annotated with an EC number (“negative alignment”). For each EC category, similarity scores of the positive and negative alignments for each individual protein belonging to 38 that category are combined to yield two density distributions. To reduce potential bias caused by sampling relatively small datasets, density profiles were created only for those enzyme families which contain ≥ 30 distinct proteins (585 EC numbers represented by 115,407 proteins – Supplementary Table S1). The results of this analysis for fifty representative enzymes are shown in Figure 2-2. Across all enzymes I note a wide spectrum of sequence diversity suggesting that some enzymes are easier to discriminate than others. For example, for proteins belonging to the enzyme family, EC:2.7.2.3 (phosphoglycerate kinase), scores of alignments to other family members may be readily discriminated from the alignment scores of non-family members. Conversely, for proteins belonging to EC:2.7.11.1 (serine/threonine kinases), it is difficult to distinguish family members on the basis of their alignment scores. Interestingly, I found that for enzymes on both ends of the sequence diversity spectrum, one or two EC categories predominated the negative alignments for that enzyme. This could be problematic for enzymes considered difficult to distinguish based on alignment scores, and may be explained by the level of specificity present within the EC hierarchy where a new reaction must be assigned on the basis of a very small difference in sequence specificity. For instance, EC:1.1.1.37 has most of its negative alignments to EC:1.1.1.27; while both enzymes perform similar activities, they differ in sequence by only a few key catalytic residues. I found this was the case for many enzymes with low discrimination profiles. For enzymes with highly discriminatory profiles, however, the predominant negative EC category was typically associated with a completely different enzyme activity showing no similarity in function or substrates. The enzyme EC:3.3.1.1 (Adenosyl- homocysteinase), for instance, hits mostly EC:1.1.1.86 (Ketol-acid reductoisomerase) in the negative distribution. These observed differences in alignment score distributions may at least in part be attributable to the degree of substrate specificity associated with the enzyme. As an example, I note that EC:2.7.2.3 has a very specific function in the glycolysis pathway where binding of 3-phospho-D-glycerate leads to its phosphorylation. On the other hand the reaction involving EC:2.7.11.1 is thought to involve approximately 30% of all cellular proteins (Cohen, 2000). Within our set of 585 enzymes, 401 (69%) exhibit some degree of overlap between their positive and negative alignments (Figure 2-2). This suggests that homology-based methods used to predict enzyme function could be improved through accounting for such sequence similarity overlaps. Motivated by these considerations, in the following section I describe to the development of DETECT, a novel homology-based approach for automated enzyme annotation. 39

Figure 2-1 Number of proteins in each EC category as annotated in Swiss-Prot Of the 2,277 complete EC categories in Swiss-Prot, more than half (56%) represent categories with 10 or fewer proteins, while only a very small number (4%) represent categories that have greater than 400 proteins.

40

Figure 2-2 Boxplots show ranges of global sequence alignment scores for a representative set of 50 gold standard enzymes. Alignments scores were divided into enzymes from the same EC category (blue) and enzymes from different EC categories (red). Enzymes are ordered by the percent overlap between the interquartile ranges of the positive and negative hit scores. Boxplots close to the right side of the graph represent enzymes that are readily distinguishable on the basis of sequence alignment scores. In contrast, enzymes closer to the left-hand side represent enzymes that are more difficult to discriminate. The lower graphs represent density distributions of alignment scores for three enzymes representing the cross-spectrum of enzyme diversity. Again, colours indicate alignment scores to enzymes from the same EC category (blue) and different EC categories (red). The Table inset shows a summary of enzyme overlap in global alignment scores that match the same EC (positive dataset) or a different EC (negative dataset). Overlap is averaged between positive and negative datasets (i.e. 10% overlap means if score range was 500, then the overlap was 50). Numbers have been presented for enzymes that have been annotated for at least 30 proteins.

41

2.3.2 Density estimation tool for enzyme classification (DETECT)

In the previous section I noted that the distribution of alignment scores may be used to discriminate members of enzyme families. While score increases with the length of the alignment, it is not true that the length of the enzyme is correlated to its sequence variability (Figure 2-3). Consequently, it should be possible to exploit these distributions to classify potentially unknown proteins. For example, from Figure 2-2, I note that if a protein aligns to a member of EC:1.1.1.100 with a score between 200 and 600, there is a level of uncertainty as to whether the protein belongs to that enzyme class. Furthermore, as score increases, there is a corresponding increase in confidence associated with the classification. Hence, through comparing the distribution of alignment scores of an unknown protein to the set of proteins associated with an enzyme family in comparison to those not belonging to that family, it is possible to obtain a probability of the unknown protein belonging to that enzyme class. For example, given an alignment with a score of 450, an unknown protein is more likely to belong to 1.1.1.100 than to another EC number since the score distribution of positive alignments is denser than that of negative alignments. Conversely, for an alignment with a score of 300, an unknown protein is unlikely to belong to 1.1.1.100. Hence by incorporating information on the sequence diversity across the various enzyme families, it is possible to compute a probability score for the likely association of an unknown protein to each enzyme family.

42

Figure 2-3 Boxplots for the length of positive alignments corresponding to the fifty representative enzymes. (A) Boxplots for the length of positive alignments corresponding to the fifty representative enzymes as shown in Figure 2-2. (B) (same as Figure 2-2, but shown for comparison purposes). For the same group and ordering of enzymes, boxplots of the global alignment scores. Scores are divided by positive (blue) and negative (red) alignments. See Results and Discussion for more detail. (A & B) Length of the enzyme sequence does not correspond to variability in sequence similarity.

43

Based on these ideas, I have developed a novel algorithm, which I term DETECT (Density Estimation Tool for Enzyme ClassificaTion) which applies a Bayesian statistical framework to the enzyme density profiles generated above, to assign a probability score to an enzyme annotation. For a query protein, q, a BLAST search first retrieves all matches of q to Swiss-Prot. For each EC category within the set of matches, the Needleman-Wunsch algorithm was used to generate global alignment scores between q and each protein assigned that EC category. These scores are then used to derive a pair of density estimation values from the density distribution plot for that EC category generated earlier (Figure 2-4). The two values correspond to the density of positive and negative alignment scores previously generated for that EC category. Multiple pairs of values generated from all matches to proteins from the same EC category, are combined within a Bayesian framework, to generate an integrated likelihood score (ILS) that q belongs to that EC category. Briefly, for each pair of scores, a likelihood score is generated based on prior probabilities associated with that EC category. The final ILS is then obtained from the product of all likelihood scores. See Materials and Methods (section 2.2) for further details.

44

Figure 2-4 Using density estimation values to calculate a combined probability score for EC:1.1.1.100. The plot shows the density estimates of P(alignment score | positive alignment to EC:1.1.1.100) and P(alignment score | negative alignment to EC:1.1.1.100). Using the density estimation values of a given alignment score, we can obtain the probability the protein that has aligned to an EC:1.1.1.100 protein belongs to the same enzyme category. For brevity, ‘positive alignment to EC:1.1.1.00’ is abbreviated ‘ ’,, ‘negative alignment to EC:1.1.1.100’ is abbreviated ‘Graphic’, and ‘alignment score’ is abbreviated as ‘score’. In this formula,

To evaluate performance, I compared DETECT with BLAST and PSI-BLAST using five-fold cross-validation to predict enzyme activities for proteins in Swiss-Prot. Another method based on enzyme profiles, PRIAM (Claudel-Renard, et al., 2003), was also considered for the ROC analysis. However, due to the implementation of the algorithm, this was not possible since PRIAM profiles cannot be regenerated for five-fold cross-validation as they have been pre- trained on Swiss-Prot. Due to difficulties in differentiating between enzymes with multiple catalytic sites and those with a single catalytic site performing multiple types of reactions, proteins assigned multiple EC categories were not included. Applying a range of score cutoffs, 45

DETECT predictions were defined as the EC category with the highest ILS; BLAST and PSI- BLAST predictions were the highest scoring hit from the training set. Proteins predicted to have two or more EC numbers were not included in the performance analysis. DETECT and BLAST were also evaluated based on the entire list of predictions (not just top-scoring prediction). From the ROC curves in Figure 2-5, it is clear that the DETECT approach significantly outperforms the BLAST-based methods with greater sensitivity and specificity across all score thresholds. The gain in performance when the entire list of predictions is considered is almost negligible, illustrating the discriminatory power of DETECT and highlighting the usefulness of a ranked list of ILS’s. In contrast, BLAST (similar to PSI-BLAST) predict with much lower accuracy when the top-scoring hit is considered.

Figure 2-5 ROC curves for DETECT, BLAST and PSI-BLAST based on data generated from 5- fold cross-validation analyses.

46

2.3.3 Comparison to current prediction methods

One of the central aims of developing DETECT is to improve the automated prediction of enzyme function from genomic information. Here I am interested in DETECT’s ability to accurately predict metabolic enzymes from the annotated genome of P. falciparum. To date a number of resources have been constructed that describe genome scale reconstructions of enzyme complements. For P. falciparum, these include KEGG (Kanehisa, et al., 2007), BioCyc (Caspi, et al., 2007), BRENDA (Schomburg, et al., 2004) and the Malaria Parasite Metabolic Pathways (MPMP) (Ginsburg, 2006). Each dataset exhibits a wide range in coverage and accuracy, reflecting potential errors and biases in their respective curations (Figure 2-6). Due its high standard of curation and coverage, I chose MPMP as a suitable baseline to examine the performance of DETECT relative to the other resources, simple BLAST analyses and predictions derived from a profile based tool termed PRIAM (Claudel-Renard, et al., 2003) (Table 2-1 and Supplementary Tables S1-3). For this analysis, predictions of DETECT and BLAST are comparable due to the lack of a good negative training set for P. falciparum. Consequently the improved specificity observed with Swiss-Prot for DETECT is difficult to assess. Of 765 proteins annotated with EC numbers by MPMP, DETECT predicts activities for 622. Of these, 325 (52.3%) matched the MPMP/BRENDA annotations. However, within this set of 622 proteins are 198 which are annotated by MPMP/BRENDA with EC numbers not present within the set of 582 EC categories used by the DETECT algorithm (Supplementary Table S2). Ignoring these increased correct predictions to 76.7%. Furthermore, considering only predictions with an ILS in excess of 0.2 (a defined cutoff, providing a good compromise between specificity and sensitivity as derived with reference to Figure 2-5), DETECT correctly predicts 246 of 276 proteins (89.1%). BLAST unsurprisingly provided a higher number of matches - 352 (58.4%) compared to DETECT. However, PRIAM was found to correctly predict only 117 of the 424 proteins annotated with one of the 582 EC categories, while KEGG and BioCyc match 233 (55.0%) and 150 (35.4%) respectively. I also performed a comparison to a generic automatic annotation tool based on the transfer of Gene Ontology terms, Blast2GO (Conesa, 2005). Using MPMP as the gold standard, DETECT was found to have significantly greater accuracy and coverage than Blast2GO. Of the 301 Blast2GO annotations only 154 matched MPMP annotations (51.1%) compared to an accuracy of 76.7% for DETECT (data not shown). 47

Figure 2-6 Overlap in enzyme curations and predictions for P. falciparum as annotated by four database resources. KEGG (Kanehisa et al., 2007), BioCyc (Caspi et al., 2007), BRENDA (Schomburg et al., 2004) and the Malaria Parasite Metabolic Pathways (MPMP; Ginsburg, 2006), and two typically applied automated annotation methods: BLAST (top scoring match against all enzymes in Swiss-Prot using a cutoff of e-10) and PRIAM (again using a cutoff of e-10; Claudel-Renard et al., 2003).

Table 2-1 Matches of various resources and methods to MPMP annotations.

Predictions for 765 Correct predictions Predictions for 582 Correct predictions for 582 Additional predictions Dataset proteins annotated by (%) categories* categories* (%) (not in MPMP) MPMP DETECT 622 325 (52.3%) 424 325 (76.7%) 2451 DETECT (>0.2) 318 246 (77.4%) 276 246 (89.1%) 465 BLAST 603 352 (58.4%) 365 309 (84.7%) 2342 PRIAM 258 169 (65.5%) 147 117 (79.6%) 143 KEGG 470 389 (82.8%) 264 233 (88.3%) 201 BioCyc 285 263 (92.3%) 160 150 (93.8%) 21 * The 582 categories refer to those EC categories with more than 30 representatives in the Swiss-Prot dataset.

48

Of the 246 DETECT predictions with an ILS > 0.2, which match MPMP/BRENDA annotations, four were incorrectly predicted by BLAST (Table 2-2). These include MAL13P1.301 and PF11_0395, annotated by MPMP as guanylyl cyclases (GCs - EC:4.6.1.2). GCs are responsible for the synthesis of cyclic GMP, an important secondary messenger molecule. In eukaryotes, GCs are highly conserved and the presence of multiple isoforms is common. The two Plasmodium GCs, however, are significantly divergent from mammalian homologues in both structure and function (Baker and Kelly, 2004). In addition to containing two GC catalytic domains, the sequences of MAL13P1.301 and PF11_0395 appear to have regions similar to P- type ATPases (EC:3.6.3.-) (Figure 2-7). On the basis of only local similarity, applying BLAST would incorrectly assign these proteins to EC:3.6.3.1. On the other hand, from Figure 2-7, I note the distributions of MAL13P1.301 and PF11_0395 alignment scores to GCs produce significantly greater overlap to the positive distribution for EC:4.6.1.2 than for EC:3.6.3.1 resulting in higher ILS's (MAL13P1.301: ILS=0.27 and 0.0024 for EC:4.6.1.2 and EC:3.6.3.1 respectively; PF11_039: ILS=0.91 and 0.55 for EC:4.6.1.2 and EC:3.6.3.1, respectively). While a role for the P-type ATPase domain has yet to be found, a previous study has suggested the two isoforms of GC encode bifunctional enzymes in P. falciparum (Carucci, et al., 2000). The diverse nature of these sequences relative to their hosts, suggests that these two GCs may represent suitable targets for therapeutic intervention. The serine/threonine specific protein phosphatase (EC:3.1.3.16) activity of PF10_0320, was also correctly predicted by DETECT but not by BLAST. Such phosphatases have been shown to play an important role both in parasite growth and development (Ward, et al., 1994) and also in the invasion process (Rangachari, et al., 1986). The density estimation profiles resulted in the assignment of a high probability for EC:3.1.3.16 (ILS = 0.94) relative to the top BLAST prediction of EC:6.1.1.17 (ILS = 10-5). Conversely, there were ten examples where BLAST but not DETECT correctly predicts enzyme function (Table 2-2). Note that the majority of these proteins were correctly predicted by DETECT based on the top 2nd hit, further supporting the usefulness of having a ranked list of enzymes. Additionally, while only two of 246 correct DETECT predictions had less than five positive matches, I note that seven of the ten incorrect predictions were based on fewer than five positive matches. This suggests that the number of positive matches should be used as additional criteria to filter DETECT predictions. Intriguingly, the remaining three incorrect DETECT predictions were based on a large number of positive sequence matches to their respective EC 49 categories. This might suggest that the MPMP annotations are incorrect. For instance, PFB0505c is reported in PlasmoDB as a β-ketoacyl-ACP synthase III (fabH; EC:2.3.1.180) but annotated by MPMP as a β-ketoacyl-ACP synthase I (fabB; EC:2.3.1.41) and fatty-acid synthase (FASN; EC:2.3.1.85). Based on a large number of alignments to both EC:2.3.1.180 and EC:2.3.1.41, DETECT predicts the gene to have FabH activity, whereas BLAST predicts it to have FabB activity. Although FabH activity has been demonstrated for P. falciparum (Waller, et al., 1998), recent experimental studies focusing on a putative FabB/F enzyme indicates that FabF and not FabB activity is also present (Sharma, et al., 2009). Another potential misannotation involves PF13_0141, assigned L-lactate dehydrogenase activity (LDH; EC:1.1.1.27) by MPMP but predicted by DETECT to be malate dehydrogenase (MDH; EC:1.1.1.37). In Plasmodium, LDH is an essential enzyme required for the production of ATP (Gomez, et al., 1997), while MDH plays a crucial role in pathogenicity (Chan and Sim, 2004). Members of both families have been characterized and sequenced from a wide variety of organisms representative of all domains of life (Madern, 2002). While structurally distinct, MDH and LDH share significant sequence similarity. Furthermore, analysis of the MDH family identified two distinct groups of closely related enzymes (Goward and Nicholls, 1994), responsible for a bimodal distribution of within-family alignment scores (Figure 2-8). This is further supported by a group of LDH members that share greater sequence similarity to the MDH family than the rest of the LDH’s (Pazos, et al., 2006). Based on global alignments, PF13_0141 produces many significant matches to both LDH and MDH family members. Inspection of the resulting density distribution profiles indicates a better fit to the distribution of positive alignment scores for EC:1.1.1.37 than that of EC:1.1.1.27 (Figure 2-8), resulting in a higher ILS (0.97 versus 0.31). Interestingly, a recent study uncovered additional MDH activity for the Plasmodium LDH based on molecular function (Wiwanitkit, 2007), suggesting that perhaps PF13_0141 has broad specificity.

50

Table 2-2 Lists of proteins correctly annotated by either DETECT or BLAST. From the list of 276 proteins with DETECT ILS scores > 0.2 and members of the 582 EC categories with ≥ 30 members, 4 were correctly identified by DETECT but not BLAST (E-value < 1), and 10 were correctly identified by BLAST but not DETECT. Grey backgrounds indicate predictions matching MPMP annotations. Positives and Negatives indicate the number of proteins used to formulate the DETECT prediction. Seven of the incorrect DETECT predictions were based on matches to less than five proteins.

DETECT BLAST Protein ID ILS Positives Negatives MPMP annotation PlasmoDB annotation prediction prediction PFF1145c 2.7.10.2 0.97 134 744 2.7.10.2 2.7.11.25 phosphatidylinositol 4-kinase, putative PF10_0320 3.1.3.16 0.94 7 28 3.1.3.16 4.6.1.1 Lipoate-protein ligase A type 2 MAL13P1.301 4.6.1.2 0.27 49 109 4.6.1.2 3.6.3.1 m1-family aminopeptidase PF11_0395 4.6.1.2 0.91 49 92 4.6.1.2 3.6.3.1 M18 aspartyl aminopeptidase PF13_0141 1.1.1.37 0.97 292 268 1.1.1.27 1.1.1.27 L-lactate dehydrogenase MAL13P1.122 5.2.1.8 0.85 4 176 2.1.1.43 2.1.1.43 SET domain protein, putative β-ketoacyl-acyl carrier protein synthase III PFB0505c 2.3.1.180 0.30 163 201 2.3.1.41; 2.3.1.85 2.3.1.41 precursor, putative PF11_0242 6.5.1.1 0.42 1 1004 2.7.11.17 2.7.11.17 calcium-dependent protein kinase, putative PF11_0060 3.1.3.48 1.00 2 515 2.7.11.17 2.7.11.17 exported serine/threonine protein kinase PFD0740w 2.7.10.2 0.68 150 907 2.7.11.22 2.7.11.22 Ser/Thr protein kinase, putative PFF0275c 6.1.1.7 1.00 1 520 2.7.4.6 2.7.4.6 protein kinase, putative PFA0340w 6.1.1.7 1.00 1 137 2.7.7.60 2.7.7.60 casein kinase II, alpha subunit PFF0745c 5.2.1.8 0.93 1 62 3.1.-.-; 3.1.13.1 3.1.13.1 adenylate kinase PFL0475w 3.1.26.5 1.00 1 50 3.1.4.17 3.1.4.35 glycosyltransferase family 28 protein, putative

51

Figure 2-7 Re-annotation of MAL13P1.301 and PF11_0395 using DETECT. (A) Plasmodium protein MAL13P1.301 appears to be a bifunctional enzyme based on an alignment to a P-Type ATPase (ALA8_ARATH) and guanylyl cyclase (GUC2G_RAT). (B and C) Overlap of the density distribution of Plasmodium alignment scores to P-type ATPase (B) or guanylyl cyclase (C) enzymes compared to the density distribution of positive alignment scores for the respective enzyme class.

Figure 2-8 The distribution of scores for alignments of PF13_0141 against (A) EC:1.1.1.27 and (B) EC:1.1.1.37 proteins compared to the respective within-family alignments indicates that PF13_0141 is more likely to belong to EC:1.1.1.37 based on density estimation profiles. 52

2.3.4 Expanding the metabolome of P. falciparum

As noted above, BLAST proved to be very effective in confirming MPMP predictions. However, a major challenge in enzyme annotation is to be able to discriminate false positive predictions. By calculating an ILS that accounts for observed diversity in sequence similarity distributions, DETECT provides a useful method for the generation of ranked lists of enzyme predictions that can be combined with metabolic network reconstruction (Caspi, et al., 2007; Kanehisa, et al., 2007) methods to help prioritize candidate enzymes for more detailed studies. Applied to Plasmodium, DETECT predicts activities for an additional 2451 genes (Supplementary Table S3). However, using an ILS > 0.2, relying on predictions involving at least six positive sequence matches and ignoring the highly promiscuous EC categories - 2.7.11.1 (serine/threonine kinase - 47 annotations), 2.7.13.3 (histidine kinase - 735 annotations) and 2.7.7.6 (RNA polymerase - 70 annotations), results in a high confidence list of 88 predictions worthy of follow-up investigations.

The conserved Plasmodium protein of unknown function, PF11_0207, is predicted by DETECT to be EC:5.99.1.2 (DNA topoisomerase) with an ILS~0.48. Biochemical evidence has been found for topoisomerase (Topo) I and II in P. falciparum (Cheesman, et al., 1994; Tosh and Kilbey, 1995), and a number of putative annotations for Topo are reported in PlasmoDB. BLAST, on the other hand predicts PF11_0207 to be EC:6.5.1.1 (DNA ligase) with an E-value < 10-14. While many eukaryotes express three DNA ligase isoforms, P. falciparum expresses only one (Cheesman, et al., 1994). The gene for the sole ligase, DNA ligase I, has been identified, and no evidence suggests the presence of additional isoforms in the parasite. The density profile for PF11_0207 alignment scores against the positive score distribution for EC:5.99.1.2 shows a large degree of overlap, compared to a single alignment for EC:6.5.1.1, suggesting a more likely membership to the Topo family (Figure 2-9). The accurate identification of the genes encoding Topo may be important as studies have suggested that certain Topo II poisons may act selectively against the human malaria parasite (Chavalitshewinkoon, et al., 1993; Gamage, et al., 1994).

53

Figure 2-9 Density distributions of similarity scores for PF11_0207. Unknown protein PF11_0207 is predicted by BLAST to be EC 6.5.11. The density profile for ten PF11_0207 alignment scores against the positive alignments for EC:5.99.1.2 (A) shows a large degree of overlap, compared to a single alignment for EC:6.5.1.1 (B) suggesting a more likely membership to EC:5.99.1.2.

DETECT also serves to reinforce the predictions of existing tools when searching for genes with unidentified enzyme activity. For instance, PFI1475w, a merozoite surface protein, is predicted by BLAST to be EC:3.1.11.6 (Exonuclease VII) with only moderate statistical support (E-value < 0.001). With an ILS~0.56, DETECT also makes the same prediction, where analysis of the density profile for matching PFI1475w alignment scores against the positive score distribution of EC:3.1.11.6 indicates a good fit (Figure 2-10). The putative Exonuclease VII has not been biochemically characterized in P. falciparum, but genes for Exonuclease I and III have been identified, suggesting other exonucleases are yet to be identified.

54

Figure 2-10 Density distributions of similarity scores for PFI1475w. The BLAST prediction PFI1475w, EC:3.1.11.6 (Exonuclease VII), which has only moderate statistical support (E-value < 0.001) is reinforced by DETECT with the same prediction and ILS~0.56. The density profile for the alignment scores of PFI1475w matching to EC:3.1.11.6 members indicates a good fit, suggesting the possibility of a novel exonuclease in Plasmodium.

Even in instances of less than six positive matches, the ILS can produce informative predictions. For example two genes annotated with “unknown function” are PFD0485w and PFL1535w, which DETECT predicts to have glucose-6- isomerase (GPI) activity (EC:5.3.1.9) with high confidence scores of 0.90 and 0.91, respectively. One may dismiss these as spurious predictions given that the experimentally characterized GPI (Srivastava, et al., 1992) has been annotated in P. falciparum as PF14_0341. This enzyme, which catalyzes the reversible conversion of glucose-6-phosphate to fructose-6-phosphate, is necessary for glycolysis in P. falciparum. As evidenced by the absence of mitochondrial activity in the parasite (Mather, et al., 2007), the energy needs of P. falciparum are met entirely through the anaerobic consumption of exogenous glucose. While PF14_0341 is the only gene that has been annotated to have GPI activity, purification studies detected the presence of several isozymes in P. falciparum (Srivastava, et al., 1992). The appearance of multiple isozymes is a common feature for Plasmodial glycolytic enzymes (Maeda, et al., 2009) suggesting that PFD0485w and PFL1535w may represent isozymes for GPI. Global alignments of PFD0485w and PFL1535w to non- Plasmodium members of the GPI family have optimal scores based on their fit to the positive distribution for EC:5.3.1.9 (Figure 2-11). 55

Figure 2-11 Density distributions of similarity scores for PFD0485w and PFL1535w. Genes of unknown function, PFD0485w and PFL1535w, are predicted by DETECT to be EC:5.3.1.9 (glucose-6- phosphate isomerise) with high confidence ILS’s of 0.90 and 0.91, respectively. Global alignments of PFD0485w and PFL1535w to non-Plasmodium members of EC:5.3.1.9 have scores that fit optimally within the positive density distribution.

Finally, in cases where enzyme activity may be expected (e.g. through biochemical assays or hole filling algorithms), DETECT provides a powerful approach to identify and prioritize candidate proteins that may be at the extreme of sequence homology detection. The MPMP database identifies 428 enzyme categories, of which 404 have been annotated to one or more proteins. Of the remaining categories, DETECT predicts a protein for four although all with low ILS's: EC:1.1.1.25 (PF14_0424, ILS~10-7), EC:1.4.4.2 (PFL2095w, ILS~0.001), EC:2.7.1.23 (PFI0650c, ILS~0.001) and EC:3.6.1.1 (PF14_0541; PFL1700c; PF11_0190; and PF11_0202, ILS:1; 1; ~0.003; ~0.0006 respectively). PF14_0541 and PFL1700c are both annotated in PlasmoDB as putative V-type H+-translocating pyrophosphatases, confirming the DETECT predictions, but currently lack the appropriate EC designation. As a component of the shikimate pathway, not present in mammals EC:1.1.1.25 (shikimate dehydrogenase) is of particular interest from a drug target point of view. Evidence for the requirement of the shikimate pathway was originally provided by isolation of P. falciparum mutants requiring pABA for growth 56

(McConkey, et al., 1994). While the presence of shikimate dehydrogenase has been reported at very low levels in P. falciparum, no protein has previously been identified. Although possessing only a low ILS, PF14_0424, produces reasonable alignments to the active site of three shikimate dehydrogenase proteins found in bacteria (Figure 2-12). As such, PF14_0424 represents a suitable target for confirming shikimate dehydrogenase activity.

Figure 2-12 DETECT prediction for PF14_0424. Shikimate dehydrogenase (EC:1.1.1.25) is a missing enzyme in the druggable shikimate pathway for P. falciparum. DETECT makes a prediction to EC:1.1.1.35 for unknown protein PF14_0424, which produces reasonable alignments to the active site of three shikimate dehydrogenase proteins found in bacteria.

57

2.4 Concluding Remarks

By focusing on P. falciparum as a model organism, my results have shown that sequence diversity is an important factor in the accurate identification of enzymes. Moreover, through providing a ranked list of ILS’s, DETECT facilitates the prioritization of potential novel enzymes that has helped to guide species-specific metabolic reconstructions for divergent groups of parasitic organisms. Chapter 3 delineates the value of DETECT when combined appropriately with other automated approaches and curated resources for enzyme annotation towards the accurate metabolic reconstruction for 15 fully sequenced members of the Apicomplexa along with separate studies for four tapeworm species, and the phytopathogen Ophiostoma ulmi. I anticipate that DETECT will become a standard tool providing valuable predictions for many future reconstructions. Another future goal is to apply DETECT to the prediction of multi- functional enzymes. Current Swiss-Prot annotations indicate that there are over 4000 proteins that catalyze more than one reaction. A potential approach involves assessing all high-scoring predictions made by DETECT and examining the potential of multiple catalytic domains within the protein of interest.

58

Chapter 3 Reconstructing parasite metabolism: Biological insights

3 Reconstructing parasite metabolism: Biological insights

3.1 Introduction

Although genetic and biochemical evidence remains the gold standard for enzyme and pathway curation, experimental manipulation in apicomplexans is not trivial. Many have complex life cycles in multiple hosts restricting their genetic amenability. Furthermore, parasite material is often difficult to obtain in sufficient quantities for biochemical analysis or can be contaminated with host material. By contrast, a number of automated methods and resources have been developed that attempt to mine the wealth of apicomplexan sequence data. Unfortunately, for many classes of enzymes, high rates of sequence divergence have made the task of accurately annotating enzyme-encoding genes in apicomplexans challenging (as described in Chapter 2). Consequently, the value of these resources and tools lies in their ability to generate hypotheses that serve to focus subsequent experimental investigations.

This data chapter describes the development of an integrated pipeline for defining metabolism in parasites with the aims of gaining insight into the metabolic capabilities in a species-specific as well as phylum-wide context. The pipeline is applied for three sets of parasites: (i) the Apicomplexa, (ii) four tapeworm species and (iii) the phytopathogen Ophiostoma ulmi. Based on these reconstructions, integrated comparative genomic analyses were undertaken to highlight metabolic innovations and potential enzyme drug targets within each group. Druggable targets for the Apicomplexa were selected for experimental validation with preliminary functional assay and knockout experiments in Toxoplasma gondii further described.

59

3.2 Materials and Methods

3.2.1 Metabolic reconstruction for other organisms

The 31.5 Mb genome sequence of Ophiostoma ulmi was generated at the University of Toronto using paired-end whole genome shotgun sequencing (Illumina GAIIx) to 200x coverage. The corresponding gene model for O. ulmi (2012-02-29), containing 8,639 genes, was searched against the UniProt/Swiss-Prot protein database (v 58.0) using the following homology-based enzyme prediction tools: (i) DETECT (Hung, et al., 2010) (cutoff ILS > 0.2, at least 5 positive hits), (ii) BLAST (E-value > 1e-10), (iii) PRIAM (Claudel-Renard, et al., 2003) (E-value > 1e- 10), and (iv) ortholog mappings to Yeast based on OrthoMCL (Li, et al., 2003). No data for Ophiostoma was available from KEGG. The BRENDA resource (Barthelmes, et al., 2007) provided biochemical evidence for four enzymes. The final set of 783 enzymes from O. ulmi was obtained by integrating the datasets from BRENDA, DETECT, Yeast orthologs, and enzymes identified by both BLAST and PRIAM. See Supplementary Table S4 for gene-EC mappings and Supplementary Table S5 for corresponding evidences. Yeast and Arabidopsis thaliana EC numbers were obtained by combining species-specific datasets from BRENDA and BioCyc (YeastCyc and AraCyc, respectively).

Genomes for Echinoccocus sp. and Hymenolepis microstoma ranged in size from 114-141 Mb and were generated at the Wellcome Trust Sanger Institute using capillary, shotgun, 454 and Illumina platforms according to manufacturer’s instructions. To map the enzymes in E. multilocularis its 11,675 gene models (retrieved 2012-02-02), were investigated using the following homology based enzyme prediction tools against the UniProt database (release 2012- 02) unless specified: (i) DETECT (Hung, et al., 2010) (cutoff ILS > 0.2, at least 5 positive hits), (ii) BLASTP (E-value > 1e-10), (iii) PRIAM enzyme rel. 30-Nov-2010 (Claudel-Renard, et al., 2003) (E-value > 1e-10), and (iv) ortholog mappings to S. mansoni based on OrthoMCL (Li, et al., 2003). There were some discrepancies in the identified enzymes so EC mappings were consolidated based on predictions from all four automated methods. For genes with multiple EC mappings, only the EC with the highest number of predictions was assigned to the gene (for instance EmuJ_000342600.1 was predicted to have EC 3.6.3.10 by a single method and EC 3.6.3.9 by two methods, so to avoid ambiguity only EC 3.6.3.9 was assigned). Genes with a tie 60 between multiple ECs were not included (187 genes). For predictions that differed between two or more methods, DETECT predictions were ranked higher than BLAST and PRIAM, which are in turn ranked higher than OrthoMCL predictions. The final set of 522 enzymes for E. multilocularis was obtained by integrating the datasets from BRENDA, DETECT, S. mansoni orthologs, and enzymes identified by both BLAST and PRIAM (Supplementary Table S6). Hymenolepis microstoma EC numbers were obtained from the combined dataset of high- confidence DETECT predictions, BRENDA, enzymes predicted by both BLAST and PRIAM, and orthologs of E. multilocularis (based on a BLASTP hit with an E-value > 1e-10). Taenia solium EC numbers were obtained from the combined dataset of high confidence DETECT predictions together with enzymes predicted by both BLAST and PRIAM. mansoni (gene models retrieved 2012-04-26) EC numbers were obtained from the combined dataset of BRENDA enzymes, high confidence DETECT predictions and enzymes predicted by both BLAST and PRIAM. Mouse and human EC numbers were the combined results of HumanCyc and those obtained from BRENDA.

The 51.8 Mb Eimeria tenella (Houghton strain) was generated at the Wellcome Trust Sanger Institute using Sanger capillary and Illumina GAIIx sequencing. The corresponding gene model for E. tenella (2011-07), containing 8,786 genes, was searched against using the following homology-based enzyme prediction tools: (i) DETECT (Hung, et al., 2010) (cutoff ILS > 0.2, at least 5 positive hits), (ii) BLAST (E-value > 1e-10), (iii) PRIAM (Claudel-Renard, et al., 2003) (E-value > 1e-10), and (iv) ToxoDB orthologs. No data for Eimeria was available from KEGG or EuPathDB. To account for EuPathDB-specific annotations for highly conserved apicomplexan enzymes, the set of enzymes shared by both P. falciparum and T. gondii and producing a BLAST hit (E-value > 1e-10) to the E. tenella gene model were included as an additional dataset (based on PlasmoDB and ToxoDB gene-EC mappings). The BRENDA resource (Barthelmes, et al., 2007) provided biochemical evidence for 15 enzymes, with evidence for an additional 68 enzymes from the supplemental resource AMENDA. The final set of 571 enzymes for E. tenella was obtained by integrating the datasets from BRENDA, DETECT, T. gondii orthologs, apicomplexan-conserved enzymes and enzymes identified by both BLAST and PRIAM. See Supplementary Tables S7 and S8 for gene-EC mappings and evidences for each enzyme.

61

3.2.2 Biochemical assays in T. gondii

Pantothenate biosynthesis was selected for experimental validation based on comparative analyses that indicate the pathway is conserved across apicomplexans and absent in humans. The pathway operates sequentially through the reactions encoded by the genes panB, panE, and panC. Pantothenate synthetase (PS; EC 6.3.2.1), encoded by the panC gene, catalyzes adenosine triphosphate (ATP)-dependent condensation of D-pantoate and β-alanine to form pantothenate. The activity was assayed spectrophotometrically by coupling AMP production to the activities of myokinase, pyruvate kinase and lactate dehydrogenase as described by Pfleiderer et al. (Pfleiderer, et al., 1960). In this assay, the rate of pantothenate formation is proportional to the rate of NADH oxidation. Two molecules of NADH are oxidized for every molecule of pantothenate formed. In a final volume of 1 ml, the reaction mixture contained 100 µL Tris/HCl, 10 µL MgSO4, 50 µL ATP, 72 µL NADH, 2 µL phosphoenolpyruvate (PEP), 0.8 µL myokinase, 0.28 µL pyruvate kinase, 0.55 µL lactate dehydrogenase, 10 µL β-alanine, and 1 µL pantoate. 20 µL of T. gondii RH lysate was added with and without the addition of the reaction substrate, pantoate. The absorption change at 340 nm was monitored immediately for 30 min in order to determine background activity. The PS reaction was then initiated by addition of pantoate solution and the A340 was monitored for 30 min. Where β- alanine instead of pantoate was used to initiate the PS reaction, it was replaced by pantoate in the assay mix for background determination.

Ketopantoate hydroxymethyltransferase (KPHMT; EC 2.1.2.11), encoded by the panB gene, catalyzes the first step of the pathway by forming ketopantoate from α-ketoisovalerate. The activity was assayed spectrophotometrically by coupling the KPHMT-catalyzed formation of ketopantoate, from α -ketoisovalerate and N5,N10-methylenetetrahydrofolate, to ketopantoate reductase. The decrease in absorbance of NADPH at 340 nm was measured at 37 °C using a spectrophotometer.

Assays were performed in 100 µL Hepes (pH = 7.5), 4 µL MgSO4, 50 µL α-ketoisovalerate, 20 µL

NADPH, 25 µL formaldehyde, and 25 µL tetrahydrofolate. N5,N10-methylenetetrahydrofolate was prepared by mixing 25 mM formaldehyde and 25 mM tetrahydrofolate in the assay buffer and incubating for 5 min at 37 °C. A three-fold serial dilution was used to test varying concentrations of tachyzoite lysate mixture from Toxoplasma using 20 µL of E. coli lysate overexpressing a Mycobacterium tuberculosis panB as a positive control.

62

3.3 Results and Discussion

3.3.1 Integration of enzyme datasets

To define an accurate first draft of parasite metabolism, I first sought to exploit high confidence datasets based on a variety of automated (DETECT, BLAST, PRIAM), partially curated (BioCyc, KEGG, and EuPathDB), and experimental (BRENDA) datasets. Given the disparity among datasets (Figure 3-1), I developed a set of formal rules for integrating these datasets that would allow for a more meaningful reconstruction. Essential to this integration is the inclusion of all enzyme annotations that are supported through biochemical evidence or validated by an expert; these are typified by enzymes obtained through BRENDA and EuPathDB (which includes BioCyc along with manually curated datasets such as MPMP). Because of the limited coverage provided by these datasets, the inclusion of annotations derived from automated methods is appropriate for identifying candidates that can further extend the reconstruction or serve for pathway-hole filling purposes (Section 3.1.3). Using a rigorous score cutoff, I have shown that DETECT provides high quality enzyme predictions, albeit for a limited number (582) of enzyme classes (Hung, et al., 2010). These predictions are supplementary with enzyme annotations that are consistently predicted across multiple automated methods (i.e. BLAST, PRIAM, KEGG). Hence, a reasonable compromise between coverage and accuracy is the collation of enzyme annotations from BRENDA, EuPathDB and DETECT together with annotations that are consistent across all three of BLAST, KEGG and PRIAM (Figure 3-1). Although it is appreciated that this integration might not be perfect, it nevertheless provides a useful framework for the generation of draft metabolic reconstructions suitable for preliminary comparative analyses and manual refinement.

63

Figure 3-1 Enzyme complements of the Apicomplexa as annotated by a variety of resources. (a) Phylogenetic tree of apicomplexans with annotated genomes. The matching table contains the numbers of unique enzymes identified by various automated tools (BLAST, DETECT, PRIAM) and curated resources (BioCyc, EuPathDB, KEGG, BRENDA) for metabolic reconstruction. The final column lists the number of unique enzymes obtained by integrating all seven data sources and is based on the union of enzymes identified by (i) BRENDA, (ii) EuPathDB (includes BioCyc predictions), and (iii) DETECT, in addition to enzymes found by all three of BLAST, PRIAM, and KEGG. A blank (‘-‘) indicates a species for which annotations were not available from the corresponding resource. The tree was constructed by combining the phylogenies from (Perkins et al., 2007) and (Zhu et al., 2000). (b) Venn diagrams of enzyme datasets for C. parvum, P. falciparum, and T. gondii. Total numbers of unique enzymes identified by each dataset is noted in brackets (e.g. for P. falciparum, 481 enzymes were identified by either BRENDA or EuPathDB, whereas 175 enzymes were identified by all three of BLAST, PRIAM, and KEGG).

64

3.3.2 Diversity in apicomplexan metabolism

Mapping our integrated datasets of enzyme complements onto KEGG pathway maps reveals surprising differences entailing further investigation (Figure 3-2). Whereas entire pathways, such as de novo amino acid biosynthesis and those associated with the apicoplast, have been lost in all three species of Cryptosporidium, C. muris metabolism appears to possess components of the tricarboxylic acid (TCA) cycle that are absent from C. hominis and C. parvum (Figure 3-2). Genome data mining confirms that both C. parvum and C. hominis lack genes for nearly all TCA cycle enzymes (Abrahamsen, et al., 2004; Xu, et al., 2004). In contrast, electron microscopy studies reveal a detailed structure of the C. muris mitosome (a mitochondrion-derived organelle), with acetyl-coA being fully oxidized by the canonical TCA cycle (Keithly, et al., 2005; Mogi and Kita, 2010). Furthermore, similarities of the C. muris mitosome to the mitochondrion in Plasmodium and Toxoplasma, in which morphological studies demonstrate divergence from their host counterparts, might hint to a previously undiscovered drug target in C. muris (Keithly, et al., 2005; Mather, et al., 2007).

65

Figure 3-2 Heatmap showing the conservation of individual metabolic pathways for sequenced apicomplexans. Each row indicates an individual metabolic pathway grouped by their superclass membership (defined by KEGG). Each column indicates apicomplexan species (grouped by lineage). Coloured tiles indicate the level of conservation (percentage of enzymes detected) of each pathway within each species (see inset colour key on bottom right). The presence of enzymes is based on the integration of enzymes obtained from the union of BRENDA, EuPathDB, DETECT, in addition to those shared by all three of BLAST, PRIAM, and KEGG. Individual and groups of pathways of interest have been highlighted and are described in more detail in the text: (i) amino acid metabolism (pink box), (ii) the TCA cycle (blue box), and (iii) pathways for Vitamin B metabolism (green boxes).

Similar conservation patterns between species might also suggest the presence of previously undiscovered pathways. For instance, enzyme components required for vitamin B metabolism are conserved across species of Plasmodium, Toxoplasma, and Cryptosporidium although they are generally absent from Theileria (Figure 3-2). In particular, several vitamin B de novo synthesis pathways exist in the Apicomplexa that are absent from humans (hence their designation as essential vitamins in the latter), and therefore represent attractive drug targets. Included in these pathways is vitamin B9 (folate) biosynthesis, which is a proven inhibitory pathway, based on the drug pyrimethamine that targets the bifunctional DHFR-TS enzyme 66

(Escalante, et al., 2009). The pathway for vitamin B6 biosynthesis has been experimentally validated in Plasmodium and Toxoplasma (Gengenbacher, et al., 2006; Knockel, et al., 2007; Krungkrai, et al., 1989; Wrenger, et al., 2006), but evidence for vitamin B1 (thiamine) biosynthesis has only been found in Plasmodium and has yet to be established in Toxoplasma; the conservation of enzyme complements for thiamine biosynthesis in T. gondii suggests that further experimental investigations might uncover a previously undiscovered pathway in the parasite. Finally, vitamin B5 (pantothenate) biosynthesis is of particular interest since it has not yet been studied in any apicomplexan; the pathway has been selected for experimental validation in T. gondii with details described in Section 3.3.2.3.

Pathways display a spectrum of conservation, from those that are well-conserved across all lineages, such as carbohydrate and energy metabolism, to those that are diverse across and within the phylum, as illustrated by amino acid metabolism (Figure 3-2), where distinct patterns of auxotrophy have resulted from losses and gains of biosynthetic pathways due to horizontal gene transfer (Chaudhary and Roos, 2005). These observations are limited, however, given that KEGG pathways are relatively generic, integrating knowledge of transformations from several species. Consequently, resulting pathways are not organism-specific and might include enzymes that are not biologically relevant. For example, KEGG assigns only three enzymes to , so the presence of just two of these enzymes, implicated in other pathways, is clearly misleading for many of the Apicomplexa (Figure 3-2). Furthermore, enzymes that represent lineage specific adaptations might be excluded (Green and Karp, 2006) or the presence of a pathway might be misreported for a species. For example, purine nucleoside phosphorylase is a key component of the polyamine biosynthesis pathway; and while both Plasmodium and Toxoplasma have highly similar amino acid sequences for this enzyme, only the Plasmodium version leads to a functional pathway (Chaudhary, et al., 2006). This serves to illustrate the importance of additional biochemical data for accurate metabolic reconstruction in Apicomplexa.

67

3.3.2.1 Insights from metabolic networks for the Apicomplexa

To overcome the limitations of a pathway-centric approach, metabolic reconstructions can be analysed from a network perspective in which enzymes are linked if they share a common substrate (Figure 3-3). Metabolic networks have been shown to consist of a highly conserved, but nonetheless flexible, ‘core’ of enzymes involved in linking reactions across multiple pathways (Peregrin-Alvarez, et al., 2009). Consistent with these ideas, the apicomplexan network displays an equivalent core in with both highly conserved enzymes (of potential pan- apicomplexan therapeutic interest), together with those that are lineage-specific, suggesting that these parasites have evolved different strategies for performing similar core metabolic activities. Further differences between species are found at the periphery of the network and highlight the lineage-specificity of many apicomplexan pathways. For example, Toxoplasma and Cryptosporidium possess carbohydrate metabolic pathways apparently not present in Plasmodium, whereas the ability to synthesize heme (associated with the porphyrin and chlorophyll synthesis pathway) appears lost in Cryptosporidium. Unlike other apicomplexans, Toxoplasma can convert aspartate to lysine through the diaminopimelate pathway (Chaudhary and Roos, 2005), and this is consistent with the presence of enzymes making up the pathways for lysine biosynthesis and degradation. Such a representation of the metabolic network is a novel and comprehensive approach to viewing metabolism that reveals the complex nature of metabolic pathways in the Apicomplexa. An important application of these networks is their comparison to networks of other organisms such as the host, in which differences can provide information on host–parasite interactions (Forst, 2006), or compared to other nonparasitic organisms can uncover the gain or loss of enzymes and pathways offering clues to the evolutionary histories of these parasites (Nerima, et al., 2010).

68

Figure 3-3 An integrated view of metabolism in C. parvum, P. falciparum, and T. gondii. Species overlap for three apicomplexans is shown in the global metabolic network (obtained from KEGG) where individual enzymes (nodes) are connected through common metabolites (edges). The colour and size of nodes represent the coverage by the three selected species (see inset key). A number of pathways with connected enzymes have been highlighted in black circles and the conserved ‘core’ has been highlighted with a dotted black circle; pathways highlighted by red circles are mentioned in the text. Enzymes have been grouped according to pathway membership in KEGG. The presence of enzymes is based on the integration of enzymes obtained from the union of BRENDA, EuPathDB, DETECT, in addition to those shared by all three of BLAST, PRIAM, and KEGG. The network layout has been adapted from (Peregrin-Alvarez et al., 2009).

3.3.2.2 Metabolic insights into apicomplexan Eimeria tenella

The Eimeria spp. is a group parasites that, while belonging to the same phylum as T. gondii, Cryptosporidium spp., and malaria parasites (Plasmodium spp.), sets itself apart from other members in the phylum with its unique genome structure. The sequence of Eimeria is broken into large alternating blocks – an arrangement not seen in any other organism. Consequently, the sequencing of its genome was a particularly daunting and time-consuming task (Lim, et al., 2012). To increase the value of the recently sequenced E. tenella genome (Reid, et al., 2012) and to gain a better understanding of the metabolic capabilities of this parasite, I have performed a separate metabolic network analysis which I describe herein. 69

Eimeria, together with other apicomplexans including Toxoplasma and Neospora, belong to a special subclass known as the Coccidia; infection by these parasites is known as coccidia. Correspondingly, I have mapped the final integrated enzyme dataset for E. tenella to KEGG pathways against those for T. gondii and N. caninum (Figure 3-4). The network that results illustrates not only the general conservation of metabolism across the coccidia, but also highlights subtle species-specific pathways that may be associated with differences in host range (Figure 3-5). For instance, streptomycin and polyketide sugar biosynthesis may play a role in the parasite, given relatively higher numbers of enzymes in E. tenella. On the other hand, E. tenella possesses fewer enzymes for pathways such as lipoic acid metabolism along with valine, leucine, and isoleucine degradation relative to T. gondii and N. caninum, perhaps representing non- functional pathways in the parasite.

Figure 3-4 Overlap of integrated enzyme datasets for E. tenella, T. gondii, and N. caninum The total number of unique enzymes for E. tenella was obtained by integrating five different data sources, based on the union of enzymes identified by: (i) BRENDA and AMENDA, (ii) ToxoDB orthologs, (iii) apicomplexan-conserved enzymes (found in both T. gondii and P. falciparum) (iv) high confidence predictions from DETECT and (v) enzymes identified by both BLAST and PRIAM. Enzymes for T. gondii were obtained based on the integration of the datasets from BRENDA, ToxoDB, DETECT, and hits shared by BLAST, PRIAM, and KEGG. Enzymes for N. caninum were obtained similarly to T. gondii with the addition of enzymes identified through ToxoDB orthologs. 70

Figure 3-5 Heatmap showing the conservation of individual metabolic pathways for E. tenella, T. gondii, and N. caninum. Each row indicates an individual metabolic pathway grouped by their superclass membership (defined by KEGG). Coloured tiles indicate the level of conservation (percentage of enzymes detected) of each pathway within each species. For T. gondii, enzymes are based on the integration of BRENDA, ToxoDB, DETECT in addition to those shared by all three of BLAST, PRIAM, and KEGG. A similar integration is performed for identifying N. caninum enzymes, but with an additional dataset of the enzymes corresponding to T. gondii orthologs.

71

3.3.2.2.1 The mannitol cycle is essential for E. tenella

Interestingly, E. tenella possesses 49 enzymes that are absent in both Toxoplasma and Neospora, 12 of which appear are involved in pathways for carbohydrate metabolism. Of particular interest are three enzymes encoding the mannitol cycle (Figure 3-6): EC 1.1.1.17 (mannitol-1- dehydrogenase), EC 3.1.3.22 (mannitol-1-phosphatase, M1Pase), and EC 1.1.1.67 (mannitol dehydrogenase). Enzyme assays have confirmed that all three enzymes exist in Eimeria, and are essential for survival of the parasite (Liberator, et al., 1998; Michalski, et al., 1992; Schmatz, et al., 1989). Moreover, while M1Pase has been molecularly characterized in Eimeria (Liberator, et al., 1998) with the deduced amino acid sequence mapping to ETH_00027300, neither sequence similarity searches nor experimental studies have provided evidence for the presence of these enzymes in T. gondii and N. caninum. Current data suggests that mannitol is produced during oocyst formation in the chicken gut and accumulated as an energy reserve for sporulation (Schmatz, et al., 1989). Given that higher eukaryotic hosts are unable to synthesize or catabolise mannitol, the mannitol cycle in Eimeria is very evolutionarily unique and represents an attractive drug target for the control of coccidiosis.

Figure 3-6 The mannitol cycle in Eimeria tenella. The enzymes involved are mannitol-1-phosphate dehydrogenase (EC 1.1.1.17), mannitol-1-phosphatase (EC 3.1.3.22), mannitol dehydrogenase (EC 1.1.1.67), and hexokinase (EC 2.7.1.1). Fructose-6-phosphate enters the mannitol cycle from glycolysis via mannitol-1-phosphate dehydrogenase and is reduced to mannitol-1-phosphate, which in turn is dephosphorylated by mannitol-1-phosphatase to yield mannitol. Mannitol is then oxidized to fructose by mannitol dehydrogenase and is rephosphorylated to fructose-6-phosphate by hexokinase. 72

3.3.2.2.2 E. tenella possesses additional enzymes for polyketide sugar unit biosynthesis and streptomycin biosynthesis

Several pathways appear to have greater complements of enzymes in Eimeria compared to Toxoplasma and Neospora including streptomycin biosynthesis and polyketide sugar unit biosynthesis (Figure 3-5). It is worthy to note, however, that enzymes mapping to the streptomycin biosynthesis pathway found in all three species also play a role in other biochemical processes (Table 3-1). Hexokinase and phosphoglucomutase, for instance, participate in multiple essential pathways including glycolysis, galactose metabolism, starch and sucrose metabolism, and amino sugar and nucleotide sugar metabolism. Two other enzymes that were identified in all three species, inositol-phosphate phosphate and inositol 2-dehydrogenase, play key roles in inositol phosphate metabolism, as suggested by their names. Whether or not the two additional genes (ETH_00014095 and ETH_00014090) identified in E. tenella that appear to be missing from Toxoplasma and Neospora are enzymes involved in streptomycin biosynthesis is yet to be confirmed with biochemical evidence and is currently restricted to experiments done in Streptomyces spp. and E. coli. Interestingly, both genes, which have predicted enzyme activities based on BLAST and PRIAM, are also associated with the polyketide sugar unit biosynthesis pathway which when coupled with a EC:5.1.3.- or EC:1.1.1.- enzyme (multiple candidates found in E. tenella) represent a consecutive path of reactions. This suggests that perhaps these genes may encode functional enzymes of polyketide sugar unit biosynthesis, which appear to be acting immediately downstream of the amino sugar and nucleotide sugar metabolic pathway resulting in the production of dTDP-L-Rham, a precursor to novobiocin biosynthesis.

The vancomycin biosynthesis pathway, on the other hand, requires only a single enzyme, dTDP- glucose 4,6-dehydratase (EC 4.2.1.46), and is predicted to be encoded by ETH_00014095 by BLAST and PRIAM (with no orthologs from T. gondii or N. caninum). The presence of this enzyme may be an artefact of its participation in one or both of the polyketide sugar unit biosynthesis and streptomycin biosynthesis pathways (Table 3-1). ETH_00014095 is annotated as a hypothetical protein in EuPathDB (note that 6352/8786 or 72% of E. tenella proteins are annotated by EuPathDB as hypothetical proteins) and produces an identical sequence match to 73 the dTDP-glucose-4,6-dehydratase found in Streptococcus pneumonia (NP_357916.1). This enzyme has only been characterized in bacterial species with no experimental evidence available to support the presence of this gene in E. tenella.

Table 3-1 Enzymes for streptomycin biosynthesis pathway that have been identified in the coccidia.

Enzyme (EC) Associated pathways T. gondii N. caninum E. tenella Hexokinase (2.7.1.1) Glycolysis, fructose and TGME49_265450 NCLIV_039820 ETH_00009955 mannose metabolism, galactose metabolism, starch and sucrose metabolism, amino sugar and nucleotide metabolism, butirosin and neomycin biosynthesis Phosphoglucomutase Glycolysis, pentose TGME49_285980 NCLIV_014450 ETH_00002785* (5.4.2.2) phosphate pathway, galactose metabolism, purine metabolism, starch and sucrose metabolism, amino sugar and nucleotide sugar metabolism Inositol-phosphate Inositol phosphate TGME49_222970 NCLIV_006000 ETH_00012965 phosphatise (3.1.3.25) metabolism Inositol 2- Inositol phosphate TGME49_268870 NCLIV_037480 ETH_00027970 dehydrogenase metabolism (1.1.1.18) dTDP-glucose 4,6- Polyketide sugar unit None None ETH_00014095 dehydratase (4.2.1.46) biosynthesis, biosynthesis of vancomycin group antibiotics dTDP-4- Polyketide sugar unit None None ETH_00014090 dehydrorhamnose biosynthesis reductase (1.1.1.133) *ETH_00004545 was predicted by DETECT to have EC 5.4.2.2 as well, but has not been included since ToxoDB annotates only ETH_00002785 as a phosphoglucomutase.

74

3.3.2.2.3 Identification of genes for lipoic acid metabolism

T. gondii and N. caninum both possess the required complement of enzymes to carry out lipoic acid (LA) metabolism – these include the enzymes (i) lipoic acid synthetase (LipA, EC 2.8.1.8), (ii) lipoyl (octanoyl) transferase (LipB; EC:2.3.1.181), and (iii) lipoate-protein ligase A (LlpA; EC 2.7.7.63). LA is a highly conserved cofactor that is required for the function of several key enzyme complexes in oxidative and one-carbon metabolism. The metabolism of LA, represented by the LA de novo biosynthesis and LA salvage pathways, has been the most extensively studied in E. coli (Figure 3-7, and for a recent review, see (Spalding and Prigge, 2010). Functional complementation in E. coli using the corresponding genes from T. gondii and P. falciparum confirm the suspected role of these enzymes (Thomsen-Zieger, et al., 2003). While homology- based sequence searches did not produce any significant enzyme predictions, ToxoDB does list three genes annotated as hypothetical proteins as orthologs to the T. gondii-encoded LipA (Table 3-2). To determine if the remaining members of the pathways exist as unannotated genes, T. gondii protein sequences for LipB (TGME49_315640) and LlpA (TGME49_271820) were searched against the E. tenella genome. High-scoring sequence matches were identified, suggesting that E. tenella is capable of lipoic acid metabolism as well as highlighting deficiencies in the E. tenella gene model (Table 3-2). Importantly, all candidates were identified to have a conserved domain consistent with the catalytic domains of enzymes associated with the predicted activity.

Table 3-2 E. tenella genes for lipoic acid metabolism identified based on orthology to T. gondii

Enzyme EC T. gondii E. tenella E-value LipA, lipoic acid 2.8.1.8 TGME49_226400 ETH_00028530, 1.2E-11 synthetase ETH_00035460, 6.0E-55, ETH_00036950 1.0E-24 LipB, lipoyl 2.3.1.181 TGME49_315640 NODE_6489_length_3039_cov_9.373478 4.3E-21 (octanoyl) transferase LlpA, lipoate-protein 2.7.7.63 TGME49_271820 NODE_1560_length_1923_cov_16.313572, 1.4E-28, ligase A NODE_1417_length_4458_cov_16.122702 3.2E-21

75

Figure 3-7 Lipoic acid biosynthesis and salvage pathways as illustrated in KEGG. Enzymes that have been highlighted in red have been identified in one or more coccidian species.

76

3.3.2.3 The pantothenate biosynthesis pathway for drug targeting

By reconstructing metabolic pathways for 14 apicomplexan parasites, my comparative analyses have highlighted the conservation and potential druggability of pantothenate biosynthesis. Genes encoding the enzymes of this pathway have been identified bioinformatically in Toxoplasma gondii and have also been linked to expression during the metabolically active life stage of the parasite (Table 3-3). Given the conservation across the Apicomplexa (with the exception of Theileria spp., whose genomes are not well-annotated) and absence from humans (who are completely reliant on the uptake of pantothenate from their diet), enzymes required for de novo pantothenate biosynthesis are suitable drug targets that can be assessed in the model organism T. gondii. Using this information, we can assess the feasibility of targeting these enzymes through biochemical assays and knockout studies. The pathway, which is found in bacteria, fungi, plants and potentially apicomplexans, comprises three enzymes (Figure 3-8). The resulting product is pantothenate, which serves as the precursor for coenzyme A, an essential cofactor in numerous reactions required for sustaining life.

Table 3-3 Genes involved in pantothenate biosynthesis for target selection. ToxoDB gene Gene EC No. Enzyme Name Similarity Tachyzoite ID Name to human expression TGME49_057050 PanB 2.1.2.11 3-methyl-2-oxobutanoate None y hydroxymethyltransferase PanE 1.1.1.169 2-dehydropantoate 2- reductase TGME49_065870 PanC 6.3.2.1 Pantoate--beta-alanine None y ligase

77

Figure 3-8 Pantothenate biosynthesis. The proposed pathway and genes identified in Toxoplasma gondii. The pathway comprises three different enzymes: ketopantoate hydroxymethyltransferase (PanB), ketopantoate reductase (PanE), and pantothenate synthetase (PanC). The genes encoding each enzyme are shown in italics.

3.3.2.4 Assessing activity of pantothenate biosynthesis enzymes in Toxoplasma gondii

To validate the proposed candidates for further drug development, biochemical assays were undertaken to confirm the enzymatic activity of the target genes. Ketopantothenate hydroxymethyltransferase (KPHMT) activity was spectrophotometrically determined, as described in Materials and Methods. Figure 3-9 suggests low levels of KPHMT activity in Toxoplasma gondii tachyzoites when compared to the Mycoplasma tuberculosis-derived positive control for KPHMT overexpressed in E. coli. Experiments were limited to unfiltered tachyzoite lysate, which contains other enzymes and proteins that may have affected the absorbance readings of the assay. A more appropriate strategy for probing the activity of these enzymes in Toxoplasma would be the use of genetic approaches such as functional complementation and/or gene knockouts.

78

Figure 3-9 Spectrophotometric assay of PanB activity in Toxoplasma gondii. PanB activity is correlated with the decrease in absorbance at 340 nm when compared to background activity. The reaction mixture contained 100 µL Hepes (pH = 7.5), 4 µL MgSO4, 50 µL α-ketoisovalerate, 20 µL NADPH, 25 µL formaldehyde, and 25 µL tetrahydrofolate. A three-fold serial dilution was used to test varying concentrations of tachyzoite lysate mixture from Toxoplasma using 20 µL of E. coli lysate overexpressing a Mycobacterium tuberculosis PanB as a positive control.

3.3.2.5 Gene knockout experiments reveal misassembly of potentially bifunctional PanBE enzyme in Toxoplasma gondii

In Toxoplasma, genes are knocked out through double crossover homologous recombination in ΔKU80 parasites, a process that normally takes several months. In an effort to streamline this process from months to weeks, a single cross-over strategy in ΔKU80 parasites was proposed and tested. This modified approach takes advantage of endogenous gene-tagging through the use of ligation-independent cloning (LIC) (Huynh and Carruthers, 2009). Here, tagging occurs by inserting a LIC plasmid at the 3’ end of the gene of interest. However, to achieve a gene knockout, insertion of the LIC plasmid would occur at the start of the gene before the ATG start site of transcription (Figure 3-10). Pantothenate biosynthesis genes predicted in Toxoplasma were assessed for their suitability for gene knockout with the modified tagging protocol; preliminary investigations, however, revealed that TGME49_057050, predicted to encode activities for both PanB and PanE, is likely to be misassembled based on a comparison of the gene model to experimental datasets in ToxoDB (Figure 3-11). To resolve this discrepancy, I designed primers along the length of the gene to amplify overlapping fragments from both 79 gDNA and cDNA (Figure 3-12A). While gDNA produced PCR products consistent with the gene model (Figure 3-13, Table 3-4), this was not the case for that from cDNA (Figure 3-14, Table 3-5). To investigate, clean PCR products from cDNA were sequenced, assembled, and searched against the NCBI NR/NT database (Figure 3-12B). Comparisons revealed novel sequence in two of the cDNA fragments (not found in canonical PanB sequence), with one of these fragments containing entirely novel sequence not present in the T. gondii genome. Mapping the completed cDNA sequence of TGME49_057050 (done by Melissa Chiasson of the Grigg laboratory, Bethesda) confirms the gene is indeed a fusion protein. The revised and improved gene model for TGME49_057050, which was released subsequent to these experiments (Fig. 3-11E), confirms a misassembly but more importantly supports our sequencing results. Further functional experiments are required to prove or disprove the bifunctionality of the gene.

The gene knockout method based on the modified tagging protocol was tested with PanC, but led to spurious results revealing a critical design flaw in the system. Due to replication of the promoter region during homologous recombination, duplication of the endogenous copy of the gene occurs downstream of the insertion site. Alternative strategies have been explored and are currently being tested (pers. comm.).

Figure 3-10 Schematic illustration of the single-crossover gene insertion mechanism using a modified endogenous gene tagging approach. A pYFP.LIC.HXGPRT vector was generated that contained an upstream LIC cassette for cloning of the promoter region for the gene of interest fused in frame to YFP, which was followed by the HXGPRT cassette for positive selection of parasites in MPA plus xanthine.

80

Figure 3-11 Genomic landscape of TgPanBE showing the predicted gene model with tracks for various experimental datasets. (A) Gene model (highlighted in yellow) for TGME49_057050 with sequenced cDNA fragments in red (B) Intron splice junctions match align with the gene model, except for those that appear upstream of the transcriptional start site. (C) ChIP-chip data indicates a transcriptional start site further upstream of what is predicted by the gene model. (D) RNASeq reads for day 3 and day 4 time points from tachyzoites agree with (B) and (C) and indicate a start site that is further upstream than what is predicted by the gene model. (E) Revised and improve gene model for TGME49_057050 (August 2014) with the same experimental tracks as in (B) to (D).

81

Figure 3-12 Schematic of TgPanBE cDNA showing (A) location of primers and (B) assembly of sequencing reads based on PCR products from cDNA against PanBE gene model.

Figure 3-13 PCR amplification of pantothenate synthesis genes from gDNA. Lane 1: TgPanK, lane 2: the TgPanK target region for insertion, lane 3: the TgPanC target region for insertion, lane 4: TgPanC (optimized for qPCR), lane 5: region linking PanB domain and PanE domain on TgPanBE, lane 6: the PanE domain of TgPanBE, lane 7: PanB domain of TgPanBE (optimized for qPCR), lane 8: TgPanBE.

Figure 3-14 PCR amplification of TgPanBE from cDNA using RFLP analysis. Relative locations of primers is detailed in Figure 3-12. Lane 1: 1F-2R, lane 2: 1F-3R, lane 3: 1F-4R, lane 4: 2F-3R, lane 5: 2F- 4R, lane 6: 3F-2R, lane 7: 3F-4R, lane 8: 1F-1R, lane 9: 2F-2R, lane 10: 3F-3R, lane 11: 4F-4R. 82

Table 3-4 Band sizes of PCR products generated from TgPanBE gDNA.

Lane PCR Product Expected (bp) Actual (bp) 1 PanK gene 8552 - 2 PanK tagging 933 ~1000 3 PanC tagging 1085 ~1100 4 PanC for qPCR 399 ~400 5 PanBE RFLP (3F, 3R) 1024 ~1000 6 PanE domain (4F, 4R) 4246 ~4500 7 PanB for qPCR (2F, 2R) 1226 ~1300 8 PanB Fwd, Pan E Rvs (2F, 4R) 6179 ~6200

Table 3-5 Band sizes of PCR products generated from TgPanBE cDNA.

Lane PCR Product Expected (bp) Actual (bp) 1 1F, 2R** < 4257 ~2800 2 1F, 3R < 4709 * 3 1F, 4R < 5576 - 4 2F, 3R** 822 ~1000 5 2F, 4R 1689 * 6 3F, 2R 171 ~170 7 3F, 4R 1490 * 8 1F, 1R** 736 ~900 9 2F, 2R** 370 ~400 10 3F, 3R 623 ~750 11 4F, 4R 1013 *

*Possible non-specific primer binding

**PCR products send out for sequencing

83

3.3.3 Reduced metabolic diversity in tapeworms

Echinococcosis and cysticercosis are amongst the most severe parasitic diseases in humans, caused by the proliferation of larval tapeworms in vital organs (Garcia, et al., 2007). Larval tapeworms can persist asymptomatically in a human host for decades (W.H.O., 2012), eventually causing a spectrum of debilitating pathologies and death (Garcia, et al., 2007). Tapeworm infections are highly prevalent worldwide (Budke, et al., 2009) and there is a need to develop new and specific medicines to help eradicate these parasites and the diseases they cause.

Towards understanding how these parasites live and grow, genome sequencing has been performed for four species including Echinococcus multilocularis, E. granulosus, Hymenolepis microstoma, and Taenia solium (Tsai, et al., 2013). To this end, I have reconstructed the metabolisms for these four parasites to gain greater insight into the metabolic capabilities of these parasites as well as to identify potential enzyme candidates for much needed therapeutic intervention.

The tapeworm metabolism in some instances differs greatly from that of other animals (and pathways available in KEGG), so flatworm-specific pathways were retrieved from literature, and reconstructed using the genes from the tapeworm and S. mansoni genomes. To date, the metabolic pathways of tapeworms have not been well-characterized; no data for Echinoccocus was available from KEGG, and the BRENDA resource (Barthelmes, et al., 2007) only provided biochemical evidence for 5 enzymes. While a recent survey of expressed sequence tags uncovered a role for fermentative pathways in the germinal layer of the hydatid cyst of E. granulosus (Parkinson, et al., 2012), the current study constitutes the first comprehensive, genome-wide metabolic pathway reconstruction for a tapeworm. In total, 1,032 genes (representing ~10% of the genome) could be assigned EC numbers. Ratios of enzyme complements for E. multilocularis, S. mansoni, H. microstoma, T. solium, mouse and human were calculated based on KEGG pathways, and grouped according to superclass (Figure 3-15). E. multilocularis shares 91% (410/453) of its enzymes with S. mansoni (Figure 3-16). Mapping E. multilocularis enzymes to KEGG metabolic pathways provides a unique perspective on pathways that are conserved across most species, but may have been lost in the parasite (Figure 84

3-15 and Figure 3-17). For instance, pathways for thiamine metabolism and the de novo synthesis of phenylalanine, tyrosine and tryptophan which appear to present in schistosomes are noticeably lacking in the metabolic network for E. multilocularis. With the complete loss of a gut, it is presumed that tapeworms acquire many essential components from the host and therefore many enzymes are lost compared to free-living counterparts (Olson, et al., 2012). This is consistent with our metabolic reconstruction, which shows a high proportion of missing pathways (e.g. fatty acid biosynthesis, vitamin B6 and glutamate metabolism).

Figure 3-15 Heatmap on the conservation of individual metabolic pathways. Heatmap showing the conservation of individual metabolic pathways for E. multilocularis (Em), T. solium (Ts), H. microstoma (Hm), S. mansoni (Sm) compared to human (Hs) and mouse (Mm). Each row indicates an individual metabolic pathway grouped by their superclass membership (defined by KEGG). Coloured tiles indicate the level of conservation (percentage of enzymes detected) of each pathway within each species. KEGG pathways which insufficient evidence (containing only one enzyme) in E. multilocularis have been removed.

85

Figure 3-16 Overlap of enzyme datasets for E. multilocularis and comparison of integrated datasets for H. microstoma and S. mansoni. (A) Total numbers of unique enzymes identified by each dataset is noted in brackets (e.g. 380 enzymes were identified by both BLAST and PRIAM, while 342 enzymes were identified by either EC-annotated S. mansoni orthologs or from BRENDA). (B) Overlap of integrated enzyme datasets for E. multilocularis, S. mansoni, and H. microstoma. The total number of unique enzymes was obtained by integrating five different data sources (also shown in (A)), based on the union of enzymes identified by: (i) BRENDA, (ii) EC-annotated S. mansoni orthologs, (iii) DETECT and (iv) enzymes found in both BLAST and PRIAM. Enzymes for S. mansoni (07-2012 gene model) were predicted based on integrating the datasets from BRENDA, DETECT, and hits shared by both BLAST intersect PRIAM. Enzymes for H. microstoma were predicted using the same method as S. mansoni with additional BLAST hits to E. multilocularis annotations.

86

Figure 3-17 Metabolic network of E. multilocularis. Node size corresponds to number of evidences, with background nodes (smallest) included for context of KEGG pathways. Enzyme nodes predicted to be in E. multilocularis are outlined in black. Groups of enzymes circled in black appear to be conserved in the parasite while enzymes circled in red appear to be absent in the parasite. Note that the malate dismutation pathway is not represented in KEGG.

3.3.3.1 Energy metabolism in Echinococcus and other parasitic helminths

Experimental evidence has shown that E. multilocularis, E. granulosus, and H. microstoma are all capable of two forms anaerobic respiration via malate dismutation (Fioravanti, 1982; Matsumoto, et al., 2008; Xiao, et al., 1993) and lactate dehydrogenase (LDH; EC 1.1.1.27) (Burke, et al., 1972; Sarciron, et al., 1990; Xiao, et al., 1995). Our results support that tapeworms are capable both of aerobic and anaerobic metabolism through oxidative phosphorylation as well as malate dismutation and the production of lactate, and RNA-Seq data revealed no significant differences in the expression of genes involved in either aerobic and anaerobic respiration pathways. I found evidence in E. multilocularis, T. solium, and H. microstoma of several enzymes involved in the anaerobic malate dismutation pathway, a unique 87 respiratory chain shown to exist in several parasitic helminths (roundworms and tapeworms) and fully characterised in Fasciola hepatica (Tielens, 1994; Tielens, et al., 1992; van Grinsven, et al., 2009). This specialised electron-transport chain is coupled to ATP formation via oxidative phosphorylation without the use of oxygen as an electron transporter (pathway outlined in Figure 3-18). The key components are fumarate reductase (FRD) and rhodoquinine (RQ) (serves as the terminal electron carrier). Importantly, FRD represents the reverse activity (anaerobic conditions) of complex II of the ETC succinate dehydrogenase (SDH) performs the forward activity during aerobic conditions.

In conditions where oxygen levels are low for the parasite, a likely scenario for carbohydrate consumption might be analogous to that of F. hepatica. First, degradation of glycogen into phosphoenolpyruvate occurs via the glycolytic pathway, which is then converted to malate via oxaloacetate. Once transported into the mitochondria, malate dismutation occurs (malate is simultaneously reduced and oxidized into two different products). By way of the oxidation pathway, malic enzyme (EC 1.1.1.40; EmuJ_000645800, EmuJ_000695200, and EmuJ_001145700 predicted by BLAST and PRIAM) and pyruvate dehydrogenase (EC 1.2.4.1; EmuJ_000956200 and EmuJ_000590700 predicted by multiple methods) converts some of the malate into acetyl-CoA. The remaining malate is converted to fumarate and reduced to succinate via fumarase (FUM; EC 4.2.1.2; EmuJ_000256500 predicted by BLAST and DETECT) and FRD, respectively. To date, no protein sequence has been definitively identified for FRD in parasitic helminths, and its divergence from bacterial ancestors is described below. The parasiticidal effects of mitochondrial fumarate reductase inhibitors have been demonstrated in vitro, suggesting that the malate dismutation pathway (including rhodoquinone synthesis) would be an effective target for the development of novel therapeutics in tapeworms (Matsumoto, et al., 2008).

To better understand which of these two pathways are more dominant and during which stages of the parasite’s life cycle the pathways take place, RNA-Seq data (Tsai, et al., 2013) was examined for E. multilocularis, E. granulosus, and H. microstoma. To assess the relative up- or downregulation of these genes, RNA-Seq data was also examined for two well-established housekeeping enzymes, glyceraldehydes 3-phosphate dehydrogenase (GAPDH; EC 1.2.1.12) and 88 pyruvate dehydrogenase (PDH; EC 1.2.4.1) also a participant in the TCA cycle, representative of aerobic respiration. Note that since a full enzyme complement was not available for E. granulosus, a simple BLASTP search was done using E. multilocularis sequences to obtain putative orthologs for the enzymes of interest. See Supplementary Table S6 for complete set of mappings.

In E. multilocularis, GAPDH has the highest expression (at least 23-fold greater than all other enzymes) amongst the genes examined. In particular, FUM, LDH, and PDH exhibited similar expression during all stages, except with the pregravid parasites where LDH was downregulated. In E. granulosus, only protoscolex FPKM values were available, but indicated that GAPDH, while still having the highest expression amongst the three enzymes, was only 3-fold greater than LDH which in turn had 4-fold greater expression than both PDH and FUM. RNA-Seq data for H. microstoma displayed similar patterns of expression as E. multilocularis, with FPKM values constant in larval stages, but reaching a minimum in adult parasites. These data show that two distinct forms of anaerobic respiration play an important role in the metabolism of larvae tapeworm parasites. Moreover, the differential expression of these enzymes across the three species suggests that these tapeworms need to employ anaerobic respiration pathways to differing extents. Lactate dehydrogenase, which degrades carbohydrates into lactate, is a classical adaptation to anaerobic metabolism. While this strategy is used by all helminths to some extent, the majority additionally utilize malate dismutation, whereby carbohydrates are degraded to phosphoenolpyruvate (PEP) and subsequently reduced to malate (Mehlhorn, 2008); evidence for the importance of this unique respiratory chain is lacking. Unlike lactate, which is excreted from the cytosol, malate is transported into the mitochondria for further degradation. The fate of malate takes place in one of two ways: (i) oxidation to acetate or (ii) reduction to succinate, which is often further metabolized and excreted as propionate (Figure 3-18). For malate dismutation to maintain redox balance, twice as much propionate must be produced as acetate. RNA-Seq data for lactate dehydrogenase and malate dismutation genes suggest that these two types of anaerobic respiration are complementary, whereby one pathway is predominant over the other depending on the parasite and its life stage (Table 3-6).

89

Figure 3-18 Proposed schematic overview of oxidative phosphorylation and malate dismutation in tapeworms. In malate dismutation, the anaerobic fermentation variant operative in most adult helminths, fumarate (rather than oxygen) acts as a terminal electron acceptor of the electron transport chain. Aerobic (black arrows) and anaerobic (red arrows) pathways are shown. Transport of electrons is represented by dashed arrows, with electron carriers coloured in blue. End product metabolites are shown in boxes and multi-step reactions have been represented as two step arrows. Genes predicted for key enzymes involved in anaerobic respiration for E. multilocularis are shown in grey boxes. Abbreviations: AcCoA, acetyl- CoA; CITR, citrate; FM, fumarase; FRD, fumarate reductase; FUM, fumarate; LDH, lactate dehydrogenase; MAL, malate; MDH, malate dehydrogenase; ME, malic enzyme; OXAC, oxaloacetate; PDH, pyruvate dehydrogenase; PEP, phosphenolpyruvate; PK, pyruvate kinase; PYR, pyruvate; SDH, succinate dehydrogenase; SUCC, succinate.

90

Table 3-6 Maximal RNA-Seq expression for E. multilocularis (Em), E. granulosus (Eg), and H. microstoma (Hm) genes mapping to malate dismutation.

gene_id abbr EC average_maximal_FPKM Em EmW_000422600 1194.87 Eg EgG_000422600 FR 1.3.1.6 336.065 Hm HmN_000447800 1572.7 Em EmW_000256500 304.466 Eg EgG_000256500 FUM 4.2.1.2 203.333 Hm HmN_000372800 1111.13 Em EmW_000608500 365.43522 Eg EgG_000660800 LDH 1.1.1.27 214.96418 Hm HmN_000478500 1325.8615 Em EmW_000956200 460.1185 Eg EgG_000956200 PDH 1.2.4.1 241.699 Hm HmN_000382600 2210.82 Em EmW_000254600 3884.2024 Eg EgG_000254600 GAPDH 1.2.1.12 961.3325067 Hm HmN_000374100 4565.1365 fumarase (FUM), fumarate reductase (FR); lactate dehydrogenase (LDH); the TCA cycle (aerobic respiration), pyruvate dehydrogenase (PDH); and housekeeping enzyme, glyceraldehyde-3-phosphate dehydrogenase (GAPDH)

3.3.3.2 Amino acid metabolism

The reduced capability of synthesizing amino acids in the fluke S. mansoni (Berriman, et al., 2009) is further reduced in tapeworms. For instance, host-derived arginine is a major source of synthesized proline in flukes (Mehlhorn, 2008), but enzymes essential for interconversion from arginine and de novo biosynthesis are missing from E. multilocularis, and direct import of proline has been shown in Echinococcus protoscoleces (Jeffs and Arme, 1987). Likewise, serine appears to be imported in Echinococcus, with the concurrent loss of enzymes involved in its biosynthetic pathway (Figure 3-15). Tapeworms also appear to have lost many enzymes in the molybdopterin biosynthesis pathway, which allows molybdenum to be utilized as an enzymatic cofactor. Molybdopterin is most widely used in bacteria, but was hitherto believed to be present in all eukaryotes too (Schwarz and Mendel, 2006). 91

3.3.4 Metabolic insights into phytopathogen Ophiostoma ulmi

The phytopathogen, Ophiostoma ulmi, is the causative agent of the first incident of one of the most destructive plant diseases, Dutch elm disease (DED), resulting in tremendous economic impacts on the global forestry and horticultural industries. Unfortunately, very few resources are directed to the control of the disease because the molecular basis for O. ulmi pathogenicity is still not understood (Gagne, et al., 2001; Massoumi Alamouti, et al., 2007; Temple, et al., 2006). Thus, genome sequencing of this plant parasite species provides the opportunity for comparative analysis to understand the basis for their pathogenicity and gain further insights into parasitism in general (Khoshraftar, et al., 2013). To this end, I have reconstructed and analysed the metabolic pathways for the first draft of the genome sequence and annotation of O. ulmi.

3.3.4.1 Overview of metabolism in O. ulmi

Reconstruction of the metabolic network for Ophiostoma ulmi was achieved by integrating several automated datasets together with ortholog mappings to Saccharomyces cerevisae (Yeast) (Figure 3-19A). In total 1,378 genes (representing 16% of the genome) map to some enzymatic activity based on EC number annotation. This number aligns well with the Yeast consensus metabolic reconstruction, consisting of 832 genes (representing 13% of its genome) (Herrgard, et al., 2008). O. ulmi shares 79% (615/783) of its enzymes with Yeast (Figure 3-19B).

92

Figure 3-19 Overlap of enzyme datasets for Ophiostoma ulmi and comparison to Yeast. (A) Total numbers of unique enzymes identified by each dataset is noted in brackets (e.g. 668 enzymes were identified by both BLAST and PRIAM, while 325 enzymes were identified by Yeast orthologs). (B) Overlap of integrated enzyme datasets for O. ulmi and Yeast. The total number of unique enzymes was obtained by integrating four different data sources (also shown in (A)), based on the union of enzymes identified by: (i) EC-annotated Yeast orthologs, (ii) DETECT and (iii) enzymes found in both BLAST and PRIAM.

Mapping O. ulmi enzymes to KEGG metabolic pathways provides a unique perspective to pathways that are conserved amongst fungi (represented by Yeast) that may have been lost in the organism, in addition to highlighting pathways unique to O. ulmi that may play a role in causing disease in its host the elm tree (represented by A. thaliana) (Figure 3-20). For instance, pathways for fatty acid biosynthesis and the metabolism of D-arginine and D-ornithine have noticeably fewer enzymes in O. ulmi compared to Yeast. Other pathways such as anthocyanin biosynthesis and photosynthesis are plant-specific and as expected appear to be missing from the fungal species. Of interest are a number of pathways have noticeably greater proportions of enzymes in O. ulmi compared to Yeast including glycosphingolipid biosynthesis (ganglio series), glycan degradation (N-glycans, gangliosides), lipoic acid metabolism, and drug metabolism of enzymes other than cytochrome P450. These activities, particularly those related to glycan degradation, are likely to play an important role in the success and survival of O. ulmi as a phytopathogen.

93

Figure 3-20 Heatmap showing the conservation of individual metabolic pathways for Ophiostoma ulmi compared to Saccharomyces cerevisiae (Yeast) and Arabidopsis thaliana. Each row indicates an individual metabolic pathway grouped by their superclass membership (defined by KEGG). Coloured tiles indicate the level of conservation (percentage of enzymes detected) of each pathway within each species. Enzyme datasets for Yeast and A. thaliana are based on EC annotations from BioCyc (YeastCyc and AraCyc) combined with those obtained from BRENDA.

In general, core essential pathways such as those related to amino acid, carbohydrate, energy, and nucleotide metabolism are highly conserved across both fungi and plants (Figure 3-20). Interestingly, a recent study by Oliveira et al. (Oliveira, et al., 2012) showed that elm trees inoculated with O. novo-ulmi (more virulent version of O. ulmi) had significantly reduced contents of glucose, fructose, starch and sucrose, suggesting that carbohydrate metabolism pathways are important to the pathogenicity of the . Moreover, functional categorization of an EST library for O. novo-ulmi revealed that the majority of EST sequences associated with metabolism had the greatest representation in carbohydrate metabolism (Hintz, et al., 2011). These results suggest that, while metabolic reconstruction predicts O. ulmi has similar enzyme 94 complements for a number of pathways such as those involved in carbohydrate metabolism to Yeast and A. thaliana, expression profiles are an essential component to assessing the functionality of specific pathways. In addition, the close phylogenetic relationship to O. novo- ulmi might also hint that the pathogenic role of O. ulmi is at least partially a result of decreasing carbohydrates, consequently reducing the efficiency of photosynthesis, and eventually leading to plant senescence.

3.3.4.2 Cell-wall degradation enzymes as potential drug targets

For example, endopolygalacturonase (ePG) (EC 3.2.1.15) has been identified in O. ulmi (DETECT prediction: 8095_g), and is involved in cell wall degradation. ePG belongs to the polygalacturonase (PG) family of enzymes that catalyze the hydrolysis of pectin compounds which comprise 30% of the primary cell wall in plants (Juge, 2006). Previous studies have implicated PGs as virulence factors in other phytopathogens including Botrytis cinerea (ten Have, et al., 1998) and Alternaria citri (Isshiki, et al., 2001) where the enzyme could assist host invasion, tissue destruction and similar processes associated with plant disease. A recent study assessing the role of ePG in O. ulmi, however, suggests that the enzyme functions as a parasitic fitness factor as opposed to a virulence factor, given that targeted disruption of the gene led to a reduction in pectin-degrading activity and not a lethal phenotype (Temple, et al., 2009). Other pectinase enzymes such as pectin methylesterase (EC 3.1.1.11; BLAST and PRIAM prediction: 5432_g) and pectinase (EC 3.2.1.67; BLAST and PRIAM prediction: 4052_g) present in O. ulmi likely act in concert to contribute to successful invasion of the host. The production of PG enzymes is particularly important for the success and survival of O. ulmi as it is a wound that enters the host directly through a pre-existing wound and therefore lacks specialized penetration structures (De Lorenzo and Ferrari, 2002). Moreover, the role as a minor virulence factor is a possibility and when combined with other virulence candidates, ePG represents a potential target for the control of Dutch elm disease.

95

3.4 Concluding Remarks

Herein, I have described the development and application of an integrated pipeline for defining metabolism in parasites with the aims of gaining insight into the metabolic capabilities at various taxonomic levels. My comparative analysis of the reconstructed metabolic pathways for 14 apicomplexan species highlight a spectrum of conservation, from those that are well-conserved across all lineages, such as carbohydrate metabolism and energy metabolism, to those that are diverse across and within the phylum, as illustrated by auxotrophies in amino acid metabolism. Consistent with these findings, the apicomplexan metabolic network displays a highly conserved but nonetheless flexible core of enzymes together with those that are lineage-specific, suggesting these parasites have evolved different strategies for performing similar core metabolic activities.

Delving more deeply into the metabolism of Eimeria tenella, comparisons against cousin species Toxoplasma gondii and Neospora caninum reveal subtle species-specific pathways that may be associated with differences in host range, but importantly confirm the presence of mannitol cycle enzymes, that due to their absence from the host (and other Apicomplexa), represent attractive drug targets for coccidiosis. Genes for lipoic acid metabolism, which were putatively identified based on homology to known counterparts in T. gondii and Neospora, highlights deficiencies in the E. tenella gene model that, with further curation and improved gene model prediction, will enable for a more accurate reconstruction and hence assessment of E. tenella’s metabolic capabilities.

Comparative analysis of the reconstructed metabolisms for four tapeworm species highlight an extensive reduction in overall metabolic capacity compared with other animals, and have an increased ability to absorb nutrients from the host. Nevertheless, they are capable of most core metabolic processes with their main energy source, carbohydrates, being catabolised by two complementary anaerobic pathways (malate dismutation and lactate fermentation). The parasitical effects of mitochondrial fumarate reductase inhibitors have been demonstrated in vitro, suggesting that the malate dismutation pathway would be an effective target for the development of novel therapies.

96

My reconstruction of Ophiostoma ulmi metabolic pathways highlight specific enzymes that may play a role in virulence for the phytopathogen, particularly those involved in glycan degradation. Endopolygalacturonase (ePG) along with other pectinase enzymes involved in cell wall degradation were identified in the metabolome and likely contribute to the successful invasion of the host. Significantly, these enzymes when combined with other virulence factors represent potential targets for the control of Dutch elm disease.

To fully appreciate the themes and variations I have alluded to from my comparative analyses, further comparisons must be carried out with other organisms such as the host, to provide information on host-parasite interactions, or other non-parasitic organisms, to uncover the gain or loss of enzymes and pathways offering clues to the evolutionary histories of these parasites. Importantly, I have shown that the combination of functional genomics and bioinformatics provides a powerful means for metabolic reconstruction, and refinement of these protocols can be aided through complementary biochemical investigations. My preliminary experiments in the model apicomplexan, T. gondii, suggest that enzymes in the pantothenate biosynthesis pathway represent potential drug targets and, importantly, establish the in vivo relevance of these targets. Further experiments such as inhibition of the enzyme are necessary to ensure these enzyme candidates are amenable for intervention. Additional candidates that are supported through these investigations promise a wealth of biological insights into the function and evolution of apicomplexan metabolism and are expected to lay the foundation for future drug discovery initiatives.

97

Chapter 4 Reconstructing parasite metabolism: Applications

4 Reconstructing parasite metabolism: Applications

4.1 Introduction

As established in Chapter 3, the identification of enzyme complements enables for the derivation of draft metabolic networks by mapping enzymes to their corresponding enzymes, aided through resources such as KEGG. Reconstructions can be analyzed, such as I have done, by applying bioinformatics comparative approaches to interrogate multi-species relationships; these analyses have (i) helped to crystallize our current knowledge of parasite metabolism, (ii) provided insights into the metabolic capabilities of parasites and (iii) prioritized lists of enzyme drug targets for therapeutic intervention. Importantly, however, these reconstructions serve as the foundation for building more detailed genome-scale models through the implementation of constraint-based approaches and incorporation of other data types, with the ultimate goal of achieving a complete understanding of the organism’s metabolism.

Due to the complex, often non-intuitive relationships between enzymes and pathways, such studies require the use of computational tools that provide global views of the organization and operation of the network. Currently, one of the more successful approaches involves flux balance analysis (FBA), in which flux of metabolites through the network are computed to allow for the identification of key enzymes and pathways that process nutrients imported into the cell to the final metabolites required for growth (Lee, et al., 2006; Raman and Chandra, 2009).

Biology has also become an increasingly data-rich field, so determining how to organize, sort, interrelate, and contextualize currently available high-throughput datasets is a challenge. As well, it is not uncommon for reconstruction efforts to provide high-quality estimates of cellular parameters such as growth yield, specific fluxes, and reaction reversibility, where such theoretical values are often used for hypothesis building or validation in biological studies (Hiratsuka, et al., 2008; Lee, et al., 2007). By serving as a framework on which other data types can be overlaid, the metabolic reconstruction has served as a powerful tool for contextualizing 98 high-throughput and predicted datasets. For example, overlaying microarray data onto the metabolic network may foster insight into metabolic hotspots or pathways that are significantly altered under certain conditions (Usaite, et al., 2006).

Extensive stage-specific transcriptional data have been acquired for apicomplexan parasites revealing functions of novel genes and providing insights into complex biological processes of the parasite. dbEST (Boguski, et al., 1993) and EuPathDB (Aurrecoechea, et al., 2007) host the largest collection of expression sequence tags (EST) for the Apicomplexa. Serial analysis of gene expression (SAGE) projects have also been carried out for both P. falciparum and T. gondii (Gunasekera, et al., 2007; Patankar, et al., 2001; Radke, et al., 2005). Microarray expression data is also available for several Plasmodium species (P. falciparum, P. berghei, P. vivax, P. yoelii) and T. gondii (Bozdech, et al., 2003; Bozdech, et al., 2008; Hall, et al., 2005; Kidgell, et al., 2006; Tarun, et al., 2008). At this time microarray data are missing for Theileria parasites, however collections of ESTs from different T. annulata life cycle stages have been sequenced (Pain, et al., 2005) and in the case of T. parva, an alternative but powerful technique called Massively Parallel Signature Sequencing (MPSS) was used to gain insight into parasite gene expression profiles (Bishop, et al., 2005); recently, the transcriptome of B. bovis was analysed to identify virulence determinants (Mesplet, et al., 2011). With the recent advent of ultra high- throughput sequencing technologies, a handful of RNA-Seq datasets have been generated with several more underway. Currently, there are published studies for P. falciparum (Bartfai, et al., 2010; Otto, et al., 2010), and T. gondii and N. caninum (Reid, et al., 2012) examining genome- wide expression patterns using RNA-seq.

Towards gaining a more complete understanding of apicomplexan metabolism, this data chapter describes (i) the application of FBA to the P. falciparum genome producing a model that integrates thermodynamics, gene expression, and evolutionary data; and (ii) the incorporation of high-throughput gene expression datasets for analyzing the metabolisms of P. falciparum and two coccidians, Toxoplasma and Neospora.

99

4.2 Materials and Methods

4.2.1 Metabolic reconstruction for iMPMP420

The metabolic reconstruction was based on information obtained from the Malaria Parasite Metabolic Pathways (MPMP) web resource. A detailed description can be found on its website and works by Ginsburg (Ginsburg, 2006; Ginsburg, 2009). Farhan Raja was responsible for producing the model, iMPMP420, including its curation and flux balance analysis. Briefly, this reconstruction was compiled using various literature sources and represents metabolic physiology of intraerythrocytic P. falciparum. All enzymes were checked for associated gene annotations in PlasmoDB, and special care was taken to avoid inferring the existence of entire pathways observed in other unicellular eukaryotic organisms based on the evidence of a few enzymes. The pathways shown in MPMP were represented with KEGG reaction and compound identifiers. Generally, the reactions indicated by MPMP maps were used in the model. However, by systematically representing each reaction displayed in the maps with reactions from the KEGG database, potential annotation errors in MPMP were identified. Network completeness was investigated using Flux Balance Analysis in an iterative process. Gaps/inconsistencies in the network, such as the inability to produce biomass components, were reconciled using additional reactions assigned with KEGG identifiers where possible. Intercompartmental and extracellular transport reactions were added in order to provide the necessary metabolite transport. Enzyme cofactor usage regarding NAD/NADPH was taken from MPMP when available as it was found to match Plasmodium-specific data presented in the BRENDA database.

4.2.2 Correlation of thermodynamics data with gene expression

In order to correlate thermodynamics data with gene expression data, reactions were mapped to EC numbers, which were subsequently mapped to genes. Of the 875 reactions in the model, 733 mapped to an EC number, of which 292 reactions (171 unique EC numbers) (33% of the model) corresponded to thermodynamics data. Microarray data was obtained for the complete transcriptome of the intraerythrocytic developmental cycle (IDC) of three strains of P. falciparum (3D7, HB3, and Dd2) at a one-hour time-scale resolution (Llinas, et al., 2006). 100

Expression data was log-transformed and mapped to 198 unique EC numbers. From the set of 292 reactions with an EC number and values for minimum and maximum Gibbs free energy, 182 reactions (21% of the model) could be mapped to gene expression data (Derisi et al. dataset). In total, 114 EC numbers were associated with both expression and thermodynamics data (see Figure 4-3). Expression profiles were hierarchically clustered using uncentered Pearson correlation as a measure of similarity.

4.2.3 Evolutionary analysis

Raw sequencing reads (obtained from the Sanger Institute website – www.sanger.ac.uk) were aligned to the P. falciparum 3D7 genomic sequence (PlasmoDB v9.2) using BWA (Li and Durbin, 2009). SAMtools (Li, et al., 2009) was applied to the resulting alignments to generate variant files, which were annotated and filtered using VCFtools (Danecek, et al., 2011). High- quality SNPs were called based on the criteria of: (i) quality score of at least 30, (ii) maximum depth of 2*average depth at variant site, (iii) minimum depth of max{5, 1/10*(maximum depth)}, and (iv) no variants within three base pairs of each other. Consensus sequences for each assembly were obtained based on top scoring BLASTN hits (E-value cutoff of 1e-50) to the CDS sequences for 3D7. A custom Perl script was written to produce CDS versions of the assembled genes and to map each sequence to the corresponding 3D7 gene id. MUSCLE (version 3.8.31) (Edgar, 2004) was used to generate multiple sequence alignments of each gene as input into the Datamonkey server (Delport, et al., 2010) for dN, dS calculation. In the HyPhy package via datamonkey.org, the FUBAR module was run to estimate dN and dS values at every codon for each gene alignment. Sites were deemed to be under positive or negative selection based on a posterior probability cutoff of 0.9. To avoid values of infinity, for instances where dS = 0, ω values were calculated with a dS = 1e-10. All statistics were performed in R (v 2.14.2) (http://www.R-project.org). Mann-Whitney tests were performed for each group of enzymes (based on MPMP) to examine whether the dN/dS distribution of the genes in that group differed from the dN/dS distribution of all other groups of enzymes.

101

4.2.4 Expression analysis

Raw expression values obtained from NCBI GEO for the Bahl et al. dataset (Bahl, et al., 2010) and from the DeRisi lab website for the Bozdech et al. dataset (Bozdech, et al., 2003) were RMA normalized. RNA-Seq data from Reid et al. (Reid, et al., 2012) and Bártfai et al. (Bartfai, et al., 2010) were provided as RPKM values and transformed logarithmically to the range [0,16] using the MayDay SeaSight program (Battke and Nieselt, 2011). Both datasets were quantile normalized to reconcile the different distributions of the data. In Cluster3, data was adjusted to a [-3,3] scale by centering arrays (mean) and hierarchically clustered (centroid uncentered correlation). Expression for a given enzyme family (EC number) was calculated by taking the mean expression for all genes mapping to that particular enzyme family. Hierarchical clustering was applied to the dataset using Pearson correlation as the distance metric.

4.3 Results and Discussion

4.3.1 Reconstruction of iMPMP420 and comparison to other models

We generated a metabolic reconstruction consisting of the chemical reactions that transport and interconvert metabolites within P. falciparum 3D7. This network construction, termed iMPMP420 (Raja, 2010), is based on MPMP (Ginsburg, 2006), the most reliable source of metabolic information for P. falciparum (Ginsburg, 2009). A comparison of the general properties of iMPMP420 to other two previously published P. falciparum models is outlined in Table 4-1. The major aspects of expansion of iMPMP420 over previous P. falciparum network reconstructions come under the following categories: (i) increased reconstruction accuracy and (ii) incorporation of reaction thermodynamics and dN/dS data for a more stringent model.

102

Table 4-1 Properties of iMPMP420, Huthmacher et al., and Plata et al. models. iMPMP420 Huthmacher Plata this study et al., 2010 et al., 2010 Genes 420 579 366 Compartments 5 6 4 Reactions 510 998 1001 Cytoplasm 395 503 Mitochondria 43 49 Apicoplast 45 105 Food vacuole 9 Endoplasmic reticulum 18 Transport reactions 82 377 233 Exchange reactions 117 111 Metabolites 459 1622 616 Cytoplasmic 335 537 Mitochondrial 83 Apicoplast 135 Extracellular 60 159

Since iMPMP420 is based on the manually curated database MPMP, only enzymes that belong to a validated pathway or have biochemical information are included in the model. Previous models include predictions from less-curated resources such as BioCyc and KEGG, which are not necessarily relevant since many of these predictions are simply homology-based without additional evidence (Hung and Parkinson, 2011). Flux balance analysis (FBA), which has been applied to iMPMP420 and models by Huthmacher et al. and Plata et al., is a mathematical approach for analyzing the flow of metabolites through a metabolic network where the number of alternative optimal solutions scales exponentially with the size of the network; consequently, the erroneous inclusion of additional reactions limits the applicability and usefulness of the model (Mahadevan and Schilling, 2003).

To further improve the model, constraints based on the laws of thermodynamics can be applied for the determination of feasible ranges for the rates of biochemical reactions and concentrations 103 of metabolites (Boghigian, et al., 2010; Henry, et al., 2007; Soh and Hatzimanikatis, 2010). Importantly, thermodynamics data helps to improve the curation of the model by providing information about the reaction reversibility that might otherwise be represented incorrectly based on what is stated in the primary literature or assignments made through general heuristic rules (Henry, et al., 2007). For genome scale models like iMPMP420, thermodynamics data is scarce and available only for a small fraction of reactions; fortunately, the Group Contribution method (Jankowski, et al., 2008), which is a technique to estimate and predict thermodynamic and other properties from molecular structures, provides a means for estimating the Gibbs free energy change of nearly every reaction. Without thermodynamic analysis, the range of feasible Gibbs energy of reactions tends to be larger, resulting in less precise classification of reactions (Kummel, et al., 2006). Thus, incorporation of this data to form iMPMP420 increases the stringency of the model and enables for more accurate in vivo predictions.

In addition to improving the curation of the model, I am interested in examining how the thermodynamics of reactions may have impacted the organization of the parasite’s metabolic network. Previous studies have examined the effect of thermodynamics from a regulation perspective, for example, where in a recent study analysing the genome-scale metabolic model for Geobacter sulfurreducens, thermodynamic flux analysis was able to identify reaction candidates for transcriptional regulation (Garg, et al., 2010). A comparison of free energy ranges with corresponding gene expression fold changes confirmed the model predictions are consistent with experimental data. In another study, thermodynamics analysis of metabolome data enabled for the calculation of feasible flux distributions, leading to systems-level insights into the compartmentalized flux physiology of Saccaromyces cerevisiae (Jol, et al., 2012). Here I show that the incorporation of thermodynamics data into iMPMP420 not only clarifies the picture of how pathways operate in P. falciparum, but also highlights specific reactions representing candidates for drug targeting.

104

4.1.1 Thermodynamics variability analysis of P. falciparum metabolism

TVA (also known as thermodynamics-based flux analysis) provides a framework for a genome- scale thermodynamics analysis of metabolism by enforcing constraints related to the thermodynamic feasibility of reactions in the network in addition to the mass balance constraints typically used in metabolic flux analysis (Henry, et al., 2007). By maximizing and minimizing the standard free energy change of each reaction in the metabolic network, TVA estimates the range of free energy change for that reaction under optimal conditions. Here, I have used TVA to examine how thermodynamics has influenced the metabolism of P. falciparum.

In order to establish the thermodynamic constraints required for TVA, knowledge of ∆G° (standard free energy) of the reactions in the model must be estimated or measured experimentally. E. coli, having one of the most well-characterized genome-scale metabolic models, has experimental data for only a small fraction of its reactions (Henry, et al., 2006); fortunately, for most organisms, including P. falciparum, thermodynamic properties can be estimated using the group contribution method (Jankowski, et al., 2008). The group contribution method was applied (Mahadevan Lab, University of Toronto) to calculate the ∆G° for 159 reactions (or 300 reactions if forward and reverse reactions are considered distinct) in iMPMP420. Reactions were classified either as (i) “thermodynamic bottlenecks”, which are reactions defined to be operating at equilibrium (maximum and minimum ∆G near zero) and expected to be largely influenced by perturbations in the concentration of the reactants or products, or (ii) “candidates for regulation”, which represent reactions where the range of free energy change is completely negative (∆Gmin, ∆Gmax < 0) or completely positive (∆Gmin, ∆Gmax > 0), and most likely insensitive to perturbations in metabolite concentrations and therefore subject to regulation. The latter group of reactions are particularly important since these have the potential to act as regulatory control points for their respective pathways (Kummel, et al., 2006). For many biosynthetic pathways, regulation occurs at the first reaction step of a linear pathway (Umbarger, 1978); indeed, out of the 14 reactions that were identified in iMPMP420 to be candidates for regulation, nearly half of these reactions catalyze the first reaction in the linear portion of its respective biosynthetic pathway (Table 4-2).

105

Table 4-2 Maximum and minimum free energy change values for regulation candidates and their associated pathway(s).

Reaction EC ∆Gmin ∆Gmax Pathway R02251 2.3.1.20 -832 -671 PE / PS metabolism, Utilization of phospholipids R00851r 2.3.1.15 -504 -343 Fatty acid synthesis, Utilization of phospholipids R01800r 2.7.8.8 -307 -152 PE / PS metabolism R03416r 3.1.1.5 -322 -171 PC metabolism, PE / PS metabolism, Utilization of phospholipids R02239r 3.1.3.4 -280 -130 Utilization of phospholipids R00329r 3.6.1.6 -649 -490 Purine metabolism R02055r 4.1.1.65 -156 -44 PE / PS metabolism Rxxps1 6.2.1.3 -276 -120 PE / PS metabolism, Utilization of phospholipids, Sphingomyelin and ceramide metabolism R02251r 2.3.1.20 671 832 PE / PS metabolism, Utilization of phospholipids R02241r 2.3.1.51 671 832 Fatty acid synthesis, PE / PS metabolism, Utilization of phospholipids R02240 2.7.1.107 130 280 Utilization of phospholipids R00330 2.7.4.6 490 649 Dolicol metabolism R01801 2.7.8.5 152 307 PE / PS metabolism R01315 3.1.1.4 66 221 PC metabolism, PE / PS metabolism, Utilization of phospholipids R02053 3.1.1.4 117 267 PC metabolism, PE and PS metabolism, Utilization of phospholipids R03435 3.1.4.11 171 322 Inositol phosphate metabolism Rxxps1r 6.2.1.3 84 313 PE / PS metabolism, Utilization of phospholipids, Sphingomyelin and ceramide metabolism PC = phosphatidylcholine; PE = phosphatidylethanolamine; PS = phosphatidylserine

4.1.1.1 Candidates for regulation in phospholipid biosynthesis pathways

Interestingly, the majority of reactions that represent candidates for regulation are involved in pathways for phospholipid biosynthesis (7 out of 14) (Table 4-2), a metabolic process that is absent from normal mature human erythrocytes (Van Deenen and de Gier, 1975). For example, phosphatidylserine decarboxylase (PSD; EC 4.1.1.65) plays a unique role in the biosynthesis of phosphatidylethanolamine (PE), representing one of the major phospholipids found in malaria 106 parasites. During the stage of asexual intraerythrocytic proliferation, the production of parasite membranes is critical and accomplished through massive de novo synthesis of phospholipids, mostly in the form of PE and phosphatidylcholine (PC) (Vial and Ancelin, 1992). P. falciparum has two routes via which PE can be produced, namely the (i) serine decarboxylase-CDP- ethanolamine pathway, in which PE is synthesized directly from ethanolamine or (ii) the phosphatidylserine (PS) decarboxylation pathway in which PSD acts to convert PS into PE. As the PE pool derived from de novo biosynthesized PS can be rapidly methylated to PC, this pathway would represent an important route of PC synthesis from the serine precursor (Elabbadi, et al., 1997). The production of PC can also occur through the de novo biosynthesis pathway beginning with the action of choline kinase (CK), which, interestingly enough was recently found to be potently inhibited by several chemical compounds (Crowther, et al., 2011). Moreover, the PfPSD gene product is known to exclusively localize to the endoplasmic reticulum, unlike other organisms where PSD is found in the inner mitochondrial membrane space (Baunaure, et al., 2004). While other enzymes required for the highly up-regulated PE and PS biosynthesis pathways were not among the 14 reactions operating far from equilibrium, thermodynamics does reiterate the importance of such enzymes to malaria parasite replication and highlights the possibility for their future drug development.

4.1.1.2 Correlation of thermodynamics data with gene expression – an unresponsive genome

Previous studies have demonstrated that reactions operating far from equilibrium are capable of imposing flux control and are more likely to be regulated by the cell (Wang, et al., 2004). A recent study by Garg et al. implemented TVA using the metabolic network of Geobacter sulfurreducens, resulting in the identification of reactions that are subject to regulatory control consistent with experimental data (Garg, et al., 2010). To determine whether such “flux control points” exist in malaria parasites, I examined expression profiles across 48 individual 1-h time points from the blood stage of P. falciparum (Derisi lab) in the context of thermodynamics data for reactions in iMPMP420. This dataset provides a detailed and representative perspective of the changes in gene transcript levels that occur during the complete asexual intraerythrocytic developmental cycle (IDC) of the parasite. During this particular lifecycle stage of P. 107 falciparum, which is responsible for all the clinical malaria symptoms and complications, the human red blood cell provides a relatively stable environment compared with environments where free-living unicellular organisms reside.

In evaluating the range of gene expression fold changes for regulation candidates and bottleneck reactions, there is the appearance that genes corresponding to predicted regulatory candidates have a broader range of fold change relative to the thermodynamic bottleneck reactions which have the narrowest range of gene expression (Figure 4-1); however, a two sample t-test assuming equal variance indicates a p-value of 0.977, suggesting no statistical difference between the average fold change expression of the two types of reactions. Furthermore, hierarchical clustering of the expression profiles for reactions in iMPMP420 reveals a continuous cascade of transcription for the majority of genes concurrent with the idea that genes follow a “just-in-time” manufacturing process whereby induction occurs only once when it is required (Bozdech, et al., 2003). Not surprisingly, maximal expression peaks during the metabolically active trophozoite stage for the majority of these genes, with only a small number of genes such as the house-keeping enzyme glucose-6-phosphate dehydrogenase (EC 1.1.1.49) displaying a relatively constant profile (Figure 4-2). In model organisms such as E. coli and yeast, reactions operating as thermodynamic bottlenecks tend to have limited potential for regulation as these reactions are very sensitive to minor perturbations in metabolite concentrations (Henry, et al., 2007). Our analyses, however, support a fundamentally different method for gene expression in malaria parasites, whereby metabolism appears to be “hard-wired” and non-responsive to specific metabolic perturbations (Bozdech, et al., 2003; Deitsch, et al., 2007) (Figure 4-3). While some studies have detected changes in transcription upon various challenges such as drug pressure and host body temperature, the changes observed were of low amplitude and could be explained by cell cycle arrest or sexual differentiation (Hu, et al., 2010; Natalang, et al., 2008; Oakley, et al., 2007; Tamez, et al., 2008). For instance, during continual drug treatment with a lethal antifolate for 24 h, malaria parasites showed very little deviation in their transcriptome, even as the parasites died from treatment (Ganesan, et al., 2008). These observations are consistent with studies suggesting that P. falciparum may have a limited capacity to mount directed transcriptional responses based on a marked paucity of specific transcription factors in the genome (Coulson, et al., 2004) and experiments that were unable to detect significant 108 alterations in transcript levels following environmental perturbations (Ganesan, et al., 2008; Gunasekera, et al., 2007; Le Roch, et al., 2008; Young, et al., 2008). Thus, while certain highly regulated enzymes such as PSD operate far from equilibrium and a small number of constitutively expressed enzymes such as glucose-6-phosphate dehydrogenase operate close to equilibrium, the thermodynamics of reactions for P. falciparum in general is not affected by transcriptional regulation further supporting the model for a constrained program that largely pre-determines the expression levels of metabolic genes in malaria parasites.

Figure 4-1 Boxplots showing distribution of fold change expression for bottleneck reactions and regulation candidate reactions. Fold change expression for each reaction was calculated as the ratio of maximum to minimum gene expression over the entire IDC time course. 109

Figure 4-2 The expression profiles of enzyme-encoding genes reveal a continuous cascade of expression consistent with the idea that P. falciparum has evolved a highly specialized mode of coordinated regulation that resembles a “just-in-time” manufacturing process whereby induction of any given gene occurs once per cycle and only at a time when it is required. In particular, our subset of enzyme-encoding genes appears to be expressed mostly during the highly metabolic trophozoite stage, and as expected does not include expression profiles typical of invasion proteins which are expressed during the early ring stage or virulence proteins which are expressed in the later schizont stage. Genes (with at least 80% data) were hierarchically clustered based on centered correlation and using average linkage of log2 transformed expression data for 3D7. Red represents higher expression and green indicates lower expression in the experiment. 110

Figure 4-3 ∆G°, expression, and dN/dS for reactions in iMPMP420 ordered by increasing ∆Gmin. 111

4.1.2 Metabolic enzymes as drug targets

In order to investigate the potential of metabolic enzymes as drug targets, single enzyme deletions were performed in silico to identify those required for parasitic growth (Raja, 2010). FBA of iMPMP420 predicts 162 of the 325 (50%) enzymes to be required for production of biomass components essential for growth (Supplementary Table S9). This relatively high proportion of enzymes further illustrates the simplified nature of P. falciparum metabolism offering few alternate routes for the production of growth metabolites. In particular, candidate targets were associated with many key metabolic pathways known to be highly up-regulated in the blood stages of the malaria parasite such as the TCA cycle, glycolysis, the pentose phosphate pathway, and purine and pyrimidine metabolism.

To evaluate the ability of the model to predict drug targets, I compiled a list of known enzyme drug targets from MPMP together with lists generated by previous groups (Fatumo, et al., 2009; Huthmacher, et al., 2010; Plata, et al., 2010; Yeh, et al., 2004). I have also included four additional enzymes that were identified from two recent high-throughput biochemical screens as potential drug targets based on evidence of essentiality, druggability, and selectivity (Crowther, et al., 2011; Preuss, et al.). Importantly, while other groups have typically included gold standard enzymes that are not specific to the malaria parasite (e.g. enzymes with evidence as anti-cancer agents or enzymes inhibited in P. berghei only), I have employed a strict yet comprehensive gold standard dataset encompassing all enzymes known to be essential in P. falciparum. Based on this criterion and combining all five datasets produces a list of 86 gold standard enzymes (Table 4-3). Within this gold standard set, 9 enzymes are currently verified drug targets (Table 4-4). Comparing the ability of iMPMP420 to predict essential enzymes over other P. falciparum models indicates a similar performance to that achieved by Huthmacher et al., with a modest increase in sensitivity over predictions made by Plata et al., but importantly, iMPMP420 predicts the greatest number of currently verified antimalarial targets (6 of 9) (Table 4-4 and Table 4-5). Interestingly, of the 30 enzymes that were identified to be essential by all three models, most of these reactions (11/30 or 37%) participate in purine and/or pyrimidine metabolism including 8 proposed targets from the literature (Figure 4-4). Currently, no 112 antimalarials are available against either of these pathways, making this overlapping dataset an ideal set of high-confidence candidates for antimalarial drug development.

Table 4-3 List of gold standard enzymes compiled for P. falciparum.

1.1.1.205 3.4.21.62 3.1.3.56 2.3.1.37 1.1.1.267 3.4.22.- 3.1.4.17 2.5.1.1 1.1.1.8 3.4.22.1 3.3.1.1 2.5.1.10 1.10.2.2 3.4.23.- 4.1.1.23 2.5.1.58 1.3.3.1 3.4.23.38 4.1.1.50 2.7.1.33 1.5.1.3 3.4.23.39 4.1.2.13 2.7.7.6 1.9.3.1 3.4.24.- 4.2.1.24 2.7.7.7 2.3.1.24 3.5.4.4 4.2.3.5 3.1.1.5 2.3.1.50 4.1.1.17 4.4.1.5 3.4.14.1 2.4.1.80 6.3.4.4 6.3.2.2 3.5.1.98 2.4.2.1 6.4.1.2 6.3.5.5 3.5.2.3 2.4.2.8 1.17.4.1 5.99.1.2 3.6.1.23 2.5.1.15 1.3.99.1 1.8.1.9 4.2.1.60 2.5.1.19 1.6.5.3 1.3.1.9 4.3.2.2 2.6.1.85 2.1.1.45 4.2.1.1 4.6.1.12 2.7.8.3 2.1.1.64 1.1.1.27 5.99.1.3 3.1.4.12 2.3.1.15 1.15.1.1 6.3.5.2 3.4.11.1 2.3.1.41 1.3.3.4 1.1.1.49 3.4.11.18 2.5.1.16 1.3.98.1 2.3.1.97 3.4.11.2 2.5.1.18 1.8.1.7 2.7.4.8 3.4.11.21 2.5.1.21 2.1.1.103 3.1.1.31 3.4.11.9 2.7.1.32

113

Table 4-4 List of experimentally verified enzyme drug targets in P. falciparum.

EC Enzyme Drug iMPMP420 Huthmacher Plata Reference et al. et al. 1.1.1.205 inosine-5- Bredinin Yes No No Webster and monophosphate Whaun, 1982 dehydrogenase 1.1.1.267 1-deoxy-D-xylulose Fosmiomycin Yes Yes Yes Lell et al., 2003 5-phosphate reductoisomerase 1.3.3.1 dihydroorotate Triazolopyrimidine Yes Yes Yes Coteron et al., dehydrogenase compound 38 2011 1.3.99.1 flavoprotein subunit Plumbagin No No No Suraveratum et of succinate al., 2000 dehydrogenase 1.5.1.3 dihydrofolate Pyrimethamine Yes No Yes Sixsmith et al. reductase 1984 2.1.1.45 thymidylate Pyrimethamine Yes Yes Yes Sixsmith et al. synthase 1984 2.5.1.15 dihydropteroate Sulfadoxine Yes No Yes Triglia and synthase Cowman, 1994 3.5.4.4 adenosine Deoxycoformycin No No No Tyler et al., deaminase 2007 5.99.1.2 topoisomerase I Camptothecin No Yes No Bodley et al., 1998

Table 4-5 Evaluation of iMPMP420 for predicting essential enzymes compared to previous P. falciparum FBA models.

Study Predicted essential Verified drug TP FN FP TN Sensitivity enzymes targets (TP/TP+FN) iMPMP420 162 6 41 45 121 118 41/86 = 48% Huthmacher et al. 135 4 39 47 96 ± 39/86 = 45% Plata et al. 62 5 22 64 40 ± 22/86 = 26% TP = Gold standard enzyme that is also predicted to be essential by the model FN = Gold standard enzyme that is not predicted to be essential by the model FP = Reaction predicted by the model to be essential but not in gold standard set TN = Reaction predicted by the model to be non-essential and not in the gold standard set ± Unknown since models did not specify total number of distinct enzymes based on EC number classification 114

Figure 4-4 Overlap of gold standard enzymes with predicted essential enzymes by iMPMP420, Huthmacher et al., and Plata et al. Total numbers of enzymes identified by each dataset is noted in brackets (e.g. 162 enzymes were predicted to be essential by iMPMP420). Gold standard enzymes have been compiled from lists generated by Yeh et al., Fatumo et al., Huthmacher et al., and Plata et al., in union with those annotated as validated drug targets by MPMP.

Furthermore, 15 of the 30 enzymes identified by all three models have not yet been reported in the literature as potential drug targets for P. falciparum (Table 4-6). To investigate the suitability of these candidates to undergo further investigation, I examined the similarity of these enzymes to human proteins and their expression based on recently generated RNA-Seq datasets (PlasmoDB). These 15 enzymes are encoded as single-copy genes and are all considered to be expressed at some point during the life cycle of malaria parasite. Sequence similarity searches revealed four candidates having relatively low sequence similarity to human transcripts (20- 40%) and another four candidates (MAL13P1.186, PFE0150c, PFA0340w, and PF07_0018) having no significant similarity to any human protein (E-value > 10-2). More interestingly, I find that MAL13P1.186, PFE0150c, and PFA0340w are predicted to encode enzymes that are all involved in isoprenoid metabolism, which is one of several pathways in P. falciparum that localizes to the apicoplast, a defining chloroplast-like organelle of apicomplexan parasites. Furthermore, isoprenoid biosynthesis, while being essential in most organisms, has two possible routes – the mevalonate and nonmevalonate pathways; P. falciparum, like many other pathogenic organisms including E. coli and M. tuberculosis, relies on the nonmevalonate route, 115 while humans do not. Given the suitability of this pathway for drug targeting, the enzymes involved in the nonmevalonate route for isoprenoid biosynthesis have received significant attention, including the first enzyme of the pathway, 1-deoxy-Dxylulose-5-phosphate reductoisomerase (MAL13P1.186), which is currently the focus of inhibition studies (Jackson and Dowd, 2012). Moreover, the second enzyme of the pathway, 1-deoxy-d-xylulose 5- phosphate (DOXP), was one of the enzymes captured by all three models; DOXP, however, has already been explored as a potential drug target in P. falciparum, with the drug fosmidomycin exhibiting antimalarial activity in vitro and in a rodent malarial model (Wiesner and Jomaa, 2007). The fourth candidate exhibiting no significant similarity to human proteins is PF07_0018, a Plasmodium membrane protein of unknown function, but annotated to the CoA biosynthesis pathway. While CoA is a required cofactor involved in many downstream biosynthetic pathways, the ability to target membrane proteins is known to be particularly challenging. Nonetheless, FBA knockouts when combined with our current knowledge of P. falciparum metabolism strongly support the suitability of enzymes involved in isoprenoid biosynthesis for further investigation as targets against malaria.

116

Table 4-6 Essential enzymes identified by all three iMPMP420, Huthmacher et al., and Plata et al. that have not been reported in the literature. EC Gene ID Gene Product Pathway RNASeq Human Similarity 2.2.1.7 MAL13P1.186 1-deoxy-D-xylulose 5-phosphate Isoprenoid Yes No* synthase metabolism± 2.7.1.148 PFE0150c 4-diphosphocytidyl-2c-methyl- Isoprenoid Yes No* D-erythritol kinase, putative metabolism± 2.7.7.60 PFA0340w 2-C-methyl-D-erythritol 4- Isoprenoid Yes No* phosphate cytidylyltransferase, metabolism± putative 4.2.1.11 PF10_0155 enolase Glycolysis Yes Yes 2.7.2.3 PFI1105w phosphoglycerate kinase Glycolysis Yes Yes 2.5.1.6 PFI1090w S-adenosylmethionine Methionine and ? Yes synthetase polyamine metabolism 2.7.1.24 PF14_0415 dephospho-CoA kinase, putative CoA biosynthesis± Yes Yes 2.7.7.3 PF07_0018 conserved Plasmodium CoA biosynthesis Yes No* membrane protein, unknown function 4.1.1.36 MAL8P1.81 flavoprotein, putative CoA biosynthesis Yes Yes 1.3.3.3 PF11_0436 coproporphyrinogen III oxidase Porphyrin metabolism Yes Yes 4.1.1.37 PFF0360w uroporphyrinogen III Porphyrin Yes Low** decarboxylase metabolism± 4.99.1.1 MAL13P1.326 ferrochelatase Porphyrin Yes Low** metabolism£ 2.7.4.9 PFL2465c thymidylate kinase Pyrimidine Yes Yes metabolism 2.1.3.2 MAL13P1.221 aspartate carbamoyltransferase Pyrimidine ? Low** metabolism, Asparagine and aspartate metabolism± 2.4.2.10 PFE0630c orotate Pyrimidine Yes Low** phosphoribosyltransferase metabolism *No significant sequence similarity (E-value > 10e-2) **Low sequence identify (20-40% to human transcripts) ±Nuclear genes with apicoplast signal sequences £Nuclear genes with mitochondrial signal sequences

117

4.1.3 Evolutionary analysis of P. falciparum metabolism

Recent surveys on the genetic diversity of P. falciparum have revealed a number of salient features on the genome-wide variation and evolution of the parasite. In addition to identifying new antigens and vaccine candidates, these studies provide high-resolution polymorphism maps that serve as a solid foundation for further understanding the population biology of these parasites (Jeffares, et al., 2007; Mu, et al., 2007; Volkman, et al., 2007). Nonetheless, the influence of selective pressures and evolutionary diversity present in the repertoire of genes required for metabolic processes has yet to be examined. Here I combine and analyze evolutionary data with thermodynamics and gene expression data in the context of a robust framework for P. falciparum metabolism.

To gain further insight into factors influencing the organization of the malaria parasite metabolic network, I analyzed the selective pressures across 18 different strains of P. falciparum (Table 4-7). More specifically, I have calculated the rate of synonymous substitutions per synonymous site (dS) and the rate of nonsynonymous substitutions per nonsynonymous site (dN) for all enzyme-encoding genes in our model. The dN to dS ratio (ω) provides information on selection where ω > 1, ω < 1, and ω = 1 indicate positive or adaptive selection, negative or purifying selection, and neutral evolution, respectively. Traditionally, the ω ratio is calculated as an average over all amino acids in the protein, which makes ω > 1 a very strict criterion for measuring positive selection given that the majority of amino acids in many proteins are largely invariable due to strong functional constraints (Li, 1997). A more appropriate approach to modelling this scenario is the use of codon-site models to test and identify critical amino acids in a protein under positive selection. Recently, a number of such tools have been developed to calculate heterogeneous ω ratios (Nielsen and Yang, 1998; Suzuki, 2004; Suzuki and Gojobori, 1999). To determine the gene- and site-specific selection pressures acting on metabolic enzymes in P. falciparum, I chose to use the codon-based maximum likelihood algorithm FUBAR (Murrell, et al., 2013), which improves both on speed and accuracy over previous models for inferring heterogeneous selection. The method is a component of the HYPHY software package as implemented on the Datamonkey server (www.datamonkey.org) (Delport, et al., 2010). Sites under positive or negative selection were considered significant with a posterior probability greater than 0.9.

118

Table 4-7 P. falciparum genomes assembled for calculation of dN/dS.

WGS Study Strain / Library Number Mean sequencing Raw Filtered of reads depth SNPs SNPs P. falciparum E5 3D7-E5 1 / ERR034866 14,278,193 33.20 1226 744 P. falciparum reference 7G8 / ERR019313 17,069,968 46.96 4642 3701 isolates for genotyping P. falciparum clones X33 / ERR034867 23,378,113 15.00 3005 1198 from 3D7xHB3 XP8 / ERR034868 24,441,874 13.60 2155 825 X10 / ERR034869 21,996,114 11.94 2726 1030 X4 / ERR034870 18,074,566 13.01 2529 852 XP2 / ERR034871 19,529,950 14.49 2507 942 P. falciparum Dd2 ParentalDd2 / 17,040,505 46.37 4041 2693 clones ERR019427 16,515,058 44.89 6454 4592 Clone1Dd2 / 20,192,367 47.60 5145 3787 ERR019428 19,721,320 46.28 5445 3632 Clone2Dd2 / ERR019429 Clone3Dd2 / ERR019430 P. falciparum Ech_1 / ERR018814 7,906,118 28.22 6844 4341 enrichment test P. falciparum 3D7 3D7-700L / 6,031,033 17.67 97 13 ERR019077 82,480,286 36.77 164 34 3D7_500 / ERR034295 P. falciparum IT IT_SPRI_A1 / 2,081,762 11.38 6903 5243 ERR047192 67,387,086 34.96 6515 4731 PfIT-1 / ERR019272 P. falciparum progeny Pfcl7 / ERR027131 16,298,957 33.54 2590 1412 from HB3xDd2 Pfcl12 / ERR027132 18,875,021 32.09 2304 1117

119

Estimates for dN/dS could be calculated for 742 enzyme-encoding genes, with a distribution skewed around 0.80 indicating neutral or purifying selection (Figure 4-5); this is expected given the importance of metabolism to the survival of the malaria parasite. In comparison to free energy change data, I find that selective pressures vary widely in both groups of reactions classified either as bottlenecks or those operating far from equilibrium (candidates for regulation) (Figure 4-3). A previous study, however, examining the genome-level variation and evolution in P. falciparum correlate levels of expression with dN/dS rates of protein-coding genes based on the observation that genes expressed at low levels are rapidly evolving, while those being abundantly expressed are conserved during evolution (Jeffares, et al., 2007). As I have shown in our expression analysis, metabolic enzymes overall exhibit high levels of expression, so correspondingly I would expect these genes to be under purifying selection and thus more evolutionarily conserved. Indeed, I find that nearly 90% of all enzymes in our model are associated with ω ratios ≤ 1, with only a small proportion exhibiting positive selection (11%). Amongst this latter group are currently verified drug targets and bifunctional enzymes dihydrofolate reductase-thymidylate synthase (DHFR-TS) (PF3D7_0417200, EC 1.5.1.3/2.1.1.45) and dihydropterin pyrophosphokinase-dihydropteroate synthase (PPPK-DHPS) (PF3D7_0810800, EC 2.5.1.15/2.7.6.3). Consistent with the positive selection of DHFR-TS as a drug resistance marker, our data indicates a very high rate of nonsynonymous to synonymous substitutions for the gene (dN/dS = 1.13, 97th percentile). This result is further supported by data from PlasmoDB which reports higher numbers of nonsynonymous SNPs compared to synonymous SNPs as identified from several high-throughput experimental SNP studies (N/S = 6, 94th percentile). Moreover, mutations that have been implicated in pyrimethamine resistance occur at amino acid positions 50, 51, 59, 108, and 164 of DHFR (Brown, et al., 2010; Cowman, et al., 1988; Lozovsky, et al., 2009; Peterson, et al., 1988; Sirawaraporn, et al., 1997) and are consistent with the locations of nonsynonymous SNPs identified through our assemblies. Many pathways such as those for purine interconversion and pyrimidine biosynthesis have been established as essential for parasite development and distinct from those of the human host; in the following section I provide a unique evolutionary perspective into individual enzymes of such pathways lending greater insight into their potential as antimalarial drug targets.

120

Figure 4-5 Distribution of dN/dS ratios for genes in iMPMP420.

4.1.4 Selective pressures in essential pathways

Of the list of genes containing codon sites under strong positive or negative selection (posterior probability > 0.9) (Table 4-8), the gene encoding the carbamoyl phosphate synthetase II (PfCPSII) enzyme (PF3D7_1308200, EC 6.3.5.5) is particularly notable being the rate-limiting and first step in de novo pyrimidine biosynthesis, a pathway that is indispensible in malaria parasites due to the absence of the pyrimidine salvage pathway. Our analysis identified two sites of selection in PfCPSII – one under diversifying positive selective pressure (dN/dS = 7.148, T1675G); the other under purifying negative selective pressure (dN/dS = 0.1032, T726A). Interestingly, the mRNA transcript has been shown to contain two insertion sequences that are absent from mammalian orthologs, providing unique targets exploitable for the design of new antimalarials (Flores, et al., 1994). While the exact role of these sequences has yet to be characterized in P. falciparum, they appear to be advantageous based on experiments in Toxoplasma gondii demonstrating the functional importance of the insertions as novel domains 121

(Fox and Bzik, 2003). Significantly, codon 1675 is located within the unusually large insertion sequence of PfCPSII that is flanked by two ATP-binding regions together comprising the synthetase domain of PfCPSII (Figure 4-6). The codon, predicted in our analysis to be under strong positive selection (most common mutation is T to G nucleotide resulting in change from lysine to asparagine residue), is also a confirmed SNP (PlasmoDB SNP ID: CombinedSNP.MAL13.1760) with type data for 9 isolates. The high degree of positive selective pressure at this site and its position within the unique insertion sequence might suggest that the codon substitution confers a fitness benefit for the parasite. On the other hand, mutations due to negative selection of codon 726 are unlikely to affect protein function consistent with the location of this residue on the boundary of the variable linker region of the enzyme (Figure 4-6). Collectively, the incorporation of this information will be beneficial for future experiments such as those to test the inhibitory effect of nucleic acid therapies against PfCPSII.

122

Table 4-8 Sites of positive selection in genes of the pyrimidine and/or purine metabolism pathways. Codon-specific rates of synonymous (dS) and nonsynonymous (dN) nucleotide substitutions were estimated by FUBAR. The Bayesian posterior probability provided for each codon was considered to be strong support for positive selection.

Gene EC Codon dN/dS P(dN/dS)*

PF3D7_0923800.1 1.8.1.9 39 7.689 0.931 PF3D7_0417200 1.5.1.3/ 108 8.096 0.942 2.1.1.45 PF3D7_0305800 2.7.4.3 739 7.591 0.929 744 8.358 0.941 PF3D7_0605600 2.7.4.6 748 7.915 0.932 1696 7.870 0.931 PF3D7_0316300.1 3.6.1.1 162 8.790 0.939 166 9.728 0.955 171 9.527 0.952 185 8.394 0.936 191 8.880 0.947 482 7.484 0.923 715 8.589 0.938 843 9.176 0.949 1043 9.560 0.945 1051 8.799 0.937 1779 8.554 0.938 PF3D7_0802600 4.6.1.1 1996 6.222 6.222 2008 7.904 7.904 2010 7.060 7.060 PF3D7_1138400 4.6.1.2 1247 7.523 0.925 1411 7.259 0.922 1939 8.584 0.936 PF3D7_1360500 4.6.1.2 1041 8.402 0.937 1425 8.148 0.934 2554 8.584 0.942 2834 8.520 0.941 PF3D7_1308200 6.3.5.5 1675 7.148 0.921

*Posterior probability for positive selection at the site. 123

Figure 4-6 Location of codon sites under selection relative within the PfCPSII gene. The N-terminal GAT domain is composed of a putative structural domain (PSD) and glutaminase (GLNase) domain, which is joined by a short linker sequence to the CPS domain comprising the CPSa and CPSb subdomains. Insertion sequences are represented by white boxes. Below the protein, active sites are highlighted by (*) and ATP-binding domains are drawn as rounded boxes.

To examine evolutionary pressures from a pathway perspective, I compared dN/dS values for groups of enzymes as defined by MPMP (Figure 4-7). Pathways involved in carbohydrate, cofactor and vitamin and nucleotide metabolism were generally under neutral selection, while those for amino acid and lipid metabolism appeared to be under significantly greater negative selection. Not surprisingly, folate biosynthesis which contains two enzymes (DHFR and DHPS) that have developed resistance to current antimalarial drugs, is under positive selection. Glycine and serine metabolism was also found to be under positive selection and includes the enzyme aminomethyltransferase (PfAMT) (PF3D7_1344000, EC 2.1.2.10, ω = 1.045), which interacts with the folate biosynthesis pathway by converting H4folate to CH2H4folate. Whether the selective pressures of enzymes are affected by those of nearby pathways has not yet been studied, but represents a possible scenario to explain the relatively high dN/dS ratio predicted for PfAMT.

124

A)

(B)

Figure 4-7 Evolutionary rates of P. falciparum enzymes in the context of (A) pathways and (B) superclass pathways. Metabolic pathways classifications are as defined by MPMP: (1) arginine and proline metabolism, (2) asparagine and aspartate metabolism, (3) glutamate metabolism, (4) glycine and serine metabolism, (5) lysine metabolism, (6) methionine and polyamine metabolism, (7) nitrogen metabolism, (8) phenyalanine and tyrosine metabolism, (9) selenocysteine metabolism, (10) glycolysis, (11) mannose and fructose metabolism, (12) pentose phosphate cycle, (13) pyruvate metabolism, (14) TCA cycle, (15) CoA biosynthesis, (16) folate biosynthesis, (17) lipoic acid metabolism, (18) nicotinate and nicotinamide metabolism, (19) porphyrin metabolism, (20) pyridoxal phosphate (vitamin B6) metabolism, (21) riboflavin metabolism, (22) shikimate biosynthesis, (23) thiamine metabolism, (24) hemoglobin digestion, (25) dolicol metabolism, (26) inositol phosphate metabolism, (27) isoprenoid metabolism, (28) phosphatidylcholine metabolism, (29) phosphatidylserine & phosphatidylethanolamine metabolism, (30) sphingomyeline and ceramide metabolism, (31) terpenoid metabolism, (32) purine metabolism and (33) pyrimidine metabolism. Boxplots have been coloured by superclass category, where AA = amino acid metabolism; C = carbohydrate metabolism; CoF & V = cofactors and other vitamins; Hb = haemoglobin digestion; L = lipid metabolism; N = nucleotide metabolism. 125

While positive selection appears to be associated with drug resistance mechanisms, strong negative selection for an enzyme may serve as a beneficial attribute for current drug targets by reducing the risk of developing resistance mutations against the binding inhibitor. To identify such candidates, I mapped essentiality data as predicted through in silico deletions by iMPMP420 and found that the majority of these pathways contained at least one essential enzyme with a value of ω ≤ 1. Isoprenoid biosynthesis, however, stood out as having all (11) of its enzymes predicted to be essential and under purifying selection. The pathway leads to the production of isoprenoids, which are fundamental building blocks for diverse cellular compounds (e.g. steroids, cholesterol, ubiquinones) that are vital to a variety of biological processes (Rohmer, 1999; Sacchettini and Poulter, 1997). Importantly, the pathway utilized by and essential for P. falciparum is the 2-C-methyl-D-erythritol-4-phosphate (MEP) pathway (Jomaa, et al., 1999; Ralph, et al., 2004) and biochemically distinct from that of humans, in which isoprenoids are synthesized via the mevalonate pathway (Goldstein and Brown, 1990). Due to their absence in human cells, all enzymes of the MEP pathway are excellent molecular targets for the development of antimalarial drugs, including the second enzyme of the pathway, deoxyxylulose phosphate reductoisomerase (DXR, EC 1.1.1.267), which is currently the target of clinical trials for the drug fosmidomycin as a combination therapy to treat malaria (Borrmann, et al., 2004; Lell, et al., 2003; Missinou, et al., 2002). Other enzymes that have been functionally characterized include 1-deoxy-D-xylylose-5-phosphate synthase (EC 2.2.1.7) and 2-C-methyl-D- erythritol-2,4-cyclodiphosphate synthase (EC 4.6.1.12) (Handa, et al., 2013; Rohdich, et al., 2001). Our dN/dS data indicates that 4-hydroxy-3-methylbut-2-enyl diphosphate reductase (PF3D7_0811900, EC 1.17.1.2) and 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase (PF3D7_0106900, EC 2.7.7.60) have particularly low values of dN/dS (ω values of 0.646 and 0.807, respectively) and represent attractive candidates in the MEP pathway that have not yet been explored in P. falciparum as antimalarial drug targets. Besides isoprenoid biosynthesis, numerous other pathways appear to be under purifying selection and indicate the presence of enzymes affected predominantly by synonymous polymorphisms (Figure 4-7A). Glutamate metabolism was found to be the most significantly enriched by enzymes under negative selection (Mann-Whitney test, P = 5.71 E-4) with enzymes of this pathway receiving recent interest as potential targets for therapeutic intervention (Zocher, et al., 2012). 126

Collectively, our dN/dS data indicate that mutations are quite common but non-detrimental (either neutral or purifying) when comparing strains of P. falciparum. Our study supports a high level of diversity in P. falciparum, which appears to be a fitness advantage for the parasite consistent with its ability to survive and propagate within a highly selective and hostile host environment. As this diversity also allows the parasite to develop resistance to antimalarial drugs rapidly, systematic analyses such as those described in this study are important steps towards identifying novel and effective drug targets. Notwithstanding, more extensive exploration of polymorphisms in P. falciparum metabolic enzymes are necessary to fully understand the impact of selective pressures on the organization of the malaria parasite metabolic network. Our integrated analysis of P. falciparum metabolism within the context of a robust FBA model has allowed for informative hypothesis generation that when combined with more detailed in-depth analysis will help to achieve a greater understanding of malaria parasite metabolism and ultimately lead to more effective drug therapies.

4.1.5 Transcriptomic analyses for the Apicomplexa

With the rapid advancement of technologies for transcriptomics, a number of genome-wide gene expression datasets for the Apicomplexa have recently been generated(Bahl, et al., 2010; Llinas, et al., 2006; Wakaguri, et al., 2009). By exploiting these datasets, I can build on my initial comparative analyses by using the metabolic network as a platform for overlaying transcriptomic data. Comparisons across species will provide insight into parasite evolution by highlighting differences in the expression of different groups of enzymes. Datasets were selected based on being high quality (e.g. exhibiting reproducibility through replicates) and having genome-scale coverage (in order to capture all enzyme-encoding genes) (Table 4-9). I performed an analysis of coccidian metabolism (using a microarray dataset for three Toxoplasma strains as well as an RNA-Seq dataset for T. gondii and close cousin N. caninum) as well as Plasmodium-specific analysis based on a high quality microarray and RNA-Seq dataset.

127

Table 4-9 Expression datasets for comparative transcriptomics of the Apicomplexa.

Bahl et al., 2010 Reid et al., 2012 Bozdech et al., 2003 Bártfai et al., 2010 Species/strains Toxoplasma gondii Toxoplasma gondii P. falciparum HB3 P. falciparum 3D7 PRU, RH, VEG VEG, Neospora caninum Expression Oligonucleotide RNA-seq (Illumina Custom long- RNA-seq (Illumina platform microarray GAIIx) oligonucleotide GAIIx) (Affymetrix) microarray Genes expressed* 3,986 80%, 74% of genes ~2000 > 90% genome Replicates 3 2 1 1 Normalization RMA RPKM NOMAD** RPKM procedure Differential SAM DESeq Fast Fourier n/a expression Transform (FFT) Source NCBI GEO Personal comm. derisilab.ucsf.edu www.plasmodb.org [GSE20145] *During the tachyzoite/trophozoite stage **Normalized by linear scale (global normalization); datasets have not been log-transformed or mean-centered (as in figures).

4.1.5.1 Toxoplasma displays strain-specific differences in arachidonic acid metabolism

Hierarchical clustering of the coccidian datasets revealed that expression is very similar across Toxoplasma strains and between platforms (Figure 4-8). Pathways for amino acid metabolism, however, appeared to be down-regulated in N. caninum (Mann-Whitney test, P = 8.19e-05) in comparison to Toxoplasma strains, particularly for (i) valine, leucine and isoleucine degradation and (ii) phenylalanine, tyrosine and tryptophan biosynthesis (Figure 4-8). To further investigate the significance of this disparity, I applied Gene Set Enrichment Analysis (GSEA) (Subramanian et al., 2005) based on gene expression values for individual enzyme-encoding genes. Indeed, while GSEA revealed an enrichment in pathways for amino acid metabolism, the leading edge subset (LES) (representing the core grouping of enzymes contributing to the enrichment score) were biased by genes encoding histone-lysine-N-methyltransferases and tRNA-ligases. On the other hand, GSEA for specific enzyme classes (vs. for individual enzyme genes) reveals that two 128 enzymes in arachidonic acid metabolism (a component of lipid metabolic pathways) are enriched in the type III strain (VEG) (Figure 4-9). Importantly, these findings are supported by a recently generated metabolomics datasets indicating that metabolites produced in the arachidonic acid pathway have the highest levels in type III strains (pers. comm.). In addition, preliminary experiments in mice infected with T. gondii have suggested a role for the arachidonic acid pathway in manifesting distinct phenotypes (pers. comm.).

Figure 4-8 Cluster analysis of gene expression based on microarray expression data for three strains of Toxoplasma (PRU, RH and VEG) and RNA-Seq data for T. gondii VEG and N. caninum. A cluster of enzymes appears to be distinctly down-regulated in N. caninum comparison to Toxoplasma strains (highlighted in orange box).

129

Figure 4-9 GSEA enrichment plot shows that enzymes involved in lipid metabolism are overall more active in T. gondii VEG than T. gondii PRU. The green curve corresponds to the ES (enrichment score) curve, which is the running sum of the weighted ES obtained from the GSEA software. The total height of the curve indicates the extent of enrichment, with the normalized enrichment score (NES) and FDR value indicated. Each of the of black peaks represent a gene in the pathway, with those belonging to the Leading Edge Subset (LES) indicated by the double-headed blue arrow. The "Signal- to-Noise" ratio (SNR) statistic was used to rank the genes according to their correlation with either the T. gondii PRU phenotype (red) or T. gondii VEG phenotype (blue). The graph on the bottom represents the ranked, ordered, and non-redundant list of genes.

4.1.5.2 P. falciparum pathways are highly expressed throughout the intraerythrocytic development cycle

Investigation of expression datasets for P. falciparum indicated that the majority of enzymes exhibit similar profiles across the metabolically active trophozoite life cycle stage (Figure 4-10). A small subset of genes, however, are associated with transcript levels that appeared down- regulated in RNA-seq and up-regulated in microarray data; this artefact is most likely a result of probe-specific background hybridization on the array, as suggested by a landmark study by Marioni et al. that observed differences in technology were associated with large array intensities 130 and small sequence counts (Marioni, et al., 2008). Based on average P. falciparum expression for individual enzyme classes, GSEA highlights an enrichment of enzymes in glycerophospholipid metabolism. Importantly, pathways involved in glycerophospholipid metabolism, such as phosphatidylethanolamine and phosphatidylserine biosynthesis, were previously identified in my P. falciparum FBA analyses as candidates for regulation based on their strictly negative or positive free energy change; these reactions represent potential drug targets based on their essentiality to membrane formation during parasite proliferation. Finally, to glean insight into the differences in pathway expression across different stages of the IDC, I applied GSEA to compare trophozoite-, schizont- and ring-stage expression. My analysis revealed that glycolysis had the highest level of enrichment in trophozoites compared to the less metabolically active ring and schizont stages of the parasite (p < 0.05), and this is supported by experiments demonstrating that enzymes of the glycolytic pathway reach extremely high concentrations during the IDC stages of P. falciparum (Roth, et al., 1988). These and other findings help to clarify and solidify our understanding of P. falciparum metabolism, and when integrated with additional more detailed datasets will provide greater insight into the biology and evolution of these parasites.

Figure 4-10 Cluster analysis of gene expression based on RNA-seq and microarray datasets for P. falciparum 3D7. Timepoints that have been included in the heatmap are those relevant to the metabolically active trophozoite life cycle stage of the parasite. Highlighted in the yellow box are a cluster of enzymes exhibiting differential expression between platforms.

131

4.2 Concluding Remarks

The analysis of our high-confidence model for P. falciparum metabolism, iMPMP420, has provided insight into the operation and evolution of pathways in the malaria parasite as well as a list of prioritized enzymes representing suitable candidates for further drug intervention. By enforcing constraints for thermodynamics variability within the context of biologically relevant pathways, our model identifies highly regulated pathways such as phospholipid biosynthesis which are critical for survival during intraerythrocytic proliferation. Moreover, the integration of gene expression data into the model strongly supports P. falciparum’s “hard-wired” metabolism whereby the thermodynamics and hence regulation of enzymes is non-responsive to metabolic perturbations. Importantly, GSEA of two high-resolution genome-scale expression datasets for P. falciparum highlights an enrichment of enzymes in glycerophospholipid metabolism which overlaps with candidates for regulation identified through our thermodynamically-constrained FBA. Further incorporation of evolutionary data reveals that the majority of enzymes in iMPMP420 are under purifying selection typical of highly conserved enzymes, which is consistent with a large complement of enzymes in the model exhibiting high levels of expression. From a pathway perspective, most pathways are under neutral or purifying selection, while not surprisingly folate biosynthesis, which contains two enzymes that have developed resistance to current antimalarial drugs, is under positive selection. Finally, in a comparison to two existing models for P. falciparum metabolism, iMPMP420 was found to predict the greatest number of currently verified antimalarial targets. Essential enzymes predicted by all three models includes candidates in the purine and pyrimidine biosynthesis pathways for which no antimalarials are yet available. Collectively, our integrated analysis of P. falciparum metabolism has allowed for informative hypothesis generation that when combined with more in-depth analyses will help to achieve a greater understanding of parasite metabolism and ultimately lead to more effective drug therapies.

132

Chapter 5

Summary of findings and future directions

5 Summary of findings and future directions

5.1 Summary of findings

This thesis project represents a systematic analysis of apicomplexan metabolism with the aims of (1) identifying novel drug targets and (2) gaining insight into apicomplexan biology and parasitism in general. Given the difficulties associated with enzyme prediction in phylogenetically divergent organisms such as the Apicomplexa, I have developed an improved method for enzyme classification (DETECT) based on a probabilistic framework that incorporates sequence similarity data across enzyme families. Combining DETECT with complementary tools and resources for studying metabolism has enabled for genome-scale metabolic reconstructions of 15 apicomplexan parasites including medically important species Plasmodium falciparum and Toxoplasma gondii, along with four tapeworm species and the phytopathogen Ophiostoma ulmi. DETECT will continue to serve as an essential component in this pipeline that I have established for studying the metabolism of newly sequenced organisms. When integrated with pathway and conservation information, these reconstructions have uncovered a number of important pathways as well as biological insights into variations and themes across groups of parasites. Importantly, comparative analyses have highlighted pantothenate biosynthesis as an attractive drug target for the Apicomplexa. The pathway, which is absent from humans, contains two enzymes that I have experimentally confirmed in T. gondii to be encoded by the same gene, suggesting a potentially bifunctional enzyme, and yielding new avenues for future experimental investigations. Moving beyond the simple classification of pathways, inclusion of thermodynamics, gene expression, and SNP datasets when applied to P. falciparum have gleaned further key insights into how pathways are organized and operate. These studies integrated with a proven metabolic reconstruction pipeline serve as a potent framework for driving further investigations aimed at exploiting metabolism for therapeutic intervention. 133

5.2 Future directions

5.2.1 Expansion and improvement of DETECT

5.2.1.1 Classification of transporter proteins and detection of proteins belonging to ortholog groups

While the intent of DETECT is to serve as an improved method for predicting enzyme-encoding proteins, the crux of the tool, stemming from its unique implementation of a probabilistic framework, can be extended to other biological contexts where sequence similarity is a determining factor in classifying a protein. Applied more generally, the set of probability profiles utilized by DETECT can instead be viewed as the profiles for a collection of distinct annotation entities (e.g. EC numbers, transporter classification numbers, ortholog groups) comprising a particular category of protein (e.g. enzymes, transporters, orthologs). In the optimal situation, DETECT, by taking into account sequence diversity across all subcategories of the protein category of interest, will produce more accurate function predictions whereby each assignment is linked to a probability confidence score.

Transporter proteins, which are notoriously difficult to classify, are particularly well-suited for the framework of DETECT since proteins of this type are annotated with a universal system for classification equivalent to the Enzyme Commission (EC) hierarchy for enzymes (Bairoch, 1994). The five-tier system, known as the Transporter Classification (TC) system (Saier, et al., 2006), classifies transport proteins according to both function and phylogeny. Furthermore, the use of global sequence alignments by DETECT to avoid bit scores biased by local high-scoring domains is also advantageous for transporter protein classification since the functions of these proteins are largely defined by the presence of multiple domains and/or motifs. Despite the importance of transporters to the life science industries (e.g. drug design, pharmacological studies, etc.), only a handful of tools have been developed specifically for transport protein prediction (Li, et al., 2008; Lin, et al., 2006; Saier, 2000) including BLAST being the most common method for annotation. The extension of DETECT as I have described is therefore a natural step forward in achieving improved predictions. Exploiting the use of TC-specific density profiles within a probabilistic framework will allow transport proteins to be more accurately classified and ultimately facilitate the study of cellular and pharmacological functions. 134

Another worthwhile application of DETECT relates to the phylogenetic classification of proteins, where instead of enzyme families, each class denotes a group of proteins related by orthology, in-paralogy, or co-orthology. These groups are denoted through ortholog group identifiers as defined by the Clusters of Orthologous (COG) proteins database (Tatusov, et al., 2003). Unfortunately, identifying orthologs and distinguishing them from paralogs is not always straightforward since genetic (e.g. gene duplication) and population-level (e.g. speciation) events can yield complex gene histories (Koonin, 2005). As the ability to accurately predict orthology is difficult, several different types of algorithms have been developed for ortholog prediction including methods based on phylogeny, evolutionary distance metrics, and sequence similarity (e.g. BLAST) (for a review, see (Kuzniar, et al., 2008)). While all types are widely used, only a handful of studies have evaluated the performance of the different prediction algorithms and have yielded contradictory results (Altenhoff and Dessimoz, 2009; Chen, et al., 2007; Hulsen, et al., 2006). Thus, alternative approaches are necessary to consolidate these predictions; one solution would be to exploit the uniquely probabilistic nature of DETECT, whose predictions can serve as additional support for those generated by existing algorithms. The DETECT framework, which is based on density profiles generated from the bit scores of global alignments between proteins, is a suitable vehicle for ortholog classification since the clustering of orthologs is driven primarily by the presence of multiple domains in proteins (Fischer, et al., 2011). Indeed, the senior author of OrthoMCL (Li, et al., 2003) has expressed a strong interest in implementing this approach and is delegating the task to programmers in his lab (David Roos, pers. comm.).

5.2.1.2 Enabling the classification of under-represented enzyme families

The bayesian framework that underlies DETECT provides a confidence score assessing the likelihood that an unknown protein of interest belongs to a particular enzyme family. Benchmark tests indicate that DETECT significantly outperforms other homology-based methods but only when examining enzyme families with 30 or more protein members; with fewer than 30 protein members, however, the ability to accurately predict the correct enzyme family based on the highest confidence score is severely reduced (underperforms compared to BLAST and PRIAM). Notwithstanding, the resulting confidence scores associated with smaller 135 families still represent probabilities which serve as quantitative measures for assessing the likelihood of enzyme function. For instance, the family of flavin reductases (EC 1.5.1.30) has five protein members, with sequences belonging to a wide range of species (cow, human, frog, mouse, and bacteria). Alignments of these members to other enzyme families produces similar scores, resulting in a density distribution profile that is highly indiscriminate. For an unclassified protein X that putatively belongs to EC 1.5.1.30, the resulting confidence score will be a very low probability. Protein X will also produce low probabilities against the other enzyme families exhibiting similarity to EC 1.5.1.30 (EC 1.5.1.36, 1.14.13.29, and 1.14.13.166 to name a few). Thus, while the relative ranking of these probability scores for protein X may not be correct, the ability to identify potential functional candidates and provide somewhat informative confidence scores to these candidates suggests that DETECT lends itself as useful tool for enzyme prediction even for under-represented enzyme families.

5.2.1.3 Integration of additional information to improve predictive capaticity

In its simplest form, DETECT uses information on sequence similarity scores to calculate the likelihood a protein of interest belongs to a particular enzyme family. Combining additional information such as phylogenetic relationships or protein lengths could further improve the classification accuracy of DETECT. I first examined the effect of protein length (number of amino acids) in the context of enzyme families as a possible factor that might relate to the degree of separation between positive and negative score distributions. While there was no correlation between these two measures (Figure 2-3B), it is worthy to note that protein lengths for most EC numbers vary to a much lesser extent than that of the similarity scores. This suggests that protein length is a fairly well-conserved feature of enzymes, and should improve predictions if properly integrated into the framework of DETECT.

Information on phylogenetic relationships can also be valuable for enhancing the predictive capacity of DETECT based on the fact that similarity between two enzyme genes can arise either through convergence (i.e. no common evolutionary history) or descent with modification from a common ancestor (i.e. homology). Importantly, sequence similarity and homology are not interchangeable since homologous genes may diverge to the extent that similarities are difficult 136 to detect (Reeck, et al., 1987). Most sequence-based functional predictions, however, are based on the identification of similarities that are thought to be due to homology. The current implementation of DETECT and other commonly used approaches for enzyme prediction (e.g. BLAST, PRIAM) do not take advantage of any information on how these enzymes evolve. For instance, gene duplication and subsequent divergence of function of duplicates can result in homologs with different functions being present in one species (i.e. paralogs) versus orthologs, which are duplicates of the same enzyme family. Incorporation of this evolutionary information would require the use of phylogenetic trees, which would likely be a slow and labor-intensive process, but at the very least would provide more accurate enzyme function predictions.

5.2.2 Understanding the evolution of apicomplexan metabolism

As more members of the phylum Apicomplexa continue to be sequenced, the ability to accurately predict and analyse the resulting genomes is now possible with the system of methods I have established for studying apicomplexan metabolism. Incorporation of actively updated resources such as EuPathDB (Aurrecoechea, et al., 2009) and LAMP (Shanmugasundram, et al., 2013) in addition to experimental (e.g. expression, kinetic, physiological) datasets will be integral to the generation of more biologically relevant interpretations and testable hypotheses. Importantly, these future studies will help to expand our current understanding of apicomplexan metabolism and provide new avenues for drug intervention. My preliminary findings have shown that apicomplexans have evolved distinct pathways for performing similar core metabolic activities. However, despite being related, different Apicomplexa have evolved different suites of enzymes reflecting altered life cycle strategies. The mechanisms and evolutionary histories of how these differences arose remains largely unknown, but with the insights gleaned from my analyses additional in-depth studies can begin to fill in these knowledge gaps and help to better understand the evolution of apicomplexan metabolism.

Host-specific adaptations are likely to play a prominent role in creating diversity in metabolic pathways, where specific evolutionary changes must have taken place to enable the parasite to survive in that niche. Such adaptive traits could originate by a process of gradual change, but there are mechanisms that would allow potential parasites to adapt very quickly. In particular, 137 lateral gene transfer (LGT) is a process by which genetic information is passed from one genome to an unrelated genome where it is stably integrated and maintained. While LGT is a well- known mechanism for genome evolution among prokaryotes (Lawrence, 1999), the process is especially important in the evolution of a parasitic lifestyle, where the transmission of infection- related factors may confer an immediate selective advantage (de Koning, et al., 2000).

Apicomplexans have undergone LGT via endosymbiosis where organelles such as the mitochondrion and apicoplast were acquired. Many of the genes encoded by or targeted to these organelles are known to be involved in active parasite metabolic pathways including fatty acid biosynthesis, isoprenoid biosynthesis and heme biosynthesis. Given these genes are readily transferable, one might hypothesize that in adapting to a new host environment, additional LGT events have occurred in the Apicomplexa leading to changes in the sets of enzymes necessary for metabolism. In fact, adaptation to parasitism seems to favour the acquisition of new genes by LGT with the observation that many eukaryotic organisms that have become ecologically specialized are rich in horizontally transferred genes (Keeling and Palmer, 2008). Large-scale changes beyond LGT also represent likely events in the evolution of divergent apicomplexan metabolisms – for instance, Cryptosporidium has presumably disposed of its apicoplast entirely and instead evolved alternative methods for metabolizing fatty acids by utilizing resources from the host (Zhu, 2004). Plasmodium species, which replicate exclusively in red blood cells, rely heavily on heme biosynthesis for survival. The pathway, which is absent from all other apicomplexans, is thought to have evolved through mosaic origins, a phenomenon describing a pathway that is composed of enzymes with different evolutionary origins (Obornik and Green, 2005). Several other examples of mosaic pathways in the Apicomplexa include the polyamine biosynthesis (Cook, et al., 2007), pyrimidine biosynthesis (Nara, et al., 2000), and the shikimate biosynthesis pathway (Richards, et al., 2006). Certainly, future phylogenetic studies based on these altered enzyme gene sets are essential and will lend insight into the evolution of these parasites and their metabolisms.

Beyond standard phylogenetic analyses, biochemical experiments are necessary to confirm the activities of the divergent enzymes – for instance, while Toxoplasma gondii encodes a set of genes associated with heme biosynthesis, this pathway is not known to be functional in the 138 parasite. Presumably, these are relict genes retained in the genome that were simply missed during negative selection sweeps. Moreover, experiments may help to establish whether or not an enzyme has acquired additional/altered function; many bifunctional enzymes have been observed in the Apicomplexa, so it is possible that metabolic pathways differ between species due to these modified enzymes. My metabolic networks serve as a starting point for performing additional experiments (e.g. cellular localization, gene knockouts, crystal structure determination) which can then be integrated with current datasets (e.g. phylogenetic analyses, expression, FBA, SNP analyses), that taken together will provide further key insights into how these pathways are organized and operate.

5.2.3 Experimental validation of the pantothenate biosynthesis pathway

Metabolic reconstructions have revealed that Toxoplasma gondii along with other members of the Apicomplexa encode the genes involved in the biosynthesis of pantothenate. Since pantothenate cannot be synthesized by humans, this pathway is an attractive drug target. Using T. gondii as a model organism, my preliminary biochemical assays for panB and panC activity were inconclusive, but sequencing experiments suggest the presence of a potentially bifunctional gene encoding enzyme activities for panB and panE; this latter result is particularly interesting since previously characterized bifunctional enzymes in Toxoplasma have had much success as novel drug targets (Fox and Bzik, 2003; Hortua Triana, et al., 2012; Ling, et al., 2007; Pashley, et al., 1997). Importantly, however, the functionality of these genes must be experimentally confirmed before making any further conclusions about the viability of these enzymes as drug targets. Towards this end, in collaboration with members of the Grigg laboratory, experiments are continuing for the validation of the bifunctional panBE gene along with functional complementation and/or gene knockdown experiments to assess the essentiality of pantothenate biosynthesis enzymes.

139

References

Abrahamsen, M.S., Templeton, T.J., Enomoto, S., Abrahante, J.E., Zhu, G., Lancto, C.A., Deng, M., Liu, C., Widmer, G., Tzipori, S., Buck, G.A., Xu, P., Bankier, A.T., Dear, P.H., Konfortov, B.A., Spriggs, H.F., Iyer, L., Anantharaman, V., Aravind, L. and Kapur, V. (2004) Complete genome sequence of the apicomplexan, Cryptosporidium parvum, Science, 304, 441-445. Albert, R., Jeong, H. and Barabasi, A.L. (2000) Error and attack tolerance of complex networks, Nature, 406, 378-382. Altenhoff, A.M. and Dessimoz, C. (2009) Phylogenetic and functional assessment of orthologs inference projects and methods, PLoS Comput Biol, 5, e1000262. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. (1990) Basic local alignment search tool, J Mol Biol, 215, 403-410. Anantharaman, V., Iyer, L.M. and Aravind, L. (2007) Comparative genomics of : new insights into the evolution of eukaryotic signal transduction and gene regulation, Annu Rev Microbiol, 61, 453-475. Arakaki, A.K., Huang, Y. and Skolnick, J. (2009) EFICAz2: enzyme function inference by a combined approach enhanced by machine learning, BMC Bioinformatics, 10, 107. Aurrecoechea, C., Brestelli, J., Brunk, B.P., Dommer, J., Fischer, S., Gajria, B., Gao, X., Gingle, A., Grant, G., Harb, O.S., Heiges, M., Innamorato, F., Iodice, J., Kissinger, J.C., Kraemer, E., Li, W., Miller, J.A., Nayak, V., Pennington, C., Pinney, D.F., Roos, D.S., Ross, C., Stoeckert, C.J., Jr., Treatman, C. and Wang, H. (2009) PlasmoDB: a functional genomic database for malaria parasites, Nucleic Acids Res, 37, D539-543. Aurrecoechea, C., Brestelli, J., Brunk, B.P., Fischer, S., Gajria, B., Gao, X., Gingle, A., Grant, G., Harb, O.S., Heiges, M., Innamorato, F., Iodice, J., Kissinger, J.C., Kraemer, E.T., Li, W., Miller, J.A., Nayak, V., Pennington, C., Pinney, D.F., Roos, D.S., Ross, C., Srinivasamoorthy, G., Stoeckert, C.J., Jr., Thibodeau, R., Treatman, C. and Wang, H. (2009) EuPathDB: a portal to eukaryotic pathogen databases, Nucleic Acids Res, 38, D415-419. Aurrecoechea, C., Heiges, M., Wang, H., Wang, Z., Fischer, S., Rhodes, P., Miller, J., Kraemer, E., Stoeckert, C.J., Jr., Roos, D.S. and Kissinger, J.C. (2007) ApiDB: integrated resources for the apicomplexan bioinformatics resource center, Nucleic Acids Res, 35, D427-430. Bahl, A., Brunk, B., Crabtree, J., Fraunholz, M.J., Gajria, B., Grant, G.R., Ginsburg, H., Gupta, D., Kissinger, J.C., Labo, P., Li, L., Mailman, M.D., Milgram, A.J., Pearson, D.S., Roos, D.S., Schug, J., Stoeckert, C.J. and Whetzel, P. (2003) PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data, Nucleic Acids Res, 31, 212-215. Bahl, A., Davis, P.H., Behnke, M., Dzierszinski, F., Jagalur, M., Chen, F., Shanmugam, D., White, M.W., Kulp, D. and Roos, D.S. (2010) A novel multifunctional oligonucleotide microarray for Toxoplasma gondii, BMC Genomics, 11, 603. Bairoch, A. (1994) The ENZYME data bank, Nucleic Acids Res, 22, 3626-3627. Baker, D.A. and Kelly, J.M. (2004) Purine nucleotide cyclases in the malaria parasite, Trends Parasitol, 20, 227-232. Bartfai, R., Hoeijmakers, W.A., Salcedo-Amaya, A.M., Smits, A.H., Janssen-Megens, E., Kaan, A., Treeck, M., Gilberger, T.W., Francoijs, K.J. and Stunnenberg, H.G. (2010) H2A.Z demarcates intergenic regions of the plasmodium falciparum epigenome that are dynamically marked by H3K9ac and H3K4me3, PLoS Pathog, 6, e1001223. 140

Barthelmes, J., Ebeling, C., Chang, A., Schomburg, I. and Schomburg, D. (2007) BRENDA, AMENDA and FRENDA: the enzyme information system in 2007, Nucleic Acids Res, 35, D511- 514. Bartoloni, A. and Zammarchi, L. (2012) Clinical aspects of uncomplicated and severe malaria, Mediterr J Hematol Infect Dis, 4, e2012026. Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M. and Sonnhammer, E.L. (2002) The Pfam protein families database, Nucleic Acids Res, 30, 276-280. Battke, F. and Nieselt, K. (2011) Mayday SeaSight: combined analysis of deep sequencing and microarray data, PLoS One, 6, e16345. Baum, J., Papenfuss, A.T., Baum, B., Speed, T.P. and Cowman, A.F. (2006) Regulation of apicomplexan actin-based motility, Nat Rev Microbiol, 4, 621-628. Baunaure, F., Eldin, P., Cathiard, A.M. and Vial, H. (2004) Characterization of a non- mitochondrial type I phosphatidylserine decarboxylase in Plasmodium falciparum, Mol Microbiol, 51, 33-46. Berriman, M., Haas, B.J., LoVerde, P.T., Wilson, R.A., Dillon, G.P., Cerqueira, G.C., Mashiyama, S.T., Al-Lazikani, B., Andrade, L.F., Ashton, P.D., Aslett, M.A., Bartholomeu, D.C., Blandin, G., Caffrey, C.R., Coghlan, A., Coulson, R., Day, T.A., Delcher, A., DeMarco, R., Djikeng, A., Eyre, T., Gamble, J.A., Ghedin, E., Gu, Y., Hertz-Fowler, C., Hirai, H., Hirai, Y., Houston, R., Ivens, A., Johnston, D.A., Lacerda, D., Macedo, C.D., McVeigh, P., Ning, Z., Oliveira, G., Overington, J.P., Parkhill, J., Pertea, M., Pierce, R.J., Protasio, A.V., Quail, M.A., Rajandream, M.A., Rogers, J., Sajid, M., Salzberg, S.L., Stanke, M., Tivey, A.R., White, O., Williams, D.L., Wortman, J., Wu, W., Zamanian, M., Zerlotini, A., Fraser-Liggett, C.M., Barrell, B.G. and El-Sayed, N.M. (2009) The genome of the blood fluke Schistosoma mansoni, Nature, 460, 352-358. Bhaumik, S. (2013) Malaria funds drying up: World Malaria Report 2012, Natl Med J India, 26, 62. Billiouw, M., Vercruysse, J., Marcotty, T., Speybroeck, N., Chaka, G. and Berkvens, D. (2002) Theileria parva epidemics: a case study in eastern Zambia, Vet Parasitol, 107, 51-63. Bishop, R., Shah, T., Pelle, R., Hoyle, D., Pearson, T., Haines, L., Brass, A., Hulme, H., Graham, S.P., Taracha, E.L., Kanga, S., Lu, C., Hass, B., Wortman, J., White, O., Gardner, M.J., Nene, V. and de Villiers, E.P. (2005) Analysis of the transcriptome of the protozoan Theileria parva using MPSS reveals that the majority of genes are transcriptionally active in the schizont stage, Nucleic Acids Res, 33, 5503-5511. Blackman, M.J. and Bannister, L.H. (2001) Apical organelles of Apicomplexa: biology and isolation by subcellular fractionation, Mol Biochem Parasitol, 117, 11-25. Boeckmann, B., Blatter, M.C., Famiglietti, L., Hinz, U., Lane, L., Roechert, B. and Bairoch, A. (2005) Protein variety and functional diversity: Swiss-Prot annotation in its biological context, C R Biol, 328, 882-899. Boghigian, B.A., Shi, H., Lee, K. and Pfeifer, B.A. (2010) Utilizing elementary mode analysis, pathway thermodynamics, and a genetic algorithm for metabolic flux determination and optimal metabolic network design, BMC Syst Biol, 4, 49. Boguski, M.S., Lowe, T.M. and Tolstoshev, C.M. (1993) dbEST--database for "expressed sequence tags", Nat Genet, 4, 332-333. Borrmann, S., Issifou, S., Esser, G., Adegnika, A.A., Ramharter, M., Matsiegui, P.B., Oyakhirome, S., Mawili-Mboumba, D.P., Missinou, M.A., Kun, J.F., Jomaa, H. and Kremsner, 141

P.G. (2004) Fosmidomycin-clindamycin for the treatment of Plasmodium falciparum malaria, J Infect Dis, 190, 1534-1540. Bozdech, Z., Llinas, M., Pulliam, B.L., Wong, E.D., Zhu, J. and DeRisi, J.L. (2003) The transcriptome of the intraerythrocytic developmental cycle of Plasmodium falciparum, PLoS Biol, 1, E5. Bozdech, Z., Mok, S., Hu, G., Imwong, M., Jaidee, A., Russell, B., Ginsburg, H., Nosten, F., Day, N.P., White, N.J., Carlton, J.M. and Preiser, P.R. (2008) The transcriptome of Plasmodium vivax reveals divergence and diversity of transcriptional regulation in malaria parasites, Proc Natl Acad Sci U S A, 105, 16290-16295. Brayton, K.A., Lau, A.O., Herndon, D.R., Hannick, L., Kappmeyer, L.S., Berens, S.J., Bidwell, S.L., Brown, W.C., Crabtree, J., Fadrosh, D., Feldblum, T., Forberger, H.A., Haas, B.J., Howell, J.M., Khouri, H., Koo, H., Mann, D.J., Norimine, J., Paulsen, I.T., Radune, D., Ren, Q., Smith, R.K., Jr., Suarez, C.E., White, O., Wortman, J.R., Knowles, D.P., Jr., McElwain, T.F. and Nene, V.M. (2007) Genome sequence of Babesia bovis and comparative analysis of apicomplexan hemoprotozoa, PLoS Pathog, 3, 1401-1413. Breman, J.G. (2001) The ears of the hippopotamus: manifestations, determinants, and estimates of the malaria burden, Am J Trop Med Hyg, 64, 1-11. Brown, K.M., Costanzo, M.S., Xu, W., Roy, S., Lozovsky, E.R. and Hartl, D.L. (2010) Compensatory mutations restore fitness during the evolution of dihydrofolate reductase, Mol Biol Evol, 27, 2682-2690. Budke, C.M., White, A.C., Jr. and Garcia, H.H. (2009) Zoonotic larval cestode infections: neglected, neglected tropical diseases?, PLoS Negl Trop Dis, 3, e319. Burgard, A.P., Pharkya, P. and Maranas, C.D. (2003) Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization, Biotechnol Bioeng, 84, 647-657. Burke, W.F., Gracy, R.W. and Harris, B.G. (1972) Studies on enzymes from parasitic helminths. 3. Purification and properties of lactate dehydrogenase from the tapeworm, Hymenolepis diminuta, Comp Biochem Physiol B, 43, 345-359. Bzik, D.J., Li, W.B., Horii, T. and Inselburg, J. (1987) Molecular cloning and sequence analysis of the Plasmodium falciparum dihydrofolate reductase-thymidylate synthase gene, Proc Natl Acad Sci U S A, 84, 8360-8364. Carlton, J.M., Adams, J.H., Silva, J.C., Bidwell, S.L., Lorenzi, H., Caler, E., Crabtree, J., Angiuoli, S.V., Merino, E.F., Amedeo, P., Cheng, Q., Coulson, R.M., Crabb, B.S., Del Portillo, H.A., Essien, K., Feldblyum, T.V., Fernandez-Becerra, C., Gilson, P.R., Gueye, A.H., Guo, X., Kang'a, S., Kooij, T.W., Korsinczky, M., Meyer, E.V., Nene, V., Paulsen, I., White, O., Ralph, S.A., Ren, Q., Sargeant, T.J., Salzberg, S.L., Stoeckert, C.J., Sullivan, S.A., Yamamoto, M.M., Hoffman, S.L., Wortman, J.R., Gardner, M.J., Galinski, M.R., Barnwell, J.W. and Fraser- Liggett, C.M. (2008) Comparative genomics of the neglected human malaria parasite Plasmodium vivax, Nature, 455, 757-763. Carlton, J.M., Angiuoli, S.V., Suh, B.B., Kooij, T.W., Pertea, M., Silva, J.C., Ermolaeva, M.D., Allen, J.E., Selengut, J.D., Koo, H.L., Peterson, J.D., Pop, M., Kosack, D.S., Shumway, M.F., Bidwell, S.L., Shallom, S.J., van Aken, S.E., Riedmuller, S.B., Feldblyum, T.V., Cho, J.K., Quackenbush, J., Sedegah, M., Shoaibi, A., Cummings, L.M., Florens, L., Yates, J.R., Raine, J.D., Sinden, R.E., Harris, M.A., Cunningham, D.A., Preiser, P.R., Bergman, L.W., Vaidya, A.B., van Lin, L.H., Janse, C.J., Waters, A.P., Smith, H.O., White, O.R., Salzberg, S.L., Venter, J.C., Fraser, C.M., Hoffman, S.L., Gardner, M.J. and Carucci, D.J. (2002) Genome sequence and 142 comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii, Nature, 419, 512-519. Carlton, J.M., Das, A. and Escalante, A.A. (2013) Genomics, population genetics and evolutionary history of Plasmodium vivax, Adv Parasitol, 81, 203-222. Carreno, R.A., Martin, D.S. and Barta, J.R. (1999) Cryptosporidium is more closely related to the gregarines than to coccidia as shown by phylogenetic analysis of apicomplexan parasites inferred using small-subunit ribosomal RNA gene sequences, Parasitol Res, 85, 899-904. Carucci, D.J., Witney, A.A., Muhia, D.K., Warhurst, D.C., Schaap, P., Meima, M., Li, J.L., Taylor, M.C., Kelly, J.M. and Baker, D.A. (2000) Guanylyl cyclase activity associated with putative bifunctional integral membrane proteins in Plasmodium falciparum, J Biol Chem, 275, 22147-22156. Caspi, R., Altman, T., Dale, J.M., Dreher, K., Fulcher, C.A., Gilham, F., Kaipa, P., Karthikeyan, A.S., Kothari, A., Krummenacker, M., Latendresse, M., Mueller, L.A., Paley, S., Popescu, L., Pujar, A., Shearer, A.G., Zhang, P. and Karp, P.D. (2010) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases, Nucleic Acids Res, 38, D473-479. Caspi, R., Foerster, H., Fulcher, C.A., Hopkinson, R., Ingraham, J., Kaipa, P., Krummenacker, M., Paley, S., Pick, J., Rhee, S.Y., Tissier, C., Zhang, P. and Karp, P.D. (2006) MetaCyc: a multiorganism database of metabolic pathways and enzymes, Nucleic Acids Res, 34, D511-516. Caspi, R., Foerster, H., Fulcher, C.A., Kaipa, P., Krummenacker, M., Latendresse, M., Paley, S., Rhee, S.Y., Shearer, A.G., Tissier, C., Walk, T.C., Zhang, P. and Karp, P.D. (2007) The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases, Nucleic Acids Res. Cavalier-Smith, T. (1993) protozoa and its 18 phyla, Microbiol Rev, 57, 953-994. Chan, M. and Sim, T.S. (2004) Functional characterization of an alternative [lactate dehydrogenase-like] malate dehydrogenase in Plasmodium falciparum, Parasitol Res, 92, 43-47. Chang, A., Scheer, M., Grote, A., Schomburg, I. and Schomburg, D. (2009) BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009, Nucleic Acids Res, 37, D588-592. Chaudhary, K. and Roos, D.S. (2005) Protozoan genomics for drug discovery, Nat Biotechnol, 23, 1089-1091. Chaudhary, K., Ting, L.M., Kim, K. and Roos, D.S. (2006) Toxoplasma gondii purine nucleoside phosphorylase biochemical characterization, inhibitor profiles, and comparison with the Plasmodium falciparum ortholog, J Biol Chem, 281, 25652-25658. Chavalitshewinkoon, P., Wilairat, P., Gamage, S., Denny, W., Figgitt, D. and Ralph, R. (1993) Structure-activity relationships and modes of action of 9-anilinoacridines against chloroquine- resistant Plasmodium falciparum in vitro, Antimicrob Agents Chemother, 37, 403-406. Cheesman, S., McAleese, S., Goman, M., Johnson, D., Horrocks, P., Ridley, R.G. and Kilbey, B.J. (1994) The gene encoding topoisomerase II from Plasmodium falciparum, Nucleic Acids Res, 22, 2547-2551. Chen, F., Mackey, A.J., Vermunt, J.K. and Roos, D.S. (2007) Assessing performance of orthology detection strategies applied to eukaryotic genomes, PLoS One, 2, e383. Chen, L. and Vitkup, D. (2006) Predicting genes for orphan metabolic activities using phylogenetic profiles, Genome Biol, 7, R17. Christie, K.R., Weng, S., Balakrishnan, R., Costanzo, M.C., Dolinski, K., Dwight, S.S., Engel, S.R., Feierbach, B., Fisk, D.G., Hirschman, J.E., Hong, E.L., Issel-Tarver, L., Nash, R., 143

Sethuraman, A., Starr, B., Theesfeld, C.L., Andrada, R., Binkley, G., Dong, Q., Lane, C., Schroeder, M., Botstein, D. and Cherry, J.M. (2004) Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms, Nucleic Acids Res, 32, D311-314. Claudel-Renard, C., Chevalet, C., Faraut, T. and Kahn, D. (2003) Enzyme-specific profiles for genome annotation: PRIAM, Nucleic Acids Res, 31, 6633-6639. Claudel-Renard, C., Chevalet, C., Faraut, T. and Kahn, D. (2003) Enzyme-specific profiles for genome annotation: PRIAM, Nucleic Acids Res, 31, 6633-6639. Cohen, P. (2000) The regulation of protein function by multisite phosphorylation--a 25 year update, Trends Biochem Sci, 25, 596-601. Conrad, P.A., Miller, M.A., Kreuder, C., James, E.R., Mazet, J., Dabritz, H., Jessup, D.A., Gulland, F. and Grigg, M.E. (2005) Transmission of Toxoplasma: clues from the study of sea otters as sentinels of Toxoplasma gondii flow into the marine environment, Int J Parasitol, 35, 1155-1168. Cook, T., Roos, D., Morada, M., Zhu, G., Keithly, J.S., Feagin, J.E., Wu, G. and Yarlett, N. (2007) Divergent polyamine metabolism in the Apicomplexa, Microbiology, 153, 1123-1130. Cornillot, E., Hadj-Kaddour, K., Dassouli, A., Noel, B., Ranwez, V., Vacherie, B., Augagneur, Y., Bres, V., Duclos, A., Randazzo, S., Carcy, B., Debierre-Grockiego, F., Delbecq, S., Moubri- Menage, K., Shams-Eldin, H., Usmani-Brown, S., Bringaud, F., Wincker, P., Vivares, C.P., Schwarz, R.T., Schetters, T.P., Krause, P.J., Gorenflot, A., Berry, V., Barbe, V. and Ben Mamoun, C. (2012) Sequencing of the smallest Apicomplexan genome from the human pathogen Babesia microti, Nucleic Acids Res, 40, 9102-9114. Coulson, R.M., Hall, N. and Ouzounis, C.A. (2004) Comparative genomics of transcriptional control in the human malaria parasite Plasmodium falciparum, Genome Res, 14, 1548-1554. Cowman, A.F. and Crabb, B.S. (2003) Functional genomics: identifying drug targets for parasitic diseases, Trends Parasitol, 19, 538-543. Cowman, A.F., Morry, M.J., Biggs, B.A., Cross, G.A. and Foote, S.J. (1988) Amino acid changes linked to pyrimethamine resistance in the dihydrofolate reductase-thymidylate synthase gene of Plasmodium falciparum, Proc Natl Acad Sci U S A, 85, 9109-9113. Crowther, G.J., Napuli, A.J., Gilligan, J.H., Gagaring, K., Borboa, R., Francek, C., Chen, Z., Dagostino, E.F., Stockmyer, J.B., Wang, Y., Rodenbough, P.P., Castaneda, L.J., Leibly, D.J., Bhandari, J., Gelb, M.H., Brinker, A., Engels, I.H., Taylor, J., Chatterjee, A.K., Fantauzzi, P., Glynne, R.J., Van Voorhis, W.C. and Kuhen, K.L. (2011) Identification of inhibitors for putative malaria drug targets among novel antimalarial compounds, Mol Biochem Parasitol, 175, 21-29. Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T., McVean, G. and Durbin, R. (2011) The variant call format and VCFtools, Bioinformatics, 27, 2156-2158. de Koning, A.P., Brinkman, F.S., Jones, S.J. and Keeling, P.J. (2000) Lateral gene transfer and metabolic adaptation in the human parasite Trichomonas vaginalis, Mol Biol Evol, 17, 1769- 1773. De Lorenzo, G. and Ferrari, S. (2002) Polygalacturonase-inhibiting proteins in defense against phytopathogenic fungi, Curr Opin Plant Biol, 5, 295-299. Deitsch, K., Duraisingh, M., Dzikowski, R., Gunasekera, A., Khan, S., Le Roch, K., Llinas, M., Mair, G., McGovern, V., Roos, D., Shock, J., Sims, J., Wiegand, R. and Winzeler, E. (2007) Mechanisms of gene regulation in Plasmodium, Am J Trop Med Hyg, 77, 201-208. 144

Delport, W., Poon, A.F., Frost, S.D. and Kosakovsky Pond, S.L. (2010) Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology, Bioinformatics, 26, 2455-2457. Deutscher, D., Meilijson, I., Kupiec, M. and Ruppin, E. (2006) Multiple knockout analysis of genetic robustness in the yeast metabolic network, Nat Genet, 38, 993-998. Donald, R.G., Carter, D., Ullman, B. and Roos, D.S. (1996) Insertional tagging, cloning, and expression of the Toxoplasma gondii hypoxanthine-xanthine-guanine phosphoribosyltransferase gene. Use as a selectable marker for stable transformation, J Biol Chem, 271, 14010-14019. Donald, R.G. and Roos, D.S. (1993) Stable molecular transformation of Toxoplasma gondii: a selectable dihydrofolate reductase-thymidylate synthase marker based on drug-resistance mutations in malaria, Proc Natl Acad Sci U S A, 90, 11703-11707. Dubey, J.P. and Frenkel, J.K. (1973) Experimental toxoplasma infection in mice with strains producing oocysts, J Parasitol, 59, 505-512. Dubey, J.P., Lindsay, D.S. and Speer, C.A. (1998) Structures of Toxoplasma gondii tachyzoites, bradyzoites, and sporozoites and biology and development of tissue cysts, Clin Microbiol Rev, 11, 267-299. Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res, 32, 1792-1797. Edwards, J.S., Covert, M. and Palsson, B. (2002) Metabolic modelling of microbes: the flux- balance approach, Environ Microbiol, 4, 133-140. Elabbadi, N., Ancelin, M.L. and Vial, H.J. (1997) Phospholipid metabolism of serine in Plasmodium-infected erythrocytes involves phosphatidylserine and direct serine decarboxylation, Biochem J, 324 ( Pt 2), 435-445. Escalante, A.A., Smith, D.L. and Kim, Y. (2009) The dynamics of mutations associated with anti-malarial drug resistance in Plasmodium falciparum, Trends Parasitol, 25, 557-563. Espadaler, J., Eswar, N., Querol, E., Aviles, F.X., Sali, A., Marti-Renom, M.A. and Oliva, B. (2008) Prediction of enzyme function by combining sequence similarity and protein interactions, BMC Bioinformatics, 9, 249. Esposito, M., Stettler, R., Moores, S.L., Pidathala, C., Muller, N., Stachulski, A., Berry, N.G., Rossignol, J.F. and Hemphill, A. (2005) In vitro efficacies of nitazoxanide and other thiazolides against Neospora caninum tachyzoites reveal antiparasitic activity independent of the nitro group, Antimicrob Agents Chemother, 49, 3715-3723. Fast, N.M., Xue, L., Bingham, S. and Keeling, P.J. (2002) Re-examining evolution using multiple protein molecular phylogenies, J Eukaryot Microbiol, 49, 30-37. Fatumo, S., Plaimas, K., Mallm, J.P., Schramm, G., Adebiyi, E., Oswald, M., Eils, R. and Konig, R. (2009) Estimating novel potential drug targets of Plasmodium falciparum by analysing the metabolic network of knock-out strains in silico, Infect Genet Evol, 9, 351-358. Feist, A.M. and Palsson, B.O. (2008) The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli, Nat Biotechnol, 26, 659-667. Feist, A.M., Zielinski, D.C., Orth, J.D., Schellenberger, J., Herrgard, M.J. and Palsson, B.O. (2009) Model-driven evaluation of the production potential for growth-coupled products of Escherichia coli, Metab Eng, 12, 173-186. Fichera, M.E. and Roos, D.S. (1997) A plastid organelle as a drug target in apicomplexan parasites, Nature, 390, 407-409. Fioravanti, C.F. (1982) Mitochondrial malate dehydrogenase, decarboxylating ("malic" enzyme) and transhydrogenase activities of adult Hymenolepis microstoma (Cestoda), J Parasitol, 68, 213-220. 145

Fischer, S., Brunk, B.P., Chen, F., Gao, X., Harb, O.S., Iodice, J.B., Shanmugam, D., Roos, D.S. and Stoeckert, C.J., Jr. (2011) Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups, Curr Protoc Bioinformatics, Chapter 6, Unit 6 12 11-19. Flegr, J. (2007) Effects of toxoplasma on human behavior, Schizophr Bull, 33, 757-760. Flores, M.V., O'Sullivan, W.J. and Stewart, T.S. (1994) Characterisation of the carbamoyl phosphate synthetase gene from Plasmodium falciparum, Mol Biochem Parasitol, 68, 315-318. Forst, C.V. (2006) Host-pathogen systems biology, Drug Discov Today, 11, 220-227. Foth, B.J. and McFadden, G.I. (2003) The apicoplast: a plastid in Plasmodium falciparum and other Apicomplexan parasites, Int Rev Cytol, 224, 57-110. Fox, B.A. and Bzik, D.J. (2003) Organisation and sequence determination of glutamine- dependent carbamoyl phosphate synthetase II in Toxoplasma gondii, Int J Parasitol, 33, 89-96. Fox, B.A., Ristuccia, J.G., Gigley, J.P. and Bzik, D.J. (2009) Efficient gene replacements in Toxoplasma gondii strains deficient for nonhomologous end joining, Eukaryot Cell, 8, 520-529. Gagne, P., Yang, D.Q., Hamelin, R.C. and Bernier, L. (2001) Genetic Variability of Canadian Populations of the Sapstain Fungus Ophiostoma piceae, Phytopathology, 91, 369-376. Gamage, S.A., Tepsiri, N., Wilairat, P., Wojcik, S.J., Figgitt, D.P., Ralph, R.K. and Denny, W.A. (1994) Synthesis and in vitro evaluation of 9-anilino-3,6-diaminoacridines active against a multidrug-resistant strain of the malaria parasite Plasmodium falciparum, J Med Chem, 37, 1486- 1494. Ganesan, K., Ponmee, N., Jiang, L., Fowble, J.W., White, J., Kamchonwongpaisan, S., Yuthavong, Y., Wilairat, P. and Rathod, P.K. (2008) A genetically hard-wired metabolic transcriptome in Plasmodium falciparum fails to mount protective responses to lethal antifolates, PLoS Pathog, 4, e1000214. Garcia, H.H., Moro, P.L. and Schantz, P.M. (2007) Zoonotic helminth infections of humans: echinococcosis, cysticercosis and fascioliasis, Curr Opin Infect Dis, 20, 489-494. Gardner, M.J., Bishop, R., Shah, T., de Villiers, E.P., Carlton, J.M., Hall, N., Ren, Q., Paulsen, I.T., Pain, A., Berriman, M., Wilson, R.J., Sato, S., Ralph, S.A., Mann, D.J., Xiong, Z., Shallom, S.J., Weidman, J., Jiang, L., Lynn, J., Weaver, B., Shoaibi, A., Domingo, A.R., Wasawo, D., Crabtree, J., Wortman, J.R., Haas, B., Angiuoli, S.V., Creasy, T.H., Lu, C., Suh, B., Silva, J.C., Utterback, T.R., Feldblyum, T.V., Pertea, M., Allen, J., Nierman, W.C., Taracha, E.L., Salzberg, S.L., White, O.R., Fitzhugh, H.A., Morzaria, S., Venter, J.C., Fraser, C.M. and Nene, V. (2005) Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes, Science, 309, 134-137. Gardner, M.J., Hall, N., Fung, E., White, O., Berriman, M., Hyman, R.W., Carlton, J.M., Pain, A., Nelson, K.E., Bowman, S., Paulsen, I.T., James, K., Eisen, J.A., Rutherford, K., Salzberg, S.L., Craig, A., Kyes, S., Chan, M.S., Nene, V., Shallom, S.J., Suh, B., Peterson, J., Angiuoli, S., Pertea, M., Allen, J., Selengut, J., Haft, D., Mather, M.W., Vaidya, A.B., Martin, D.M., Fairlamb, A.H., Fraunholz, M.J., Roos, D.S., Ralph, S.A., McFadden, G.I., Cummings, L.M., Subramanian, G.M., Mungall, C., Venter, J.C., Carucci, D.J., Hoffman, S.L., Newbold, C., Davis, R.W., Fraser, C.M. and Barrell, B. (2002) Genome sequence of the human malaria parasite Plasmodium falciparum, Nature, 419, 498-511. Garg, S., Yang, L. and Mahadevan, R. (2010) Thermodynamic analysis of regulation in metabolic networks using constraint-based modeling, BMC Res Notes, 3, 125. 146

Gengenbacher, M., Fitzpatrick, T.B., Raschle, T., Flicker, K., Sinning, I., Muller, S., Macheroux, P., Tews, I. and Kappes, B. (2006) Vitamin B6 biosynthesis by the malaria parasite Plasmodium falciparum: biochemical and structural insights, J Biol Chem, 281, 3633-3641. Gilbert, L.A., Ravindran, S., Turetzky, J.M., Boothroyd, J.C. and Bradley, P.J. (2007) Toxoplasma gondii targets a protein phosphatase 2C to the nuclei of infected host cells, Eukaryot Cell, 6, 73-83. Ginsburg, H. (2006) Progress in in silico functional genomics: the malaria Metabolic Pathways database, Trends in parasitology, 22, 238-240. Ginsburg, H. (2006) Progress in in silico functional genomics: the malaria Metabolic Pathways database, Trends Parasitol, 22, 238-240. Ginsburg, H. (2009) Caveat emptor: limitations of the automated reconstruction of metabolic pathways in Plasmodium, Trends Parasitol, 25, 37-43. Goldstein, J.L. and Brown, M.S. (1990) Regulation of the mevalonate pathway, Nature, 343, 425-430. Gomez, M.S., Piper, R.C., Hunsaker, L.A., Royer, R.E., Deck, L.M., Makler, M.T. and Vander Jagt, D.L. (1997) Substrate and cofactor specificity and selective inhibition of lactate dehydrogenase from the malarial parasite P. falciparum, Mol Biochem Parasitol, 90, 235-246. Gould, S.B., Tham, W.H., Cowman, A.F., McFadden, G.I. and Waller, R.F. (2008) Alveolins, a new family of cortical proteins that define the infrakingdom Alveolata, Mol Biol Evol, 25, 1219-1230. Goward, C.R. and Nicholls, D.J. (1994) Malate dehydrogenase: a model for structure, evolution, and catalysis, Protein Sci, 3, 1883-1888. Green, M.L. and Karp, P.D. (2004) A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases, BMC Bioinformatics, 5, 76. Green, M.L. and Karp, P.D. (2005) Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers, Nucleic Acids Res, 33, 4035-4039. Green, M.L. and Karp, P.D. (2006) The outcomes of pathway database computations depend on pathway ontology, Nucleic Acids Res, 34, 3687-3697. Greenwood, B. and Mutabingwa, T. (2002) Malaria in 2002, Nature, 415, 670-672. Greenwood, B.M., Bojang, K., Whitty, C.J. and Targett, G.A. (2005) Malaria, Lancet, 365, 1487-1498. Grigg, M.E., Bonnefoy, S., Hehl, A.B., Suzuki, Y. and Boothroyd, J.C. (2001) Success and virulence in Toxoplasma as the result of sexual recombination between two distinct ancestries, Science, 294, 161-165. Gunasekera, A.M., Myrick, A., Le Roch, K., Winzeler, E. and Wirth, D.F. (2007) Plasmodium falciparum: genome wide perturbations in transcript profiles among mixed stage cultures after chloroquine treatment, Exp Parasitol, 117, 87-92. Hall, N., Karras, M., Raine, J.D., Carlton, J.M., Kooij, T.W., Berriman, M., Florens, L., Janssen, C.S., Pain, A., Christophides, G.K., James, K., Rutherford, K., Harris, B., Harris, D., Churcher, C., Quail, M.A., Ormond, D., Doggett, J., Trueman, H.E., Mendoza, J., Bidwell, S.L., Rajandream, M.A., Carucci, D.J., Yates, J.R., 3rd, Kafatos, F.C., Janse, C.J., Barrell, B., Turner, C.M., Waters, A.P. and Sinden, R.E. (2005) A comprehensive survey of the Plasmodium life cycle by genomic, transcriptomic, and proteomic analyses, Science, 307, 82-86. Handa, S., Ramamoorthya, D., Spradlinga, T.J., Guidaa, W.C., Adamsb, J.H., Bendinskasc, K.G. and Merklera, D.J. (2013) Production of recombinant 1-deoxy-d-xylulose-5-phosphate synthase from Plasmodium vivax in Escherichia coli, FEBS Open Bio, 3, 124-129. 147

Hempelmann, E. (2007) Hemozoin biocrystallization in Plasmodium falciparum and the antimalarial activity of crystallization inhibitors, Parasitol Res, 100, 671-676. Henry, C.S., Broadbelt, L.J. and Hatzimanikatis, V. (2007) Thermodynamics-based metabolic flux analysis, Biophys J, 92, 1792-1805. Henry, C.S., Jankowski, M.D., Broadbelt, L.J. and Hatzimanikatis, V. (2006) Genome-scale thermodynamic analysis of Escherichia coli metabolism, Biophys J, 90, 1453-1461. Herrgard, M.J., Swainston, N., Dobson, P., Dunn, W.B., Arga, K.Y., Arvas, M., Bluthgen, N., Borger, S., Costenoble, R., Heinemann, M., Hucka, M., Le Novere, N., Li, P., Liebermeister, W., Mo, M.L., Oliveira, A.P., Petranovic, D., Pettifer, S., Simeonidis, E., Smallbone, K., Spasic, I., Weichart, D., Brent, R., Broomhead, D.S., Westerhoff, H.V., Kirdar, B., Penttila, M., Klipp, E., Palsson, B.O., Sauer, U., Oliver, S.G., Mendes, P., Nielsen, J. and Kell, D.B. (2008) A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology, Nat Biotechnol, 26, 1155-1160. Hintz, W., Pinchback, M., de la Bastide, P., Burgess, S., Jacobi, V., Hamelin, R., Breuil, C. and Bernier, L. (2011) Functional categorization of unique expressed sequence tags obtained from the yeast-like growth phase of the elm pathogen Ophiostoma novo-ulmi, BMC Genomics, 12, 431. Hiratsuka, T., Furihata, K., Ishikawa, J., Yamashita, H., Itoh, N., Seto, H. and Dairi, T. (2008) An alternative menaquinone biosynthetic pathway operating in microorganisms, Science, 321, 1670-1673. Hopkins, A.L. and Groom, C.R. (2002) The druggable genome, Nat Rev Drug Discov, 1, 727- 730. Hortua Triana, M.A., Huynh, M.H., Garavito, M.F., Fox, B.A., Bzik, D.J., Carruthers, V.B., Loffler, M. and Zimmermann, B.H. (2012) Biochemical and molecular characterization of the pyrimidine biosynthetic enzyme dihydroorotate dehydrogenase from Toxoplasma gondii, Mol Biochem Parasitol, 184, 71-81. Hoxie, N.J., Davis, J.P., Vergeront, J.M., Nashold, R.D. and Blair, K.A. (1997) Cryptosporidiosis-associated mortality following a massive waterborne outbreak in Milwaukee, Wisconsin, Am J Public Health, 87, 2032-2035. Hu, G., Cabrera, A., Kono, M., Mok, S., Chaal, B.K., Haase, S., Engelberg, K., Cheemadan, S., Spielmann, T., Preiser, P.R., Gilberger, T.W. and Bozdech, Z. (2010) Transcriptional profiling of growth perturbations of the human malaria parasite Plasmodium falciparum, Nat Biotechnol, 28, 91-98. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., Cuche, B.A., de Castro, E., Lachaize, C., Langendijk-Genevaux, P.S. and Sigrist, C.J. (2008) The 20 years of PROSITE, Nucleic Acids Res, 36, D245-249. Hulsen, T., Huynen, M.A., de Vlieg, J. and Groenen, P.M. (2006) Benchmarking ortholog identification methods using functional genomics data, Genome Biol, 7, R31. Hung, S.S. and Parkinson, J. (2011) Post-genomics resources and tools for studying apicomplexan metabolism, Trends Parasitol, 27, 131-140. Hung, S.S., Wasmuth, J., Sanford, C. and Parkinson, J. (2010) DETECT--a density estimation tool for enzyme classification and its application to Plasmodium falciparum, Bioinformatics, 26, 1690-1698. Huthmacher, C., Hoppe, A., Bulik, S. and Holzhutter, H.G. (2010) Antimalarial drug targets in Plasmodium falciparum predicted by stage-specific metabolic network analysis, BMC Syst Biol, 4, 120. 148

Huynh, M.H. and Carruthers, V.B. (2009) Tagging of endogenous genes in a Toxoplasma gondii strain lacking Ku80, Eukaryot Cell, 8, 530-539. Imming, P., Sinning, C. and Meyer, A. (2006) Drugs, their targets and the nature and number of drug targets, Nat Rev Drug Discov, 5, 821-834. Isshiki, A., Akimitsu, K., Yamamoto, M. and Yamamoto, H. (2001) Endopolygalacturonase is essential for citrus black rot caused by Alternaria citri but not brown spot caused by Alternaria alternata, Mol Plant Microbe Interact, 14, 749-757. Jankowski, M.D., Henry, C.S., Broadbelt, L.J. and Hatzimanikatis, V. (2008) Group Contribution Method for Thermodynamic Analysis of Complex Metabolic Networks, Biophys J, 95, 1487-1499. Janouskovec, J., Horak, A., Obornik, M., Lukes, J. and Keeling, P.J. (2010) A common red algal origin of the apicomplexan, , and plastids, Proc Natl Acad Sci U S A, 107, 10949-10954. Jeffares, D.C., Pain, A., Berry, A., Cox, A.V., Stalker, J., Ingle, C.E., Thomas, A., Quail, M.A., Siebenthall, K., Uhlemann, A.C., Kyes, S., Krishna, S., Newbold, C., Dermitzakis, E.T. and Berriman, M. (2007) Genome variation and evolution of the malaria parasite Plasmodium falciparum, Nat Genet, 39, 120-125. Jeffs, S.A. and Arme, C. (1987) Echinococcus granulosus: specificity of amino acid transport systems in protoscoleces, Parasitology, 95 ( Pt 1), 71-78. Jeong, H., Tombor, B., Albert, R., Oltvai, Z.N. and Barabasi, A.L. (2000) The large-scale organization of metabolic networks, Nature, 407, 651-654. Jol, S.J., Kummel, A., Terzer, M., Stelling, J. and Heinemann, M. (2012) System-level insights into yeast metabolism by thermodynamic analysis of elementary flux modes, PLoS Comput Biol, 8, e1002415. Jomaa, H., Wiesner, J., Sanderbrand, S., Altincicek, B., Weidemeyer, C., Hintz, M., Turbachova, I., Eberl, M., Zeidler, J., Lichtenthaler, H.K., Soldati, D. and Beck, E. (1999) Inhibitors of the nonmevalonate pathway of isoprenoid biosynthesis as antimalarial drugs, Science, 285, 1573- 1576. Juge, N. (2006) Plant protein inhibitors of cell wall degrading enzymes, Trends Plant Sci, 11, 359-367. Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T. and Yamanishi, Y. (2007) KEGG for linking genomes to life and the environment, Nucl. Acids Res., gkm882. Kanehisa, M., Goto, S., Furumichi, M., Tanabe, M. and Hirakawa, M. (2010) KEGG for representation and analysis of molecular networks involving diseases and drugs, Nucleic Acids Res, 38, D355-360. Kaplan, J.E., Jones, J.L. and Dykewicz, C.A. (2000) Protists as opportunistic pathogens: public health impact in the 1990s and beyond, J Eukaryot Microbiol, 47, 15-20. Karp, P.D., Paley, S.M., Krummenacker, M., Latendresse, M., Dale, J.M., Lee, T.J., Kaipa, P., Gilham, F., Spaulding, A., Popescu, L., Altman, T., Paulsen, I., Keseler, I.M. and Caspi, R. (2009) Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology, Brief Bioinform, 11, 40-79. Kauffman, K.J., Prakash, P. and Edwards, J.S. (2003) Advances in flux balance analysis, Curr Opin Biotechnol, 14, 491-496. Keeling, P.J. (2004) Diversity and evolutionary history of plastids and their hosts, Am J Bot, 91, 1481-1493. 149

Keeling, P.J. and Palmer, J.D. (2008) Horizontal gene transfer in eukaryotic evolution, Nat Rev Genet, 9, 605-618. Keithly, J.S., Langreth, S.G., Buttle, K.F. and Mannella, C.A. (2005) Electron tomographic and ultrastructural analysis of the Cryptosporidium parvum relict mitochondrion, its associated membranes, and organelles, J Eukaryot Microbiol, 52, 132-140. Khoshraftar, S., Hung, S., Khan, S., Gong, Y., Tyagi, V., Parkinson, J., Sain, M., Moses, A.M. and Christendat, D. (2013) Sequencing and annotation of the Ophiostoma ulmi genome, BMC Genomics, 14, 162. Kidgell, C., Volkman, S.K., Daily, J., Borevitz, J.O., Plouffe, D., Zhou, Y., Johnson, J.R., Le Roch, K., Sarr, O., Ndir, O., Mboup, S., Batalov, S., Wirth, D.F. and Winzeler, E.A. (2006) A systematic map of genetic variation in Plasmodium falciparum, PLoS Pathog, 2, e57. Kikuchi, G., Yoshida, T. and Noguchi, M. (2005) Heme oxygenase and heme degradation, Biochem Biophys Res Commun, 338, 558-567. Kim, K., Soldati, D. and Boothroyd, J.C. (1993) Gene replacement in Toxoplasma gondii with chloramphenicol acetyltransferase as selectable marker, Science, 262, 911-914. Kim, K. and Weiss, L.M. (2004) Toxoplasma gondii: the model apicomplexan, Int. J. Parasit., 34, 423-432. Knockel, J., Muller, I.B., Bergmann, B., Walter, R.D. and Wrenger, C. (2007) The apicomplexan parasite Toxoplasma gondii generates pyridoxal phosphate de novo, Mol Biochem Parasitol, 152, 108-111. Kohler, S., Delwiche, C.F., Denny, P.W., Tilney, L.G., Webster, P., Wilson, R.J., Palmer, J.D. and Roos, D.S. (1997) A plastid of probable green algal origin in Apicomplexan parasites, Science, 275, 1485-1489. Koonin, E.V. (2005) Orthologs, paralogs, and evolutionary genomics, Annu Rev Genet, 39, 309- 338. Korenromp, E.L., Williams, B.G., de Vlas, S.J., Gouws, E., Gilks, C.F., Ghys, P.D. and Nahlen, B.L. (2005) Malaria attributable to the HIV-1 epidemic, sub-Saharan Africa, Emerg Infect Dis, 11, 1410-1419. Krungkrai, J., Webster, H.K. and Yuthavong, Y. (1989) De novo and salvage biosynthesis of pteroylpentaglutamates in the human malaria parasite, Plasmodium falciparum, Mol Biochem Parasitol, 32, 25-37. Kumar, V.S. and Maranas, C.D. (2009) GrowMatch: an automated method for reconciling in silico/in vivo growth predictions, PLoS Comput Biol, 5, e1000308. Kummel, A., Panke, S. and Heinemann, M. (2006) Putative regulatory sites unraveled by network-embedded thermodynamic analysis of metabolome data, Mol Syst Biol, 2, 2006 0034. Kuo, C.H. and Kissinger, J.C. (2008) Consistent and contrasting properties of lineage-specific genes in the apicomplexan parasites Plasmodium and Theileria, BMC Evol Biol, 8, 108. Kuzniar, A., van Ham, R.C., Pongor, S. and Leunissen, J.A. (2008) The quest for orthologs: finding the corresponding gene across genomes, Trends Genet, 24, 539-551. Lawrence, J.G. (1999) Gene transfer, speciation, and the evolution of bacterial genomes, Curr Opin Microbiol, 2, 519-523. Le Roch, K.G., Johnson, J.R., Ahiboh, H., Chung, D.W., Prudhomme, J., Plouffe, D., Henson, K., Zhou, Y., Witola, W., Yates, J.R., Mamoun, C.B., Winzeler, E.A. and Vial, H. (2008) A systematic approach to understand the mechanism of action of the bisthiazolium compound T4 on the human malaria parasite, Plasmodium falciparum, BMC Genomics, 9, 513. 150

Leander, B.S., Clopton, R.E. and Keeling, P.J. (2003) Phylogeny of gregarines (Apicomplexa) as inferred from small-subunit rDNA and beta-tubulin, Int J Syst Evol Microbiol, 53, 345-354. Lee, J.M., Gianchandani, E.P. and Papin, J.A. (2006) Flux balance analysis in the era of metabolomics, Brief Bioinform, 7, 140-150. Lee, J.M. and Sonnhammer, E.L. (2003) Genomic gene clustering analysis of pathways in eukaryotes, Genome Res, 13, 875-882. Lee, S.M., Ender, M., Adhikari, R., Smith, J.M., Berger-Bachi, B. and Cook, G.M. (2007) Fitness cost of staphylococcal cassette chromosome mec in methicillin-resistant Staphylococcus aureus by way of continuous culture, Antimicrob Agents Chemother, 51, 1497-1499. Lell, B., Ruangweerayut, R., Wiesner, J., Missinou, M.A., Schindler, A., Baranek, T., Hintz, M., Hutchinson, D., Jomaa, H. and Kremsner, P.G. (2003) Fosmidomycin, a novel chemotherapeutic agent for malaria, Antimicrob Agents Chemother, 47, 735-738. Leontovich, A.M., Tokmachev, K.Y. and van Houwelingen, H.C. (2008) The comparative analysis of statistics, based on the likelihood ratio criterion, in the automated annotation problem, BMC Bioinformatics, 9, 31. Levine, N.D. (ed) (1973) The Apicomplexa and the coccidia proper. Protozoan parasites of domestic animals and man. Burgress Publishing Company, Minneapolis. Levy, E.D., Ouzounis, C.A., Gilks, W.R. and Audit, B. (2005) Probabilistic annotation of protein sequences based on functional classifications, BMC Bioinformatics, 6, 302. Li, H., Dai, X. and Zhao, X. (2008) A nearest neighbor approach for automated transporter prediction and categorization from protein sequences, Bioinformatics, 24, 1129-1136. Li, H. and Durbin, R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform, Bioinformatics, 25, 1754-1760. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R. (2009) The Sequence Alignment/Map format and SAMtools, Bioinformatics, 25, 2078-2079. Li, L., Stoeckert, C.J., Jr. and Roos, D.S. (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes, Genome Res, 13, 2178-2189. Li, W.-H. (1997) Molecular Evolution. Sunderland, Massachusetts. Liberator, P., Anderson, J., Feiglin, M., Sardana, M., Griffin, P., Schmatz, D. and Myers, R.W. (1998) Molecular cloning and functional expression of mannitol-1-phosphatase from the apicomplexan parasite Eimeria tenella, J Biol Chem, 273, 4237-4244. Liesenfeld, O. (1999) Immune responses to Toxoplasma gondii in the gut, Immunobiology, 201, 229-239. Lim, L.S., Tay, Y.L., Alias, H., Wan, K.L. and Dear, P.H. (2012) Insights into the genome structure and copy-number variation of Eimeria tenella, BMC Genomics, 13, 389. Lin, H.H., Han, L.Y., Cai, C.Z., Ji, Z.L. and Chen, Y.Z. (2006) Prediction of transporter family from protein sequence by support vector machine approach, Proteins, 62, 218-231. Ling, Y., Li, Z.H., Miranda, K., Oldfield, E. and Moreno, S.N. (2007) The farnesyl- diphosphate/geranylgeranyl-diphosphate synthase of Toxoplasma gondii is a bifunctional enzyme and a molecular target of bisphosphonates, J Biol Chem, 282, 30804-30816. Llinas, M., Bozdech, Z., Wong, E.D., Adai, A.T. and DeRisi, J.L. (2006) Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains, Nucleic Acids Res, 34, 1166-1173. 151

Lozovsky, E.R., Chookajorn, T., Brown, K.M., Imwong, M., Shaw, P.J., Kamchonwongpaisan, S., Neafsey, D.E., Weinreich, D.M. and Hartl, D.L. (2009) Stepwise acquisition of pyrimethamine resistance in the malaria parasite, Proc Natl Acad Sci U S A, 106, 12025-12030. Ma, H., Sorokin, A., Mazein, A., Selkov, A., Selkov, E., Demin, O. and Goryanin, I. (2007) The Edinburgh human metabolic network reconstruction and its functional analysis, Mol Syst Biol, 3, 135. Madern, D. (2002) Molecular evolution within the L-malate and L-lactate dehydrogenase super- family, J Mol Evol, 54, 825-840. Maeda, T., Saito, T., Harb, O.S., Roos, D.S., Takeo, S., Suzuki, H., Tsuboi, T., Takeuchi, T. and Asai, T. (2009) Pyruvate kinase type-II isozyme in Plasmodium falciparum localizes to the apicoplast, Parasitol Int, 58, 101-105. Mahadevan, R. and Schilling, C.H. (2003) The effects of alternate optimal solutions in constraint-based genome-scale metabolic models, Metab Eng, 5, 264-276. Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M. and Gilad, Y. (2008) RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays, Genome Res, 18, 1509-1517. Massoumi Alamouti, S., Kim, J.J., Humble, L.M., Uzunovic, A. and Breuil, C. (2007) Ophiostomatoid fungi associated with the northern spruce engraver, Ips perturbatus, in western Canada, Antonie Van Leeuwenhoek, 91, 19-34. Mather, M.W., Henry, K.W. and Vaidya, A.B. (2007) Mitochondrial drug targets in apicomplexan parasites, Curr Drug Targets, 8, 49-60. Mather, M.W. and Vaidya, A.B. (2008) Mitochondria in malaria and related parasites: ancient, diverse and streamlined, J Bioenerg Biomembr, 40, 425-433. Matsumoto, J., Sakamoto, K., Shinjyo, N., Kido, Y., Yamamoto, N., Yagi, K., Miyoshi, H., Nonaka, N., Katakura, K., Kita, K. and Oku, Y. (2008) Anaerobic NADH-fumarate reductase system is predominant in the respiratory chain of Echinococcus multilocularis, providing a novel target for the chemotherapy of alveolar echinococcosis, Antimicrob Agents Chemother, 52, 164- 170. McConkey, G.A., Ittarat, I., Meshnick, S.R. and McCutchan, T.F. (1994) Auxotrophs of Plasmodium falciparum dependent on p-aminobenzoic acid for growth, Proc Natl Acad Sci U S A, 91, 4244-4248. McConkey, G.A., Rogers, M.J. and McCutchan, T.F. (1997) Inhibition of Plasmodium falciparum protein synthesis. Targeting the plastid-like organelle with thiostrepton, J Biol Chem, 272, 2046-2049. Mehlhorn, H. (2008) Encyclopedia of parasitogy. Springer, New York. Meissner, M., Brecht, S., Bujard, H. and Soldati, D. (2001) Modulation of myosin A expression by a newly established tetracycline repressor-based inducible system in Toxoplasma gondii, Nucleic Acids Res, 29, E115. Meissner, M., Schluter, D. and Soldati, D. (2002) Role of Toxoplasma gondii myosin A in powering parasite gliding and host cell invasion, Science, 298, 837-840. Mercier, C., Adjogble, K.D., Daubener, W. and Delauw, M.F. (2005) Dense granules: are they key organelles to help understand the parasitophorous vacuole of all apicomplexa parasites?, Int J Parasitol, 35, 829-849. Meshnick, S.R. (2002) Artemisinin: mechanisms of action, resistance and toxicity, Int J Parasitol, 32, 1655-1660. 152

Mesplet, M., Palmer, G.H., Pedroni, M.J., Echaide, I., Florin-Christensen, M., Schnittger, L. and Lau, A.O. (2011) Genome-wide analysis of peptidase content and expression in a virulent and attenuated Babesia bovis strain pair, Mol Biochem Parasitol, 179, 111-113. Messina, M., Niesman, I., Mercier, C. and Sibley, L.D. (1995) Stable DNA transformation of Toxoplasma gondii using phleomycin selection, Gene, 165, 213-217. Michalski, W.P., Edgar, J.A. and Prowse, S.J. (1992) Mannitol metabolism in Eimeria tenella, Int J Parasitol, 22, 1157-1163. Missinou, M.A., Borrmann, S., Schindler, A., Issifou, S., Adegnika, A.A., Matsiegui, P.B., Binder, R., Lell, B., Wiesner, J., Baranek, T., Jomaa, H. and Kremsner, P.G. (2002) Fosmidomycin for malaria, Lancet, 360, 1941-1942. Mistry, J., Bateman, A. and Finn, R.D. (2007) Predicting active site residue annotations in the Pfam database, BMC Bioinformatics, 8, 298. Mobegi, V.A., Loua, K.M., Ahouidi, A.D., Satoguina, J., Nwakanma, D.C., Amambua-Ngwa, A. and Conway, D.J. (2012) Population genetic structure of Plasmodium falciparum across a region of diverse endemicity in West Africa, Malar J, 11, 223. Mogi, T. and Kita, K. (2010) Diversity in mitochondrial metabolic pathways in parasitic protists Plasmodium and Cryptosporidium, Parasitol Int, 59, 305-312. Montoya, J.G. and Liesenfeld, O. (2004) Toxoplasmosis, Lancet, 363, 1965-1976. Moore, R.B., Obornik, M., Janouskovec, J., Chrudimsky, T., Vancova, M., Green, D.H., Wright, S.W., Davies, N.W., Bolch, C.J., Heimann, K., Slapeta, J., Hoegh-Guldberg, O., Logsdon, J.M. and Carter, D.A. (2008) A photosynthetic alveolate closely related to apicomplexan parasites, Nature, 451, 959-963. Mu, J., Awadalla, P., Duan, J., McGee, K.M., Keebler, J., Seydel, K., McVean, G.A. and Su, X.Z. (2007) Genome-wide variation and identification of vaccine targets in the Plasmodium falciparum genome, Nat Genet, 39, 126-130. Murrell, B., Moola, S., Mabona, A., Weighill, T., Sheward, D., Kosakovsky Pond, S.L. and Scheffler, K. (2013) FUBAR: a fast, unconstrained bayesian approximation for inferring selection, Mol Biol Evol, 30, 1196-1205. Nadjm, B. and Behrens, R.H. (2012) Malaria: an update for physicians, Infect Dis Clin North Am, 26, 243-259. Nara, T., Hshimoto, T. and Aoki, T. (2000) Evolutionary implications of the mosaic pyrimidine- biosynthetic pathway in eukaryotes, Gene, 257, 209-222. Natalang, O., Bischoff, E., Deplaine, G., Proux, C., Dillies, M.A., Sismeiro, O., Guigon, G., Bonnefoy, S., Patarapotikul, J., Mercereau-Puijalon, O., Coppee, J.Y. and David, P.H. (2008) Dynamic RNA profiling in Plasmodium falciparum synchronized blood stages exposed to lethal doses of artesunate, BMC Genomics, 9, 388. Neafsey, D.E., Galinsky, K., Jiang, R.H., Young, L., Sykes, S.M., Saif, S., Gujja, S., Goldberg, J.M., Young, S., Zeng, Q., Chapman, S.B., Dash, A.P., Anvikar, A.R., Sutton, P.L., Birren, B.W., Escalante, A.A., Barnwell, J.W. and Carlton, J.M. (2012) The malaria parasite Plasmodium vivax exhibits greater genetic diversity than Plasmodium falciparum, Nat Genet, 44, 1046-1050. Nerima, B., Nilsson, D. and Maser, P. (2010) Comparative genomics of metabolic networks of free-living and parasitic eukaryotes, BMC Genomics, 11, 217. Nielsen, R. and Yang, Z. (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene, Genetics, 148, 929-936. 153

Nzila, A. (2006) Inhibitors of de novo folate enzymes in Plasmodium falciparum, Drug Discov Today, 11, 939-944. Oakley, M.S., Kumar, S., Anantharaman, V., Zheng, H., Mahajan, B., Haynes, J.D., Moch, J.K., Fairhurst, R., McCutchan, T.F. and Aravind, L. (2007) Molecular factors and biochemical pathways induced by febrile temperature in intraerythrocytic Plasmodium falciparum parasites, Infect Immun, 75, 2012-2025. Obornik, M. and Green, B.R. (2005) Mosaic origin of the heme biosynthesis pathway in photosynthetic eukaryotes, Mol Biol Evol, 22, 2343-2353. Oliveira, H., Sousa, A., Alves, A., Nogueira, A.J.A. and Santos, C. (2012) Inoculation with Ophiostoma novo-ulmi subsp. americana affects photosynthesis, nutrition and in in vitro Ulmus minor plants, Environmental and Experimental Botany, 77, 146-155. Olson, P.D., Zarowiecki, M., Kiss, F. and Brehm, K. (2012) Cestode genomics - progress and prospects for advancing basic and applied aspects of flatworm biology, Parasite Immunol, 34, 130-150. Olszewski, K.L., Mather, M.W., Morrisey, J.M., Garcia, B.A., Vaidya, A.B., Rabinowitz, J.D. and Llinas, M. (2010) Branched tricarboxylic acid metabolism in Plasmodium falciparum, Nature, 466, 774-778. Otto, T.D., Wilinski, D., Assefa, S., Keane, T.M., Sarry, L.R., Bohme, U., Lemieux, J., Barrell, B., Pain, A., Berriman, M., Newbold, C. and Llinas, M. (2010) New insights into the blood-stage transcriptome of Plasmodium falciparum using RNA-Seq, Mol Microbiol, 76, 12-24. Pagola, S., Stephens, P.W., Bohle, D.S., Kosar, A.D. and Madsen, S.K. (2000) The structure of malaria pigment beta-haematin, Nature, 404, 307-310. Pain, A., Bohme, U., Berry, A.E., Mungall, K., Finn, R.D., Jackson, A.P., Mourier, T., Mistry, J., Pasini, E.M., Aslett, M.A., Balasubrammaniam, S., Borgwardt, K., Brooks, K., Carret, C., Carver, T.J., Cherevach, I., Chillingworth, T., Clark, T.G., Galinski, M.R., Hall, N., Harper, D., Harris, D., Hauser, H., Ivens, A., Janssen, C.S., Keane, T., Larke, N., Lapp, S., Marti, M., Moule, S., Meyer, I.M., Ormond, D., Peters, N., Sanders, M., Sanders, S., Sargeant, T.J., Simmonds, M., Smith, F., Squares, R., Thurston, S., Tivey, A.R., Walker, D., White, B., Zuiderwijk, E., Churcher, C., Quail, M.A., Cowman, A.F., Turner, C.M., Rajandream, M.A., Kocken, C.H., Thomas, A.W., Newbold, C.I., Barrell, B.G. and Berriman, M. (2008) The genome of the simian and human malaria parasite Plasmodium knowlesi, Nature, 455, 799-803. Pain, A., Renauld, H., Berriman, M., Murphy, L., Yeats, C.A., Weir, W., Kerhornou, A., Aslett, M., Bishop, R., Bouchier, C., Cochet, M., Coulson, R.M., Cronin, A., de Villiers, E.P., Fraser, A., Fosker, N., Gardner, M., Goble, A., Griffiths-Jones, S., Harris, D.E., Katzer, F., Larke, N., Lord, A., Maser, P., McKellar, S., Mooney, P., Morton, F., Nene, V., O'Neil, S., Price, C., Quail, M.A., Rabbinowitsch, E., Rawlings, N.D., Rutter, S., Saunders, D., Seeger, K., Shah, T., Squares, R., Squares, S., Tivey, A., Walker, A.R., Woodward, J., Dobbelaere, D.A., Langsley, G., Rajandream, M.A., McKeever, D., Shiels, B., Tait, A., Barrell, B. and Hall, N. (2005) Genome of the host-cell transforming parasite Theileria annulata compared with T. parva, Science, 309, 131-133. Pal, C., Papp, B., Lercher, M.J., Csermely, P., Oliver, S.G. and Hurst, L.D. (2006) Chance and necessity in the evolution of minimal metabolic networks, Nature, 440, 667-670. Papoutsakis, E.T. (1984) Equations and calculations for fermentations of butyric acid bacteria, Biotechnol Bioeng, 26, 174-187. Park, J.M., Kim, T.Y. and Lee, S.Y. (2009) Constraints-based genome-scale metabolic simulation for systems metabolic engineering, Biotechnol Adv, 27, 979-988. 154

Parkinson, J., Wasmuth, J.D., Salinas, G., Bizarro, C.V., Sanford, C., Berriman, M., Ferreira, H.B., Zaha, A., Blaxter, M.L., Maizels, R.M. and Fernandez, C. (2012) A transcriptomic analysis of Echinococcus granulosus larval stages: implications for parasite biology and host adaptation, PLoS Negl Trop Dis, 6, e1897. Pashley, T.V., Volpe, F., Pudney, M., Hyde, J.E., Sims, P.F. and Delves, C.J. (1997) Isolation and molecular characterization of the bifunctional hydroxymethyldihydropterin pyrophosphokinase-dihydropteroate synthase gene from Toxoplasma gondii, Mol Biochem Parasitol, 86, 37-47. Patankar, S., Munasinghe, A., Shoaibi, A., Cummings, L.M. and Wirth, D.F. (2001) Serial analysis of gene expression in Plasmodium falciparum reveals the global expression profile of erythrocytic stages and the presence of anti-sense transcripts in the malarial parasite, Mol Biol Cell, 12, 3114-3125. Pazos, F., Rausell, A. and Valencia, A. (2006) Phylogeny-independent detection of functional residues, Bioinformatics, 22, 1440-1448. Peregrin-Alvarez, J.M., Sanford, C. and Parkinson, J. (2009) The conservation and evolutionary modularity of metabolism, Genome Biol, 10, R63. Peterson, D.S., Walliker, D. and Wellems, T.E. (1988) Evidence that a point mutation in dihydrofolate reductase-thymidylate synthase confers resistance to pyrimethamine in falciparum malaria, Proc Natl Acad Sci U S A, 85, 9114-9118. Pfleiderer, G., Kreiling, A. and Wieland, T. (1960) UBER PANTOTHENSAURESYNTHETASE AUS E-COLI .1. ANREICHERUNG MIT HILFE EINES OPTISCHEN TESTES, Biochemische Zeitschrift, 333, 302-307. Pharkya, P. and Maranas, C.D. (2006) An optimization framework for identifying reaction activation/inhibition or elimination candidates for overproduction in microbial systems, Metab Eng, 8, 1-13. Pinney, J.W., Papp, B., Hyland, C., Wambua, L., Westhead, D.R. and McConkey, G.A. (2007) Metabolic reconstruction and analysis for parasite genomes, Trends Parasitol, 23, 548-554. Pinney, J.W., Shirley, M.W., McConkey, G.A. and Westhead, D.R. (2005) metaSHARK: software for automated metabolic network prediction from DNA sequence and its application to the genomes of Plasmodium falciparum and Eimeria tenella, Nucleic Acids Res, 33, 1399-1409. Plata, G., Hsiao, T.L., Olszewski, K.L., Llinas, M. and Vitkup, D. (2010) Reconstruction and flux-balance analysis of the Plasmodium falciparum metabolic network, Mol Syst Biol, 6, 408. Porter, S.B. and Sande, M.A. (1992) Toxoplasmosis of the central nervous system in the acquired immunodeficiency syndrome, N Engl J Med, 327, 1643-1648. Preuss, J., Hedrick, M., Sergienko, E., Pinkerton, A., Mangravita-Novo, A., Smith, L., Marx, C., Fischer, E., Jortzik, E., Rahlfs, S., Becker, K. and Bode, L. High-throughput screening for small- molecule inhibitors of plasmodium falciparum glucose-6-phosphate dehydrogenase 6- phosphogluconolactonase, J Biomol Screen, 17, 738-751. Radke, J.R., Behnke, M.S., Mackey, A.J., Radke, J.B., Roos, D.S. and White, M.W. (2005) The transcriptome of Toxoplasma gondii, BMC Biol, 3, 26. Radzicka, A. and Wolfenden, R. (1995) A proficient enzyme, Science, 267, 90-93. Raja, F. (2010) Flux Balance Analysis of Plasmodium falciparum Metabolism. Biochemistry. University of Toronto, Toronto. Ralph, S.A., van Dooren, G.G., Waller, R.F., Crawford, M.J., Fraunholz, M.J., Foth, B.J., Tonkin, C.J., Roos, D.S. and McFadden, G.I. (2004) Tropical infectious diseases: metabolic maps and functions of the Plasmodium falciparum apicoplast, Nat Rev Microbiol, 2, 203-216. 155

Raman, K. and Chandra, N. (2009) Flux balance analysis of biological systems: applications and challenges, Brief Bioinform, bbp011. Rangachari, K., Dluzewski, A., Wilson, R.J. and Gratzer, W.B. (1986) Control of malarial invasion by phosphorylation of the host cell membrane cytoskeleton, Nature, 324, 364-365. Rask-Andersen, M., Almen, M.S. and Schioth, H.B. (2011) Trends in the exploitation of novel drug targets, Nat Rev Drug Discov, 10, 579-590. Reeck, G.R., de Haen, C., Teller, D.C., Doolittle, R.F., Fitch, W.M., Dickerson, R.E., Chambon, P., McLachlan, A.D., Margoliash, E., Jukes, T.H. and et al. (1987) "Homology" in proteins and nucleic acids: a terminology muddle and a way out of it, Cell, 50, 667. Reed, J.L., Famili, I., Thiele, I. and Palsson, B.O. (2006) Towards multidimensional genome annotation, Nat Rev Genet, 7, 130-141. Reed, J.L. and Palsson, B.O. (2003) Thirteen years of building constraint-based in silico models of Escherichia coli, J Bacteriol, 185, 2692-2699. Reid, A.J., Vermont, S.J., Cotton, J.A., Harris, D., Hill-Cawthorne, G.A., Konen-Waisman, S., Latham, S.M., Mourier, T., Norton, R., Quail, M.A., Sanders, M., Shanmugam, D., Sohal, A., Wasmuth, J.D., Brunk, B., Grigg, M.E., Howard, J.C., Parkinson, J., Roos, D.S., Trees, A.J., Berriman, M., Pain, A. and Wastling, J.M. (2012) Comparative genomics of the apicomplexan parasites Toxoplasma gondii and Neospora caninum: Coccidia differing in host range and transmission strategy, PLoS Pathog, 8, e1002567. Remington, J.S. (1974) Toxoplasmosis in the adult, Bull N Y Acad Med, 50, 211-227. Rice, P., Longden, I. and Bleasby, A. (2000) EMBOSS: the European Molecular Biology Open Software Suite, Trends Genet, 16, 276-277. Richards, T.A., Dacks, J.B., Campbell, S.A., Blanchard, J.L., Foster, P.G., McLeod, R. and Roberts, C.W. (2006) Evolutionary origins of the eukaryotic shikimate pathway: gene fusions, horizontal gene transfer, and endosymbiotic replacements, Eukaryot Cell, 5, 1517-1531. Roberts, F., Roberts, C.W., Johnson, J.J., Kyle, D.E., Krell, T., Coggins, J.R., Coombs, G.H., Milhous, W.K., Tzipori, S., Ferguson, D.J., Chakrabarti, D. and McLeod, R. (1998) Evidence for the shikimate pathway in apicomplexan parasites, Nature, 393, 801-805. Robertson, J.G. (2005) Mechanistic basis of enzyme-targeted drugs, Biochemistry, 44, 5561- 5571. Rohdich, F., Eisenreich, W., Wungsintaweekul, J., Hecht, S., Schuhr, C.A. and Bacher, A. (2001) Biosynthesis of terpenoids. 2C-Methyl-D-erythritol 2,4-cyclodiphosphate synthase (IspF) from Plasmodium falciparum, Eur J Biochem, 268, 3190-3197. Rohmer, M. (1999) The discovery of a mevalonate-independent pathway for isoprenoid biosynthesis in bacteria, algae and higher plants, Nat Prod Rep, 16, 565-574. Roos, D.S., Crawford, M.J., Donald, R.G., Fraunholz, M., Harb, O.S., He, C.Y., Kissinger, J.C., Shaw, M.K. and Striepen, B. (2002) Mining the Plasmodium genome database to define organellar function: what does the apicoplast do?, Philos Trans R Soc Lond B Biol Sci, 357, 35- 46. Rost, B. (2002) Enzyme function less conserved than anticipated, J Mol Biol, 318, 595-608. Roth, E.F., Jr., Calvin, M.C., Max-Audit, I., Rosa, J. and Rosa, R. (1988) The enzymes of the glycolytic pathway in erythrocytes infected with Plasmodium falciparum malaria parasites, Blood, 72, 1922-1925. Rueckert, S., Simdyanov, T.G., Aleoshin, V.V. and Leander, B.S. (2011) Identification of a divergent environmental DNA sequence clade using the phylogeny of gregarine parasites (Apicomplexa) from crustacean hosts, PLoS One, 6, e18163. 156

Sacchettini, J.C. and Poulter, C.D. (1997) Creating isoprenoid diversity, Science, 277, 1788- 1789. Saier, M.H., Jr. (2000) A functional-phylogenetic classification system for transmembrane solute transporters, Microbiol Mol Biol Rev, 64, 354-411. Saier, M.H., Jr., Tran, C.V. and Barabote, R.D. (2006) TCDB: the Transporter Classification Database for membrane transport protein analyses and information, Nucleic Acids Res, 34, D181-186. Samal, A., Singh, S., Giri, V., Krishna, S., Raghuram, N. and Jain, S. (2006) Low degree metabolites explain essential reactions and enhance modularity in biological networks, BMC Bioinformatics, 7, 118. Sarciron, M.E., Delabre-Defayolle, I., Audin, P., Petavy, A.F. and Paris, J. (1990) Effects of ethyl N-N-benzyl-methyl-oxamate in Meriones unguiculatus infected with Echinococcus multilocularis metacestodes. Biochemical and ultrastructural observations, Arzneimittelforschung, 40, 607-610. Sato, S. and Wilson, R.J. (2002) The genome of Plasmodium falciparum encodes an active delta- aminolevulinic acid dehydratase, Curr Genet, 40, 391-398. Schmatz, D.M., Baginsky, W.F. and Turner, M.J. (1989) Evidence for and characterization of a mannitol cycle in Eimeria tenella, Mol Biochem Parasitol, 32, 263-270. Schnoes, A.M., Brown, S.D., Dodevski, I. and Babbitt, P.C. (2009) Annotation error in public databases: misannotation of molecular function in enzyme superfamilies, PLoS Comput Biol, 5, e1000605. Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G. and Schomburg, D. (2004) BRENDA, the enzyme database: updates and major new developments, Nucleic Acids Res, 32. Schramm, V.L. (1998) Enzymatic transition states and transition state analog design, Annu Rev Biochem, 67, 693-720. Schwarz, G. and Mendel, R.R. (2006) Molybdenum cofactor biosynthesis and molybdenum enzymes, Annu Rev Plant Biol, 57, 623-647. Seeber, F., Limenitakis, J. and Soldati-Favre, D. (2008) Apicomplexan mitochondrial metabolism: a story of gains, losses and retentions, Trends Parasitol, 24, 468-478. Shanmugasundram, A., Gonzalez-Galarza, F.F., Wastling, J.M., Vasieva, O. and Jones, A.R. (2013) Library of Apicomplexan Metabolic Pathways: a manually curated database for metabolic pathways of apicomplexan parasites, Nucleic Acids Res, 41, D706-713. Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang, J.T., Ramage, D., Amin, N., Schwikowski, B. and Ideker, T. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks, Genome Res, 13, 2498-2504. Sharma, S., Sharma, S.K., Surolia, N. and Surolia, A. (2009) Beta-ketoacyl-ACP synthase I/II from Plasmodium falciparum (PfFabB/F)--is it B or F?, IUBMB Life, 61, 658-662. Shen, Y., Liu, J., Estiu, G., Isin, B., Ahn, Y.Y., Lee, D.S., Barabasi, A.L., Kapatral, V., Wiest, O. and Oltvai, Z.N. (2010) Blueprint for antimicrobial hit discovery targeting metabolic networks, Proc Natl Acad Sci U S A, 107, 1082-1087. Shlomi, T., Cabili, M.N. and Ruppin, E. (2009) Predicting metabolic biomarkers of human inborn errors of metabolism, Mol Syst Biol, 5, 263. Sibley, L.D. (2004) Intracellular parasite invasion strategies, Science, 304, 248-253. Sibley, L.D. (ed) (2012) Evolution of Virulence in Eukaryotic Microbes. Wiley-Blackwell. 157

Sibley, L.D. and Boothroyd, J.C. (1992) Virulent strains of Toxoplasma gondii comprise a single clonal lineage, Nature, 359, 82-85. Sibley, L.D., Messina, M. and Niesman, I.R. (1994) Stable DNA transformation in the obligate intracellular parasite Toxoplasma gondii by complementation of tryptophan auxotrophy, Proc Natl Acad Sci U S A, 91, 5508-5512. Sirawaraporn, W., Yongkiettrakul, S., Sirawaraporn, R., Yuthavong, Y. and Santi, D.V. (1997) Plasmodium falciparum: asparagine mutant at residue 108 of dihydrofolate reductase is an optimal antifolate-resistant single mutant, Exp Parasitol, 87, 245-252. Snow, R.W., Guerra, C.A., Noor, A.M., Myint, H.Y. and Hay, S.I. (2005) The global distribution of clinical episodes of Plasmodium falciparum malaria, Nature, 434, 214-217. Soh, K.C. and Hatzimanikatis, V. (2010) Network thermodynamics in the post-genomic era, Curr Opin Microbiol, 13, 350-357. Soldati, D. (1999) The apicoplast as a potential therapeutic target in and other apicomplexan parasites, Parasitol Today, 15, 5-7. Soldati, D. and Boothroyd, J.C. (1993) Transient transfection and expression in the obligate intracellular parasite Toxoplasma gondii, Science, 260, 349-352. Soldati, D., Kim, K., Kampmeier, J., Dubremetz, J.F. and Boothroyd, J.C. (1995) Complementation of a Toxoplasma gondii ROP1 knock-out mutant using phleomycin selection, Mol Biochem Parasitol, 74, 87-97. Spalding, M.D. and Prigge, S.T. (2010) Lipoic acid metabolism in microbial pathogens, Microbiol Mol Biol Rev, 74, 200-228. Srivastava, I.K., Schmidt, M., Grall, M., Certa, U., Garcia, A.M. and Perrin, L.H. (1992) Identification and purification of glucose phosphate isomerase of Plasmodium falciparum, Mol Biochem Parasitol, 54, 153-164. Stelling, J., Klamt, S., Bettenbrock, K., Schuster, S. and Gilles, E.D. (2002) Metabolic network structure determines key aspects of functionality and regulation, Nature, 420, 190-193. Suhre, K. (2007) Inference of gene function based on gene fusion events: the rosetta-stone method, Methods Mol Biol, 396, 31-41. Surolia, N. and Surolia, A. (2001) Triclosan offers protection against blood stages of malaria by inhibiting enoyl-ACP reductase of Plasmodium falciparum, Nat Med, 7, 167-173. Suzuki, Y. (2004) New methods for detecting positive selection at single amino acid sites, J Mol Evol, 59, 11-19. Suzuki, Y. and Gojobori, T. (1999) A method for detecting positive selection at single amino acid sites, Mol Biol Evol, 16, 1315-1328. Tamez, P.A., Bhattacharjee, S., van Ooij, C., Hiller, N.L., Llinas, M., Balu, B., Adams, J.H. and Haldar, K. (2008) An erythrocyte vesicle protein exported by the malaria parasite promotes tubovesicular lipid import from the host cell surface, PLoS Pathog, 4, e1000118. Tarun, A.S., Peng, X., Dumpit, R.F., Ogata, Y., Silva-Rivera, H., Camargo, N., Daly, T.M., Bergman, L.W. and Kappe, S.H. (2008) A combined transcriptome and proteome survey of malaria parasite liver stages, Proc Natl Acad Sci U S A, 105, 305-310. Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J. and Natale, D.A. (2003) The COG database: an updated version includes eukaryotes, BMC Bioinformatics, 4, 41. 158

Temple, B., Bernier, L. and Hintz, W.E. (2009) Characterisation of the polygalacturonase gene of the Dutch elm disease pathogen Ophiostoma novo-ulmi, New Zealand Journal of Forestry Science, 29, 39-37. Temple, B., Pines, P.A. and Hintz, W.E. (2006) A nine-year genetic survey of the causal agent of Dutch elm disease, Ophiostoma novo-ulmi in Winnipeg, Canada, Mycol Res, 110, 594-600. ten Have, A., Mulder, W., Visser, J. and van Kan, J.A. (1998) The endopolygalacturonase gene Bcpg1 is required for full virulence of Botrytis cinerea, Mol Plant Microbe Interact, 11, 1009- 1016. Thiele, I. and Palsson, B.O. (2010) Reconstruction annotation jamborees: a community approach to systems biology, Mol Syst Biol, 6, 361. Thomsen-Zieger, N., Schachtner, J. and Seeber, F. (2003) Apicomplexan parasites contain a single lipoic acid synthase located in the plastid, FEBS Lett, 547, 80-86. Tielens, A.G. (1994) Energy generation in parasitic helminths, Parasitol Today, 10, 346-352. Tielens, A.G., Horemans, A.M., Dunnewijk, R., van der Meer, P. and van den Bergh, S.G. (1992) The facultative anaerobic energy metabolism of Schistosoma mansoni sporocysts, Mol Biochem Parasitol, 56, 49-57. Tosh, K. and Kilbey, B. (1995) The gene encoding topoisomerase I from the human malaria parasite Plasmodium falciparum, Gene, 163, 151-154. Tsai, I.J., Zarowiecki, M., Holroyd, N., Garciarrubio, A., Sanchez-Flores, A., Brooks, K.L., Tracey, A., Bobes, R.J., Fragoso, G., Sciutto, E., Aslett, M., Beasley, H., Bennett, H.M., Cai, J., Camicia, F., Clark, R., Cucher, M., De Silva, N., Day, T.A., Deplazes, P., Estrada, K., Fernandez, C., Holland, P.W., Hou, J., Hu, S., Huckvale, T., Hung, S.S., Kamenetzky, L., Keane, J.A., Kiss, F., Koziol, U., Lambert, O., Liu, K., Luo, X., Luo, Y., Macchiaroli, N., Nichol, S., Paps, J., Parkinson, J., Pouchkina-Stantcheva, N., Riddiford, N., Rosenzvit, M., Salinas, G., Wasmuth, J.D., Zamanian, M., Zheng, Y., Cai, X., Soberon, X., Olson, P.D., Laclette, J.P., Brehm, K. and Berriman, M. (2013) The genomes of four tapeworm species reveal adaptations to parasitism, Nature, 496, 57-63. Umbarger, H.E. (1978) Amino acid biosynthesis and its regulation, Annu Rev Biochem, 47, 532- 606. Usaite, R., Patil, K.R., Grotkjaer, T., Nielsen, J. and Regenberg, B. (2006) Global transcriptional and physiological responses of Saccharomyces cerevisiae to ammonium, L-alanine, or L- glutamine limitation, Appl Environ Microbiol, 72, 6194-6203. Van de Peer, Y. and De Wachter, R. (1997) Evolutionary relationships among the eukaryotic crown taxa taking into account site-to-site rate variation in 18S rRNA, J Mol Evol, 45, 619-630. Van Deenen, L.L. and de Gier, J. (1975) Lipids of the red cell membrane. In, The red blood cell. Academic Press, New York, N.Y., 147-211. Van Der Ouderaa, F.J., Buytenhek, M., Nugteren, D.H. and Van Dorp, D.A. (1980) Acetylation of prostaglandin endoperoxide synthetase with acetylsalicylic acid, Eur J Biochem, 109, 1-8. van Dooren, G.G., Su, V., D'Ombrain, M.C. and McFadden, G.I. (2002) Processing of an apicoplast leader sequence in Plasmodium falciparum and the identification of a putative leader cleavage enzyme, J Biol Chem, 277, 23612-23619. van Grinsven, K.W., van Hellemond, J.J. and Tielens, A.G. (2009) Acetate:succinate CoA- transferase in the anaerobic mitochondria of Fasciola hepatica, Mol Biochem Parasitol, 164, 74- 79. Varma, A. and Palsson, B.O. (1995) Parametric sensitivity of stoichiometric flux balance models applied to wild-type Escherichia coli metabolism, Biotechnol Bioeng, 45, 69-79. 159

Vavra, J. and Small, E.B. (1969) Scanning electron microscopy of gregarines (Protozoa, Sporozoa) and its contribution to the theory of gregarine movement, J Protozool, 16, 745-757. Vial, H.J. and Ancelin, M.L. (1992) Malarial lipids. An overview, Subcell Biochem, 18, 259-306. Vial, H.J. and Gorenflot, A. (2006) Chemotherapy against babesiosis, Vet Parasitol, 138, 147- 160. Volkman, S.K., Neafsey, D.E., Schaffner, S.F., Park, D.J. and Wirth, D.F. (2012) Harnessing genomics and genome biology to understand malaria biology, Nat Rev Genet, 13, 315-328. Volkman, S.K., Sabeti, P.C., DeCaprio, D., Neafsey, D.E., Schaffner, S.F., Milner, D.A., Jr., Daily, J.P., Sarr, O., Ndiaye, D., Ndir, O., Mboup, S., Duraisingh, M.T., Lukens, A., Derr, A., Stange-Thomann, N., Waggoner, S., Onofrio, R., Ziaugra, L., Mauceli, E., Gnerre, S., Jaffe, D.B., Zainoun, J., Wiegand, R.C., Birren, B.W., Hartl, D.L., Galagan, J.E., Lander, E.S. and Wirth, D.F. (2007) A genome-wide map of diversity in Plasmodium falciparum, Nat Genet, 39, 113-119. von Itzstein, M., Wu, W.Y., Kok, G.B., Pegg, M.S., Dyason, J.C., Jin, B., Van Phan, T., Smythe, M.L., White, H.F., Oliver, S.W. and et al. (1993) Rational design of potent sialidase-based inhibitors of influenza virus replication, Nature, 363, 418-423. von Mering, C., Jensen, L.J., Snel, B., Hooper, S.D., Krupp, M., Foglierini, M., Jouffre, N., Huynen, M.A. and Bork, P. (2005) STRING: known and predicted protein-protein associations, integrated and transferred across organisms, Nucleic Acids Res, 33, D433-437. Wakaguri, H., Suzuki, Y., Katayama, T., Kawashima, S., Kibukawa, E., Hiranuka, K., Sasaki, M., Sugano, S. and Watanabe, J. (2009) Full-Malaria/Parasites and Full-Arthropods: databases of full-length cDNAs of parasites and arthropods, update 2009, Nucleic Acids Res, 37, D520-525. Waller, R.F., Keeling, P.J., Donald, R.G., Striepen, B., Handman, E., Lang-Unnasch, N., Cowman, A.F., Besra, G.S., Roos, D.S. and McFadden, G.I. (1998) Nuclear-encoded proteins target to the plastid in Toxoplasma gondii and Plasmodium falciparum, Proc Natl Acad Sci U S A, 95, 12352-12357. Wang, L., Birol, I. and Hatzimanikatis, V. (2004) Metabolic control analysis under uncertainty: framework development and case studies, Biophys J, 87, 3750-3763. Ward, G.E., Fujioka, H., Aikawa, M. and Miller, L.H. (1994) Staurosporine inhibits invasion of erythrocytes by malarial merozoites, Exp Parasitol, 79, 480-487. Wasmuth, J., Daub, J., Peregrin-Alvarez, J.M., Finney, C.A. and Parkinson, J. (2009) The origins of apicomplexan sequence innovation, Genome Res, 19, 1202-1213. Whitaker, J.W., Westhead, D.R. and McConkey, G.A. (2009) Alio intuitu: the automated reconstruction of the metabolic networks of parasites, Trends Parasitol, 25, 396-397. Wiwanitkit, V. (2007) Plasmodium and host lactate dehydrogenase molecular function and biological pathways: implication for antimalarial drug discovery, Chem Biol Drug Des, 69, 280- 283. Wrenger, C., Eschbach, M.L., Muller, I.B., Laun, N.P., Begley, T.P. and Walter, R.D. (2006) Vitamin B1 de novo synthesis in the human malaria parasite Plasmodium falciparum depends on external provision of 4-amino-5-hydroxymethyl-2-methylpyrimidine, Biol Chem, 387, 41-51. Xiao, S., Feng, J., Guo, H. and Yao, M. (1995) Effects of mebendazole, albendazole and praziquantel on alanine aminotransferase and aspartate aminotransferase of Echinococcus granulosus cyst wall harbored in mice, Zhongguo Ji Sheng Chong Xue Yu Ji Sheng Chong Bing Za Zhi, 13, 107-110. Xiao, S.H., Feng, J.J., Guo, H.F., Jiao, P.Y., Yao, M.Y. and Jiao, W. (1993) Effects of mebendazole, albendazole, and praziquantel on succinate dehydrogenase, fumarate reductase, 160 and malate dehydrogenase in Echinococcus granulosus cysts harbored in mice, Zhongguo Yao Li Xue Bao, 14, 151-154. Xu, P., Widmer, G., Wang, Y., Ozaki, L.S., Alves, J.M., Serrano, M.G., Puiu, D., Manque, P., Akiyoshi, D., Mackey, A.J., Pearson, W.R., Dear, P.H., Bankier, A.T., Peterson, D.L., Abrahamsen, M.S., Kapur, V., Tzipori, S. and Buck, G.A. (2004) The genome of Cryptosporidium hominis, Nature, 431, 1107-1112. Yadav, A. and Gupta, S.K. (2001) Study of resistance against some ionophores in Eimeria tenella field isolates, Vet Parasitol, 102, 69-75. Yeh, I., Hanekamp, T., Tsoka, S., Karp, P.D. and Altman, R.B. (2004) Computational analysis of Plasmodium falciparum metabolism: organizing genomic information to facilitate drug discovery, Genome Res, 14, 917-924. Yip, K.Y., Yu, H., Kim, P.M., Schultz, M. and Gerstein, M. (2006) The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks, Bioinformatics, 22, 2968-2970. Young, J.A., Johnson, J.R., Benner, C., Yan, S.F., Chen, K., Le Roch, K.G., Zhou, Y. and Winzeler, E.A. (2008) In silico discovery of transcription regulatory elements in Plasmodium falciparum, BMC Genomics, 9, 70. Yu, H., Kim, P.M., Sprecher, E., Trifonov, V. and Gerstein, M. (2007) The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics, PLoS Comput Biol, 3, e59. Zhang, T., Zhang, H., Chen, K., Shen, S., Ruan, J. and Kurgan, L. (2008) Accurate sequence- based prediction of catalytic residues, Bioinformatics, 24, 2329-2338. Zhang, Y. and Meshnick, S.R. (1991) Inhibition of Plasmodium falciparum dihydropteroate synthetase and growth in vitro by sulfa drugs, Antimicrob Agents Chemother, 35, 267-271. Zhang, Y., Traskman-Bendz, L., Janelidze, S., Langenberg, P., Saleh, A., Constantine, N., Okusaga, O., Bay-Richter, C., Brundin, L. and Postolache, T.T. (2012) Toxoplasma gondii immunoglobulin G antibodies and nonfatal suicidal self-directed violence, J Clin Psychiatry, 73, 1069-1076. Zhou, Y., Ramachandran, V., Kumar, K.A., Westenberger, S., Refour, P., Zhou, B., Li, F., Young, J.A., Chen, K., Plouffe, D., Henson, K., Nussenzweig, V., Carlton, J., Vinetz, J.M., Duraisingh, M.T. and Winzeler, E.A. (2008) Evidence-based annotation of the malaria parasite's genome using comparative expression profiling, PLoS One, 3, e1570. Zhu, G. (2004) Current progress in the fatty acid metabolism in Cryptosporidium parvum, J Eukaryot Microbiol, 51, 381-388. Zhu, G., Keithly, J.S. and Philippe, H. (2000) What is the phylogenetic position of Cryptosporidium?, Int J Syst Evol Microbiol, 50 Pt 4, 1673-1681. Zocher, K., Fritz-Wolf, K., Kehr, S., Fischer, M., Rahlfs, S. and Becker, K. (2012) Biochemical and structural characterization of Plasmodium falciparum glutamate dehydrogenase 2, Mol Biochem Parasitol, 183, 52-62.